樹洞 Tree Hole 2.0

Reading, Caffeine, Alcohol, Peanuts, Cynicism…

十年前的今日 — May 3, 2017

十年前的今日

十年前,Web 2.0 的熱潮方興未艾,於是也湊熱鬧的開了一個部落格,那時剛接觸 Data Mining 這門學問,一邊整理文獻,一邊做筆記,一時腦熱,拍拍腦袋就把筆記稍作修整,放上部落格

現在應該沒人在乎 1989 的 IJCAI 有什麼歷史意義了,那時的傻勁,真可愛

Data Mining is the evolution of a filed with long history, the term “data mining” emerged in late ’80s and the researches of data mining flourished since 1990s. Many believed that the birth of data mining (or knowledge discovery) should trace back to the 1989 IJCAI workshop on Knowledge Discovery in Databases took pace in Detroit, Michigan, USA. The report was published in AI magazine and the bibex can be found at ACM digital library. The context of the document can be found at KDnuggets. (The Proceedings of the conference may be of interest).

Advertisements
機器學習 vs. 統計 — March 27, 2017

機器學習 vs. 統計

機器學習越發受到大眾矚目之後,比較機器學習和統計有什麼不同的各種說法越來越多,我自己也在各種資料上,和各種討論(打嘴炮)場合,見過和聽說過各種偏見和意見。有一點小意外, Data Mining 圈大名鼎鼎的 KDNuggets ,竟然找了投資銀行出身的 Astash Shah 來說說機器學習和統計有什麼不同?

Source: SAS Institute – A Venn diagram that shows how machine learning and statistics are related

從教科書抄出來的定義,和一般大眾的印象,得到的總結是這樣的:

Machine learning is all about predictions, supervised learning, unsupervised learning, etc.

Statistics is about sample, population, hypothesis, etc.

然後 Astash Shah 說統計是數學的分枝科目,而機器學習的理論技術則是源自人工智慧。

Machine learning is a subfield of computer science and artificial intelligence. It deals with building systems that can learn from data, instead of explicitly programmed instructions.

statistical model, on the other hand, is a subfield of mathematics.

既然內功源流不同,打磨熬煉的經脈不同,若是面對同一件事,機器學習專家和統計專家會如何描述這件事呢?老實說,從下面的例子還真不容易分出來。

ML professional: “The model is 85% accurate in predicting Y, given a, b and c.”

Statistician: “The model is 85% accurate in predicting Y, given a, b and c; and I am 90% certain that you will obtain the same result.”

最後的結論:

The difference between the two has reduced significantly over the past decade. Both the branches have learned from each other a lot and will continue to come closer together in the future.

But, understanding the association and knowing their differences enables machine learners and statisticians to expand their knowledge and even apply methods outside their domain of expertise. This is the notion of “data science” itself, which aims to bridge the gap.

好像,真的沒有什麼不同!?做投資的人,就是有辦法。

Source: Machine Learning vs Statistics – KDnuggets

我們應該擔心嗎 — March 14, 2017

我們應該擔心嗎

前幾天 MIT Technology Review 網站有文章談到深度學習大牛 Yann LeCun 認為機器可以利用機器視覺技術從大量影片中提取「常識」等級的知識,還有篇文章談如何利用機器學習技術,協助法官判案

光看這兩篇文章的標題,就讓我渾身冷颼颼,在人工智慧技術進展迅速的今日, John Markoff 的書Machines of Loving Grace 裡面所說 IA (intelligence augmentation) vs. AI (artificial intelligence) 的天平,似乎擺盪頻率愈發的高,擺盪幅度也愈發的大了。

看了上面這兩篇文章,我不禁懷疑,IA 和 AI 兩個取向,天平擺盪會有贏家輸家嗎?誰贏誰輸,最終對人類的影響究竟有什麼不同?

AlphaGo 初次露臉之後,李開復寫了一篇《人工智慧對人類真正的威脅是什麼?》,我覺得他對人工智慧議題的觀點是稍偏 IA 這一側的。但機器若能從大量影片裡面觀察到事物的特色與限制(真的邁向 common sense 了?),那可真的是「學習」路上一大步,不是 augmentation 或 amplification ,而是 intelligence 了。

One of the things we really want to do is get machines to acquire the very large number of facts that represent the constraints of the real world just by observing it through video or other channels. That’s what would allow them to acquire common sense, in the end. These are things that animals and babies learn in the first few months of life—you learn a ridiculously large amount about the world just by observation.

去年有人說臉書的廣告演算法和推薦演算法為什麼不一樣(唉,竟然忘記出處),因為他不需要 profiling 你是什麼樣的人,他根本就知道你是誰啊。Yann LeCun 現在可是臉書的人工智慧研究部門的老大,如果臉書的研究往前走了這麼一大步,怎麼不讓我感到冷颼颼。

說真格的,人工智慧對人類究竟是不是「威脅」,人言言殊,真的很難說。雖然現在不可能有答案,杞人憂天,畢竟也是談資啊Vox 讓旗下專欄作者 Sean Illing 找了十來個專家,問 How worried should we be about artificial intelligence?,答案可以說南轅北轍,也可以說所處的位置和行業決定了答案。人工智慧搶工作的議題是一定會一提再提的,技術演進的腳步快慢也一定有不同的看法。

當然,一定會有人要大家認真看待人工智慧的威脅,首先開槍的是來自牛津大學的哲學家 Nick Bostrom,人工智慧說不定那天就搞出大事了,怎麼能不小心謹慎呢。

The transition to machine superintelligence is a very grave matter, and we should take seriously the possibility that things could go radically wrong. This should motivate having some top talent in mathematics and computer science research the problems of AI safety and AI control.

最近幾年,只要談到資料挖掘、大數據、人工智慧,異常搶鏡的 Andrew Ng,則大剌剌的說,未來我們的後代也許需要擔心這個,但是現在擔心這問題,跟擔心火星上發生貪汙案一樣。中國味十足的答案,莫非因為他去了百度,常常閱讀中國材料,耳濡目染中國的反貪腐宣言,一不小心就帶進對話裡了。

Worrying about evil-killer AI today is like worrying about overpopulation on the planet Mars. Perhaps it’ll be a problem someday, but we haven’t even landed on the planet yet. This hype has been unnecessarily distracting everyone from the much bigger problem AI creates, which is job displacement.

我想,最好的答案,也是最雞湯的答案,應該是 MIT 的 Daniela Rus 的宣言吧!

It’s understandable that people have fears and anxieties about AI, and, as researchers, we have a duty to recognize those fears and provide different perspectives and solutions. I am optimistic about the future of AI in enabling people and machines to work together to make our lives better.

都不是真的 — February 26, 2017

都不是真的

下圖是 2009 年某產品白皮書裡面的插圖,原始出處不明。此圖深意,歷久彌新

根據數據分析,住在我那個社區裡的人都喜歡不辣的食物,喜歡網球勝過高爾夫,訂閱至少一本新聞類的雜誌, 擁有30來條領帶, 從來不買檸檬香味的產品,而且在我的地下室有一個重型機床。

這些都不是真的,下面這個才是真的,ha ha ha…

Data Mining Doesn’t Cure Stupidity — January 25, 2017

Data Mining Doesn’t Cure Stupidity

2007 年底,在部落格 Data Strategy 看到 Data mining doesn’t cure stupidity ,文章很短,但是標題實在是太給力,至今印象深刻,常常拿這句話提醒自己。

Data mining, when done correctly, can improve understanding and provide insight, but data mining just doesn’t work under stupid assumptions.

那年底,當然把這件事寫進那年的年終檢討,文章結尾,我下了結論:

愚者千慮,個人認為,今年度最有意義文章,首選應是 Data Strategy 的 Data Mining doesn’t cure stupidity,這篇文章不長,第一段就簡要地闡明腦筋清楚比技術更重要的真諦:

…..

這段文字裡的 Data mining 換成任何名詞,都說得通,不管排行榜再多,檢討再繁,腦袋清楚才是最重要的。時值歲末,展望來年,這才是最要緊的心得,切記,切記。

~ original published at blurkerlab.blogspot.com at Dec 16,2007

要言不繁,再重複一次,不管幹什麼,腦袋清楚才是最重要的


又,Data Strategy 已經斷更,為保險計,截圖留個紀念,是為記。

-