圖靈測試：計算機能通過人類測試嗎？ - Alex Gendler

什麼是意識？人造機器真的能思考嗎？心智只是由腦的神經元所組成或有無形的聰明才智在腦的核心嗎？對多數人而言，這一直是未來人工智慧的主要考慮因素。但英國電腦科學家阿蘭‧圖靈決定不理會這諸多問題，而支持更簡單的一種：電腦能和人一樣交談嗎？這問題後來發展出一個測量人工智慧的理念，也就是眾所周知的「圖靈測試」。在 1950 年的一篇論文《電腦機器與智能》，圖靈提出了以下的測試。一位審判員和不被看見的被測試者做文字對話，並評估他們的回答。要通過測試，電腦必須能取代被測試者之一，但不會明顯改變最後結果。換言之，此電腦會被認定是有智慧 ── 若其對話不易與人類的區分時。圖靈預言在 2000 年之前，具備 100 MB 記憶體的電腦將能輕易通過圖靈測試，但他可能言之過早了。即使目前配備更多記憶體的電腦，也很少成功，而那些通過測試的著重在欺騙審判員的小技巧，而非運用強大的運算能力。雖然從未接受真正測試，第一個程式聲明它已成功的是 ELIZA，它僅以簡短的程式腳本，模仿心理學家，成功迷惑許多人，鼓勵他們說得更多些，同時也反問他們的問題。另一個早期程式腳本 PARRY 採取相反的方式，模仿偏狂型精神分裂症病患一直把話題轉回它預設的執著事物。它們成功愚弄人們突顯出這測驗的弱點，人們通常把許多和聰明才智無關的事物視為「智慧」。儘管如此，年度競賽例如羅布納獎（Loebner Prize）已使測試變得更正式了，審判員已預先知道其中有些他們的對話者是機器。雖然品質已有改善，但許多聊天機器人的程式員是採用類似 ELIZA 和 PARRY 的方法。 1997 年的獲獎者 Catherine 能夠進行驚人地專注且聰明的對話，但多半只在審判員願意談有關比爾‧克林頓的事時。最近的獲獎者尤金‧古斯特曼 (Eugene Goostman) 是個偽裝 13 歲烏克蘭男孩的機器人，審判員將它不合邏輯及蹩??鱆漱慦k 視為是語言及文化的障礙所致。同時，另一程式例如 Cleverbot 則採用不同的方法，藉著統計分析大量真正對話的資料庫，以決定最佳回應。有些還儲存先前對話的記憶，以便長期改善。雖然 Cleverbot 的每次回應聽起來極像人類，但它缺乏前後一致的個性，而且無法應付嶄新的話題，因而明顯露出馬腳。在圖靈時代的人怎能預料到今日的電腦能駕駛太空船、執行精細手術、能解開龐大的方程式、卻仍難以處理最基本的小交談呢？人類語言原來就是一種驚人複雜的現象，甚至連最大的字典也無法記錄。一個簡單的暫停就能難倒聊天機器人，例如「嗯……」或沒有正確解答的問題。一個簡單的對話句子，例如：「我從冰箱拿出果汁給他，但忘了檢查保存期限。」這需要很多基本理解及直覺語法分析。事實証明模擬人類對話不是只要增加記憶體及數據處理能力而已，當離圖靈的目標越近時，我們終究要去處理所有有關「意識」的大難題。

The Turing test: Can a computer pass for a human? - Alex Gendler

What is consciousness? Can an artificial machine really think? Does the mind just consist of neurons in the brain, or is there some intangible spark at its core? For many, these have been vital considerations for the future of artificial intelligence. But British computer scientist Alan Turing decided to disregard all these questions in favor of a much simpler one: can a computer talk like a human? This question led to an idea for measuring aritificial intelligence that would famously come to be known as the Turing test. In the 1950 paper, "Computing Machinery and Intelligence," Turing proposed the following game. A human judge has a text conversation with unseen players and evaluates their responses. To pass the test, a computer must be able to replace one of the players without substantially changing the results. In other words, a computer would be considered intelligent if its conversation couldn't be easily distinguished from a human's. Turing predicted that by the year 2000, machines with 100 megabytes of memory would be able to easily pass his test. But he may have jumped the gun. Even though today's computers have far more memory than that, few have succeeded, and those that have done well focused more on finding clever ways to fool judges than using overwhelming computing power. Though it was never subjected to a real test, the first program with some claim to success was called ELIZA. With only a fairly short and simple script, it managed to mislead many people by mimicking a psychologist, encouraging them to talk more and reflecting their own questions back at them. Another early script PARRY took the opposite approach by imitating a paranoid schizophrenic who kept steering the conversation back to his own preprogrammed obsessions. Their success in fooling people highlighted one weakness of the test. Humans regularly attribute intelligence to a whole range of things that are not actually intelligent. Nonetheless, annual competitions like the Loebner Prize, have made the test more formal with judges knowing ahead of time that some of their conversation partners are machines. But while the quality has improved, many chatbot programmers have used similar strategies to ELIZA and PARRY. 1997's winner Catherine could carry on amazingly focused and intelligent conversation, but mostly if the judge wanted to talk about Bill Clinton. And the more recent winner Eugene Goostman was given the persona of a 13-year-old Ukrainian boy, so judges interpreted its nonsequiturs and awkward grammar as language and culture barriers. Meanwhile, other programs like Cleverbot have taken a different approach by statistically analyzing huge databases of real conversations to determine the best responses. Some also store memories of previous conversations in order to improve over time. But while Cleverbot's individual responses can sound incredibly human, its lack of a consistent personality and inability to deal with brand new topics are a dead giveaway. Who in Turing's day could have predicted that today's computers would be able to pilot spacecraft, perform delicate surgeries, and solve massive equations, but still struggle with the most basic small talk? Human language turns out to be an amazingly complex phenomenon that can't be captured by even the largest dictionary. Chatbots can be baffled by simple pauses, like "umm..." or questions with no correct answer. And a simple conversational sentence, like, "I took the juice out of the fridge and gave it to him, but forgot to check the date," requires a wealth of underlying knowledge and intuition to parse. It turns out that simulating a human conversation takes more than just increasing memory and processing power, and as we get closer to Turing's goal, we may have to deal with all those big questions about consciousness after all.

授課教師
陳永忠 ycchen@thu.edu.tw