0:00:06.677,0:00:11.306 How is it that so many [br]intergalactic species in movies and TV 0:00:11.306,0:00:14.483 just happen to speak perfect English? 0:00:14.483,0:00:17.886 The short answer is that no one[br]wants to watch a starship crew 0:00:17.886,0:00:21.774 spend years compiling an alien dictionary. 0:00:21.774,0:00:23.392 But to keep things consistent, 0:00:23.392,0:00:26.789 the creators of Star Trek[br]and other science-fiction worlds 0:00:26.789,0:00:30.514 have introduced the concept[br]of a universal translator, 0:00:30.514,0:00:35.012 a portable device that can instantly[br]translate between any languages. 0:00:35.012,0:00:38.539 So is a universal translator [br]possible in real life? 0:00:38.539,0:00:42.137 We already have many programs[br]that claim to do just that, 0:00:42.137,0:00:45.954 taking a word, sentence, [br]or entire book in one language 0:00:45.954,0:00:49.004 and translating it into almost any other, 0:00:49.004,0:00:52.337 whether it's modern English[br]or Ancient Sanskrit. 0:00:52.337,0:00:55.913 And if translation were just a matter[br]of looking up words in a dictionary, 0:00:55.913,0:00:59.825 these programs would run circles[br]around humans. 0:00:59.825,0:01:03.299 The reality, however, [br]is a bit more complicated. 0:01:03.299,0:01:07.349 A rule-based translation program[br]uses a lexical database, 0:01:07.349,0:01:10.302 which includes all the words [br]you'd find in a dictionary 0:01:10.302,0:01:13.283 and all grammatical forms they can take, 0:01:13.283,0:01:18.925 and set of rules to recognize the basic[br]linguistic elements in the input language. 0:01:18.925,0:01:22.396 For a seemingly simple sentence like,[br]"The children eat the muffins," 0:01:22.396,0:01:27.050 the program first parses its syntax,[br]or grammatical structure, 0:01:27.050,0:01:29.587 by identifying the children [br]as the subject, 0:01:29.587,0:01:32.317 and the rest of the sentence [br]as the predicate 0:01:32.317,0:01:34.368 consisting of a verb "eat," 0:01:34.368,0:01:37.422 and a direct object "the muffins." 0:01:37.422,0:01:40.249 It then needs to recognize[br]English morphology, 0:01:40.249,0:01:44.681 or how the language can be broken down[br]into its smallest meaningful units, 0:01:44.681,0:01:46.124 such as the word muffin 0:01:46.124,0:01:49.755 and the suffix "s," [br]used to indicate plural. 0:01:49.755,0:01:52.449 Finally, it needs to understand [br]the semantics, 0:01:52.449,0:01:56.178 what the different parts of the sentence[br]actually mean. 0:01:56.178,0:01:58.074 To translate this sentence properly, 0:01:58.074,0:02:01.982 the program would refer to a different set[br]of vocabulary and rules 0:02:01.982,0:02:05.166 for each element of the target language. 0:02:05.166,0:02:07.020 But this is where it gets tricky. 0:02:07.020,0:02:11.820 The syntax of some languages[br]allows words to be arranged in any order, 0:02:11.820,0:02:16.954 while in others, doing so could make[br]the muffin eat the child. 0:02:16.954,0:02:19.647 Morphology can also pose a problem. 0:02:19.647,0:02:23.243 Slovene distinguishes between[br]two children and three or more 0:02:23.243,0:02:27.097 using a dual suffix absent [br]in many other languages, 0:02:27.097,0:02:30.532 while Russian's lack of definite articles[br]might leave you wondering 0:02:30.532,0:02:33.575 whether the children are eating [br]some particular muffins, 0:02:33.575,0:02:36.719 or just eat muffins in general. 0:02:36.719,0:02:39.708 Finally, even when the semantics[br]are technically correct, 0:02:39.708,0:02:42.757 the program might miss their finer points, 0:02:42.757,0:02:45.809 such as whether the children [br]"mangiano" the muffins, 0:02:45.809,0:02:47.794 or "divorano" them. 0:02:47.794,0:02:51.558 Another method is [br]statistical machine translation, 0:02:51.558,0:02:55.762 which analyzes a database [br]of books, articles, and documents 0:02:55.762,0:02:59.488 that have already [br]been translated by humans. 0:02:59.488,0:03:02.959 By finding matches between source[br]and translated text 0:03:02.959,0:03:05.393 that are unlikely to occur by chance, 0:03:05.393,0:03:09.345 the program can identify corresponding[br]phrases and patterns, 0:03:09.345,0:03:12.429 and use them for future translations. 0:03:12.429,0:03:14.969 However, the quality [br]of this type of translation 0:03:14.969,0:03:17.690 depends on the size [br]of the initial database 0:03:17.690,0:03:21.357 and the availability of samples [br]for certain languages 0:03:21.357,0:03:23.383 or styles of writing. 0:03:23.383,0:03:27.140 The difficulty that computers have[br]with the exceptions, irregularities 0:03:27.140,0:03:30.994 and shades of meaning[br]that seem to come instinctively to humans 0:03:30.994,0:03:35.045 has led some researchers to believe[br]that our understanding of language 0:03:35.045,0:03:39.251 is a unique product [br]of our biological brain structure. 0:03:39.251,0:03:43.101 In fact, one of the most famous[br]fictional universal translators, 0:03:43.101,0:03:46.439 the Babel fish from [br]"The Hitchhiker's Guide to the Galaxy", 0:03:46.439,0:03:49.726 is not a machine at all[br]but a small creature 0:03:49.726,0:03:54.210 that translates the brain waves[br]and nerve signals of sentient species 0:03:54.210,0:03:57.005 through a form of telepathy. 0:03:57.005,0:03:59.726 For now, learning a language[br]the old fashioned way 0:03:59.726,0:04:05.106 will still give you better results than[br]any currently available computer program. 0:04:05.106,0:04:06.749 But this is no easy task, 0:04:06.749,0:04:09.014 and the sheer number [br]of languages in the world, 0:04:09.014,0:04:12.989 as well as the increasing interaction[br]between the people who speak them, 0:04:12.989,0:04:18.004 will only continue to spur greater[br]advances in automatic translation. 0:04:18.004,0:04:21.409 Perhaps by the time we encounter[br]intergalactic life forms, 0:04:21.409,0:04:24.660 we'll be able to communicate with them[br]through a tiny gizmo, 0:04:24.660,0:04:29.026 or we might have to start compiling[br]that dictionary, after all.