WEBVTT 00:00:06.677 --> 00:00:11.306 How is it that so many intergalactic species in movies and TV 00:00:11.306 --> 00:00:14.483 just happen to speak perfect English? 00:00:14.483 --> 00:00:17.886 The short answer is that no one wants to watch a starship crew 00:00:17.886 --> 00:00:21.774 spend years compiling an alien dictionary. 00:00:21.774 --> 00:00:23.392 But to keep things consistent, 00:00:23.392 --> 00:00:26.789 the creators of Star Trek and other science-fiction worlds 00:00:26.789 --> 00:00:30.514 have introduced the concept of a universal translator, 00:00:30.514 --> 00:00:35.012 a portable device that can instantly translate between any languages. 00:00:35.012 --> 00:00:38.539 So is a universal translator possible in real life? 00:00:38.539 --> 00:00:42.137 We already have many programs that claim to do just that, 00:00:42.137 --> 00:00:45.954 taking a word, sentence, or entire book in one language 00:00:45.954 --> 00:00:49.004 and translating it into almost any other, 00:00:49.004 --> 00:00:52.337 whether it's modern English or Ancient Sanskrit. 00:00:52.337 --> 00:00:55.913 And if translation were just a matter of looking up words in a dictionary, 00:00:55.913 --> 00:00:59.825 these programs would run circles around humans. 00:00:59.825 --> 00:01:03.299 The reality, however, is a bit more complicated. 00:01:03.299 --> 00:01:07.349 A rule-based translation program uses a lexical database, 00:01:07.349 --> 00:01:10.302 which includes all the words you'd find in a dictionary 00:01:10.302 --> 00:01:13.283 and all grammatical forms they can take, 00:01:13.283 --> 00:01:18.925 and set of rules to recognize the basic linguistic elements in the input language. 00:01:18.925 --> 00:01:22.396 For a seemingly simple sentence like, "The children eat the muffins," 00:01:22.396 --> 00:01:27.050 the program first parses its syntax, or grammatical structure, 00:01:27.050 --> 00:01:29.587 by identifying the children as the subject, 00:01:29.587 --> 00:01:32.317 and the rest of the sentence as the predicate 00:01:32.317 --> 00:01:34.368 consisting of a verb "eat," 00:01:34.368 --> 00:01:37.422 and a direct object "the muffins." 00:01:37.422 --> 00:01:40.249 It then needs to recognize English morphology, 00:01:40.249 --> 00:01:44.681 or how the language can be broken down into its smallest meaningful units, 00:01:44.681 --> 00:01:46.124 such as the word muffin 00:01:46.124 --> 00:01:49.755 and the suffix "s," used to indicate plural. 00:01:49.755 --> 00:01:52.449 Finally, it needs to understand the semantics, 00:01:52.449 --> 00:01:56.178 what the different parts of the sentence actually mean. 00:01:56.178 --> 00:01:58.074 To translate this sentence properly, 00:01:58.074 --> 00:02:01.982 the program would refer to a different set of vocabulary and rules 00:02:01.982 --> 00:02:05.166 for each element of the target language. 00:02:05.166 --> 00:02:07.020 But this is where it gets tricky. 00:02:07.020 --> 00:02:11.820 The syntax of some languages allows words to be arranged in any order, 00:02:11.820 --> 00:02:16.954 while in others, doing so could make the muffin eat the child. 00:02:16.954 --> 00:02:19.647 Morphology can also pose a problem. 00:02:19.647 --> 00:02:23.243 Slovene distinguishes between two children and three or more 00:02:23.243 --> 00:02:27.097 using a dual suffix absent in many other languages, 00:02:27.097 --> 00:02:30.532 while Russian's lack of definite articles might leave you wondering 00:02:30.532 --> 00:02:33.575 whether the children are eating some particular muffins, 00:02:33.575 --> 00:02:36.719 or just eat muffins in general. 00:02:36.719 --> 00:02:39.708 Finally, even when the semantics are technically correct, 00:02:39.708 --> 00:02:42.757 the program might miss their finer points, 00:02:42.757 --> 00:02:45.809 such as whether the children "mangiano" the muffins, 00:02:45.809 --> 00:02:47.794 or "divorano" them. 00:02:47.794 --> 00:02:51.558 Another method is statistical machine translation, 00:02:51.558 --> 00:02:55.762 which analyzes a database of books, articles, and documents 00:02:55.762 --> 00:02:59.488 that have already been translated by humans. 00:02:59.488 --> 00:03:02.959 By finding matches between source and translated text 00:03:02.959 --> 00:03:05.393 that are unlikely to occur by chance, 00:03:05.393 --> 00:03:09.345 the program can identify corresponding phrases and patterns, 00:03:09.345 --> 00:03:12.429 and use them for future translations. 00:03:12.429 --> 00:03:14.969 However, the quality of this type of translation 00:03:14.969 --> 00:03:17.690 depends on the size of the initial database 00:03:17.690 --> 00:03:21.357 and the availability of samples for certain languages 00:03:21.357 --> 00:03:23.383 or styles of writing. 00:03:23.383 --> 00:03:27.140 The difficulty that computers have with the exceptions, irregularities 00:03:27.140 --> 00:03:30.994 and shades of meaning that seem to come instinctively to humans 00:03:30.994 --> 00:03:35.045 has led some researchers to believe that our understanding of language 00:03:35.045 --> 00:03:39.251 is a unique product of our biological brain structure. 00:03:39.251 --> 00:03:43.101 In fact, one of the most famous fictional universal translators, 00:03:43.101 --> 00:03:46.439 the Babel fish from "The Hitchhiker's Guide to the Galaxy", 00:03:46.439 --> 00:03:49.726 is not a machine at all but a small creature 00:03:49.726 --> 00:03:54.210 that translates the brain waves and nerve signals of sentient species 00:03:54.210 --> 00:03:57.005 through a form of telepathy. 00:03:57.005 --> 00:03:59.726 For now, learning a language the old fashioned way 00:03:59.726 --> 00:04:05.106 will still give you better results than any currently available computer program. 00:04:05.106 --> 00:04:06.749 But this is no easy task, 00:04:06.749 --> 00:04:09.014 and the sheer number of languages in the world, 00:04:09.014 --> 00:04:12.989 as well as the increasing interaction between the people who speak them, 00:04:12.989 --> 00:04:18.004 will only continue to spur greater advances in automatic translation. 00:04:18.004 --> 00:04:21.409 Perhaps by the time we encounter intergalactic life forms, 00:04:21.409 --> 00:04:24.660 we'll be able to communicate with them through a tiny gizmo, 00:04:24.660 --> 00:04:29.026 or we might have to start compiling that dictionary, after all.