WEBVTT 99:59:59.999 --> 99:59:59.999 ♪ (Music fades in) ♪ 99:59:59.999 --> 99:59:59.999 (Chirping) 99:59:59.999 --> 99:59:59.999 (Vocalizations, different languages) 99:59:59.999 --> 99:59:59.999 (Talking overlaps in background) 99:59:59.999 --> 99:59:59.999 (Computerized beeping) 99:59:59.999 --> 99:59:59.999 (Man) We come into this world with the innate ability to learn to interact 99:59:59.999 --> 99:59:59.999 with other sentient beings. 99:59:59.999 --> 99:59:59.999 (Child vocalizing) 99:59:59.999 --> 99:59:59.999 (Man) Suppose you are to interact with other people by writing little messages. 99:59:59.999 --> 99:59:59.999 (Man) It'd be a real pain. 99:59:59.999 --> 99:59:59.999 (Man) And that's how we interact with computers. 99:59:59.999 --> 99:59:59.999 It's much easier just to talk to them... just so much easier... 99:59:59.999 --> 99:59:59.999 (Man) If the computers could understand what we're saying. 99:59:59.999 --> 99:59:59.999 For that, you need really good speech recognition. 99:59:59.999 --> 99:59:59.999 (Narrator) The first speech recognition system was developed by Bell Laboratories 99:59:59.999 --> 99:59:59.999 (Narrator) in 1952. It could only recognize numbers spoken by one person. 99:59:59.999 --> 99:59:59.999 (Narrator) In the 1970s, Carnegie-Mellon came out with the Harpy System. 99:59:59.999 --> 99:59:59.999 (Narrator) This was able to recognize over 1,000 words and different pronunciations 99:59:59.999 --> 99:59:59.999 (Narrator) of the same word. - (Man) Tomato - (Woman) Tomato 99:59:59.999 --> 99:59:59.999 (Narrator) Speech recognition continued in the 80s with the introduction of the 99:59:59.999 --> 99:59:59.999 (Narrator) Hidden Markov Model, which used a more mathematical approach 99:59:59.999 --> 99:59:59.999 (Narrator) to analyzing sound waves that led to many breakthroughs we have today. 99:59:59.999 --> 99:59:59.999 You're taking in very raw audio wave forms 99:59:59.999 --> 99:59:59.999 like you get through a microphone