1
99:59:59,999 --> 99:59:59,999
♪ (Music fades in) ♪

2
99:59:59,999 --> 99:59:59,999
(Chirping)

3
99:59:59,999 --> 99:59:59,999
(Vocalizations, different languages)

4
99:59:59,999 --> 99:59:59,999
(Talking overlaps in background)

5
99:59:59,999 --> 99:59:59,999
(Computerized beeping)

6
99:59:59,999 --> 99:59:59,999
(Man) We come into this world with the
innate ability to learn to interact

7
99:59:59,999 --> 99:59:59,999
with other sentient beings.

8
99:59:59,999 --> 99:59:59,999
(Child vocalizing)

9
99:59:59,999 --> 99:59:59,999
(Man) Suppose you are to interact with
other people by writing little messages.

10
99:59:59,999 --> 99:59:59,999
(Man) It'd be a real pain.

11
99:59:59,999 --> 99:59:59,999
(Man) And that's how we interact
with computers.

12
99:59:59,999 --> 99:59:59,999
It's much easier just to talk to them...
just so much easier...

13
99:59:59,999 --> 99:59:59,999
(Man) If the computers could understand
what we're saying.

14
99:59:59,999 --> 99:59:59,999
For that, you need really 
good speech recognition.

15
99:59:59,999 --> 99:59:59,999
(Narrator) The first speech recognition
system was developed by Bell Laboratories

16
99:59:59,999 --> 99:59:59,999
(Narrator) in 1952. It could only recognize
numbers spoken by one person.

17
99:59:59,999 --> 99:59:59,999
(Narrator) In the 1970s, Carnegie-Mellon
came out with the Harpy System.

18
99:59:59,999 --> 99:59:59,999
(Narrator) This was able to recognize over
1,000 words and different pronunciations

19
99:59:59,999 --> 99:59:59,999
(Narrator) of the same word.
- (Man) Tomato - (Woman) Tomato

20
99:59:59,999 --> 99:59:59,999
(Narrator) Speech recognition continued
in the 80s with the introduction of the

21
99:59:59,999 --> 99:59:59,999
(Narrator) Hidden Markov Model, which
used a more mathematical approach

22
99:59:59,999 --> 99:59:59,999
(Narrator) to analyzing sound waves that
led to many breakthroughs we have today.

23
99:59:59,999 --> 99:59:59,999
You're taking in very raw audio wave forms

24
99:59:59,999 --> 99:59:59,999
like you get through a microphone