WEBVTT

00:00:13.254 --> 00:00:14.261
Hello there.

00:00:14.261 --> 00:00:17.826
Artificial intelligence is, of course,
literally a new beginning.

00:00:18.385 --> 00:00:23.125
We are trying to create 
a new type of a thinking being.

00:00:23.406 --> 00:00:28.463
In fact, we have achieved a lot
since we got started with this project.

00:00:28.496 --> 00:00:32.798
Computers can now play chess 
much better than humans.

00:00:32.798 --> 00:00:37.288
They can analyze radiological images
better than human doctors.

00:00:37.682 --> 00:00:41.174
But today, I will talk about a domain

00:00:41.174 --> 00:00:47.763
where AI has not yet reached
the level of a person of average IQ:

00:00:48.254 --> 00:00:50.643
understanding human language.

00:00:50.643 --> 00:00:55.466
You have probably read
this horrific news item

00:00:55.466 --> 00:01:02.376
about the "chatbots"
which are programmed to chat with people.

00:01:02.376 --> 00:01:08.164
In 2016, Microsoft created
a Twitter AI character

00:01:08.164 --> 00:01:11.225
which was supposed to learn 
the nuances of human language

00:01:11.225 --> 00:01:13.813
by tweeting with people.

00:01:13.813 --> 00:01:18.090
Twenty-four hours later,
they had to take it offline.

00:01:18.090 --> 00:01:23.110
Due to the nasty things, curses etc.
that people wrote to it in their tweets,

00:01:23.110 --> 00:01:25.624
it turned into this nauseating character 


00:01:25.624 --> 00:01:28.007
who said things like,
"Hitler was so good."

00:01:28.021 --> 00:01:32.221
It does not exist any more.

00:01:32.221 --> 00:01:35.882
A year later, in China - maybe
you have not heard about this one -

00:01:35.882 --> 00:01:40.204
a similar end was waiting for two chatbots

00:01:40.204 --> 00:01:45.060
which were launched in China to chit chat 
with users on Chinese social media sites

00:01:45.060 --> 00:01:49.537
after they started to talk
about their dreams of moving to the States

00:01:49.537 --> 00:01:54.105
or mentioned their dislike
for the Chinese Communist Party.

00:01:54.105 --> 00:01:56.815
They were deactivated
for a few days after the incident.

00:01:56.815 --> 00:02:01.925
After they were reactivated, 
they started talking very "carefully"

00:02:01.925 --> 00:02:03.216
when those issues came up, 


00:02:03.216 --> 00:02:07.597
giving answers like, 
"Sorry, I can not understand you."

00:02:07.687 --> 00:02:11.437
People who use the digital assistant Siri

00:02:11.437 --> 00:02:14.796
already know what a big
engineering success it is.

00:02:14.796 --> 00:02:16.596
Yet there's more to the story:

00:02:16.596 --> 00:02:20.836
Authors and poets
in every language are hired

00:02:20.836 --> 00:02:25.273
so that Siri can give proper answers
in such situations.

00:02:25.294 --> 00:02:28.034
They are also writing scripts

00:02:28.034 --> 00:02:31.946
so that it doesn't have to confess
it can't understand what is being said

00:02:31.946 --> 00:02:34.506
and it can continue the illusion
of being intelligent

00:02:34.506 --> 00:02:37.885
by diverting the conversation
when it gets stuck.

00:02:38.095 --> 00:02:42.054
Amazon also has a digital
assistant named Alexa,

00:02:42.064 --> 00:02:44.361
which doesn't have a Turkish version yet.

00:02:44.361 --> 00:02:47.276
They promised a one-million-dollar prize

00:02:47.276 --> 00:02:51.488
to the programming team which will enable 
Alexa to chat with people for 20 minutes

00:02:51.488 --> 00:02:54.201
without causing extensive boredom.

00:02:54.201 --> 00:02:56.268
No one has been able to do that yet.

00:02:56.268 --> 00:03:01.236
The problem is that there are lots
of very simple things that humans know,

00:03:01.236 --> 00:03:03.073
and computers don't.

00:03:03.073 --> 00:03:07.196
And we need to have a way
of teaching them those things.

00:03:07.196 --> 00:03:10.724
Let me tell you
a personal story about that.

00:03:11.738 --> 00:03:15.749
A long time ago, maybe 25 years or so,

00:03:15.749 --> 00:03:21.318
I bought a third-grade math textbook 
for primary school students

00:03:21.318 --> 00:03:26.117
and randomly picked 20 problems from it.

00:03:26.117 --> 00:03:31.285
Then I started to write a program 
which would "understand Turkish."

00:03:31.285 --> 00:03:35.310
It would understand these particular 
arithmetic problems in Turkish

00:03:35.310 --> 00:03:36.310
and solve them.

00:03:36.310 --> 00:03:38.956
I was thinking that 
I would therefore reach a new level

00:03:38.956 --> 00:03:41.614
in the computer understanding
of the Turkish language

00:03:41.614 --> 00:03:43.514
and write another paper.

00:03:43.514 --> 00:03:45.063
The name of the program was ALİ,

00:03:45.063 --> 00:03:48.002
a Turkish acronym for
"arithmetic language processor."

00:03:48.002 --> 00:03:51.353
It could solve problems like these:

00:03:51.353 --> 00:03:53.716
There were this many workers at a factory.

00:03:53.716 --> 00:03:56.356
That many of them were fired,
and this many retired.

00:03:56.356 --> 00:03:57.583
How many were left?

00:03:57.583 --> 00:04:00.805
Or questions like,
"That many students from College A

00:04:00.805 --> 00:04:02.589
and this many students from College B

00:04:02.589 --> 00:04:04.396
attended the ceremony.

00:04:04.396 --> 00:04:06.896
What's the total number of students?"

00:04:06.896 --> 00:04:10.399
requiring really simple arithmetic.

00:04:11.022 --> 00:04:12.369
You think that's easy peasy?

00:04:12.369 --> 00:04:15.389
Ask Siri the same questions,
and see if it can solve them all.

00:04:15.389 --> 00:04:18.826
Let me tell you,
it took two years of my youth,

00:04:18.826 --> 00:04:22.395
and I used to have gorgeous hair
when I got started.

00:04:22.400 --> 00:04:23.750
(Laughter)

00:04:23.786 --> 00:04:24.996
Here is the problem.

00:04:24.996 --> 00:04:27.378
Let us go through this example.

00:04:27.378 --> 00:04:30.725
There were 67 liters of diesel
in the gas tank of a truck.

00:04:30.725 --> 00:04:32.945
The driver bought 145 liters more.

00:04:32.945 --> 00:04:34.887
How much diesel does the truck have now?

00:04:34.887 --> 00:04:40.052
We are skipping the linguistics routines
that analyze all this in Turkish.

00:04:40.052 --> 00:04:42.558
Let's come to the point

00:04:42.558 --> 00:04:45.928
where the AI can understand
the fact that there need to be 212 liters

00:04:45.928 --> 00:04:48.450
at the end of the first two sentences.

00:04:48.450 --> 00:04:52.657
There, we come to a point where it knows 
that there are 212 liters of diesel

00:04:52.657 --> 00:04:53.747
in the gas tank,

00:04:53.747 --> 00:04:56.017
but what was the wording
of the question again?

00:04:56.017 --> 00:04:58.573
"What's the sum of diesel in the truck?"

00:04:58.573 --> 00:05:00.822
"How much diesel
does the truck have now?"

00:05:00.822 --> 00:05:05.768
ALİ could not answer that
with the information we have mentioned.

00:05:05.768 --> 00:05:07.518
Do you see what the problem was?

00:05:07.518 --> 00:05:11.819
"The gas tank of the truck" is not 
the same thing as "the truck,"

00:05:11.819 --> 00:05:14.936
and computers do not know automatically

00:05:14.936 --> 00:05:18.322
that if the tank contains something,
the truck also contains that thing.

00:05:18.322 --> 00:05:19.808
And that's really complicated.

00:05:19.808 --> 00:05:21.515
"Ahmet's father had five kids"

00:05:21.515 --> 00:05:23.270
does not mean "Ahmet had five kids."

00:05:23.270 --> 00:05:24.275
On the other hand,

00:05:24.275 --> 00:05:28.157
when the gas tank of the truck
has the petrol, the truck has it as well.

00:05:28.157 --> 00:05:31.982
That's why I had to specify in the program

00:05:31.982 --> 00:05:36.427
all this knowledge
that people already inherently know.

00:05:36.427 --> 00:05:39.867
The technical name for this stuff
is "commonsense knowledge."

00:05:39.867 --> 00:05:43.556
"The gas tank of the truck
is a part of the truck."

00:05:44.596 --> 00:05:48.136
"If A is a part of B, right,

00:05:48.196 --> 00:05:51.098
B should contain
everything contained in A."

00:05:51.396 --> 00:05:53.899
All of this information
that I consider commonsense

00:05:53.899 --> 00:05:56.788
is all the things that I do not tell you
while we are talking

00:05:56.798 --> 00:06:02.345
since I assume
that you already know it all.

00:06:02.345 --> 00:06:06.756
We can not have a proper conversation 
with those chatbots

00:06:06.756 --> 00:06:08.601
since they know none of those things.

00:06:08.601 --> 00:06:11.693
After I coded all these,
ALİ could solve all 20 problems properly.

00:06:11.693 --> 00:06:16.935
I had no more energy to go on to the 21st.

00:06:17.226 --> 00:06:20.954
Now, I'll tell you the story of a man 
who dedicated his life to this problem

00:06:20.954 --> 00:06:22.857
of coding commonsense knowledge:

00:06:22.857 --> 00:06:26.945
Douglas Lenat, a famous American 
computer scientist.

00:06:26.945 --> 00:06:29.674
This is him in the 1980s.

00:06:29.674 --> 00:06:34.236
He started a project called Cyc in 1982.

00:06:34.236 --> 00:06:36.587
And this is exactly
what the project was about:

00:06:36.587 --> 00:06:41.065
To code all the commonsense knowledge 
that computers don't know.

00:06:41.065 --> 00:06:44.206
To write a million lines, 
if a million lines are needed.

00:06:44.206 --> 00:06:47.398
He founded a corporation
where they do the following:

00:06:47.398 --> 00:06:51.684
If you are drinking coffee, the open side
of the cup is facing upwards.

00:06:52.637 --> 00:06:55.421
The king is a man.

00:06:55.421 --> 00:06:58.897
Then his wife should be a woman,
and she is called the queen.

00:06:58.897 --> 00:07:02.836
People can't go to work after they die.

00:07:02.836 --> 00:07:03.851
And so on.

00:07:03.851 --> 00:07:07.451
They are coding all the items 
of information which people already know

00:07:07.451 --> 00:07:12.885
and computers need to know in order 
to understand human language, one by one.

00:07:12.885 --> 00:07:15.098
And this is him today.

00:07:15.098 --> 00:07:18.155
After 35 years, the project
is still in progress.

00:07:18.207 --> 00:07:20.788
I think there's an obvious problem here.

00:07:20.788 --> 00:07:23.685
It's clearly problematic to code manually.

00:07:23.685 --> 00:07:25.816
Now it's time to hear the good news.

00:07:25.816 --> 00:07:27.726
We have had a revolution in AI,

00:07:27.726 --> 00:07:30.456
and computers can now learn
certain things on their own,

00:07:30.456 --> 00:07:35.827
without us having to code them manually.

00:07:35.827 --> 00:07:37.684
This is a machine-learning revolution.

00:07:37.986 --> 00:07:40.558
Linguists have the following idea:

00:07:40.763 --> 00:07:45.763
If two words are exact 
synonyms of each other,

00:07:45.763 --> 00:07:49.946
then the collections of all other words
surrounding them in various sentences

00:07:49.946 --> 00:07:51.676
will also be similar to each other.

00:07:51.676 --> 00:07:55.741
Based on this idea, this man,

00:07:55.741 --> 00:08:00.696
who is proof of the fact
that you don't need to be bald

00:08:00.696 --> 00:08:03.414
in order to be handsome
if you're an AI researcher,

00:08:04.264 --> 00:08:06.407
named Tomas Mikolov,

00:08:06.837 --> 00:08:11.217
did the following while working
for Google five years ago.

00:08:11.464 --> 00:08:13.976
Now think of all the documents 
in English at Google.

00:08:13.976 --> 00:08:16.317
The work I'll be telling
you about was in English.

00:08:16.317 --> 00:08:18.418
Now imagine all the documents in English.

00:08:18.418 --> 00:08:20.386
For every word in every sentence,

00:08:20.386 --> 00:08:25.836
you're supposed to find out how many times
it has appeared in the same sentence

00:08:25.836 --> 00:08:27.247
with any other words.

00:08:27.247 --> 00:08:31.014
For every imaginable pair of words, 
we have the computer count

00:08:31.014 --> 00:08:36.935
how many times these two words appear
together in the same sentence or not.

00:08:36.965 --> 00:08:38.318
It's a computer,

00:08:38.318 --> 00:08:41.817
so it can do the computations anyway.

00:08:41.817 --> 00:08:46.437
The idea is that, if the two words
are close to each other in meaning,

00:08:46.437 --> 00:08:50.006
the same words appear with similar
frequencies in their surroundings.

00:08:50.006 --> 00:08:53.525
Let's say, we can easily see
that both words "cat" and "dog"

00:08:53.525 --> 00:08:55.756
will appear frequently 
in the same sentences

00:08:55.756 --> 00:09:02.718
with the words "flea" or "rabies,"
"vaccine," "tail," "pet," and so on,

00:09:02.738 --> 00:09:07.097
but not with words like "printer,"
"generator" or "inflation."

00:09:07.097 --> 00:09:08.690
Do we see this?

00:09:08.690 --> 00:09:12.385
So, we can prepare a number sequence

00:09:12.385 --> 00:09:14.975
containing the frequencies
of the neighboring words

00:09:14.975 --> 00:09:17.846
for every single word.

00:09:17.846 --> 00:09:23.095
Such a number sequence
is called a "vector,"

00:09:23.095 --> 00:09:26.707
as you might well know
if they still teach it in high school.

00:09:26.707 --> 00:09:31.296
The computer can automatically position 
similar number sequences

00:09:31.296 --> 00:09:34.387
closer to each other,

00:09:34.387 --> 00:09:36.667
and the dissimilar ones 
far from each other

00:09:36.667 --> 00:09:41.235
on some sort of a map or space.

00:09:41.235 --> 00:09:46.085
What I mean is that the computer,
which knows no English,

00:09:46.105 --> 00:09:50.956
creates a vector for each single word
by doing the computations.

00:09:50.966 --> 00:09:52.126
Yet, the vector of "cat"

00:09:52.126 --> 00:09:55.206
is found in a location
close to the vector of "dog" in that space

00:09:55.206 --> 00:09:56.785
for the reasons I just explained.

00:09:56.785 --> 00:10:00.522
Or the vector of the school 
Buffy the Vampire Slayer attends -

00:10:00.522 --> 00:10:03.178
they really looked at that -

00:10:03.178 --> 00:10:05.518
is positioned close
to the vector of Hogwarts,

00:10:05.518 --> 00:10:07.598
where Harry Potter studies.

00:10:07.748 --> 00:10:11.725
Thus they are found to be positioned close
to each other in terms of their meaning.

00:10:11.725 --> 00:10:12.750
There's more.

00:10:12.750 --> 00:10:16.033
As you will recall
from that high school course,

00:10:16.033 --> 00:10:18.505
you can do arithmetic on these vectors.

00:10:18.505 --> 00:10:20.697
They can be added or subtracted,

00:10:20.697 --> 00:10:22.245
and you might say, "So what?"

00:10:22.750 --> 00:10:25.513
Mikolov discovered this.

00:10:25.513 --> 00:10:29.321
He did the following addition
and subtraction operations

00:10:29.321 --> 00:10:30.681
on the vectors thus learned.

00:10:30.681 --> 00:10:32.946
He came up with the question,
"What would happen

00:10:32.946 --> 00:10:36.704
if the king were a woman instead of a man"

00:10:36.744 --> 00:10:38.884
when he subtracted the word "man"

00:10:38.884 --> 00:10:43.041
from the word "king"
and added the word "woman."

00:10:43.041 --> 00:10:45.544
Guess what the resulting 
vector is near to?

00:10:46.418 --> 00:10:47.432
"Queen."

00:10:47.432 --> 00:10:50.575
No one had hand-coded
that equation as the Lenat team.

00:10:50.575 --> 00:10:53.967
The computer discovered it all by itself

00:10:53.967 --> 00:10:58.146
after counting millions of millions
of words on the documents we created.

00:10:58.146 --> 00:11:01.654
I have more to tell you, 
and this really happened.

00:11:01.654 --> 00:11:03.979
There is info on Turkey there.

00:11:03.979 --> 00:11:08.378
If you take "France" out of "Paris"
and add "Turkey" -

00:11:08.378 --> 00:11:11.305
yes, you got it right - it's Ankara.

00:11:11.305 --> 00:11:13.815
This means in this vector space, 
there's a direction

00:11:13.845 --> 00:11:17.196
which leads from the names of countries
to the names of their capitals,

00:11:17.196 --> 00:11:18.446
which is really stunning.

00:11:18.446 --> 00:11:23.254
When you ask, What would Windows be
had it not been invented by Microsoft,

00:11:23.254 --> 00:11:26.084
but by Google?

00:11:26.714 --> 00:11:29.004
the answer pops up as "Android."

00:11:29.004 --> 00:11:35.192
When you subtract "copper"
from "Cu" and add "gold,"

00:11:36.162 --> 00:11:39.922
you get "Au" as the chemical
symbol of gold.

00:11:39.922 --> 00:11:43.130
This literally means we don't have to code
these manually anymore.

00:11:43.130 --> 00:11:46.304
It seems that the computer
can make all the inferences

00:11:46.304 --> 00:11:48.812
out of the data
we provide it with all by itself.

00:11:48.812 --> 00:11:50.552
This is the yummiest example of all.

00:11:50.552 --> 00:11:55.305
When you take "Japan" out of "sushi"
and add "Germany,"

00:11:55.915 --> 00:11:59.942
you get the "bratwurst,"
the German favorite.

00:12:00.382 --> 00:12:01.702
Too good to be true, right?

00:12:01.702 --> 00:12:02.711
Happy now?

00:12:02.711 --> 00:12:04.393
We finalized this project.

00:12:04.393 --> 00:12:08.922
Would computers understand what we say?

00:12:08.922 --> 00:12:10.942
Are we having fun? Not much.

00:12:10.942 --> 00:12:14.415
Now, I'll tell you
about a Turkish researcher.

00:12:14.415 --> 00:12:19.118
Tolga Bölükbaşı is about to finish his PhD

00:12:19.118 --> 00:12:21.014
at Boston University in the States.

00:12:21.014 --> 00:12:23.743
This is a research he did two years ago.

00:12:23.773 --> 00:12:27.375
Tolga did the same thing
as Mikolov did previously,

00:12:27.375 --> 00:12:30.433
but this time on news texts.

00:12:31.212 --> 00:12:37.414
What happens when you subtract "father"
from "doctor" and add "mom"?

00:12:37.414 --> 00:12:41.784
"My dad is a doctor, and mom is a nurse."

00:12:41.784 --> 00:12:47.963
What about when you subtract "man"
from "computer engineer" and add "woman"?

00:12:47.963 --> 00:12:51.895
In fact, we shouldn't have gender.

00:12:51.895 --> 00:12:56.757
Let's see how professions
are related to gender

00:12:56.757 --> 00:13:01.362
in the meaning space
in the head of the computer.

00:13:01.362 --> 00:13:02.510
You get "homemaker."

00:13:02.510 --> 00:13:04.346
Seriously! You get "homemaker."

00:13:04.346 --> 00:13:07.542
We get an English word "homemaker."

00:13:07.542 --> 00:13:11.233
So, it's clear that we not only put 
all of our data in computers

00:13:11.233 --> 00:13:16.890
but also put all of our prejudices.

00:13:16.890 --> 00:13:22.628
Imagine if this computer
were used to hire someone.

00:13:22.628 --> 00:13:26.793
You've already uploaded your resume
and all the personal information

00:13:26.793 --> 00:13:29.401
including your gender.

00:13:29.401 --> 00:13:31.935
Let's assume 10,000 people
applied for the job.

00:13:31.935 --> 00:13:34.268
The computer needs
to do a pre-selection, right?

00:13:34.268 --> 00:13:37.763
It needs to get to 1,000 candidates,

00:13:38.363 --> 00:13:41.553
eliminating 9,000 others

00:13:41.553 --> 00:13:43.751
so that the HR staff 
can evaluate the results.

00:13:43.751 --> 00:13:46.543
Computers nowadays are already 
used for this kind of work.

00:13:46.543 --> 00:13:50.112
Let's say that a computer loaded
with such meaning vectors makes selection

00:13:50.112 --> 00:13:53.923
among the candidates who have applied 
for a job vacancy for a computer engineer.

00:13:53.923 --> 00:13:56.763
It might automatically eliminate 
all the female candidates,

00:13:56.763 --> 00:14:01.795
thinking that a computer
engineer should be male.

00:14:02.245 --> 00:14:04.831
Tolga and his colleagues 
also mention other cases.

00:14:04.831 --> 00:14:11.124
It was found out that computers
link positive and negative attributions

00:14:11.124 --> 00:14:16.222
with the words related to being 
Afro-American and Caucasian.

00:14:16.222 --> 00:14:19.668
For instance, the computer thinks

00:14:19.668 --> 00:14:23.001
that the word "mugger" is closely related
to being Afro-American.

00:14:23.001 --> 00:14:26.542
It's certain that we uploaded 
all our prejudices

00:14:26.542 --> 00:14:29.274
while uploading all the information 
we have in computers.

00:14:29.274 --> 00:14:32.503
You might ask yourselves, 
What will happen now?

00:14:32.503 --> 00:14:36.416
Tolga and his team's article
offers a solution to that.

00:14:37.546 --> 00:14:39.646
Just told you.

00:14:39.646 --> 00:14:42.345
All these things happen 
in the vector space.

00:14:42.345 --> 00:14:44.197
Each word has its vector.

00:14:44.197 --> 00:14:48.105
We already know from high school years 
that we can add and subtract them.

00:14:48.105 --> 00:14:51.883
Tolga and his team first list the words

00:14:51.883 --> 00:14:57.054
that are really feminine or masculine,

00:14:57.054 --> 00:15:01.117
like "dad," "uncle,"
"grandmother," and so on.

00:15:01.117 --> 00:15:06.017
These words really should have
a relation to male and female roles.

00:15:06.034 --> 00:15:11.085
Then there are these words
which should not be masculine or feminine

00:15:11.085 --> 00:15:13.804
despite having closer meanings 
in the computer's space.

00:15:13.804 --> 00:15:19.282
For example, the word "genius"
appears to be male.

00:15:19.282 --> 00:15:24.114
On the other hand, the word "stylist"
stands out as a very female word.

00:15:24.114 --> 00:15:26.165
It doesn't have to be like that.

00:15:26.165 --> 00:15:29.578
So, after listing all the words
that need to be feminine or masculine,

00:15:29.578 --> 00:15:33.124
Tolga and his team created an algorithm

00:15:33.124 --> 00:15:40.018
which would automatically erase
the computer's prejudices

00:15:40.018 --> 00:15:46.838
on the ones that should be neutral.

00:15:47.068 --> 00:15:50.755
If a word like "father" or "uncle"
is not in the list,

00:15:50.825 --> 00:15:56.316
but it is still biased towards a gender
in the space of meanings,

00:15:56.316 --> 00:16:01.352
the algorithm automatically corrects it.

00:16:01.352 --> 00:16:03.419
With the help of this,
"computer programmer"

00:16:03.419 --> 00:16:06.315
ends up at the same distance
to the male and female notions,

00:16:06.315 --> 00:16:09.323
and the problems I talked about go away.

00:16:09.323 --> 00:16:10.344
Isn't that beautiful?

00:16:10.344 --> 00:16:15.532
I wish we could delete the prejudices 
in the human brain so easily.

00:16:15.532 --> 00:16:21.737
For a while, some people
have been worrying about

00:16:21.737 --> 00:16:24.146
what would happen
if computers took over.

00:16:24.146 --> 00:16:25.891
On the other hand,

00:16:25.891 --> 00:16:29.241
considering the fact that we can't delete
the prejudices in people,

00:16:29.241 --> 00:16:32.992
while we can in computers,

00:16:32.992 --> 00:16:37.434
maybe we could give computers a chance 
at jobs requiring fairness

00:16:37.434 --> 00:16:43.025
such as being referees,
judges, and managers

00:16:43.025 --> 00:16:45.145
and let people take a rest for a while.

00:16:45.145 --> 00:16:46.386
What do you say to that?

00:16:46.782 --> 00:16:47.811
Thank you.

00:16:47.835 --> 00:16:50.525
(Applause)