Chatbots and Large Language Models Part 2

Rollback to version 2

0:07 - 0:09

Hi, I'm Mira Murati.
0:09 - 0:12

I'm the chief technology officer
at Openai, the company that created
0:12 - 0:14

ChatGPT.
0:14 - 0:17

I really wanted to work on AI
0:17 - 0:21

because it has the potential
to really improve
0:21 - 0:26

almost every aspect of life
and help us tackle really hard challenges.
0:27 - 0:31

Hi, I'm Cristobal Valenzuela,
CEO and co-founder of Runway
0:31 - 0:35

Runway, is a research company
that builds AI algorithms
0:35 - 0:38

for storytelling and video creation.
0:40 - 0:43

Chat bots like ChatGPT
are based on a new type of AI
0:43 - 0:46

technology
that's called large language models.
0:47 - 0:52

So instead of a typical neural network
which trains on a specific task
0:52 - 0:56

like how to recognize faces
or images, a large language
0:56 - 1:00

model is trained on the largest amount
of information possible,
1:01 - 1:04

such as everything
available on the Internet.
1:04 - 1:07

It uses this training to then be able
1:07 - 1:10

to generate completely new information,
1:10 - 1:15

like to write essays or poems,
have conversations, or even write code.
1:16 - 1:18

The possibilities seem endless,
1:18 - 1:22

but how does this work
and what are its shortcomings?
1:22 - 1:24

Let's dive in.
1:24 - 1:28

While a chatbot built on a large
language model may seem magical,
1:29 - 1:32

it works
based on some really simple ideas.
1:32 - 1:38

In fact, most of the magic of AI
is based on very simple math concepts
1:38 - 1:43

from statistics applied billions of times
using fast computers.
1:43 - 1:47

The AI uses probabilities to predict
the text that you want it to produce
1:47 - 1:50

based on all the previous text
that it has been trained on.
1:51 - 1:54

Suppose that we want to train
a large language model
1:54 - 1:57

to read every play written
by William Shakespeare
1:57 - 2:00

so that it could write new plays
in the same style.
2:00 - 2:03

We'd start with all the texts
from Shakespeare's plays
2:04 - 2:07

stored letter by letter in a sequence
2:07 - 2:10

next, we'd analyze each letter
2:10 - 2:14

to see what letter
is most likely to come next after an I,
2:14 - 2:18

the next most likely letters
to show up in Shakespeare plays are
2:18 - 2:22

S or N after an, S,
2:22 - 2:26

T, C, or H, and so on.
2:26 - 2:29

This creates a table of probabilities.
2:30 - 2:33

With just this,
we can try to generate new writing.
2:34 - 2:36

We pick a random letter to start
2:37 - 2:39

starting with the first letter.
2:39 - 2:41

We can see
what's most likely to come next.
2:41 - 2:44

We don't always have to pick
the most popular choice
2:44 - 2:47

because that would lead
to repetitive cycles.
2:47 - 2:49

Instead, we pick randomly.
2:49 - 2:53

Once we have the next letter,
we repeat the process
2:53 - 2:56

to find the next letter
and then the next one and so on.
2:56 - 3:00

Okay, well,
that doesn't look at all like Shakespeare.
3:00 - 3:02

It's not even English,
but it's a first step.
3:02 - 3:06

The simple system might not seem
even remotely intelligent,
3:06 - 3:10

but as we build up from here,
you'll be surprised where it goes.
3:10 - 3:15

The problem in the last example
is that at any point the AI only considers
3:15 - 3:18

a single letter to pick what comes next.
3:19 - 3:23

That's not enough context,
and so the output is not helpful.
3:24 - 3:25

What if we could
3:25 - 3:29

train it to consider
a sequence of letters, like sentences
3:29 - 3:33

or paragraphs, to give it more context
to pick the next one?
3:33 - 3:36

To do this, we don't use a simple table
of probabilities.
3:37 - 3:39

We use a neural network.
3:39 - 3:42

A neural network is a computer system
that is loosely inspired
3:42 - 3:44

by the neurons in the brain.
3:44 - 3:48

It is trained on a body of information,
and with enough training,
3:49 - 3:53

it can learn to take in new information
and give simple answers.
3:54 - 3:57

The answers always include probabilities
3:57 - 3:59

because there can be many options.
4:00 - 4:03

Now let's take a neural network
and train it on
4:03 - 4:08

all the letters sequences
in Shakespeare's plays to learn
4:08 - 4:11

what letter is likely
to come next at any point.
4:14 - 4:17

Once we do this,
the neural networks can take
4:17 - 4:21

any new sequence and predict
what could be a good next letter.
4:21 - 4:24

Sometimes the answer is
obvious, but usually is not.
4:25 - 4:26

It turns out
4:26 - 4:29

this new approach works
better, much better
4:30 - 4:33

by looking at the long enough
sequence of letters, the AI
4:33 - 4:39

can learn complicated patterns, and
it uses those to produce all new texts.
4:39 - 4:42

It starts
the same way with a starting letter
4:43 - 4:47

and then using probabilities
to pick the next letter and so on.
4:48 - 4:50

But this time, the probabilities are based
4:50 - 4:54

on the entire context
of what came beforehand.
4:55 - 4:58

As you see, this works surprisingly well.
4:59 - 5:03

Now, a system like ChatGPT uses a similar approach,
5:03 - 5:06

but with three very important additions.
5:07 - 5:10

First,
instead of just training on Shakespeare,
5:10 - 5:13

it looks at all the information
it can find on the Internet,
5:14 - 5:18

including all the articles on Wikipedia
or all the code on GitHub.
5:19 - 5:22

Second,
instead of learning and predicting letters
5:22 - 5:27

from just the 26 choices in the alphabet,
it looks at tokens
5:27 - 5:33

which are either full words
or word parts or even code.
5:34 - 5:35

And third
5:35 - 5:39

difference
is that a system of this complexity
5:39 - 5:44

needs a lot of human tuning to make sure
it produces reasonable results
5:44 - 5:49

in a wide variety of situations,
while also protecting against problems
5:49 - 5:53

like producing highly biased
or even dangerous content.
5:54 - 5:58

Even after we do this tuning,
it's important to note that this system
5:58 - 6:02

is still just using random probabilities
to choose words.
6:03 - 6:05

A large language model can produce
6:05 - 6:08

unbelievable results that seem like magic,
6:09 - 6:13

but because it's not actually magic,
it can often get things wrong.
6:14 - 6:17

And when it gets things wrong, people ask, does
6:17 - 6:20

a large language
model have actual intelligence?
6:21 - 6:24

Discussions about A.I. often spark
6:24 - 6:27

philosophical debates
about the meaning of intelligence.
6:28 - 6:31

Some argue that a neural network
producing words
6:31 - 6:34

using probabilities
doesn't have really intelligence.
6:35 - 6:38

But what isn't under debate
is that large language
6:38 - 6:41

models produce amazing results
6:41 - 6:44

with applications in many fields.
6:44 - 6:49

This technology is already being used
to create apps and websites,
6:49 - 6:54

help produce movies and video games,
and even discover new drugs.
6:54 - 6:59

The rapid acceleration of
AI will have enormous impacts on society,
6:59 - 7:03

and it's important for everybody
to understand this technology.
7:03 - 7:06

What I'm looking forward to
is the amazing things
7:06 - 7:10

people will create with AI.,
and I hope you dive in to learn
7:10 - 7:15

more about how AI works
and explore what you can build with it.

Title:: Chatbots and Large Language Models Part 2
Description:: more » « less
Video Language:: English
Team:: Code.org
Project:: How AI Works
Duration:: 04:16

	Code.org edited English subtitles for Chatbots and Large Language Models Part 2
	Code.org edited English subtitles for Chatbots and Large Language Models Part 2
	Amara Bot edited English subtitles for Chatbots and Large Language Models Part 2

English subtitles

Revisions Compare revisions

Revision 3 Edited

Code.org
Revision 2 Uploaded

Code.org
Revision 1 ASR: YouTube automatic subtitles

Amara Bot

	Revision Number	Author	Created
	3	Code.org
	2	Code.org
	1	Amara Bot

Chatbots and Large Language Models Part 2

Revisions Compare revisions

Our website uses cookies

Operating cookies (Required)