< Return to Video

Chatbots and Large Language Models Part 2

  • 0:07 - 0:09
    Hi, I'm Mira Murati.
  • 0:09 - 0:12
    I'm the chief technology officer
    at Openai, the company that created
  • 0:12 - 0:14
    ChatGPT.
  • 0:14 - 0:17
    I really wanted to work on AI
  • 0:17 - 0:21
    because it has the potential
    to really improve
  • 0:21 - 0:26
    almost every aspect of life
    and help us tackle really hard challenges.
  • 0:27 - 0:31
    Hi, I'm Cristobal Valenzuela,
    CEO and co-founder of Runway
  • 0:31 - 0:35
    Runway, is a research company
    that builds AI algorithms
  • 0:35 - 0:38
    for storytelling and video creation.
  • 0:40 - 0:43
    Chat bots like ChatGPT
    are based on a new type of AI
  • 0:43 - 0:46
    technology
    that's called large language models.
  • 0:47 - 0:52
    So instead of a typical neural network
    which trains on a specific task
  • 0:52 - 0:56
    like how to recognize faces
    or images, a large language
  • 0:56 - 1:00
    model is trained on the largest amount
    of information possible,
  • 1:01 - 1:04
    such as everything
    available on the Internet.
  • 1:04 - 1:07
    It uses this training to then be able
  • 1:07 - 1:10
    to generate completely new information,
  • 1:10 - 1:15
    like to write essays or poems,
    have conversations, or even write code.
  • 1:16 - 1:18
    The possibilities seem endless,
  • 1:18 - 1:22
    but how does this work
    and what are its shortcomings?
  • 1:22 - 1:24
    Let's dive in.
  • 1:24 - 1:28
    While a chatbot built on a large
    language model may seem magical,
  • 1:29 - 1:32
    it works
    based on some really simple ideas.
  • 1:32 - 1:38
    In fact, most of the magic of AI
    is based on very simple math concepts
  • 1:38 - 1:43
    from statistics applied billions of times
    using fast computers.
  • 1:43 - 1:47
    The AI uses probabilities to predict
    the text that you want it to produce
  • 1:47 - 1:50
    based on all the previous text
    that it has been trained on.
  • 1:51 - 1:54
    Suppose that we want to train
    a large language model
  • 1:54 - 1:57
    to read every play written
    by William Shakespeare
  • 1:57 - 2:00
    so that it could write new plays
    in the same style.
  • 2:00 - 2:03
    We'd start with all the texts
    from Shakespeare's plays
  • 2:04 - 2:07
    stored letter by letter in a sequence
  • 2:07 - 2:10
    next, we'd analyze each letter
  • 2:10 - 2:14
    to see what letter
    is most likely to come next after an I,
  • 2:14 - 2:18
    the next most likely letters
    to show up in Shakespeare plays are
  • 2:18 - 2:22
    S or N after an, S,
  • 2:22 - 2:26
    T, C, or H, and so on.
  • 2:26 - 2:29
    This creates a table of probabilities.
  • 2:30 - 2:33
    With just this,
    we can try to generate new writing.
  • 2:34 - 2:36
    We pick a random letter to start
  • 2:37 - 2:39
    starting with the first letter.
  • 2:39 - 2:41
    We can see
    what's most likely to come next.
  • 2:41 - 2:44
    We don't always have to pick
    the most popular choice
  • 2:44 - 2:47
    because that would lead
    to repetitive cycles.
  • 2:47 - 2:49
    Instead, we pick randomly.
  • 2:49 - 2:53
    Once we have the next letter,
    we repeat the process
  • 2:53 - 2:56
    to find the next letter
    and then the next one and so on.
  • 2:56 - 3:00
    Okay, well,
    that doesn't look at all like Shakespeare.
  • 3:00 - 3:02
    It's not even English,
    but it's a first step.
  • 3:02 - 3:06
    The simple system might not seem
    even remotely intelligent,
  • 3:06 - 3:10
    but as we build up from here,
    you'll be surprised where it goes.
  • 3:10 - 3:15
    The problem in the last example
    is that at any point the AI only considers
  • 3:15 - 3:18
    a single letter to pick what comes next.
  • 3:19 - 3:23
    That's not enough context,
    and so the output is not helpful.
  • 3:24 - 3:25
    What if we could
  • 3:25 - 3:29
    train it to consider
    a sequence of letters, like sentences
  • 3:29 - 3:33
    or paragraphs, to give it more context
    to pick the next one?
  • 3:33 - 3:36
    To do this, we don't use a simple table
    of probabilities.
  • 3:37 - 3:39
    We use a neural network.
  • 3:39 - 3:42
    A neural network is a computer system
    that is loosely inspired
  • 3:42 - 3:44
    by the neurons in the brain.
  • 3:44 - 3:48
    It is trained on a body of information,
    and with enough training,
  • 3:49 - 3:53
    it can learn to take in new information
    and give simple answers.
  • 3:54 - 3:57
    The answers always include probabilities
  • 3:57 - 3:59
    because there can be many options.
  • 4:00 - 4:03
    Now let's take a neural network
    and train it on
  • 4:03 - 4:08
    all the letters sequences
    in Shakespeare's plays to learn
  • 4:08 - 4:11
    what letter is likely
    to come next at any point.
  • 4:14 - 4:17
    Once we do this,
    the neural networks can take
  • 4:17 - 4:21
    any new sequence and predict
    what could be a good next letter.
  • 4:21 - 4:24
    Sometimes the answer is
    obvious, but usually is not.
  • 4:25 - 4:26
    It turns out
  • 4:26 - 4:29
    this new approach works
    better, much better
  • 4:30 - 4:33
    by looking at the long enough
    sequence of letters, the AI
  • 4:33 - 4:39
    can learn complicated patterns, and
    it uses those to produce all new texts.
  • 4:39 - 4:42
    It starts
    the same way with a starting letter
  • 4:43 - 4:47
    and then using probabilities
    to pick the next letter and so on.
  • 4:48 - 4:50
    But this time, the probabilities are based
  • 4:50 - 4:54
    on the entire context
    of what came beforehand.
  • 4:55 - 4:58
    As you see, this works surprisingly well.
  • 4:59 - 5:03
    Now, a system like ChatGPT uses a similar approach,
  • 5:03 - 5:06
    but with three very important additions.
  • 5:07 - 5:10
    First,
    instead of just training on Shakespeare,
  • 5:10 - 5:13
    it looks at all the information
    it can find on the Internet,
  • 5:14 - 5:18
    including all the articles on Wikipedia
    or all the code on GitHub.
  • 5:19 - 5:22
    Second,
    instead of learning and predicting letters
  • 5:22 - 5:27
    from just the 26 choices in the alphabet,
    it looks at tokens
  • 5:27 - 5:33
    which are either full words
    or word parts or even code.
  • 5:34 - 5:35
    And third
  • 5:35 - 5:39
    difference
    is that a system of this complexity
  • 5:39 - 5:44
    needs a lot of human tuning to make sure
    it produces reasonable results
  • 5:44 - 5:49
    in a wide variety of situations,
    while also protecting against problems
  • 5:49 - 5:53
    like producing highly biased
    or even dangerous content.
  • 5:54 - 5:58
    Even after we do this tuning,
    it's important to note that this system
  • 5:58 - 6:02
    is still just using random probabilities
    to choose words.
  • 6:03 - 6:05
    A large language model can produce
  • 6:05 - 6:08
    unbelievable results that seem like magic,
  • 6:09 - 6:13
    but because it's not actually magic,
    it can often get things wrong.
  • 6:14 - 6:17
    And when it gets things wrong, people ask, does
  • 6:17 - 6:20
    a large language
    model have actual intelligence?
  • 6:21 - 6:24
    Discussions about A.I. often spark
  • 6:24 - 6:27
    philosophical debates
    about the meaning of intelligence.
  • 6:28 - 6:31
    Some argue that a neural network
    producing words
  • 6:31 - 6:34
    using probabilities
    doesn't have really intelligence.
  • 6:35 - 6:38
    But what isn't under debate
    is that large language
  • 6:38 - 6:41
    models produce amazing results
  • 6:41 - 6:44
    with applications in many fields.
  • 6:44 - 6:49
    This technology is already being used
    to create apps and websites,
  • 6:49 - 6:54
    help produce movies and video games,
    and even discover new drugs.
  • 6:54 - 6:59
    The rapid acceleration of
    AI will have enormous impacts on society,
  • 6:59 - 7:03
    and it's important for everybody
    to understand this technology.
  • 7:03 - 7:06
    What I'm looking forward to
    is the amazing things
  • 7:06 - 7:10
    people will create with AI.,
    and I hope you dive in to learn
  • 7:10 - 7:15
    more about how AI works
    and explore what you can build with it.
Title:
Chatbots and Large Language Models Part 2
Description:

more » « less
Video Language:
English
Team:
Code.org
Project:
How AI Works
Duration:
04:16

English subtitles

Revisions Compare revisions