< Return to Video

How gestures and other non-verbal cues facilitate comprehension - Xaver Funk | PGO 2021

  • 0:07 - 0:11
    Hello everyone, and a warm welcome to
    Multimodal Language Processing.
  • 0:11 - 0:17
    My name is Xaver Funk, and I recently had
    the chance to really [involve] myself into
  • 0:17 - 0:20
    this topic, because I am studying
    neurosciences and this was kind of
  • 0:20 - 0:26
    something that I had to do. And, yeah,
    that's what I want to share with you today.
  • 0:26 - 0:33
    So, what I have been doing recently also,
    is learning arabic, and a little bit of
  • 0:33 - 0:37
    mongolian. And mostly what I did was,
    I had this stream of auditory signals
  • 0:37 - 0:43
    that maybe came from the ASML audio, and
    I tried to match those to symbols that
  • 0:43 - 0:46
    were representing these, right, in the
    book.
  • 0:47 - 0:52
    And I kind of had this feeling that this
    is incomplete.
  • 0:52 - 0:54
    So there is something missing there.
  • 0:54 - 1:00
    And while I was on the other hand,
    studying a lot about multimodal language
  • 1:00 - 1:04
    processing, which I how gestures influence
    processing and stuff like that.
  • 1:05 - 1:08
    I came to the conclusion that, yeah, there
    is something missing.
  • 1:09 - 1:12
    In our world today, we are all litterate
    so we mostly think of languages as
  • 1:12 - 1:16
    these auditory signals, these mouth noises
    and the symbols that represent these.
  • 1:16 - 1:22
    But there is so much more going on, in
    face to face communication and, yeah,
  • 1:22 - 1:25
    I want to make this point clear, with a
    virtual experiment.
  • 1:26 - 1:33
    So, I want to invite you to first of all,
    listen to this audio excerpt, from an
  • 1:33 - 1:38
    "Easy Languages" video. And I give you the
    subtitles here, with the english
  • 1:38 - 1:44
    translation as well. So, basically, these
    are auditory signals in Dutch,
  • 1:44 - 1:47
    and sequences of symbols
    in Dutch and English.
  • 1:47 - 1:51
    And for the people learning Dutch, please
    just ignore the English, just to make it
  • 1:51 - 1:55
    a little bit harder. And people who know
    Dutch, please close your eyes, so that
  • 1:55 - 2:00
    you don't see it at all.
    So, let's go.
  • 2:41 - 2:46
    So, when I was listening to this at first,
    I was - because I know some Dutch,
  • 2:46 - 2:50
    I was understanding quite a lot,
    but, kind of, not everything.
  • 2:50 - 2:55
    And then, I watched the video that goes
    with it, it was kind of a different experience.
  • 2:55 - 3:01
    And that's what we are going to do now.
    So just watch the video, and if you can,
  • 3:01 - 3:07
    see how these two women, that are
    interviewed here, are interacting
  • 3:07 - 3:10
    with the interviewer, and
    between each other.
  • 3:51 - 3:54
    So, I hoped this worked, and you felt
    a little bit different now.
  • 3:54 - 3:58
    And even for the people who don't know
    Dutch, I hope you could kind of follow
  • 3:58 - 4:03
    what was going on. And even if you
    didn't, the point I want to make is that
  • 4:03 - 4:06
    messages are not only auditory,
    they are always also visual.
  • 4:07 - 4:11
    We have a lot of non-auditory
    articulators, like 43 face muscles
  • 4:11 - 4:16
    for example, and then 2x 34 muscles
    in the hands, and then even more in
  • 4:16 - 4:21
    the arms, in our torso.
    And the people in this video
  • 4:21 - 4:25
    really knew how to use these.
    So for example we had a lot of
  • 4:25 - 4:28
    facial movement going on,
    like you see on the top, here.
  • 4:28 - 4:32
    See how she raise her eyebrows,
    and then, you have this head tilting
  • 4:32 - 4:35
    at the end, that really put
    an emphasis on what she's saying.
  • 4:35 - 4:39
    And then there is a lot of gaze switching
    as well.That's right in the begining,
  • 4:39 - 4:43
    when she says
    (Dutch): Oh genoeg ! Heb je even ?
  • 4:43 - 4:46
    So, "Oh, there is so much that
    I want to see! Do you have some time?"
  • 4:47 - 4:52
    But, she doesn't really say "time", she
    says "Heb je even", "Do you have a little"
  • 4:52 - 4:55
    And for me, when I was only listening,
    I didn't quite get what she was saying,
  • 4:55 - 4:58
    but when I saw how she adresses
    the interviewer, I kind of got it,
  • 4:58 - 4:59
    afterwards.
  • 5:01 - 5:06
    So then there is of course manual gestures
    like "hoop op mijn list",
  • 5:06 - 5:12
    that's that one here, "hoop op mijn list".
    She says "berglandschap", so that's
  • 5:12 - 5:17
    a mountain range, and then "lang geleden"
    "long time ago", right?
  • 5:17 - 5:22
    So there is a lot of messages that are
    supported with these manual gestures.
  • 5:23 - 5:27
    Then there is also stuff like this
    nose scratching, where we don't even know
  • 5:27 - 5:30
    is there something to it, or is it just
    a nose scratching.
  • 5:31 - 5:34
    Does it carry some information?
    We don't know.
  • 5:35 - 5:38
    And then, lastly also arm and torso
    movements.
  • 5:38 - 5:41
    And also if you watch at the top here,
    you have nodding. So you see how
  • 5:41 - 5:46
    these two kind of nod together, they
    really give us the impression of
  • 5:46 - 5:51
    how good friends they are, right.
    And then if you look at this bottom part
  • 5:51 - 5:55
    here, that's my favorite part of the
    video. You really have this complex
  • 5:55 - 6:00
    orchestration of different gestures,
    and they are turn-taking.
  • 6:01 - 6:07
    So, the one on the right says something,
    and the one on the left answers that
  • 6:07 - 6:13
    perfectly, and then you have gestures,
    and then the, putting their hair back,
  • 6:13 - 6:18
    right, so there is so much going on
    between them, and it really gives more
  • 6:18 - 6:22
    than just the auditory message, right.
  • 6:23 - 6:27
    So, note that there is something that
    our brain has to achieve here.
  • 6:27 - 6:31
    Mainly two things : so, it has to
    segregate all of the stuff that is not
  • 6:31 - 6:34
    important for the message.
    That's the segregation problem.
  • 6:35 - 6:38
    From the important stuff, and then,
    take all the important stuff, all
  • 6:38 - 6:42
    the auditory and visuals information and
    put it together into a coherent message.
  • 6:42 - 6:46
    That's our binding problem.
    And all of this, note, all of this is
  • 6:46 - 6:49
    under a really tight time constraint,
    when you're turn taking, when you're
  • 6:49 - 6:52
    having a conversation.
    And if you say something and
  • 6:52 - 6:55
    the other person say something,
    and there is not that much time
  • 6:55 - 6:57
    between turns. And if you need
    more time, then that also has
  • 6:57 - 7:02
    a meaning, right ? If you take time, then
    that means that you're hesitating
  • 7:02 - 7:05
    to answer, maybe there is something
    going on with you emotionally...
  • 7:05 - 7:10
    So you don't want to have that as well.
    So, yeah, so basically, this is really
  • 7:10 - 7:13
    a huge computational problem
    for your brain.
  • 7:14 - 7:18
    And well, how did your brain do?
    Did you feel the video was more difficult
  • 7:18 - 7:22
    than the audio ? Did you understand more,
    or did you understand less?
  • 7:22 - 7:26
    Did you feel more in the scene, maybe?
    Catching more informations between the lines?
  • 7:27 - 7:31
    And well, for me, at least as you might
    guess, for me it was way easier
  • 7:31 - 7:35
    to follow with the video to interpret
    these gestures. And this is kind of
  • 7:35 - 7:41
    a paradox. So, how come that processing
    more signals simultaneously is easier
  • 7:41 - 7:46
    that processing speech alone?
    And this also was shown
  • 7:46 - 7:49
    in the litterature, so people have made
    experience with this.
  • 7:50 - 7:55
    And this is really a surprising
    facilitation. For example, there are
  • 7:55 - 7:57
    lot of studies, I'll just give you
    one example.
  • 7:57 - 8:01
    So in this study they showed people
    a "prime", so this was some video
  • 8:01 - 8:05
    of an action that somebody did, and then
    they showed the people different videos.
  • 8:06 - 8:10
    And the videos were either completely
    congruent, so what was said was the same
  • 8:10 - 8:14
    as the gesture, and was the same as
    this prime. So in that case it would be
  • 8:14 - 8:19
    "chop" and doing the chopping gesture.
    And then there were different conditions
  • 8:19 - 8:26
    where either the speech was congruent,
    incongruent, and the gesture was congruent.
  • 8:27 - 8:30
    Or the speech was incongruent and
    the gesture was congruent.
  • 8:30 - 8:34
    And then they had also weakly congruent
    stuff, like, this for "chopping",
  • 8:34 - 8:38
    but this is actually cutting so this is
    only weakly incongruent.
  • 8:38 - 8:42
    And then this twisting, which is
    strongly incongruent.
  • 8:42 - 8:47
    And then people had to press a button
    for "yes" if either the speech or
  • 8:47 - 8:52
    the gesture were related to the prime,
    and no if neither speech nor gesture
  • 8:52 - 8:55
    was related to the prime.
    And what the people found out was that
  • 8:55 - 8:59
    there were differences in response times,
    and also in the proportion of errors
  • 8:59 - 9:02
    that people did, as soon as soon as
    there were something incongruent.
  • 9:02 - 9:08
    And from that the authors come to
    the conclusion that really, speech and
  • 9:08 - 9:12
    gestures are two sides of the
    same coin, they mutually interact
  • 9:12 - 9:16
    to enhance comprehension.
    And now the big question is, of course,
  • 9:16 - 9:19
    how does our brain achieve this surprising
    facilitation?
  • 9:22 - 9:26
    And we can look back at turn-taking,
    to maybe get some clues here.
  • 9:26 - 9:32
    So on average a turn-take takes only
    about 0 to 200 miliseconds, which is
  • 9:32 - 9:38
    a fifth of a second. You can see in this
    video how fast she is responding,
  • 9:38 - 9:43
    right now, after this, like, this is
    an instant, right?
  • 9:44 - 9:48
    And this is quite extraordinary, because
    producing a single word actually takes
  • 9:48 - 9:49
    about 600 miliseconds.
  • 9:50 - 9:55
    So if I just prompt you to say a word,
    you would take 600 miliseconds to say it.
  • 9:56 - 10:00
    So there's something going on, it seems
    like we are predicting already what we are
  • 10:00 - 10:04
    going to say before the turn of the other
    person is finished, and we already prepare
  • 10:04 - 10:05
    our turn.
  • 10:05 - 10:10
    So there is something that is going on,
    that has to do with prediction.
  • 10:11 - 10:15
    Most language use in conversation
    has to be based on prediction somehow.
  • 10:15 - 10:18
    And this is quite nice, because prediction
    is anyways the current hype
  • 10:18 - 10:22
    in neuroscience nowadays, and it's
    basically a good candidate for
  • 10:22 - 10:24
    the overarching function of the brain.
  • 10:25 - 10:31
    And many people think that what we are
    doing in our daily lives is basically
  • 10:31 - 10:34
    constantly computing and updating
    probability distributions.
  • 10:35 - 10:39
    And this applies both to action,
    to perception, and also to language.
  • 10:40 - 10:44
    So, this will be a rephrasing of
    the problem we had before,
  • 10:44 - 10:47
    as a prediction problem.
    And this become then,
  • 10:47 - 10:51
    "given the preceding context - so, given
    all the words that come before - what word
  • 10:51 - 10:57
    is most likely to come up next?"
    Right? And to make this more clear,
  • 10:57 - 11:01
    let me give you a quick example:
    so, imagine I come to you and I say,
  • 11:01 - 11:04
    without anymore context "I would like to".
  • 11:05 - 11:08
    And then, you don't know what I'm going
    to say next, right?
  • 11:08 - 11:12
    It could be any of these, for example.
    I would like to drink, eat, work...
  • 11:12 - 11:16
    And so on.
    And now, if I shape my hand
  • 11:16 - 11:21
    in the form of a "C", and I put it to
    my mouth, like this, while I say
  • 11:21 - 11:27
    "I would like to", then your probability
    distribution over these words changes
  • 11:27 - 11:31
    in such a way that "drink" is much more
    likely to be the next word.
  • 11:31 - 11:35
    And maybe "eat" also a little bit, but
    the others words probably not,
  • 11:35 - 11:39
    because, you can associate this
    gesture with drinking, or a little bit
  • 11:39 - 11:43
    with eating, because it's also
    something that you put to your mouth,
  • 11:43 - 11:47
    but mostly this is commonly understood
    as "drinking", right.
  • 11:49 - 11:54
    So, in this way gestures add context to
    predictions and help this process of
  • 11:54 - 11:57
    predicting, and that also helps the
    comprehension.
  • 11:58 - 12:01
    And, we can actually measure prediction,
    using neurophysiology.
  • 12:03 - 12:10
    So this is EEG, and "EEG" stands for
    "Electro Encephalography", and
  • 12:10 - 12:13
    it's basically putting electrodes on the
    scalp, and then measuring
  • 12:13 - 12:20
    the brain activity that's below.
    If you do this, well you can measure
  • 12:20 - 12:24
    brain activity basically.
    What people usually do, is that
  • 12:24 - 12:28
    they give people these sentences.
    So these could be normal sentences,
  • 12:28 - 12:31
    like this one : "It was his first day
    at work."
  • 12:32 - 12:37
    Or it could be so-called garden-path
    sentences. So these are sentences that are
  • 12:37 - 12:43
    somehow manipulated artificially
    to elicit some response. Right?
  • 12:43 - 12:46
    So this would be : "He spread the warm
    bread with socks."
  • 12:46 - 12:51
    So you may have a weird feeling on
    your head, because nobody spreads
  • 12:51 - 12:57
    the bread with socks. And this weird
    feeling, if we would measure you with
  • 12:57 - 13:04
    an EEG, would constitute this reaction
    here, that's a so-called N400.
  • 13:04 - 13:10
    "N" because it is a negative polarity,
    and it's 400 miliseconds after the word.
  • 13:10 - 13:14
    So, all of this above here is just
    electrical activity, right? And you have
  • 13:14 - 13:19
    this really pronounced peak, when
    there is a violation of the semantics,
  • 13:19 - 13:25
    like with "socks".
    And it's also taken to be a prediction
  • 13:25 - 13:29
    error. So you did not predict socks,
    you predicted Nutella, for example,
  • 13:29 - 13:33
    or honey. But not socks. And this is
    reflected in this N400 prediction error.
  • 13:33 - 13:38
    So people are doing this a lot, like, showing
    these sentences that are somehow manipulated.
  • 13:38 - 13:43
    We have another example here, this is
    another topic : if you write in all caps
  • 13:43 - 13:46
    you have this kind of response for
    example.
  • 13:47 - 13:51
    But, what I want to do now with you
    is bringing you more to the cutting edge
  • 13:51 - 13:56
    of what is currently done in multimodal
    processing research.
  • 13:57 - 14:01
    So the trend is to go away from these
    artificially constructed sentences, and
  • 14:01 - 14:06
    more towards naturalistic language
    comprehension. So, using actual stories,
  • 14:06 - 14:12
    actual sentences, that are not manipulated
    in any way. And this will be combined with
  • 14:12 - 14:16
    computational linguistics - how that
    works, you will see in a bit.
  • 14:16 - 14:21
    And also, yeah, with that you can look at
    multimodal processing if you just add
  • 14:21 - 14:26
    a video to the audio that
    you make people listen to.
  • 14:27 - 14:33
    And what it might look like
    is like this. So, this is one study
  • 14:33 - 14:36
    that is currently not published
    officially yet. It is already on
  • 14:36 - 14:44
    the archive. And I want to use this to
    illustrate to you how we might research
  • 14:44 - 14:49
    naturalistic language comprehension.
    So the general planners get some
  • 14:49 - 14:54
    per-word measures - so these would be
    these ones here. So for each word,
  • 14:54 - 15:00
    there is some value attached.
    And then we can use these as regressors
  • 15:00 - 15:05
    in the big linear regression model.
    So, using fancy statistics, and with that
  • 15:05 - 15:13
    we're basically asking our data "how well
    are you predicted by these regressors?"
  • 15:14 - 15:19
    And for example, this one here is
    surprise and this is closely related
  • 15:19 - 15:25
    to predictions or prediction errors.
    So this is the negative log probability
  • 15:25 - 15:29
    of a word, given all of the words
    that come before it.
  • 15:29 - 15:33
    So, this is the contexte, basically,
    and this is some word, "w".
  • 15:34 - 15:37
    So this is basically telling you how
    unpredictable is a given word.
  • 15:39 - 15:43
    And this measure is base on computational
    language models, so for example,
  • 15:43 - 15:48
    they would take the whole corpus of
    a language, and then, see which words
  • 15:48 - 15:54
    occurs after each other, and thereby get
    to this value of how unpredictable it is.
  • 15:56 - 16:01
    And then, they have another thing here.
    They use the fundamental frequency of each
  • 16:01 - 16:06
    word as a pitch indicator, to control
    for prosody, which is also pretty cool.
  • 16:06 - 16:11
    So they let loose their linear
    regression models, with these predictors,
  • 16:11 - 16:18
    so they have a surprisal value for each
    word, for example, a prosody, ready for
  • 16:18 - 16:22
    each word, then they indicate where
    there are meaningful gestures happening,
  • 16:22 - 16:25
    and, yeah, also, mouth movements.
  • 16:27 - 16:30
    And, what came out of this, one finding
    that might be interesting for us,
  • 16:30 - 16:37
    now, is that for meaningful gestures,
    the N400 is less negative.
  • 16:38 - 16:43
    So you can also see this here : for
    meaningful gesture, this blue line,
  • 16:43 - 16:48
    you see that it is a lot less negative
    than the red line where the gestures are
  • 16:48 - 16:53
    absent. And then there's also, that's why
    I told you about surprisal, an interesting
  • 16:53 - 16:58
    interaction between gestures and
    surprisal. So, the higher the surprisal,
  • 16:58 - 17:03
    the less unexpected a word, the stronger
    this facilitating effect of gestures is.
  • 17:04 - 17:10
    Which is also really interesting.
    Then there's, this is a similar study,
  • 17:12 - 17:15
    that I actually got the chance to work on,
    with a colleague.
  • 17:17 - 17:21
    So what we did here, we had a measure of
    entropy. This measures basically the
  • 17:22 - 17:26
    uncertainty about the next word.
    So if you think back to the example
  • 17:26 - 17:31
    we had before, where I was telling
    you "I would like to", and then something,
  • 17:31 - 17:35
    but without context, that would be really
    high entropy, really high uncertainty :
  • 17:35 - 17:40
    you don't know what's coming next. Right?
    Then we also had surprisal, we had word
  • 17:40 - 17:44
    frequency, how often the word came up.
    And IVC is a measure of,
  • 17:44 - 17:50
    it's an abbreviation for "instantaneous
    visual change", so, how much the actor
  • 17:50 - 17:56
    moved while we were showing this to
    the people. And then speech envelope,
  • 17:56 - 18:01
    this is basically a measure of the level
    of the sound.
  • 18:03 - 18:09
    And what we found is - and this
    by the way was an FRMI experiment,
  • 18:09 - 18:13
    so we can look at wich regions are active
    during some condition.
  • 18:13 - 18:18
    And for words where the surprisal was
    really high, there were these regions
  • 18:18 - 18:21
    in red active, and for words where
    entropy was really high,
  • 18:21 - 18:26
    these regions in blue. And now if
    we look at interactions with gestures
  • 18:26 - 18:32
    for the entropy condition, we can see that
    when there were gestures present, we had
  • 18:32 - 18:38
    really specific activations compared
    to when there were no gestures present,
  • 18:38 - 18:41
    in situations where there is high
    entropy, so high uncertainty.
  • 18:43 - 18:50
    So with these tools we try to get into
    the processes that underlie prediction
  • 18:50 - 18:56
    in language.
    So let's take a step back, and have a look
  • 18:56 - 19:00
    at kind of a more global
    evolutionnary perspective.
  • 19:02 - 19:06
    We know from primate research that gesture
    and gaze are crucial for communication.
  • 19:07 - 19:10
    You can see it in this video : this ape
    right here does this gesture,
  • 19:10 - 19:15
    and this signals to its mother
    to pick her up. Right?
  • 19:15 - 19:22
    So these are bonobos, and you can see
    right now, this "pick me up" gesture.
  • 19:24 - 19:29
    And Federico Rossano, from the Max Planck
    Institute for Evolutionnary Institute,
  • 19:30 - 19:36
    could show that this gesture get more
    and more ritualized, to the point where
  • 19:36 - 19:46
    it becomes only a small wrist bend with
    the arm and one gaze, to instantiate
  • 19:46 - 19:50
    this carry behavior. Right?
    So you see that there is also kind of
  • 19:50 - 19:57
    a prediction involved : the mother has
    to predict what the child is wanting
  • 19:57 - 20:00
    to do, right?
    Going from this, to only this.
  • 20:01 - 20:07
    Then, building on this, there are some
    authors that propose that speech and
  • 20:07 - 20:12
    gesture have a common origin.
    And the idea here is that, through
  • 20:12 - 20:15
    these ritualized gestures that
    we've just seen in those bonobos,
  • 20:15 - 20:19
    after a time there will be a proto
    sign language evolving.
  • 20:19 - 20:22
    Which then at some point will be
    accompanied by sound as well,
  • 20:22 - 20:28
    evolving into a proto speech language.
    And then the proto sign, the proto speech,
  • 20:28 - 20:33
    will reinforce each other more and more,
    until language emerges.
  • 20:34 - 20:40
    And another point, here,
    or an observation : those of you who have
  • 20:40 - 20:44
    tried sign language, it kind of feel
    surprinsingly natural, right?
  • 20:44 - 20:51
    So, if speech is the true communication
    medium for humans, why is it
  • 20:51 - 20:56
    thet sign language feels so real,
    so natural, right?
  • 20:58 - 21:03
    And then another point that goes into this
    theory is that voluntary hand movements
  • 21:03 - 21:08
    came before voluntary breathing. And you
    need voluntary breathing to articulate
  • 21:08 - 21:09
    yourself, right?
  • 21:10 - 21:14
    So, also, just as complementary to speech,
  • 21:14 - 21:18
    you can more easily show spatial
    relations between things.
  • 21:19 - 21:23
    And then, if you look at child development
    the same pattern : gesture develop
  • 21:23 - 21:30
    before speech, and pre-speech turn-taking
    is faster than later.
  • 21:30 - 21:35
    So if you're a baby and you gesture,
    the turn-taking with your mother,
  • 21:35 - 21:40
    the communication is quite fast,
    it's almost adult level turn-taking.
  • 21:41 - 21:45
    Then as you learn language it gets way
    slower, and only in middle-school it gets
  • 21:45 - 21:48
    gets back to the adult level turn-taking.
  • 21:49 - 21:53
    So, what's the point, right? What does all
    of this mean for language learning?
  • 21:54 - 21:59
    So for this, let's do another
    time-travel, back to 1768,
  • 21:59 - 22:07
    and meet this French Jesuit monk :
    Claude-François Lizarde de Radonvilliers.
  • 22:08 - 22:12
    And he wrote this book :
    (French) "About The Way To Learn Languages"
  • 22:12 - 22:18
    back in the day, where he reflected on how
    we should teach people languages.
  • 22:18 - 22:24
    And interestingly, this is basically
    the grandfather of the Assimil method,
  • 22:24 - 22:30
    and also the Méthode
    Toussaint-Langenscheidt, or, also called
  • 22:30 - 22:34
    "interlinearversion". So this would be
    this sheet here.
  • 22:34 - 22:39
    This was a way people learned languages
    at the turn of the previous century.
  • 22:39 - 22:45
    And you can see here that you have
    the spanish at the top, then some
  • 22:45 - 22:50
    consideration in the middle,
    and on the bottom the german.
  • 22:51 - 22:54
    And this is kind of similar to what
    Assimil does, right?
  • 22:55 - 22:58
    So this is really interesting,
    but that's not the point here.
  • 22:59 - 23:04
    What he also did in this book is to compare
    L1 - so first language acquisition -
  • 23:04 - 23:09
    with second language learning.
    And he noted that it seems that,
  • 23:09 - 23:16
    for the first language, parents show their
    children pictures, and enact words or
  • 23:16 - 23:20
    concepts, and encourage the children to do
    the same, like this little boy does here.
  • 23:20 - 23:25
    But for second language acquisition all we
    do is give people these vocabulary lists,
  • 23:25 - 23:27
    and expect them to learn it
    just like that.
  • 23:28 - 23:33
    So this is kind of an interesting point,
    and since then it has been shown,
  • 23:33 - 23:37
    - and this is actually pretty robust,
    I was really surprised, that it has been
  • 23:37 - 23:43
    a really robust finding, that gesture
    enriched material enhances learning.
  • 23:43 - 23:52
    So in this study, for example, people
    tried to teach english-speaking people
  • 23:52 - 23:56
    japanese words, and they had four
    different training conditions.
  • 23:57 - 24:03
    So one, only speech, one repeated speech,
    one speech plus incongruent gestures
  • 24:03 - 24:07
    - so gestures that would not match -
    and then, congruent gestures.
  • 24:08 - 24:10
    And this is the interesting condition,
    right?
  • 24:11 - 24:16
    And then they tested the people after
    encoding for three different times :
  • 24:16 - 24:18
    after five minutes, after two days,
    and after one week.
  • 24:19 - 24:25
    And also they tested them on forced choice
    so it's basically multiple choice,
  • 24:26 - 24:29
    and free recall, so it's prompting
    the people with the word, and then they
  • 24:29 - 24:35
    come up themselves with the answer.
    So these numbers here are basically
  • 24:35 - 24:37
    the proportion of correct
    answers that people give.
  • 24:38 - 24:40
    And you can see that,
    across the board,
  • 24:40 - 24:43
    the speech plus congruent
    gesture condition is
  • 24:43 - 24:51
    very superior compared to the other ones,
    which is, yeah, which is interesting, and
  • 24:51 - 24:57
    so, you would maybe think that the point
    is "okay, so we just use videos instead
  • 24:57 - 25:00
    of audios", right?
    And this is what I would call
  • 25:00 - 25:03
    Multisensory enrichment.
    And there is nothing wrong with this,
  • 25:03 - 25:11
    this is really useful, you have
    these YouTube channel like Easy Languages
  • 25:11 - 25:15
    - I'm not sponsored by the way (laugh) -
    where you have conversations with
  • 25:15 - 25:19
    real people that from time to time
    make gestures, and you get the full
  • 25:19 - 25:23
    conversation thing, right?
    And you have these one-on-one videos,
  • 25:23 - 25:30
    like this one from Mandarin Corner.
    Where they are also a lot of gestures
  • 25:30 - 25:34
    involved, so the host, Eileen, really
    tries to integrate a lot of gestures.
  • 25:35 - 25:39
    But this is actually not the point
    - I mean, this is cool but I think
  • 25:39 - 25:44
    you already do that.
    The point is way deeper.
  • 25:44 - 25:50
    So, there's another thing going on,
    not only when you watch gestures,
  • 25:50 - 25:54
    but when you enact them.
    This is called the enactment effect.
  • 25:55 - 25:58
    This was actually coined in 1980,
    by two germans.
  • 25:58 - 26:04
    They called it first the "Tu-Effekt",
    which translates literally to "Do-Effect".
  • 26:05 - 26:09
    And you can see why people chose to call
    it the enactment effect, because it sounds
  • 26:09 - 26:13
    way more fancy (laugh) but I really like
    the "tu-effekt", it sounds funny.
  • 26:14 - 26:20
    Anyways, the point is that action words
    or phrases, this is what they - Engelkamp
  • 26:20 - 26:23
    et Krumnacker - noticed : that action
    words and phrases are remembered better
  • 26:23 - 26:27
    if they're acted out,
    or accompanied by gestures.
  • 26:28 - 26:33
    So if you would learn the phrase
    "chopping garlic", then if you enact it
  • 26:33 - 26:37
    actually while learning it,
    you will retain it way better.
  • 26:37 - 26:40
    And this effect is also
    really well replicated,
  • 26:40 - 26:42
    and this was also
    really surprising to me,
  • 26:42 - 26:49
    because it is virtually not at all
    translated into actual teaching.
  • 26:49 - 26:53
    Nobody does this, nobody tells
    the students to enact things, right?
  • 26:53 - 26:56
    Enact words, enact anything.
  • 26:56 - 27:01
    And it has been well replicated
    across tasks, across materials and also
  • 27:01 - 27:05
    across populations : across children,
    adults, even clinical populations :
  • 27:05 - 27:08
    People with Alzheimer, people recovering
    from stroke...
  • 27:08 - 27:13
    Somehow people made them
    learn words and then act the words,
  • 27:13 - 27:18
    and it worked better
    than without enactment.
  • 27:19 - 27:23
    And also, this is not only true for
    action words and concrete words,
  • 27:23 - 27:26
    but also abstract words.
    Anything you can somehow find
  • 27:26 - 27:32
    a representation - with gestures - for,
    you can use this enactment effect.
  • 27:33 - 27:36
    And this is way more powerful than
    multysensory enrichment,
  • 27:36 - 27:40
    and we can call this
    "sensorimotor enrichment",
  • 27:40 - 27:43
    because you use
    your senses and your "motor".
  • 27:44 - 27:51
    So, this is also, this ties in with
    another really interesting development
  • 27:51 - 27:53
    in neurosciences, called
    "embodied cognition".
  • 27:54 - 27:58
    Basically this is the idea that many
    features of cognition - and these might be
  • 27:58 - 28:02
    concepts, categories, reasoning or
    judgement - are shaped by aspects
  • 28:02 - 28:06
    of the body. And this would be
    the motor system - so how we move -
  • 28:06 - 28:11
    the perceptual system - what we see, what
    we feel, what we hear, and so on.
  • 28:11 - 28:15
    And also bodily interactions with
    the environment.
  • 28:15 - 28:19
    And you might see where I get with this,
    if you think about concepts and categories
  • 28:20 - 28:24
    What are words, if not concepts and
    categories, right?
  • 28:24 - 28:28
    So we might ask the question, "how are
    words represented in the brain?"
  • 28:29 - 28:31
    And there is this really funny study,
  • 28:32 - 28:38
    they showed people words that had strong
    olfactory associations, which means
  • 28:38 - 28:42
    they either stink really hard, or
    they smell really well.
  • 28:43 - 28:49
    And, in case you're looking for some
    inspiration for your spanish poem,
  • 28:49 - 28:53
    you can go (laugh) to this publication and
    search through the list of words.
  • 28:53 - 28:57
    This is also a small - this is only
    a small sample, there are tons of
  • 28:57 - 29:02
    really strong smelling words
    in the study and, yeah.
  • 29:03 - 29:06
    So basically, what they found is that
    when they showed people these words,
  • 29:06 - 29:12
    as compared to words that did not smell
    that much, some regions in the brain
  • 29:12 - 29:16
    that are associated with olfaction,
    so, with smelling, lighted up.
  • 29:18 - 29:24
    And this kind of has been
    extended as well to actions.
  • 29:24 - 29:29
    So, on the left here, these are
    all the regions that light up
  • 29:29 - 29:32
    when you move your foot
    when you move your fingers,
  • 29:32 - 29:36
    or when you move your tongue.
    And on the right here, these are
  • 29:36 - 29:42
    the regions that light up when you read
    leg-related words, arm-related words,
  • 29:42 - 29:46
    or face-related words.
    And you can see that, this more or less,
  • 29:46 - 29:51
    this is more or less,
    these activations fit each other, right?
  • 29:51 - 29:56
    So, in some way, leg-related words
    are stored where you also move you leg,
  • 29:56 - 29:59
    arm-related words are stored where
    you also move your arms, and so on.
  • 30:00 - 30:07
    So, we can think about words actually
    as functional networks, like this.
  • 30:08 - 30:13
    And, note that words are
    experience-dependent functional networks.
  • 30:14 - 30:17
    And experience is connected
    to the body, right?
  • 30:18 - 30:26
    So for exemple, you surely have, not only
    read and heard the word "garlic",
  • 30:26 - 30:29
    you also have smelled garlic,
    you touched garlic, you tasted garlic
  • 30:29 - 30:32
    and, really important thing
    you chopped garlic.
  • 30:32 - 30:35
    So when you read "garlic",
    you not only have
  • 30:35 - 30:39
    the core language areas - in yellow here -
    activated, but also
  • 30:39 - 30:45
    subcortical olfactory areas, and some
    gustatory areas - so, for taste -
  • 30:45 - 30:49
    action areas, right, and visual areas
    as well.
  • 30:50 - 30:54
    So, what I want to tell you here,
    think about this when you learn languages.
  • 30:55 - 31:02
    Did you do the same for "knoblauch", for
    example - the german word for "garlic"?
  • 31:03 - 31:09
    If you learn german, do you actually
    get into this huge associated network?
  • 31:10 - 31:13
    So, and that's the point basically,
    we are coming to the end,
  • 31:14 - 31:18
    the point is that language is multimodal,
    you should use sensory-motor enrichment
  • 31:18 - 31:22
    when learning languages, and thereby
    embody your languages.
  • 31:24 - 31:27
    And if you want to learn more about this,
    and also for me to give credit,
  • 31:27 - 31:31
    this is basically where I got most
    of my input from.
  • 31:32 - 31:37
    These are four big review articles that
    discuss all of this stuff.
  • 31:38 - 31:44
    So yeah, that's basically it.
    Thanks for listening, and hoping for some cool questions.
  • 31:53 - 31:58
    I would most certainly guess so.
    This thing with the phone is also
  • 31:58 - 32:01
    something that I have
    experienced quite a lot.
  • 32:04 - 32:09
    I have lived in Chile for some time,
    and I've got myself a chilean SIM-card
  • 32:10 - 32:15
    And I didn't give this number to a lot
    of people, but somehow, this number got
  • 32:15 - 32:18
    to people that were, I don't know,
    trying to sell me something.
  • 32:19 - 32:22
    And I would get a call from
    somebody, pick up the phone,
  • 32:22 - 32:24
    and I would not understand
    a single word.
  • 32:24 - 32:28
    Like, Chilean spanish is already
    really hard, and then it's completely
  • 32:28 - 32:31
    out of context, I don't know what
    this person wants from me, and then
  • 32:31 - 32:34
    it's just [gestures] and I'm like
    "sorry, I don't understand you"
  • 32:34 - 32:36
    "I don't understand you",
    "I don't understand you",
  • 32:36 - 32:38
    over and over again.
  • 32:38 - 32:41
    And yeah, I mean, if you're on the phone,
  • 32:43 - 32:46
    there's also a little bit of noise maybe,
  • 32:46 - 32:50
    and I really have the feeling that
    that makes,
  • 32:50 - 32:53
    especially in a foreign language,
  • 32:53 - 32:56
    conversing that much harder, because
    you don't see the mouth movements,
  • 32:57 - 33:02
    it's not that clear of a sound,
    you don't see anything else, and yeah.
  • 33:02 - 33:04
    I would say so.
    Cool question.
  • 33:18 - 33:21
    I would guess so, I would guess so.
    Like, I mean,
  • 33:23 - 33:26
    especially for autistic people
    there is a lot of research
  • 33:26 - 33:28
    on language processing in general,
  • 33:30 - 33:34
    but I don't know of any studies that are
  • 33:35 - 33:38
    specifically for multimodal processing,
  • 33:38 - 33:41
    but I think there are quite a few.
  • 33:41 - 33:44
    Actually the experiment that I showed you,
  • 33:47 - 33:51
    the one that I worked on, as well,
    in the middle of the presentation,
  • 33:51 - 33:55
    the entropy stuff, we also did this
    with schizophrenic patients,
  • 33:56 - 33:58
    but we have not looked at the data yet.
  • 33:58 - 34:00
    So once this publication is done then,
  • 34:01 - 34:05
    somebody else will deal with
  • 34:05 - 34:08
    the clinical data,
    with the schizophrenic people,
  • 34:08 - 34:11
    and in general for schizophrenics there's,
  • 34:11 - 34:13
    there's a lot of,
  • 34:15 - 34:19
    like, language-related abnormalities,
  • 34:21 - 34:23
    and I think for autistic people as well.
  • 34:23 - 34:25
    I'm not sure about ADHD
  • 34:28 - 34:31
    but yeah, it would be
    a really interesting thing to,
  • 34:33 - 34:36
    to look at this for autistic people, for sure.
  • 34:36 - 34:40
    And maybe people did this.
    You can, maybe, you can look it up.
  • 34:41 - 34:46
    I don't have anything in my head right now
    but yeah.
  • 34:47 - 34:50
    There should be something.
  • 34:59 - 35:04
    I think one of the studies that
    I glanced over actually tried this.
  • 35:04 - 35:08
    So they had some gestures that were nonsense
  • 35:08 - 35:11
    I don't know if it was with abstract words
    or with concrete words,
  • 35:12 - 35:14
    but they used nonsense gestures,
  • 35:14 - 35:17
    and they still had an effect,
    but it was smaller.
  • 35:18 - 35:21
    So if you try to make this,
  • 35:21 - 35:23
    to integrate this into your studies,
  • 35:24 - 35:29
    I would suggest that you try to find
    an enactment that is as sensical as it gets
  • 35:30 - 35:34
    I mean, it's not that it's impossible
    to get enactment for abstract words,
  • 35:35 - 35:38
    you just have to be a little bit more
    creative, and I think the more creative
  • 35:38 - 35:43
    you will be the more effective.
    Like, similar to mnemonics,
  • 35:43 - 35:48
    like, the more crazy mnemonic is,
    the easier it is to remember.
  • 35:49 - 35:53
    I could see the same effect with
    enactments as well.
  • 35:54 - 35:58
    And if you should use signs from
    sign languages,
  • 35:59 - 36:04
    I think if you want to that's a cool idea,
    because then you automatically also learn
  • 36:04 - 36:07
    the sign language.
    And when I was preparing this presentation
  • 36:07 - 36:11
    I actually thought about this.
    Like why do we learn languages?
  • 36:13 - 36:17
    Like if I now start learning a language,
    why do I not learn the sign language
  • 36:17 - 36:19
    that goes with it, right?
  • 36:19 - 36:23
    I think it would make things easier,
    because you actually have
  • 36:23 - 36:24
    the enactment ready for you,
  • 36:25 - 36:28
    and it's just a cool thing, right?
  • 36:28 - 36:32
    You can talk with so many more people.
  • 36:33 - 36:40
    And also, I think in general, people
    should learn sign languages regardless
  • 36:42 - 36:47
    This became really clear to me actually
    at the Polyglot Gathering in 2019.
  • 36:47 - 36:53
    In Bratislava we were on some ship where
    there was a party.
  • 36:55 - 36:59
    There was loud music, then there were some
    people who knew sign language
  • 36:59 - 37:00
    maybe they are listening right now.
  • 37:01 - 37:04
    So they just started, they were like,
    on the dance floor,
  • 37:04 - 37:08
    and instead of screaming into
    each other's ears, like people usually do,
  • 37:08 - 37:10
    they just started to sign,
    and it was so smooth, like
  • 37:11 - 37:15
    why should we communicate with sound
    when we can do it with gestures. Right?
  • 37:16 - 37:19
    I think for many situations
    it would be a lot easier.
  • 37:20 - 37:24
    So yeah, if you can use the signs
    of your target language,
  • 37:25 - 37:27
    I think that's a cool idea.
  • 37:31 - 37:35
    Yeah, it does sound like that.
    Indeed, indeed.
  • 37:37 - 37:40
    Yeah, I can totally see that.
  • 37:51 - 37:55
    There is a study, that I came across while
    reasearching, but I didn't look into it.
  • 37:56 - 38:00
    If you want the reference, you can
    reach out to me somehow and I can see
  • 38:00 - 38:03
    if I can find it and send it to you.
  • 38:05 - 38:08
    I didn't look deeply into it,
  • 38:10 - 38:14
    and I think I wouldn't find it now quickly.
  • 38:15 - 38:20
    So again, there's something done, but
    I can't recall it from my head right now.
  • 38:24 - 38:27
    Well so, there's two things
  • 38:28 - 38:30
    maybe more things but let's start with two
  • 38:30 - 38:33
    So first of all when you learn a new word,
  • 38:34 - 38:37
    try to get the whole picture of the word.
  • 38:38 - 38:44
    Like the garlic example, try to imagine
    how it smells, how it feels,
  • 38:44 - 38:47
    how you chop it, try to enact it.
  • 38:48 - 38:54
    Take a moment, and really try to activate
    the whole functional network of this word.
  • 38:55 - 39:02
    And then the other thing was also just
    to use input with video,
  • 39:02 - 39:04
    if you're learning with some input.
  • 39:05 - 39:09
    Look up if you find some interesting
    channels on YouTube or something.
  • 39:09 - 39:13
    And then also, that might have not
    been clear from my presentation,
  • 39:13 - 39:16
    if you are conversing with people, use signs.
  • 39:20 - 39:25
    I don't know if people do this naturally
    in general, I think I kind of do it,
  • 39:25 - 39:29
    if I talk in my target language,
    and I'm not sure about a word
  • 39:29 - 39:33
    I will try to make sure with my hands
    that somehow,
  • 39:33 - 39:37
    something gets, like, to the other person.
  • 39:37 - 39:40
    So what I'm going for is that the other
    person recognizes
  • 39:40 - 39:43
    what I'm trying to say, and then
    gives me the word, right.
  • 39:44 - 39:46
    Like in the example with the glass of water,
  • 39:47 - 39:49
    if I don't know "drink" in some language
    I would,
  • 39:49 - 39:52
    I would try to [MIMES]
    Right? "I want to [MIMES]" Right?
  • 39:52 - 39:58
    And then have the other person gives me
    the word, because I'm actively reducing
  • 39:58 - 40:01
    the uncertainty that the other perso has,
    that is trying to predict
  • 40:01 - 40:04
    what I'm going to say,
    by giving gestures. Right?
  • 40:05 - 40:09
    So that would be my three
    practical implications for now.
  • 40:19 - 40:21
    Yeah so this is something that
    I don't know.
  • 40:22 - 40:27
    Again I think it's worth trying to do this
    with the sign language.
  • 40:27 - 40:32
    I mean, there's a system of really...
  • 40:35 - 40:39
    There's a system of really fitting
    gestures that people use,
  • 40:40 - 40:47
    and it might actually be a good idea
    to try this out, to use sign language
  • 40:47 - 40:50
    as you're learning the actual language,
  • 40:52 - 40:55
    to get this enactment working.
  • 40:56 - 41:00
    Might be more effective than
    making up your own gestures.
  • 41:01 - 41:06
    I mean if you make up your own gestures
    you have the advantage that
  • 41:10 - 41:12
    during the process of
    coming up with the gesture,
  • 41:12 - 41:15
    you are engaging your brain
    in a specific way
  • 41:15 - 41:19
    that's not there if you just get
    the gesture from somebody.
  • 41:20 - 41:22
    So there might be an advantage there,
  • 41:22 - 41:28
    but the other advantage is of course
    time that you can save,
  • 41:28 - 41:33
    and the ability to communicate
    with people that can't hear
  • 41:34 - 41:38
    So yeah, I think that's open for
    exploration, for sure.
  • 41:44 - 41:46
    Well you see it in sign language, right?
  • 41:48 - 41:51
    People that sign don't really speak.
  • 41:52 - 41:55
    And they get along pretty nicely.
  • 41:56 - 42:01
    Another question would be if all of society
    as a whole can do without verbal.
  • 42:02 - 42:06
    That's another question, but I think
    you can restructure society
  • 42:06 - 42:10
    in a way that everybody can communicate
    with gestures, for sure.
  • 42:12 - 42:17
    And according to some people, it was like
    that before speech developped.
  • 42:26 - 42:29
    So yeah, there has been some research,
    not much.
  • 42:30 - 42:34
    If you are interested in this, make sure
    to check out my presentation on this topic
  • 42:34 - 42:40
    from last year's Gathering,
    and also from last year's conference.
  • 42:41 - 42:45
    The conference one is not up on YouTube
    already, but the Gathering one,
  • 42:45 - 42:49
    and there's, in the end, I show some...
  • 42:52 - 42:56
    I show a study that was done on
    polyglots and hyper polyglots
  • 42:57 - 42:59
    actually only hyper polyglots I think.
  • 43:00 - 43:05
    And so they put people in a FMRI scanner,
    and just gave them language material.
  • 43:06 - 43:09
    And what they found is that
    the language network
  • 43:09 - 43:16
    was less active than for monolinguals.
  • 43:16 - 43:20
    So if you listen to something
    there's some areas
  • 43:20 - 43:22
    on the left side of your brain
    that light up :
  • 43:22 - 43:27
    you have some typical areas,
    like Broca's area, Wernicke's area,
  • 43:27 - 43:29
    and some other ones.
  • 43:29 - 43:34
    And they found that, for polyglots
    this "lighting up" is less,
  • 43:34 - 43:39
    and the interpretation was that
    the polyglots' language network,
  • 43:39 - 43:43
    through extensive practice, has become
    more and more efficient
  • 43:43 - 43:48
    at dealing with language.
    And therefore it needs less activation.
  • 43:48 - 43:53
    So this is one thing that
    you observe quite often,
  • 43:59 - 44:02
    when there's some process that
    you get really good at,
  • 44:03 - 44:06
    in your brain the activity
    that you see goes down,
  • 44:06 - 44:08
    because the network gets more efficient.
  • 44:09 - 44:11
    So that's why this paper was aptly titled
  • 44:11 - 44:15
    "The Small And Efficient Network
    Of Polyglots And Hyperpolyglots".
  • 44:16 - 44:18
    And they, you can also look this up as well,
  • 44:19 - 44:22
    they also made them listen to
    different languages,
  • 44:23 - 44:28
    and there, the better known
    the foreign language was
  • 44:28 - 44:32
    so the first experiment was completed
    in english, their mother tongue,
  • 44:33 - 44:37
    and then the second experiment they used
    their target languages like,
  • 44:37 - 44:40
    the second best language, third best
    language and so on.
  • 44:41 - 44:45
    And there, the lesser known a language,
    the less active the language network,
  • 44:45 - 44:49
    and the better known, the more active.
    So you have kind of the opposite effect.
  • 44:49 - 44:52
    And they interpreted this as reflecting
    that the more you know
  • 44:52 - 44:55
    in a target language,
    in a foreign language,
  • 44:55 - 45:00
    the more of the language network
    gets recruited, the more context you have.
  • 45:02 - 45:05
    So you have this effect of getting really
    efficient for your mother tongue,
  • 45:06 - 45:10
    and getting more of the whole message for,
  • 45:12 - 45:13
    the better you know a foreign language.
  • 45:16 - 45:18
    So this was the last question.
    Alright
  • 45:19 - 45:24
    Thanks for listening, thanks to
    the organizers for organizing this,
  • 45:24 - 45:27
    the streaming works really well,
    I'm really impressed
  • 45:28 - 45:29
    Thanks guys!
Title:
How gestures and other non-verbal cues facilitate comprehension - Xaver Funk | PGO 2021
Description:

more » « less
Video Language:
English
Duration:
45:42

English subtitles

Incomplete

Revisions