How gestures and other non-verbal cues facilitate comprehension - Xaver Funk | PGO 2021

Edit subtitles

0:07 - 0:11

Hello everyone, and a warm welcome to
Multimodal Language Processing.
0:11 - 0:17

My name is Xaver Funk, and I recently had
the chance to really [involve] myself into
0:17 - 0:20

this topic, because I am studying
neurosciences and this was kind of
0:20 - 0:26

something that I had to do. And, yeah,
that's what I want to share with you today.
0:26 - 0:33

So, what I have been doing recently also,
is learning arabic, and a little bit of
0:33 - 0:37

mongolian. And mostly what I did was,
I had this stream of auditory signals
0:37 - 0:43

that maybe came from the ASML audio, and
I tried to match those to symbols that
0:43 - 0:46

were representing these, right, in the
book.
0:47 - 0:52

And I kind of had this feeling that this
is incomplete.
0:52 - 0:54

So there is something missing there.
0:54 - 1:00

And while I was on the other hand,
studying a lot about multimodal language
1:00 - 1:04

processing, which I how gestures influence
processing and stuff like that.
1:05 - 1:08

I came to the conclusion that, yeah, there
is something missing.
1:09 - 1:12

In our world today, we are all litterate
so we mostly think of languages as
1:12 - 1:16

these auditory signals, these mouth noises
and the symbols that represent these.
1:16 - 1:22

But there is so much more going on, in
face to face communication and, yeah,
1:22 - 1:25

I want to make this point clear, with a
virtual experiment.
1:26 - 1:33

So, I want to invite you to first of all,
listen to this audio excerpt, from an
1:33 - 1:38

"Easy Languages" video. And I give you the
subtitles here, with the english
1:38 - 1:44

translation as well. So, basically, these
are auditory signals in Dutch,
1:44 - 1:47

and sequences of symbols
in Dutch and English.
1:47 - 1:51

And for the people learning Dutch, please
just ignore the English, just to make it
1:51 - 1:55

a little bit harder. And people who know
Dutch, please close your eyes, so that
1:55 - 2:00

you don't see it at all.
So, let's go.
2:41 - 2:46

So, when I was listening to this at first,
I was - because I know some Dutch,
2:46 - 2:50

I was understanding quite a lot,
but, kind of, not everything.
2:50 - 2:55

And then, I watched the video that goes
with it, it was kind of a different experience.
2:55 - 3:01

And that's what we are going to do now.
So just watch the video, and if you can,
3:01 - 3:07

see how these two women, that are
interviewed here, are interacting
3:07 - 3:10

with the interviewer, and
between each other.
3:51 - 3:54

So, I hoped this worked, and you felt
a little bit different now.
3:54 - 3:58

And even for the people who don't know
Dutch, I hope you could kind of follow
3:58 - 4:03

what was going on. And even if you
didn't, the point I want to make is that
4:03 - 4:06

messages are not only auditory,
they are always also visual.
4:07 - 4:11

We have a lot of non-auditory
articulators, like 43 face muscles
4:11 - 4:16

for example, and then 2x 34 muscles
in the hands, and then even more in
4:16 - 4:21

the arms, in our torso.
And the people in this video
4:21 - 4:25

really knew how to use these.
So for example we had a lot of
4:25 - 4:28

facial movement going on,
like you see on the top, here.
4:28 - 4:32

See how she raise her eyebrows,
and then, you have this head tilting
4:32 - 4:35

at the end, that really put
an emphasis on what she's saying.
4:35 - 4:39

And then there is a lot of gaze switching
as well.That's right in the begining,
4:39 - 4:43

when she says
(Dutch): Oh genoeg ! Heb je even ?
4:43 - 4:46

So, "Oh, there is so much that
I want to see! Do you have some time?"
4:47 - 4:52

But, she doesn't really say "time", she
says "Heb je even", "Do you have a little"
4:52 - 4:55

And for me, when I was only listening,
I didn't quite get what she was saying,
4:55 - 4:58

but when I saw how she adresses
the interviewer, I kind of got it,
4:58 - 4:59

afterwards.
5:01 - 5:06

So then there is of course manual gestures
like "hoop op mijn list",
5:06 - 5:12

that's that one here, "hoop op mijn list".
She says "berglandschap", so that's
5:12 - 5:17

a mountain range, and then "lang geleden"
"long time ago", right?
5:17 - 5:22

So there is a lot of messages that are
supported with these manual gestures.
5:23 - 5:27

Then there is also stuff like this
nose scratching, where we don't even know
5:27 - 5:30

is there something to it, or is it just
a nose scratching.
5:31 - 5:34

Does it carry some information?
We don't know.
5:35 - 5:38

And then, lastly also arm and torso
movements.
5:38 - 5:41

And also if you watch at the top here,
you have nodding. So you see how
5:41 - 5:46

these two kind of nod together, they
really give us the impression of
5:46 - 5:51

how good friends they are, right.
And then if you look at this bottom part
5:51 - 5:55

here, that's my favorite part of the
video. You really have this complex
5:55 - 6:00

orchestration of different gestures,
and they are turn-taking.
6:01 - 6:07

So, the one on the right says something,
and the one on the left answers that
6:07 - 6:13

perfectly, and then you have gestures,
and then the, putting their hair back,
6:13 - 6:18

right, so there is so much going on
between them, and it really gives more
6:18 - 6:22

than just the auditory message, right.
6:23 - 6:27

So, note that there is something that
our brain has to achieve here.
6:27 - 6:31

Mainly two things : so, it has to
segregate all of the stuff that is not
6:31 - 6:34

important for the message.
That's the segregation problem.
6:35 - 6:38

From the important stuff, and then,
take all the important stuff, all
6:38 - 6:42

the auditory and visuals information and
put it together into a coherent message.
6:42 - 6:46

That's our binding problem.
And all of this, note, all of this is
6:46 - 6:49

under a really tight time constraint,
when you're turn taking, when you're
6:49 - 6:52

having a conversation.
And if you say something and
6:52 - 6:55

the other person say something,
and there is not that much time
6:55 - 6:57

between turns. And if you need
more time, then that also has
6:57 - 7:02

a meaning, right ? If you take time, then
that means that you're hesitating
7:02 - 7:05

to answer, maybe there is something
going on with you emotionally...
7:05 - 7:10

So you don't want to have that as well.
So, yeah, so basically, this is really
7:10 - 7:13

a huge computational problem
for your brain.
7:14 - 7:18

And well, how did your brain do?
Did you feel the video was more difficult
7:18 - 7:22

than the audio ? Did you understand more,
or did you understand less?
7:22 - 7:26

Did you feel more in the scene, maybe?
Catching more informations between the lines?
7:27 - 7:31

And well, for me, at least as you might
guess, for me it was way easier
7:31 - 7:35

to follow with the video to interpret
these gestures. And this is kind of
7:35 - 7:41

a paradox. So, how come that processing
more signals simultaneously is easier
7:41 - 7:46

that processing speech alone?
And this also was shown
7:46 - 7:49

in the litterature, so people have made
experience with this.
7:50 - 7:55

And this is really a surprising
facilitation. For example, there are
7:55 - 7:57

lot of studies, I'll just give you
one example.
7:57 - 8:01

So in this study they showed people
a "prime", so this was some video
8:01 - 8:05

of an action that somebody did, and then
they showed the people different videos.
8:06 - 8:10

And the videos were either completely
congruent, so what was said was the same
8:10 - 8:14

as the gesture, and was the same as
this prime. So in that case it would be
8:14 - 8:19

"chop" and doing the chopping gesture.
And then there were different conditions
8:19 - 8:26

where either the speech was congruent,
incongruent, and the gesture was congruent.
8:27 - 8:30

Or the speech was incongruent and
the gesture was congruent.
8:30 - 8:34

And then they had also weakly congruent
stuff, like, this for "chopping",
8:34 - 8:38

but this is actually cutting so this is
only weakly incongruent.
8:38 - 8:42

And then this twisting, which is
strongly incongruent.
8:42 - 8:47

And then people had to press a button
for "yes" if either the speech or
8:47 - 8:52

the gesture were related to the prime,
and no if neither speech nor gesture
8:52 - 8:55

was related to the prime.
And what the people found out was that
8:55 - 8:59

there were differences in response times,
and also in the proportion of errors
8:59 - 9:02

that people did, as soon as soon as
there were something incongruent.
9:02 - 9:08

And from that the authors come to
the conclusion that really, speech and
9:08 - 9:12

gestures are two sides of the
same coin, they mutually interact
9:12 - 9:16

to enhance comprehension.
And now the big question is, of course,
9:16 - 9:19

how does our brain achieve this surprising
facilitation?
9:22 - 9:26

And we can look back at turn-taking,
to maybe get some clues here.
9:26 - 9:32

So on average a turn-take takes only
about 0 to 200 miliseconds, which is
9:32 - 9:38

a fifth of a second. You can see in this
video how fast she is responding,
9:38 - 9:43

right now, after this, like, this is
an instant, right?
9:44 - 9:48

And this is quite extraordinary, because
producing a single word actually takes
9:48 - 9:49

about 600 miliseconds.
9:50 - 9:55

So if I just prompt you to say a word,
you would take 600 miliseconds to say it.
9:56 - 10:00

So there's something going on, it seems
like we are predicting already what we are
10:00 - 10:04

going to say before the turn of the other
person is finished, and we already prepare
10:04 - 10:05

our turn.
10:05 - 10:10

So there is something that is going on,
that has to do with prediction.
10:11 - 10:15

Most language use in conversation
has to be based on prediction somehow.
10:15 - 10:18

And this is quite nice, because prediction
is anyways the current hype
10:18 - 10:22

in neuroscience nowadays, and it's
basically a good candidate for
10:22 - 10:24

the overarching function of the brain.
10:25 - 10:31

And many people think that what we are
doing in our daily lives is basically
10:31 - 10:34

constantly computing and updating
probability distributions.
10:35 - 10:39

And this applies both to action,
to perception, and also to language.
10:40 - 10:44

So, this will be a rephrasing of
the problem we had before,
10:44 - 10:47

as a prediction problem.
And this become then,
10:47 - 10:51

"given the preceding context - so, given
all the words that come before - what word
10:51 - 10:57

is most likely to come up next?"
Right? And to make this more clear,
10:57 - 11:01

let me give you a quick example:
so, imagine I come to you and I say,
11:01 - 11:04

without anymore context "I would like to".
11:05 - 11:08

And then, you don't know what I'm going
to say next, right?
11:08 - 11:12

It could be any of these, for example.
I would like to drink, eat, work...
11:12 - 11:16

And so on.
And now, if I shape my hand
11:16 - 11:21

in the form of a "C", and I put it to
my mouth, like this, while I say
11:21 - 11:27

"I would like to", then your probability
distribution over these words changes
11:27 - 11:31

in such a way that "drink" is much more
likely to be the next word.
11:31 - 11:35

And maybe "eat" also a little bit, but
the others words probably not,
11:35 - 11:39

because, you can associate this
gesture with drinking, or a little bit
11:39 - 11:43

with eating, because it's also
something that you put to your mouth,
11:43 - 11:47

but mostly this is commonly understood
as "drinking", right.
11:49 - 11:54

So, in this way gestures add context to
predictions and help this process of
11:54 - 11:57

predicting, and that also helps the
comprehension.
11:58 - 12:01

And, we can actually measure prediction,
using neurophysiology.
12:03 - 12:10

So this is EEG, and "EEG" stands for
"Electro Encephalography", and
12:10 - 12:13

it's basically putting electrodes on the
scalp, and then measuring
12:13 - 12:20

the brain activity that's below.
If you do this, well you can measure
12:20 - 12:24

brain activity basically.
What people usually do, is that
12:24 - 12:28

they give people these sentences.
So these could be normal sentences,
12:28 - 12:31

like this one : "It was his first day
at work."
12:32 - 12:37

Or it could be so-called garden-path
sentences. So these are sentences that are
12:37 - 12:43

somehow manipulated artificially
to elicit some response. Right?
12:43 - 12:46

So this would be : "He spread the warm
bread with socks."
12:46 - 12:51

So you may have a weird feeling on
your head, because nobody spreads
12:51 - 12:57

the bread with socks. And this weird
feeling, if we would measure you with
12:57 - 13:04

an EEG, would constitute this reaction
here, that's a so-called N400.
13:04 - 13:10

"N" because it is a negative polarity,
and it's 400 miliseconds after the word.
13:10 - 13:14

So, all of this above here is just
electrical activity, right? And you have
13:14 - 13:19

this really pronounced peak, when
there is a violation of the semantics,
13:19 - 13:25

like with "socks".
And it's also taken to be a prediction
13:25 - 13:29

error. So you did not predict socks,
you predicted Nutella, for example,
13:29 - 13:33

or honey. But not socks. And this is
reflected in this N400 prediction error.
13:33 - 13:38

So people are doing this a lot, like, showing
these sentences that are somehow manipulated.
13:38 - 13:43

We have another example here, this is
another topic : if you write in all caps
13:43 - 13:46

you have this kind of response for
example.
13:47 - 13:51

But, what I want to do now with you
is bringing you more to the cutting edge
13:51 - 13:56

of what is currently done in multimodal
processing research.
13:57 - 14:01

So the trend is to go away from these
artificially constructed sentences, and
14:01 - 14:06

more towards naturalistic language
comprehension. So, using actual stories,
14:06 - 14:12

actual sentences, that are not manipulated
in any way. And this will be combined with
14:12 - 14:16

computational linguistics - how that
works, you will see in a bit.
14:16 - 14:21

And also, yeah, with that you can look at
multimodal processing if you just add
14:21 - 14:26

a video to the audio that
you make people listen to.
14:27 - 14:33

And what it might look like
is like this. So, this is one study
14:33 - 14:36

that is currently not published
officially yet. It is already on
14:36 - 14:44

the archive. And I want to use this to
illustrate to you how we might research
14:44 - 14:49

naturalistic language comprehension.
So the general planners get some
14:49 - 14:54

per-word measures - so these would be
these ones here. So for each word,
14:54 - 15:00

there is some value attached.
And then we can use these as regressors
15:00 - 15:05

in the big linear regression model.
So, using fancy statistics, and with that
15:05 - 15:13

we're basically asking our data "how well
are you predicted by these regressors?"
15:14 - 15:19

And for example, this one here is
surprise and this is closely related
15:19 - 15:25

to predictions or prediction errors.
So this is the negative log probability
15:25 - 15:29

of a word, given all of the words
that come before it.
15:29 - 15:33

So, this is the contexte, basically,
and this is some word, "w".
15:34 - 15:37

So this is basically telling you how
unpredictable is a given word.
15:39 - 15:43

And this measure is base on computational
language models, so for example,
15:43 - 15:48

they would take the whole corpus of
a language, and then, see which words
15:48 - 15:54

occurs after each other, and thereby get
to this value of how unpredictable it is.
15:56 - 16:01

And then, they have another thing here.
They use the fundamental frequency of each
16:01 - 16:06

word as a pitch indicator, to control
for prosody, which is also pretty cool.
16:06 - 16:11

So they let loose their linear
regression models, with these predictors,
16:11 - 16:18

so they have a surprisal value for each
word, for example, a prosody, ready for
16:18 - 16:22

each word, then they indicate where
there are meaningful gestures happening,
16:22 - 16:25

and, yeah, also, mouth movements.
16:27 - 16:30

And, what came out of this, one finding
that might be interesting for us,
16:30 - 16:37

now, is that for meaningful gestures,
the N400 is less negative.
16:38 - 16:43

So you can also see this here : for
meaningful gesture, this blue line,
16:43 - 16:48

you see that it is a lot less negative
than the red line where the gestures are
16:48 - 16:53

absent. And then there's also, that's why
I told you about surprisal, an interesting
16:53 - 16:58

interaction between gestures and
surprisal. So, the higher the surprisal,
16:58 - 17:03

the less unexpected a word, the stronger
this facilitating effect of gestures is.
17:04 - 17:10

Which is also really interesting.
Then there's, this is a similar study,
17:12 - 17:15

that I actually got the chance to work on,
with a colleague.
17:17 - 17:21

So what we did here, we had a measure of
entropy. This measures basically the
17:22 - 17:26

uncertainty about the next word.
So if you think back to the example
17:26 - 17:31

we had before, where I was telling
you "I would like to", and then something,
17:31 - 17:35

but without context, that would be really
high entropy, really high uncertainty :
17:35 - 17:40

you don't know what's coming next. Right?
Then we also had surprisal, we had word
17:40 - 17:44

frequency, how often the word came up.
And IVC is a measure of,
17:44 - 17:50

it's an abbreviation for "instantaneous
visual change", so, how much the actor
17:50 - 17:56

moved while we were showing this to
the people. And then speech envelope,
17:56 - 18:01

this is basically a measure of the level
of the sound.
18:03 - 18:09

And what we found is - and this
by the way was an FRMI experiment,
18:09 - 18:13

so we can look at wich regions are active
during some condition.
18:13 - 18:18

And for words where the surprisal was
really high, there were these regions
18:18 - 18:21

in red active, and for words where
entropy was really high,
18:21 - 18:26

these regions in blue. And now if
we look at interactions with gestures
18:26 - 18:32

for the entropy condition, we can see that
when there were gestures present, we had
18:32 - 18:38

really specific activations compared
to when there were no gestures present,
18:38 - 18:41

in situations where there is high
entropy, so high uncertainty.
18:43 - 18:50

So with these tools we try to get into
the processes that underlie prediction
18:50 - 18:56

in language.
So let's take a step back, and have a look
18:56 - 19:00

at kind of a more global
evolutionnary perspective.
19:02 - 19:06

We know from primate research that gesture
and gaze are crucial for communication.
19:07 - 19:10

You can see it in this video : this ape
right here does this gesture,
19:10 - 19:15

and this signals to its mother
to pick her up. Right?
19:15 - 19:22

So these are bonobos, and you can see
right now, this "pick me up" gesture.
19:24 - 19:29

And Federico Rossano, from the Max Planck
Institute for Evolutionnary Institute,
19:30 - 19:36

could show that this gesture get more
and more ritualized, to the point where
19:36 - 19:46

it becomes only a small wrist bend with
the arm and one gaze, to instantiate
19:46 - 19:50

this carry behavior. Right?
So you see that there is also kind of
19:50 - 19:57

a prediction involved : the mother has
to predict what the child is wanting
19:57 - 20:00

to do, right?
Going from this, to only this.
20:01 - 20:07

Then, building on this, there are some
authors that propose that speech and
20:07 - 20:12

gesture have a common origin.
And the idea here is that, through
20:12 - 20:15

these ritualized gestures that
we've just seen in those bonobos,
20:15 - 20:19

after a time there will be a proto
sign language evolving.
20:19 - 20:22

Which then at some point will be
accompanied by sound as well,
20:22 - 20:28

evolving into a proto speech language.
And then the proto sign, the proto speech,
20:28 - 20:33

will reinforce each other more and more,
until language emerges.
20:34 - 20:40

And another point, here,
or an observation : those of you who have
20:40 - 20:44

tried sign language, it kind of feel
surprinsingly natural, right?
20:44 - 20:51

So, if speech is the true communication
medium for humans, why is it
20:51 - 20:56

thet sign language feels so real,
so natural, right?
20:58 - 21:03

And then another point that goes into this
theory is that voluntary hand movements
21:03 - 21:08

came before voluntary breathing. And you
need voluntary breathing to articulate
21:08 - 21:09

yourself, right?
21:10 - 21:14

So, also, just as complementary to speech,
21:14 - 21:18

you can more easily show spatial
relations between things.
21:19 - 21:23

And then, if you look at child development
the same pattern : gesture develop
21:23 - 21:30

before speech, and pre-speech turn-taking
is faster than later.
21:30 - 21:35

So if you're a baby and you gesture,
the turn-taking with your mother,
21:35 - 21:40

the communication is quite fast,
it's almost adult level turn-taking.
21:41 - 21:45

Then as you learn language it gets way
slower, and only in middle-school it gets
21:45 - 21:48

gets back to the adult level turn-taking.
21:49 - 21:53

So, what's the point, right? What does all
of this mean for language learning?
21:54 - 21:59

So for this, let's do another
time-travel, back to 1768,
21:59 - 22:07

and meet this French Jesuit monk :
Claude-François Lizarde de Radonvilliers.
22:08 - 22:12

And he wrote this book :
(French) "About The Way To Learn Languages"
22:12 - 22:18

back in the day, where he reflected on how
we should teach people languages.
22:18 - 22:24

And interestingly, this is basically
the grandfather of the Assimil method,
22:24 - 22:30

and also the Méthode
Toussaint-Langenscheidt, or, also called
22:30 - 22:34

"interlinearversion". So this would be
this sheet here.
22:34 - 22:39

This was a way people learned languages
at the turn of the previous century.
22:39 - 22:45

And you can see here that you have
the spanish at the top, then some
22:45 - 22:50

consideration in the middle,
and on the bottom the german.
22:51 - 22:54

And this is kind of similar to what
Assimil does, right?
22:55 - 22:58

So this is really interesting,
but that's not the point here.
22:59 - 23:04

What he also did in this book is to compare
L1 - so first language acquisition -
23:04 - 23:09

with second language learning.
And he noted that it seems that,
23:09 - 23:16

for the first language, parents show their
children pictures, and enact words or
23:16 - 23:20

concepts, and encourage the children to do
the same, like this little boy does here.
23:20 - 23:25

But for second language acquisition all we
do is give people these vocabulary lists,
23:25 - 23:27

and expect them to learn it
just like that.
23:28 - 23:33

So this is kind of an interesting point,
and since then it has been shown,
23:33 - 23:37

- and this is actually pretty robust,
I was really surprised, that it has been
23:37 - 23:43

a really robust finding, that gesture
enriched material enhances learning.
23:43 - 23:52

So in this study, for example, people
tried to teach english-speaking people
23:52 - 23:56

japanese words, and they had four
different training conditions.
23:57 - 24:03

So one, only speech, one repeated speech,
one speech plus incongruent gestures
24:03 - 24:07

- so gestures that would not match -
and then, congruent gestures.
24:08 - 24:10

And this is the interesting condition,
right?
24:11 - 24:16

And then they tested the people after
encoding for three different times :
24:16 - 24:18

after five minutes, after two days,
and after one week.
24:19 - 24:25

And also they tested them on forced choice
so it's basically multiple choice,
24:26 - 24:29

and free recall, so it's prompting
the people with the word, and then they
24:29 - 24:35

come up themselves with the answer.
So these numbers here are basically
24:35 - 24:37

the proportion of correct
answers that people give.
24:38 - 24:40

And you can see that,
across the board,
24:40 - 24:43

the speech plus congruent
gesture condition is
24:43 - 24:51

very superior compared to the other ones,
which is, yeah, which is interesting, and
24:51 - 24:57

so, you would maybe think that the point
is "okay, so we just use videos instead
24:57 - 25:00

of audios", right?
And this is what I would call
25:00 - 25:03

Multisensory enrichment.
And there is nothing wrong with this,
25:03 - 25:11

this is really useful, you have
these YouTube channel like Easy Languages
25:11 - 25:15

- I'm not sponsored by the way (laugh) -
where you have conversations with
25:15 - 25:19

real people that from time to time
make gestures, and you get the full
25:19 - 25:23

conversation thing, right?
And you have these one-on-one videos,
25:23 - 25:30

like this one from Mandarin Corner.
Where they are also a lot of gestures
25:30 - 25:34

involved, so the host, Eileen, really
tries to integrate a lot of gestures.
25:35 - 25:39

But this is actually not the point
- I mean, this is cool but I think
25:39 - 25:44

you already do that.
The point is way deeper.
25:44 - 25:50

So, there's another thing going on,
not only when you watch gestures,
25:50 - 25:54

but when you enact them.
This is called the enactment effect.
25:55 - 25:58

This was actually coined in 1980,
by two germans.
25:58 - 26:04

They called it first the "Tu-Effekt",
which translates literally to "Do-Effect".
26:05 - 26:09

And you can see why people chose to call
it the enactment effect, because it sounds
26:09 - 26:13

way more fancy (laugh) but I really like
the "tu-effekt", it sounds funny.
26:14 - 26:20

Anyways, the point is that action words
or phrases, this is what they - Engelkamp
26:20 - 26:23

et Krumnacker - noticed : that action
words and phrases are remembered better
26:23 - 26:27

if they're acted out,
or accompanied by gestures.
26:28 - 26:33

So if you would learn the phrase
"chopping garlic", then if you enact it
26:33 - 26:37

actually while learning it,
you will retain it way better.
26:37 - 26:40

And this effect is also
really well replicated,
26:40 - 26:42

and this was also
really surprising to me,
26:42 - 26:49

because it is virtually not at all
translated into actual teaching.
26:49 - 26:53

Nobody does this, nobody tells
the students to enact things, right?
26:53 - 26:56

Enact words, enact anything.
26:56 - 27:01

And it has been well replicated
across tasks, across materials and also
27:01 - 27:05

across populations : across children,
adults, even clinical populations :
27:05 - 27:08

People with Alzheimer, people recovering
from stroke...
27:08 - 27:13

Somehow people made them
learn words and then act the words,
27:13 - 27:18

and it worked better
than without enactment.
27:19 - 27:23

And also, this is not only true for
action words and concrete words,
27:23 - 27:26

but also abstract words.
Anything you can somehow find
27:26 - 27:32

a representation - with gestures - for,
you can use this enactment effect.
27:33 - 27:36

And this is way more powerful than
multysensory enrichment,
27:36 - 27:40

and we can call this
"sensorimotor enrichment",
27:40 - 27:43

because you use
your senses and your "motor".
27:44 - 27:51

So, this is also, this ties in with
another really interesting development
27:51 - 27:53

in neurosciences, called
"embodied cognition".
27:54 - 27:58

Basically this is the idea that many
features of cognition - and these might be
27:58 - 28:02

concepts, categories, reasoning or
judgement - are shaped by aspects
28:02 - 28:06

of the body. And this would be
the motor system - so how we move -
28:06 - 28:11

the perceptual system - what we see, what
we feel, what we hear, and so on.
28:11 - 28:15

And also bodily interactions with
the environment.
28:15 - 28:19

And you might see where I get with this,
if you think about concepts and categories
28:20 - 28:24

What are words, if not concepts and
categories, right?
28:24 - 28:28

So we might ask the question, "how are
words represented in the brain?"
28:29 - 28:31

And there is this really funny study,
28:32 - 28:38

they showed people words that had strong
olfactory associations, which means
28:38 - 28:42

they either stink really hard, or
they smell really well.
28:43 - 28:49

And, in case you're looking for some
inspiration for your spanish poem,
28:49 - 28:53

you can go (laugh) to this publication and
search through the list of words.
28:53 - 28:57

This is also a small - this is only
a small sample, there are tons of
28:57 - 29:02

really strong smelling words
in the study and, yeah.
29:03 - 29:06

So basically, what they found is that
when they showed people these words,
29:06 - 29:12

as compared to words that did not smell
that much, some regions in the brain
29:12 - 29:16

that are associated with olfaction,
so, with smelling, lighted up.
29:18 - 29:24

And this kind of has been
extended as well to actions.
29:24 - 29:29

So, on the left here, these are
all the regions that light up
29:29 - 29:32

when you move your foot
when you move your fingers,
29:32 - 29:36

or when you move your tongue.
And on the right here, these are
29:36 - 29:42

the regions that light up when you read
leg-related words, arm-related words,
29:42 - 29:46

or face-related words.
And you can see that, this more or less,
29:46 - 29:51

this is more or less,
these activations fit each other, right?
29:51 - 29:56

So, in some way, leg-related words
are stored where you also move you leg,
29:56 - 29:59

arm-related words are stored where
you also move your arms, and so on.
30:00 - 30:07

So, we can think about words actually
as functional networks, like this.
30:08 - 30:13

And, note that words are
experience-dependent functional networks.
30:14 - 30:17

And experience is connected
to the body, right?
30:18 - 30:26

So for exemple, you surely have, not only
read and heard the word "garlic",
30:26 - 30:29

you also have smelled garlic,
you touched garlic, you tasted garlic
30:29 - 30:32

and, really important thing
you chopped garlic.
30:32 - 30:35

So when you read "garlic",
you not only have
30:35 - 30:39

the core language areas - in yellow here -
activated, but also
30:39 - 30:45

subcortical olfactory areas, and some
gustatory areas - so, for taste -
30:45 - 30:49

action areas, right, and visual areas
as well.
30:50 - 30:54

So, what I want to tell you here,
think about this when you learn languages.
30:55 - 31:02

Did you do the same for "knoblauch", for
example - the german word for "garlic"?
31:03 - 31:09

If you learn german, do you actually
get into this huge associated network?
31:10 - 31:13

So, and that's the point basically,
we are coming to the end,
31:14 - 31:18

the point is that language is multimodal,
you should use sensory-motor enrichment
31:18 - 31:22

when learning languages, and thereby
embody your languages.
31:24 - 31:27

And if you want to learn more about this,
and also for me to give credit,
31:27 - 31:31

this is basically where I got most
of my input from.
31:32 - 31:37

These are four big review articles that
discuss all of this stuff.
31:38 - 31:44

So yeah, that's basically it.
Thanks for listening, and hoping for some cool questions.
31:53 - 31:58

I would most certainly guess so.
This thing with the phone is also
31:58 - 32:01

something that I have
experienced quite a lot.
32:04 - 32:09

I have lived in Chile for some time,
and I've got myself a chilean SIM-card
32:10 - 32:15

And I didn't give this number to a lot
of people, but somehow, this number got
32:15 - 32:18

to people that were, I don't know,
trying to sell me something.
32:19 - 32:22

And I would get a call from
somebody, pick up the phone,
32:22 - 32:24

and I would not understand
a single word.
32:24 - 32:28

Like, Chilean spanish is already
really hard, and then it's completely
32:28 - 32:31

out of context, I don't know what
this person wants from me, and then
32:31 - 32:34

it's just [gestures] and I'm like
"sorry, I don't understand you"
32:34 - 32:36

"I don't understand you",
"I don't understand you",
32:36 - 32:38

over and over again.
32:38 - 32:41

And yeah, I mean, if you're on the phone,
32:43 - 32:46

there's also a little bit of noise maybe,
32:46 - 32:50

and I really have the feeling that
that makes,
32:50 - 32:53

especially in a foreign language,
32:53 - 32:56

conversing that much harder, because
you don't see the mouth movements,
32:57 - 33:02

it's not that clear of a sound,
you don't see anything else, and yeah.
33:02 - 33:04

I would say so.
Cool question.
33:18 - 33:21

I would guess so, I would guess so.
Like, I mean,
33:23 - 33:26

especially for autistic people
there is a lot of research
33:26 - 33:28

on language processing in general,
33:30 - 33:34

but I don't know of any studies that are
33:35 - 33:38

specifically for multimodal processing,
33:38 - 33:41

but I think there are quite a few.
33:41 - 33:44

Actually the experiment that I showed you,
33:47 - 33:51

the one that I worked on, as well,
in the middle of the presentation,
33:51 - 33:55

the entropy stuff, we also did this
with schizophrenic patients,
33:56 - 33:58

but we have not looked at the data yet.
33:58 - 34:00

So once this publication is done then,
34:01 - 34:05

somebody else will deal with
34:05 - 34:08

the clinical data,
with the schizophrenic people,
34:08 - 34:11

and in general for schizophrenics there's,
34:11 - 34:13

there's a lot of,
34:15 - 34:19

like, language-related abnormalities,
34:21 - 34:23

and I think for autistic people as well.
34:23 - 34:25

I'm not sure about ADHD
34:28 - 34:31

but yeah, it would be
a really interesting thing to,
34:33 - 34:36

to look at this for autistic people, for sure.
34:36 - 34:40

And maybe people did this.
You can, maybe, you can look it up.
34:41 - 34:46

I don't have anything in my head right now
but yeah.
34:47 - 34:50

There should be something.
34:59 - 35:04

I think one of the studies that
I glanced over actually tried this.
35:04 - 35:08

So they had some gestures that were nonsense
35:08 - 35:11

I don't know if it was with abstract words
or with concrete words,
35:12 - 35:14

but they used nonsense gestures,
35:14 - 35:17

and they still had an effect,
but it was smaller.
35:18 - 35:21

So if you try to make this,
35:21 - 35:23

to integrate this into your studies,
35:24 - 35:29

I would suggest that you try to find
an enactment that is as sensical as it gets
35:30 - 35:34

I mean, it's not that it's impossible
to get enactment for abstract words,
35:35 - 35:38

you just have to be a little bit more
creative, and I think the more creative
35:38 - 35:43

you will be the more effective.
Like, similar to mnemonics,
35:43 - 35:48

like, the more crazy mnemonic is,
the easier it is to remember.
35:49 - 35:53

I could see the same effect with
enactments as well.
35:54 - 35:58

And if you should use signs from
sign languages,
35:59 - 36:04

I think if you want to that's a cool idea,
because then you automatically also learn
36:04 - 36:07

the sign language.
And when I was preparing this presentation
36:07 - 36:11

I actually thought about this.
Like why do we learn languages?
36:13 - 36:17

Like if I now start learning a language,
why do I not learn the sign language
36:17 - 36:19

that goes with it, right?
36:19 - 36:23

I think it would make things easier,
because you actually have
36:23 - 36:24

the enactment ready for you,
36:25 - 36:28

and it's just a cool thing, right?
36:28 - 36:32

You can talk with so many more people.
36:33 - 36:40

And also, I think in general, people
should learn sign languages regardless
36:42 - 36:47

This became really clear to me actually
at the Polyglot Gathering in 2019.
36:47 - 36:53

In Bratislava we were on some ship where
there was a party.
36:55 - 36:59

There was loud music, then there were some
people who knew sign language
36:59 - 37:00

maybe they are listening right now.
37:01 - 37:04

So they just started, they were like,
on the dance floor,
37:04 - 37:08

and instead of screaming into
each other's ears, like people usually do,
37:08 - 37:10

they just started to sign,
and it was so smooth, like
37:11 - 37:15

why should we communicate with sound
when we can do it with gestures. Right?
37:16 - 37:19

I think for many situations
it would be a lot easier.
37:20 - 37:24

So yeah, if you can use the signs
of your target language,
37:25 - 37:27

I think that's a cool idea.
37:31 - 37:35

Yeah, it does sound like that.
Indeed, indeed.
37:37 - 37:40

Yeah, I can totally see that.
37:51 - 37:55

There is a study, that I came across while
reasearching, but I didn't look into it.
37:56 - 38:00

If you want the reference, you can
reach out to me somehow and I can see
38:00 - 38:03

if I can find it and send it to you.
38:05 - 38:08

I didn't look deeply into it,
38:10 - 38:14

and I think I wouldn't find it now quickly.
38:15 - 38:20

So again, there's something done, but
I can't recall it from my head right now.
38:24 - 38:27

Well so, there's two things
38:28 - 38:30

maybe more things but let's start with two
38:30 - 38:33

So first of all when you learn a new word,
38:34 - 38:37

try to get the whole picture of the word.
38:38 - 38:44

Like the garlic example, try to imagine
how it smells, how it feels,
38:44 - 38:47

how you chop it, try to enact it.
38:48 - 38:54

Take a moment, and really try to activate
the whole functional network of this word.
38:55 - 39:02

And then the other thing was also just
to use input with video,
39:02 - 39:04

if you're learning with some input.
39:05 - 39:09

Look up if you find some interesting
channels on YouTube or something.
39:09 - 39:13

And then also, that might have not
been clear from my presentation,
39:13 - 39:16

if you are conversing with people, use signs.
39:20 - 39:25

I don't know if people do this naturally
in general, I think I kind of do it,
39:25 - 39:29

if I talk in my target language,
and I'm not sure about a word
39:29 - 39:33

I will try to make sure with my hands
that somehow,
39:33 - 39:37

something gets, like, to the other person.
39:37 - 39:40

So what I'm going for is that the other
person recognizes
39:40 - 39:43

what I'm trying to say, and then
gives me the word, right.
39:44 - 39:46

Like in the example with the glass of water,
39:47 - 39:49

if I don't know "drink" in some language
I would,
39:49 - 39:52

I would try to [MIMES]
Right? "I want to [MIMES]" Right?
39:52 - 39:58

And then have the other person gives me
the word, because I'm actively reducing
39:58 - 40:01

the uncertainty that the other perso has,
that is trying to predict
40:01 - 40:04

what I'm going to say,
by giving gestures. Right?
40:05 - 40:09

So that would be my three
practical implications for now.
40:19 - 40:21

Yeah so this is something that
I don't know.
40:22 - 40:27

Again I think it's worth trying to do this
with the sign language.
40:27 - 40:32

I mean, there's a system of really...
40:35 - 40:39

There's a system of really fitting
gestures that people use,
40:40 - 40:47

and it might actually be a good idea
to try this out, to use sign language
40:47 - 40:50

as you're learning the actual language,
40:52 - 40:55

to get this enactment working.
40:56 - 41:00

Might be more effective than
making up your own gestures.
41:01 - 41:06

I mean if you make up your own gestures
you have the advantage that
41:10 - 41:12

during the process of
coming up with the gesture,
41:12 - 41:15

you are engaging your brain
in a specific way
41:15 - 41:19

that's not there if you just get
the gesture from somebody.
41:20 - 41:22

So there might be an advantage there,
41:22 - 41:28

but the other advantage is of course
time that you can save,
41:28 - 41:33

and the ability to communicate
with people that can't hear
41:34 - 41:38

So yeah, I think that's open for
exploration, for sure.
41:44 - 41:46

Well you see it in sign language, right?
41:48 - 41:51

People that sign don't really speak.
41:52 - 41:55

And they get along pretty nicely.
41:56 - 42:01

Another question would be if all of society
as a whole can do without verbal.
42:02 - 42:06

That's another question, but I think
you can restructure society
42:06 - 42:10

in a way that everybody can communicate
with gestures, for sure.
42:12 - 42:17

And according to some people, it was like
that before speech developped.
42:26 - 42:29

So yeah, there has been some research,
not much.
42:30 - 42:34

If you are interested in this, make sure
to check out my presentation on this topic
42:34 - 42:40

from last year's Gathering,
and also from last year's conference.
42:41 - 42:45

The conference one is not up on YouTube
already, but the Gathering one,
42:45 - 42:49

and there's, in the end, I show some...
42:52 - 42:56

I show a study that was done on
polyglots and hyper polyglots
42:57 - 42:59

actually only hyper polyglots I think.
43:00 - 43:05

And so they put people in a FMRI scanner,
and just gave them language material.
43:06 - 43:09

And what they found is that
the language network
43:09 - 43:16

was less active than for monolinguals.
43:16 - 43:20

So if you listen to something
there's some areas
43:20 - 43:22

on the left side of your brain
that light up :
43:22 - 43:27

you have some typical areas,
like Broca's area, Wernicke's area,
43:27 - 43:29

and some other ones.
43:29 - 43:34

And they found that, for polyglots
this "lighting up" is less,
43:34 - 43:39

and the interpretation was that
the polyglots' language network,
43:39 - 43:43

through extensive practice, has become
more and more efficient
43:43 - 43:48

at dealing with language.
And therefore it needs less activation.
43:48 - 43:53

So this is one thing that
you observe quite often,
43:59 - 44:02

when there's some process that
you get really good at,
44:03 - 44:06

in your brain the activity
that you see goes down,
44:06 - 44:08

because the network gets more efficient.
44:09 - 44:11

So that's why this paper was aptly titled
44:11 - 44:15

"The Small And Efficient Network
Of Polyglots And Hyperpolyglots".
44:16 - 44:18

And they, you can also look this up as well,
44:19 - 44:22

they also made them listen to
different languages,
44:23 - 44:28

and there, the better known
the foreign language was
44:28 - 44:32

so the first experiment was completed
in english, their mother tongue,
44:33 - 44:37

and then the second experiment they used
their target languages like,
44:37 - 44:40

the second best language, third best
language and so on.
44:41 - 44:45

And there, the lesser known a language,
the less active the language network,
44:45 - 44:49

and the better known, the more active.
So you have kind of the opposite effect.
44:49 - 44:52

And they interpreted this as reflecting
that the more you know
44:52 - 44:55

in a target language,
in a foreign language,
44:55 - 45:00

the more of the language network
gets recruited, the more context you have.
45:02 - 45:05

So you have this effect of getting really
efficient for your mother tongue,
45:06 - 45:10

and getting more of the whole message for,
45:12 - 45:13

the better you know a foreign language.
45:16 - 45:18

So this was the last question.
Alright
45:19 - 45:24

Thanks for listening, thanks to
the organizers for organizing this,
45:24 - 45:27

the streaming works really well,
I'm really impressed
45:28 - 45:29

Thanks guys!

Title:: How gestures and other non-verbal cues facilitate comprehension - Xaver Funk | PGO 2021
Description:: more » « less
Video Language:: English
Duration:: 45:42

	A.S edited English subtitles for How gestures and other non-verbal cues facilitate comprehension - Xaver Funk \| PGO 2021
	A.S edited English subtitles for How gestures and other non-verbal cues facilitate comprehension - Xaver Funk \| PGO 2021
	A.S edited English subtitles for How gestures and other non-verbal cues facilitate comprehension - Xaver Funk \| PGO 2021
	A.S edited English subtitles for How gestures and other non-verbal cues facilitate comprehension - Xaver Funk \| PGO 2021
	A.S edited English subtitles for How gestures and other non-verbal cues facilitate comprehension - Xaver Funk \| PGO 2021
	A.S edited English subtitles for How gestures and other non-verbal cues facilitate comprehension - Xaver Funk \| PGO 2021
	A.S edited English subtitles for How gestures and other non-verbal cues facilitate comprehension - Xaver Funk \| PGO 2021
	A.S edited English subtitles for How gestures and other non-verbal cues facilitate comprehension - Xaver Funk \| PGO 2021

Show all

English subtitles

Incomplete

Revisions

Revision 38 Edited

A.S

	A.S edited English subtitles for How gestures and other non-verbal cues facilitate comprehension - Xaver Funk \| PGO 2021
	A.S edited English subtitles for How gestures and other non-verbal cues facilitate comprehension - Xaver Funk \| PGO 2021
	A.S edited English subtitles for How gestures and other non-verbal cues facilitate comprehension - Xaver Funk \| PGO 2021
	A.S edited English subtitles for How gestures and other non-verbal cues facilitate comprehension - Xaver Funk \| PGO 2021
	A.S edited English subtitles for How gestures and other non-verbal cues facilitate comprehension - Xaver Funk \| PGO 2021
	A.S edited English subtitles for How gestures and other non-verbal cues facilitate comprehension - Xaver Funk \| PGO 2021
	A.S edited English subtitles for How gestures and other non-verbal cues facilitate comprehension - Xaver Funk \| PGO 2021
	A.S edited English subtitles for How gestures and other non-verbal cues facilitate comprehension - Xaver Funk \| PGO 2021

How gestures and other non-verbal cues facilitate comprehension - Xaver Funk | PGO 2021

Revisions

Our website uses cookies

Operating cookies (Required)