cdn.media.ccc.de/.../wikidatacon2019-14-eng-Keynote_Why_is_collecting_lexical_data_one_of_the_best_ways_we_can_help_support_underserved_and_endangered_languages_hd.mp4

Edit subtitles

0:06 - 0:09

Now, there are approximately
7,500 languages
0:09 - 0:11

spoken on the planet today.
0:12 - 0:14

Of those, it's estimated
0:14 - 0:18

that about 70%
are at risk of not surviving
0:18 - 0:20

the end of the 21st century.
0:22 - 0:24

Every time a language dies,
0:25 - 0:27

it's severing a connection
0:27 - 0:31

that has lasted for hundreds
to thousands of years,
0:31 - 0:35

to culture, to history,
0:35 - 0:38

and to traditions, and to knowledge.
0:39 - 0:42

The linguist Kenneth Hale once said
0:42 - 0:44

that every time a language dies,
0:44 - 0:47

it's like dropping
an atom bomb on the Louvre,
0:49 - 0:52

So the question is,
0:53 - 0:55

why do languages die?
0:56 - 1:00

Well, perhaps the simple answer might be
1:00 - 1:03

that one could imagine
authoritarian governments
1:03 - 1:05

preventing people from speaking
their native language,
1:06 - 1:10

children being punished
for speaking their language at school,
1:10 - 1:13

or the government
shutting down radio stations
1:13 - 1:15

in the minority language.
1:15 - 1:17

And this definitely happened in the past,
1:17 - 1:19

and it still, to some extent,
happens today.
1:20 - 1:23

But the honest answer
1:23 - 1:27

is that for the vast majority
of the cases of language extinction,
1:27 - 1:29

it's a much simpler
1:29 - 1:33

and a much more easy-to-explain answer.
1:34 - 1:36

The languages go extinct
1:36 - 1:38

because they are not passed down
1:38 - 1:40

from one generation to the next.
1:42 - 1:44

Every single time a person who speaks
1:44 - 1:46

a minority language has a child,
1:47 - 1:50

they go through a calculus.
1:51 - 1:53

They ask themselves,
1:54 - 1:56

"Do I pass my language down to my child,
1:57 - 2:01

or do I instead teach them
only the majority language?"
2:01 - 2:03

Essentially, there is a scale that goes on
2:04 - 2:06

that they access in their heads,
2:07 - 2:08

in which on one side
2:10 - 2:12

every single time in their lives
2:12 - 2:14

that they've had an opportunity
to use their native language
2:15 - 2:18

for communication,
for access to traditional culture,
2:20 - 2:22

a stone is placed on the left side.
2:22 - 2:24

And every time that they find themselves
2:24 - 2:26

unable to use their native language,
2:26 - 2:28

and instead have to rely on
the majority language,
2:28 - 2:30

a stone is placed on the right side.
2:32 - 2:35

Now, due to the strength and the dignity
2:35 - 2:37

of being able to speak
one's mother tongue,
2:37 - 2:39

the stones on the left
tend to be a bit heavier.
2:39 - 2:42

But with enough stones on the right side,
2:43 - 2:45

then eventually the scale tips,
2:45 - 2:47

and then when a person makes the decision
2:47 - 2:49

to pass their language down,
2:49 - 2:51

they see their own language
2:51 - 2:53

as more of a burden than a blessing.
2:55 - 2:59

So the question is,
how do we reverse this?
2:59 - 3:02

First, we need to think
about the fact that,
3:04 - 3:05

for any given language,
3:05 - 3:08

there are certain social spheres
that they can be used in.
3:08 - 3:09

So any language
3:09 - 3:11

that's a mother tongue spoken today,
3:11 - 3:13

can be used with one's family.
3:14 - 3:17

A smaller set of languages
can be used within one's community,
3:17 - 3:19

a smaller set, maybe within one's region,
3:19 - 3:22

and for a small handful of languages,
3:23 - 3:24

they can be used
for international communication.
3:26 - 3:29

And then even across these spheres,
3:29 - 3:32

there's the question of can someone
use their language,
3:32 - 3:36

for the purpose of education or business,
3:36 - 3:38

or in technology?
3:39 - 3:42

So, to better explain
3:43 - 3:45

what I'm talking about here,
3:45 - 3:46

I would like to use an anecdote.
3:48 - 3:50

Let's say that you are about to go
3:50 - 3:52

on your dream vacation to India,
3:53 - 3:56

and you have an eight-hour
layover in Istanbul.
3:57 - 4:01

Now, you weren't necessarily
planning on visiting Turkey,
4:01 - 4:04

but with your layover
and with a Turkish friend
4:04 - 4:06

telling you about an amazing restaurant
4:06 - 4:07

that's not too far from the airport,
4:08 - 4:11

you say, "Hey, you know,
maybe I'll stop by during my layover."
4:11 - 4:13

So, you exit the airport,
4:14 - 4:15

you get to your restaurant,
4:15 - 4:17

and they hand you a menu,
4:17 - 4:19

and the menu is entirely in Turkish.
4:20 - 4:23

Now, let's say,
for the point of this exercise,
4:23 - 4:24

that you don't speak Turkish.
4:25 - 4:27

What do you do?
4:28 - 4:30

Well, best-case scenario,
4:30 - 4:32

you find someone perhaps
who can speak your native language,
4:32 - 4:34

German, English, etc.
4:36 - 4:38

But let's say it's not your lucky day
4:38 - 4:41

and nobody in the restaurant can speak
any German or any English.
4:42 - 4:43

So what do you do?
4:43 - 4:46

Well, if you are like me,
and I imagine most of you,
4:46 - 4:48

you've probably turned
to a technological solution,
4:50 - 4:52

machine translation
or a digital dictionary,
4:53 - 4:54

look up each word individually,
4:54 - 4:58

and eventually order yourself
a delicious Turkish meal.
5:00 - 5:03

Now, let's imagine this scenario instead,
5:04 - 5:06

in which you are the native speaker
of a minority language.
5:07 - 5:09

Let's say, Lower Sorbian.
5:09 - 5:11

Lower Sorbian is an endangered language
5:11 - 5:12

spoken here in Germany,
5:12 - 5:17

about 130 kilometers
to the southeast from here,
5:18 - 5:21

that's spoken only by
a few thousand people, mostly elderly.
5:23 - 5:25

Now, let's say your mother tongue
is Lower Sorbian.
5:25 - 5:27

You end up in the restaurant.
5:27 - 5:28

Now, of course, the odds
of finding someone
5:28 - 5:31

who speaks your native language
in the restaurant is extraordinarily low.
5:32 - 5:36

But, again, you can just go
to a technological solution.
5:37 - 5:39

However, for your native language,
5:39 - 5:42

these technological solutions don't exist.
5:42 - 5:45

You would have to rely on
German or English
5:45 - 5:47

as your pivot language into Turkish.
5:49 - 5:52

Now, of course, you still end up
getting your delicious Turkish meal,
5:52 - 5:55

but you begin to think about
how difficult this would have been
5:55 - 5:57

if you were your grandfather,
who spoke no German at all.
5:58 - 6:00

Now, this is just a small incident,
6:00 - 6:05

but it's going to place a stone
on the right side of that scale,
6:05 - 6:07

and make you think perhaps
6:07 - 6:10

maybe when I have children
or maybe when I have another child,
6:11 - 6:15

the burden that you went through with this
6:15 - 6:17

may not be worth it
to keep your language.
6:19 - 6:21

And imagine if this was a scenario
6:21 - 6:26

that was of significantly more importance,
6:26 - 6:28

such as, for example, being in a hospital.
6:31 - 6:36

Now, this is the point
in which we can help--
6:37 - 6:40

by we, I mean me and you
in this room can help.
6:41 - 6:43

We have the tools
to be able to help this.
6:45 - 6:47

If technological tools
are available for people
6:47 - 6:49

who speak minority
and underserved languages,
6:51 - 6:54

it puts a little finger on the scale,
on the left side of the scale.
6:54 - 6:56

Someone doesn't necessarily have to think
6:56 - 6:58

that they have to rely on
a minority language
6:58 - 6:59

in order to interact
with the outside world,
7:00 - 7:05

because it opens the social spheres
7:05 - 7:06

a little bit more.
7:08 - 7:10

So, of course, the ideal solution
7:10 - 7:13

is that we have machine translation
in every language in the world.
7:13 - 7:17

But, unfortunately,
that's just not feasible.
7:17 - 7:20

Machine translation
requires large corpuses of text,
7:20 - 7:21

and for many of these languages
7:21 - 7:23

that are endangered or underserved,
7:23 - 7:25

such data is simply not available.
7:26 - 7:28

Some of them aren't even commonly written
7:29 - 7:33

and thus getting enough data
to make a machine translation engine
7:33 - 7:34

is unlikely.
7:34 - 7:38

But what is available is lexical data.
7:40 - 7:43

Through the work of many linguists
7:43 - 7:45

over the past few hundred years,
7:48 - 7:50

dictionaries and grammars
have been produced
7:50 - 7:52

for most of the world's languages.
7:54 - 7:57

But, unfortunately, most of these works
7:57 - 8:01

are not accessible
or available to the world,
8:01 - 8:04

let alone to speakers
of these minority languages.
8:05 - 8:06

And it's not an intentional process,
8:06 - 8:08

a lot of times it's simply because
8:08 - 8:11

the initial print run
of these dictionaries was small,
8:11 - 8:13

and the only copies
8:13 - 8:16

are moldering away
in a university library somewhere.
8:18 - 8:21

But we have the ability to take that data
8:21 - 8:23

and make it accessible to the world.
8:24 - 8:28

The Wikimedia Foundation
is one of the best organizations,
8:28 - 8:31

I would say the best
organization in the world,
8:31 - 8:33

for getting data available
8:33 - 8:37

to the vast majority
of the population of this planet.
8:39 - 8:40

So let's work on that.
8:41 - 8:43

So to explain a little bit
8:43 - 8:45

about what we've been doing
in this regard,
8:45 - 8:48

I'd like to introduce
my organization, PanLex,
8:49 - 8:52

which is an organization
that is attempting
8:52 - 8:54

to collect lexical data for this purpose.
8:55 - 8:57

We got started about 12 years ago
8:57 - 9:00

at the University of Washington,
as a research project.
9:00 - 9:01

The idea behind it
9:01 - 9:04

was to show that inferred translations
9:04 - 9:07

could create an effective
translation device,
9:07 - 9:09

essentially a lexical translation device.
9:09 - 9:12

This is an example
from PanLex data itself.
9:13 - 9:14

This is showing how to translate
9:14 - 9:18

the word "ev" in Turkish,
which means house,
9:18 - 9:20

to Lower Sorbian,
9:20 - 9:21

the language I was referring to earlier.
9:21 - 9:23

So it's unlikely to find
9:24 - 9:26

Turkish to Lower Sorbian dictionaries,
9:26 - 9:28

but by passing it through
9:28 - 9:30

many, many different
intermediate languages,
9:30 - 9:33

you can create effective translations.
9:34 - 9:37

So, once this was shown
in the research projects,
9:37 - 9:40

the founder of PanLex,
Dr. Jonathan Pool,
9:41 - 9:44

decided, "Well, you know,
why not actually just do this?"
9:44 - 9:45

So he started a non-profit
9:45 - 9:49

to collect as much lexical data
as possible and make it accessible.
9:49 - 9:51

That's what we've been doing
for the past 12 years.
9:51 - 9:55

In that time, we've collected
thousands and thousands of dictionaries,
9:55 - 9:56

and extracted lexical data out of them
9:56 - 10:01

and compiled a database that allows
inferred lexical translation
10:01 - 10:04

across any of--
10:04 - 10:06

Our current count is around 5,500
10:06 - 10:08

of the 7,500 languages in the world.
10:09 - 10:11

And, of course,
10:11 - 10:12

we're constantly trying to expand that
10:12 - 10:15

and expand the data
on each individual language.
10:17 - 10:21

So, the next question is,
10:22 - 10:26

what can we do to work together on this?
10:27 - 10:29

We, at PanLex, have been
extremely excited to watch
10:29 - 10:31

the development on lexical data,
10:31 - 10:34

that Wikidata has been working on lately.
10:35 - 10:38

It's very fascinating to see organizations
10:38 - 10:39

that are working in a very similar sphere,
10:39 - 10:41

but in different aspects.
10:42 - 10:44

And we are extremely excited to see
10:45 - 10:46

the results of this from Wikidata.
10:46 - 10:51

And also we are looking forward
to collaborating with Wikidata.
10:54 - 10:56

I think that the special skills
10:56 - 10:58

that we've developed
over the past 12 years,
10:58 - 11:02

with not just collecting lexical data,
but also in database design,
11:02 - 11:04

could be extremely useful for Wikidata.
11:04 - 11:07

And on the other side, I think that--
11:08 - 11:11

I especially am excited about Wikidata's
11:12 - 11:15

ability to do crowdsourcing of data.
11:15 - 11:18

PanLex, currently,
our sources are entirely
11:18 - 11:21

printed lexical sources
or other types of lexical sources,
11:21 - 11:23

but we don't do any crowdsourcing.
11:23 - 11:25

We simply don't have
the infrastructure for it available
11:25 - 11:27

and of course, the Wikimedia Foundation
11:27 - 11:29

is the world expert in crowdsourcing.
11:32 - 11:34

I'm really looking
forward to seeing exactly
11:34 - 11:36

how we can apply these skills together.
11:39 - 11:42

But, overall, I think the main thing
to think about this
11:42 - 11:43

is that when we were
working on these things,
11:43 - 11:45

it's minute detail.
11:45 - 11:48

We're sitting around
looking at grammatical forms,
11:48 - 11:52

or paging our way through
dictionaries, ancient dictionaries,
11:52 - 11:54

or sometimes
recently published dictionaries
11:54 - 11:57

and getting into written forms of words,
11:57 - 12:00

and it feels very close up.
12:00 - 12:02

But, occasionally, we need to remember
12:02 - 12:03

to take a step back
12:03 - 12:05

in that, even though what we're doing
12:06 - 12:09

can feel even mundane at times,
12:10 - 12:12

the work we're doing
is extremely important.
12:13 - 12:16

This is, in my opinion,
the absolute best way
12:16 - 12:19

that we can support endangered languages
12:19 - 12:21

and make sure that the linguistic
diversity of the planet
12:21 - 12:26

is preserved up to the end
of this century or longer.
12:26 - 12:30

It's entirely possible that the work
that we're doing today
12:30 - 12:33

may result in languages
12:33 - 12:35

being preserved and passed down,
12:35 - 12:37

and not going extinct.
12:39 - 12:41

So just to remember
12:41 - 12:43

that even if you're sitting
around on your computer
12:43 - 12:44

editing an individual entry
12:44 - 12:50

and adding the data form
of a small minority language
12:50 - 12:52

for every single noun,
12:52 - 12:55

the little thing
that you're doing right now,
12:55 - 12:58

might actually be partially responsible
12:58 - 12:59

for making sure that language survives,
12:59 - 13:01

until the end of the century or longer.
13:03 - 13:04

Thank you very much,
13:04 - 13:06

and I'd like to open
the floor to questions.
13:06 - 13:08

(applause)
13:24 - 13:25

(woman 1) Thank you.
13:25 - 13:27

- Thank you for your talk.
- Thank you.
13:27 - 13:29

(woman 1) I just have a question
about dictionaries.
13:29 - 13:31

You said that you work
with printed dictionaries?
13:31 - 13:32

- Yes.
- (woman 1) So my question
13:32 - 13:35

is what do you take
from those dictionaries
13:35 - 13:38

and if there's any copyright thing
you have to deal with?
13:38 - 13:41

I anticipated this to be
the first question that I would get.
13:41 - 13:43

(laughter)
13:43 - 13:46

So, first off, for PanLex,
13:46 - 13:50

we have, according to our legal
resources that we have consulted,
13:53 - 13:57

whereas the arrangement and organization
of a dictionary is copyrightable,
13:57 - 14:03

the translation itself
is not considered copyrightable.
14:04 - 14:06

A good example is like, for example,
14:06 - 14:11

a phone book is considered,
at least according to US law,
14:11 - 14:12

copyrightable.
14:12 - 14:17

But saying that person X's
phone number is digits D
14:17 - 14:18

is not copyrightable.
14:22 - 14:23

So like I said,
14:23 - 14:25

according to our legal scholars,
14:25 - 14:27

this is how we can deal with this.
14:27 - 14:31

But even if that's not
a solid enough legal argument,
14:31 - 14:32

one important thing to remember
14:32 - 14:38

is that the vast majority
of these lexical data,
14:39 - 14:41

is actually out of copyright.
14:41 - 14:43

A significant number
of these are out of copyright
14:43 - 14:44

and thus can be used without [end].
14:44 - 14:47

And the other thing
is that oftentimes, for example,
14:47 - 14:50

if we're working with
a recently made print dictionary,
14:50 - 14:52

rather than trying to scan it and OCR it,
14:52 - 14:53

we just email the person who made it.
14:53 - 14:58

And it turns out that
most linguists are really excited
14:58 - 15:00

that their data can be made accessible.
15:00 - 15:01

And so they're like, "Sure, please,
15:01 - 15:03

just put it all in there
and make it accessible."
15:06 - 15:08

So like I said, we have, at least,
according to our legal opinions,
15:08 - 15:09

we have the ability,
15:09 - 15:11

but even if you don't want
to go with that,
15:11 - 15:16

it's very easy to get
the data publicly accessible.
15:26 - 15:28

- (man 1) Thank you. Hi.
- Hi.
15:28 - 15:30

(man 1) Can you say a little more
15:30 - 15:35

about how the person who speaks
Lower Sorbian is accessing the data.
15:35 - 15:38

Like specifically how
that information is getting to them
15:38 - 15:41

and how that might help to convince them
15:41 - 15:43

to either try out the--
15:43 - 15:45

Great question and this is actually
15:45 - 15:46

one that I think about a lot as well,
15:46 - 15:50

because I think that
when we talk about data access,
15:50 - 15:53

there's actually a multiple step
of this, multiple steps.
15:53 - 15:56

One is, of course, data preservation,
make sure the data doesn't go away.
15:56 - 15:59

Secondly, is make sure it's interoperable
15:59 - 16:02

and can be used.
16:02 - 16:05

And thirdly is make sure
that it's available.
16:06 - 16:07

So in PanLex's case,
16:07 - 16:10

we have an API that can be used,
16:10 - 16:12

but, obviously,
that can't be used by an end user
16:12 - 16:15

But we've also developed interfaces.
16:15 - 16:20

And so, for example,
if you go to translate.panlex.org,
16:20 - 16:23

you can do translations on our database.
16:23 - 16:26

If you want to mess around
with the API, just go to dev.panlex.org,
16:26 - 16:29

and you can find a bunch of stuff
on the API, or just api.panlex.org.
16:31 - 16:33

But there's another step too,
16:33 - 16:37

which is that even if you make
all of your data completely accessible
16:37 - 16:41

with tools that are super useful
to be able to access it,
16:41 - 16:43

if you don't actually promote the tools,
16:43 - 16:45

then people won't actually
be able to use it.
16:45 - 16:47

And this is honestly kind of a...
16:49 - 16:51

the thing that isn't talked about enough,
16:51 - 16:53

and I don't have a good answer for it.
16:53 - 16:55

How do we make sure that--
16:55 - 16:57

For example, l only fairly recently,
16:57 - 17:00

only a few years ago
got acquainted with Wikidata,
17:00 - 17:02

and it's exactly the kind
of thing that I'm interested in.
17:03 - 17:07

So, how do we promote
ourselves to others?
17:07 - 17:09

I'm leaving that as an open question.
17:09 - 17:11

Like I said, I don't have
a good answer for this.
17:11 - 17:13

But, of course, in order to do that,
17:13 - 17:15

we still need to accomplish
the first few steps.
17:22 - 17:25

(man 2) If we want to have
machine translation,
17:25 - 17:28

don't we need a translation memory?
17:28 - 17:31

I'm not sure that the individual words
17:31 - 17:33

that we put into Wikidata,
17:33 - 17:37

these short phrases
that we put into Wikidata,
17:37 - 17:41

either as ordinary Wikidata items
or as Wikidata lexemes,
17:41 - 17:44

are sufficient to do a proper translation.
17:44 - 17:47

We need to have full sentences,
for example, for--
17:47 - 17:48

(Benjamin) Yeah, absolutely.
17:49 - 17:51

(man 2) And where do we get
this data structure?
17:51 - 17:55

I'm not sure that, currently,
17:55 - 18:00

Wikidata is able to very well handle
18:00 - 18:03

the issue of a translation memory,
18:04 - 18:06

translatewiki.net,
18:06 - 18:09

for getting into that gap of...
18:12 - 18:15

Should we do anything
in that respect, or should we--
18:15 - 18:17

Yeah, and I really
appreciate your question.
18:17 - 18:19

I touched on this a little bit earlier,
18:19 - 18:20

but I'd love to reiterate it.
18:21 - 18:25

This is precisely the reason
that PanLex works in lexical data
18:25 - 18:27

and why I'm excited about lexical data,
18:27 - 18:30

as opposed to--
not as opposed to, but in addition
18:30 - 18:35

to machine translation engines
and machine translation in general.
18:36 - 18:39

As you said, machine translation
requires a specific kind of data,
18:40 - 18:43

and that data is not available
for most of the world's languages.
18:43 - 18:45

For the vast majority
of the world's languages,
18:45 - 18:46

that simply is not available.
18:47 - 18:48

But that doesn't mean
we should just give up.
18:48 - 18:50

Like why?
18:51 - 18:54

If I needed to translate
my Turkish restaurant menu,
18:55 - 18:59

then lexical translation will likely
be an exceptionally good tool for that.
18:59 - 19:02

Now, I'm not saying
that you can use lexical translation
19:02 - 19:05

to do perfect paragraph
to paragraph translation.
19:05 - 19:07

When I say lexical translation,
I mean word to word
19:07 - 19:10

and word to word translation
can be extremely useful,
19:12 - 19:15

It's funny to think about it,
but we didn't really have access
19:15 - 19:17

to really good machine translation.
19:17 - 19:20

Everyone didn't have
access to that until fairly recently.
19:20 - 19:24

And we still got by with dictionaries,
19:24 - 19:28

and they're an incredibly good resource.
19:28 - 19:31

And the data is available,
so why not make it available
19:31 - 19:34

to the world at large
and to the speakers of these languages?
19:36 - 19:39

(woman 2) Hi, what mechanisms
do you have in place
19:39 - 19:41

when the community itself--I'm over here.
19:41 - 19:43

- Where are you? Okay, right.
- (woman 2) Yeah, sorry. (laughs)
19:43 - 19:45

...when the community itself
19:45 - 19:47

doesn't want part of their data in PanLex?
19:47 - 19:49

Great question.
19:49 - 19:52

So the way that we work with that
19:52 - 19:56

is that if a dictionary is published
and made publicly available,
19:57 - 19:58

that's a good indication.
19:58 - 20:02

Like you could buy it in a store
or at a university library,
20:02 - 20:05

or a public library anyone can access.
20:05 - 20:08

That's a good indication
that that decision has been made.
20:08 - 20:12

(woman 2) [inaudible]
20:16 - 20:18

(man 3) Please, [inaudible],
could you speak in the microphone?
20:19 - 20:20

Can you say it again?
20:20 - 20:23

(woman 2) Linguists don't always have
the permission of the community.
20:23 - 20:24

In order to publish things,
20:24 - 20:28

they oftentimes publish things
without the consent of the community.
20:28 - 20:30

And that's absolutely true.
20:30 - 20:33

I would say that is a--
20:33 - 20:34

That does happen.
20:34 - 20:37

I would say it's generally
a small minority of cases,
20:37 - 20:41

mostly confined
to generally North America,
20:41 - 20:43

although sometimes
South American languages as well.
20:45 - 20:46

It's something we have
to take into account.
20:46 - 20:49

If we were to receive word, for example,
20:49 - 20:52

that the data that is in PanLex
20:52 - 20:56

should not be accessed
by the greater world,
20:56 - 20:58

then, of course, we would remove it.
20:58 - 20:59

(woman 2) Good, good.
21:01 - 21:02

That doesn't mean, of course,
21:02 - 21:04

that we'll listen
to copyright rules necessarily
21:04 - 21:07

but we will listen
to traditional communities,
21:07 - 21:08

and that's the major difference.
21:08 - 21:10

(woman 2) Yeah,
that's what I'm referring to.
21:15 - 21:17

It brings up a really interesting point,
21:17 - 21:18

which is that
21:19 - 21:22

sometimes it's a really big question
of who speaks for a language.
21:23 - 21:28

I had some experience actually
visiting the American Southwest
21:28 - 21:30

and working with some groups,
21:30 - 21:32

who work on indigenous,
the Pueblo languages out there.
21:36 - 21:38

So there is approximately
21:38 - 21:40

six Pueblo languages,
depending on how you slice it,
21:40 - 21:42

spoken in that area.
21:42 - 21:44

But they are divided
amongst 18 different Pueblos
21:44 - 21:47

and each one has their own
tribal government,
21:47 - 21:50

and each government
may have a different opinion
21:50 - 21:54

on whether their language
should be accessible to outsiders or not.
21:57 - 21:58

Like, for example, Zuni Pueblo,
21:58 - 22:01

it's a single Pueblo
that speaks Zuni language.
22:03 - 22:05

And they're really big
on their language going everywhere,
22:05 - 22:08

they put it on the street signs
and everything, it's great.
22:08 - 22:11

But for some of the other languages,
22:11 - 22:13

you might have one group that says,
22:13 - 22:16

"Yeah, we don't want our language
being accessed by outsiders."
22:16 - 22:19

But then you have the neighboring Pueblo
who speaks the same language say,
22:19 - 22:22

"We really want our language
accessible to outsiders
22:22 - 22:24

in using these technological tools,
22:24 - 22:27

because we want our language
to be able to continue on."
22:27 - 22:29

And it raises a really
interesting ethical question.
22:29 - 22:32

Because if you default by saying,
22:32 - 22:35

"Fine, I'm cutting it off because
this group said we should cut it off"--
22:35 - 22:37

aren't you also disservicing
the second group
22:37 - 22:39

because they actively
want you to rule out these things.
22:39 - 22:43

So I don't think this is a question
that has an easy answer.
22:43 - 22:45

But I would say
at least in terms of PanLex.
22:45 - 22:49

And for the record, we actually
haven't encountered this yet,
22:49 - 22:50

that I'm aware of.
22:51 - 22:53

Now, that could be partially because...
22:54 - 22:55

Getting back to his question,
22:56 - 22:58

we may need to promote more. (chuckles)
22:59 - 23:02

But, in general, as far as I know,
23:02 - 23:04

we have not had this come up.
23:04 - 23:07

But our game plan for this
23:07 - 23:11

is if a community says they don't want
their data in a database,
23:11 - 23:12

then we remove it.
23:12 - 23:15

(woman 2) Because we have come up
with it in Wikidata and Wikipedia...
23:15 - 23:16

- You have?
- (woman 2) ...in comments.
23:16 - 23:17

- Really?
- (woman 2) It's been a problem.
23:17 - 23:20

Yeah, I can imagine especially in comments
for photos or certain things.
23:20 - 23:22

(woman 2) Correct.
23:27 - 23:33

(man 4) Hi, I had a question about
the crowdsourcing aspect of this.
23:34 - 23:37

As far as going in and asking a community
23:37 - 23:40

to annotate or add data for a dataset,
23:40 - 23:44

one of the things
that's a little intimidating is like,
23:45 - 23:49

as an editor, I can only see
what things are missing.
23:49 - 23:53

But if I'm going to spend time
on things, having an idea,
23:54 - 23:57

there's a list of high priority items,
23:58 - 24:01

that's, I guess,
very motivating in this aspect.
24:01 - 24:04

And I was curious if you had a system
24:04 - 24:08

which is, essentially, like,
we know the gaps in our own data,
24:08 - 24:12

we have linguistic evidence
to know that these are the ones
24:12 - 24:16

that if we had annotated,
these would be the high impact drivers.
24:16 - 24:17

So I can imagine
24:18 - 24:21

having the lexeme
for "house" very impactful,
24:21 - 24:25

maybe not a lexeme
for a data or some other like.
24:25 - 24:29

But I was curious if you had that,
it if it is something
24:30 - 24:35

that could be used
to drive these community efforts.
24:36 - 24:37

Great question.
24:37 - 24:41

So one thing that Wikidata
has a whole lot of--
24:41 - 24:45

sorry, excuse me, PanLex
has a whole lot of are Swadesh lists.
24:45 - 24:48

We have apparently the largest collection
of Swadesh lists in the world
24:48 - 24:49

which is interesting.
24:49 - 24:50

If you don't know what a Swadesh list is,
24:50 - 24:56

it's essentially a regularized
list of lexical items
24:56 - 25:00

that can be used
for analysis of languages.
25:00 - 25:03

They contain really basic sets.
25:03 - 25:05

So there's a couple
of different kinds of Swadesh lists.
25:05 - 25:07

But there are 100 or 213 items
25:07 - 25:09

and they might contain
25:09 - 25:13

words like "house" and "eye" and "skin"
25:13 - 25:14

and basically general words
25:14 - 25:16

that you should be able
to find in any language.
25:16 - 25:20

So that's like a really
good starting point
25:20 - 25:23

for having that kind of data available.
25:29 - 25:31

Now, as I mentioned before,
25:31 - 25:34

crowdsourcing is something
that we don't do yet
25:34 - 25:36

and we're actually
really excited to be able to do.
25:36 - 25:38

It's one of the things I'm really excited
25:38 - 25:39

to talk to people
at this conference about,
25:39 - 25:43

is how crowdsourcing can be used
25:43 - 25:46

and the logistics behind it,
25:46 - 25:49

and these are the kind
of questions that can come up.
25:51 - 25:53

So I guess the answer I can say to you
25:53 - 25:55

is that we do have a priority list--
25:55 - 25:58

Actually, one thing I can say
is we definitely do have a priority list
25:58 - 26:00

when it comes to which languages
we are seeking out.
26:00 - 26:02

So the way we do this
is that we look for languages
26:02 - 26:05

that are not currently served
by technological solutions,
26:05 - 26:07

which are oftentimes minority languages,
26:07 - 26:09

or usually minority languages,
26:09 - 26:12

and then prioritize those.
26:14 - 26:17

But in terms of individual lexical items
26:17 - 26:20

being the general way we get new data
26:20 - 26:23

is essentially by ingesting
an entire dictionary's worth.
26:23 - 26:26

We are relying on the dictionary's choice
26:26 - 26:29

of lexical items,
rather than necessarily saying,
26:29 - 26:32

we're really looking for the word
for "house" in every language.
26:32 - 26:35

But when it comes to data crowdsourcing,
we will need something like that.
26:35 - 26:38

So this is an opportunity
for research and growth.
26:40 - 26:43

(man 5) Hi, I'm Victor,
and this is awesome.
26:45 - 26:47

As you have slides here,
26:47 - 26:49

can you talk a little bit
about the technical status
26:49 - 26:51

that currently you have data
26:51 - 26:57

or information flow
from and to Wikidata and PanLex.
26:57 - 27:00

Is that currently implemented already
27:00 - 27:04

and how do you deal with
27:04 - 27:07

back and forth or even
feedback loop information
27:07 - 27:10

between PanLex and Wikidata?
27:10 - 27:14

So we actually don't have any formal
connections to Wikidata at this point,
27:14 - 27:15

and this is something that I'm, again,
27:15 - 27:18

I'm really excited to talk
to people in this conference about.
27:18 - 27:21

We've had some interaction
with Wiktionary,
27:22 - 27:25

but Wikidata is actually
a better fit, honestly,
27:25 - 27:27

for what we are looking for.
27:27 - 27:29

Having directly lexical stuff
27:29 - 27:32

means that we have to do a lot less
data analysis and extraction.
27:33 - 27:37

And so the answer is,
we don't yet, but we want to.
27:37 - 27:40

(man 5) And if not,
what are the obstacles?
27:40 - 27:44

And as we can see, Wikidata
already supports several languages,
27:44 - 27:47

but when I look up translate.panlex.org,
27:47 - 27:49

you apparently support
many, many variants,
27:49 - 27:51

much more than Wikidata.
27:51 - 27:53

How do you see there is a gap
27:53 - 27:57

between translation
or lexical translation first,
27:57 - 28:00

application versus an effort
28:00 - 28:04

as trying to map a knowledge structure.
28:04 - 28:06

Mapping knowledge
will actually be very interesting.
28:06 - 28:07

We've had some
very interesting discussions
28:07 - 28:12

about the way that Wikidata
organizes their lexical data,
28:12 - 28:14

, your lexical data,
28:14 - 28:16

and how we organize our lexical data.
28:16 - 28:21

And there are subtle differences
that would require a mapping strategy,
28:21 - 28:25

some of which will not
necessarily be automatic,
28:25 - 28:27

but we might be able to develop
techniques to be able to do this.
28:27 - 28:31

You gave the example of language variants.
28:31 - 28:34

We tend to be very "splittery"
when it comes to language variants.
28:34 - 28:36

In other words,
if we get a source that says
28:36 - 28:39

that this is the dialect spoken
28:39 - 28:42

on the left side of the river
in Papua New Guinea, for this language,
28:42 - 28:43

and we get another source that says
28:43 - 28:45

this is the dialect spoken
on the right side of the river,
28:45 - 28:47

then we consider them
essentially separate languages.
28:47 - 28:51

And so we do this in order to basically
preserve the most data that we can.
28:52 - 28:54

Being able to map that
to how Wikidata does it--
28:54 - 28:57

Actually, what I would love
is to have conversations
28:57 - 29:01

about how languages
29:01 - 29:06

are designated on Wikidata.
29:08 - 29:12

Again, we go with the strategy
of very much a "splittery" strategy.
29:14 - 29:17

We broadly rely on ISO 6393 codes,
29:18 - 29:20

which is provided by the Ethnologue,
29:20 - 29:24

and then each individual code,
we then allow multiple variants within it,
29:24 - 29:29

either for script variants
or regional dialects or sociolects, etc.
29:30 - 29:33

Again, opportunity
for discussion and work.
29:36 - 29:39

(woman 3) Hi, I would like to know
if you have a OCR pipeline
29:39 - 29:45

and especially because
we've been trying to do OCR on Maya,
29:45 - 29:48

and we don't get any results.
29:48 - 29:50

It doesn't understand anything--
29:50 - 29:53

- Oh, yeah! (laughs)
- (woman 3) And... yeah.
29:53 - 29:56

So if your pipelines are available.
29:56 - 30:00

And the other one is just
on the overlap of ISO codes,
30:00 - 30:02

like sometimes they say,
30:02 - 30:04

"Oh, this is a language,
and this is another language,"
30:04 - 30:07

but there are sources
that say other stuff,
30:07 - 30:10

as you were mentioning,
but they tend to overlap.
30:10 - 30:13

So how do you go on...? Yeah.
30:13 - 30:15

Yeah, that's absolutely
an amazing question.
30:15 - 30:17

I really like it.
30:17 - 30:20

So we don't have a formalized
OCR pipeline per se;
30:20 - 30:24

we do it on a sort of
source by source basis.
30:24 - 30:26

One of the reasons why
is because we oftentimes have sources
30:26 - 30:28

that not necessarily need to be OCR'd,
30:28 - 30:30

that are available
for some of these languages,
30:30 - 30:33

and we concentrate on those because
they require the least amount of work.
30:33 - 30:35

But, obviously,
if we really want to dive deep
30:35 - 30:37

into some of our sources
that are in our backlog,
30:37 - 30:41

we're going to need to essentially
develop strong OCR pipelines.
30:41 - 30:44

But there's another aspect too,
which is that, as you mentioned...
30:44 - 30:49

like the people who designed OCR engines
30:49 - 30:53

I think are not realizing
how much you can stress test them.
30:53 - 30:55

Like, you know what's fun?--
30:55 - 30:58

trying to OCR
a Russian-Tibetan dictionary.
30:59 - 31:00

It's really hard, it turns out...
31:02 - 31:04

We gave up, and we hired
someone to just type it up,
31:04 - 31:06

which was totally doable.
31:06 - 31:07

And actually, it turns out
31:07 - 31:10

that this amazing Russian woman
learned to read Tibetan
31:10 - 31:13

so she could type this up,
which was super cool.
31:15 - 31:18

I think that if you're dealing
with stuff in the Latin scripts,
31:18 - 31:23

then I think that OCR solutions
can be developed, that are more robust,
31:23 - 31:25

that deal with
multilingual sources like this
31:25 - 31:27

and expect that you're going
to get a random four in there,
31:27 - 31:28

if you're dealing with something like
31:28 - 31:31

16th-century Mayan sources,
you know, with digit four.
31:32 - 31:38

But there are some sources
31:38 - 31:40

that OCR is probably just
never really going to catch up to,
31:40 - 31:42

or require such an immense amount of work,
31:43 - 31:47

that actually we put a little
bit of this to use right now.
31:47 - 31:49

We have another project
we're running at PanLex
31:49 - 31:54

to transcribe all of the traditional
literature of Bali,
31:54 - 31:58

and we found that in handwritten
Balinese manuscripts,
31:58 - 32:00

there's just no chance of OCR.
32:00 - 32:02

So we got a bunch
of Balinese people to type them up,
32:02 - 32:05

and it's become a really cool
cultural project within Bali,
32:05 - 32:07

and it's become news and stuff like that.
32:07 - 32:09

So I would say
32:09 - 32:11

that you don't necessarily
need to rely on OCR,
32:11 - 32:13

but there is a lot out there.
32:13 - 32:15

So having good OCR solutions
would be good.
32:17 - 32:21

Also, if anyone out here
is into super multilingual OCR,
32:21 - 32:23

please come talk to me.
32:30 - 32:31

(man 6) Thank you for your presentation.
32:32 - 32:35

You talked about integration
32:35 - 32:37

between PanLex and Wikidata,
32:37 - 32:39

but you haven't gone into the specifics.
32:39 - 32:43

So I was checking your data license,
and it is under CC0.
32:43 - 32:44

- Yes.
- (man 6) That's really great.
32:44 - 32:46

So there are two possible ways
32:46 - 32:49

that either we can import the data
32:49 - 32:53

or we can continue something similar
to the Freebase way,
32:53 - 32:56

where we had the complete
database from the Freebase,
32:56 - 32:59

and we imported them, and we made a link,
32:59 - 33:04

an external identifier
to the Freebase database.
33:04 - 33:08

So if you have something in mind,
are you thinking similar?
33:08 - 33:10

Or you just want to make...
33:15 - 33:19

an independent database
which can be linked to Wikidata?
33:19 - 33:21

Yeah, so this is a great question
33:21 - 33:23

and actually I feel
like it's about one step ahead
33:23 - 33:26

of some of the stuff
that I've already been thinking about,
33:26 - 33:30

partially because, like I said,
33:30 - 33:32

getting the two databases to work together
33:32 - 33:34

is a step in of itself.
33:34 - 33:35

I think the first step that we can take
33:35 - 33:38

is literally just pooling
our skills together.
33:38 - 33:40

We have a lot of experience
dealing with stuff
33:40 - 33:43

like classifications of properties
of individual lexemes
33:43 - 33:45

that I'd love to share.
33:46 - 33:49

But being able to link the databases
themselves would be wonderful.
33:49 - 33:51

I'm 100% for that.
33:51 - 33:54

I think it would be a little bit easier
33:54 - 33:56

on the Wikidata towards PanLex way,
33:56 - 33:59

but maybe I'm just biased
because I can see how that could work.
34:02 - 34:06

Yeah, essentially, as long
as Wikidata is comfortable
34:06 - 34:10

with all the licensing stuff like that,
or we work something out,
34:10 - 34:12

then I think that would be a great idea.
34:13 - 34:16

We'd just have to figure out ways
of linking the data itself.
34:16 - 34:22

One thing I can imagine is, essentially,
that I would love for edits to Wikidata
34:23 - 34:26

to immediately become populated
to the PanLex database,
34:26 - 34:29

without having to essentially
34:29 - 34:31

just reingest it every...
34:31 - 34:36

essentially making Wikidata
a crowdsourceable interface to PanLex
34:36 - 34:37

would be really awesome.
34:37 - 34:40

And then being able to use
PanLex in immediate translations,
34:40 - 34:42

to be able to do translations
across Wikidata lexical items--
34:42 - 34:44

that would be glorious.
34:55 - 35:00

(man 7) This is like the auditing process
of this semantic web
35:00 - 35:04

to close holes by inference.
35:06 - 35:10

If we think this further,
this kind of translation,
35:10 - 35:13

how do you deal with semantic mismatch
35:13 - 35:16

and grammatical mismatch?
35:16 - 35:19

For instance, if you try
to translate something in German,
35:19 - 35:22

you can simply put several words together
35:22 - 35:26

and reach something that's sensible,
35:26 - 35:29

and on the other hand,
I think I read sometimes
35:31 - 35:38

not every language
has the same granular system
35:38 - 35:40

for colors, for instance.
35:42 - 35:43

You said everything
35:43 - 35:45

uses a different system
for colors or are the same?
35:46 - 35:48

(man 7) I remember maybe
that it's just about evolution of language
35:48 - 35:52

that they started out
with black and white and then--
35:52 - 35:53

Yeah, the color hierarchy.
35:53 - 35:54

Actually, the color hierarchy
35:54 - 35:57

is a great way to illustrate
how this works, right?
35:58 - 36:01

So, essentially, when you have
a single pivot language--
36:02 - 36:05

it's really interesting when
you read papers on machine translations
36:05 - 36:08

because oftentimes they'll talk about
some hypothetical pivot language,
36:08 - 36:10

that they say, "Oh yeah,
there is a pivot language,"
36:10 - 36:12

and then you read in the paper
and say, "It's English."
36:12 - 36:17

And so what this form
of lexical translation does,
36:17 - 36:20

by passing it through
many different intermediate languages,
36:21 - 36:26

it has the effect of being able
to deal with a lot of semantic ambiguity.
36:26 - 36:28

Because as long as you're passing it
through languages
36:28 - 36:33

that contain the same reasonably similar
semantic boundaries to a word,
36:33 - 36:37

then you can avoid
the problem of essentially
36:37 - 36:40

introducing semantic ambiguity
through the pivot language.
36:40 - 36:43

So using the color hierarchy thing
as an example,
36:43 - 36:46

if you take a language that has
a single color word for green and blue
36:46 - 36:51

and it translates it into blue
36:51 - 36:53

in your single pivot language
36:53 - 36:54

and then into another language
36:54 - 36:57

that has different ambiguities
on these things,
36:57 - 37:00

then you end up introducing
semantic ambiguity.
37:00 - 37:02

But if you pass it through
a bunch of other languages
37:02 - 37:06

that also contain a single
lexical item for green and blue,
37:06 - 37:11

then, essentially,
that semantic specificity
37:11 - 37:17

gets passed along
to the resultant language.
37:18 - 37:21

As far as the grammatical feature aspects,
37:21 - 37:23

PanLex has been primarily, in its history,
37:23 - 37:29

collecting essentially lexemes,
essentially lexical forms.
37:30 - 37:32

And, by that, I mean, essentially,
37:32 - 37:34

whatever you get
as the headword for a dictionary.
37:35 - 37:38

So we don't necessarily
concentrate at this time
37:39 - 37:41

on collecting grammatical variant forms,
37:41 - 37:43

things like [inaudible] data, etc.
37:43 - 37:45

or past tense and present tense.
37:45 - 37:46

But it's something we're looking into.
37:46 - 37:48

One thing that it's always
important to remember
37:48 - 37:51

is that because our focus is--
37:51 - 37:54

is on underserved and endangered
minority languages,
37:55 - 37:58

we want to make sure
that something is available
37:58 - 38:00

before we make it perfect.
38:02 - 38:03

A phrase I absolutely love
38:03 - 38:05

is "Don't let the perfect
be the enemy of the good,"
38:05 - 38:07

and that's what we intend to do.
38:07 - 38:09

But we are super interested in the idea
38:09 - 38:12

of being able to handle grammatical forms,
38:12 - 38:14

and being able to translate
across grammatical forms,
38:14 - 38:16

and it's some stuff
we've done some research on
38:16 - 38:17

but we haven't fully implemented yet.
38:25 - 38:29

(man 8) So, of the 7,500 or so languages,
38:30 - 38:33

I assume you're relying on dictionaries
which are written for us,
38:33 - 38:36

but do all those languages
have standard written forms
38:36 - 38:38

and how do you deal with...?
38:38 - 38:40

That's a great question.
38:42 - 38:45

Essentially, yes, a lot of these languages
38:45 - 38:48

as everyone's aware, are unwritten.
38:48 - 38:51

However, any language
for which a dictionary has been produced
38:51 - 38:52

has some kind of orthography,
38:52 - 38:57

and we rely on the orthography
produced for the dictionary.
38:57 - 39:00

We occasionally do some
slight massaging of orthography
39:01 - 39:03

if we can guarantee
it to be lossless, basically.
39:03 - 39:05

But we tend to avoid it
as much as possible.
39:08 - 39:11

So, essentially,
we don't get into the business
39:11 - 39:13

of developing orthographies
for languages,
39:13 - 39:15

because oftentimes they haven't developed,
39:15 - 39:17

even if they're not really
widely published.
39:17 - 39:22

So, for example,
39:22 - 39:26

for a lot of languages
that are spoken in New Guinea,
39:26 - 39:29

there may not be a commonly
used orthographic form,
39:29 - 39:31

but some linguists
just come up with something
39:31 - 39:32

and that's a good first step.
39:33 - 39:37

We also collect phonetic forms
when they're available in dictionaries,
39:37 - 39:38

and so that's another way in,
39:38 - 39:41

essentially an IPA
representation of the word,
39:41 - 39:42

if that's available.
39:42 - 39:43

So that can also be used as well.
39:43 - 39:46

But we don't just typically
use that as a pivot
39:46 - 39:48

because it introduces certain ambiguities.
39:53 - 39:55

(woman 4) Thank you,
this might be a super silly question,
39:56 - 40:01

but are those only the intermediate
languages you work with?
40:01 - 40:02

Oh, no. Oh, no.
40:02 - 40:04

(woman 4) Oh, yes, alright. Thank you.
40:04 - 40:06

No, I'm glad you asked.
It answers the question.
40:06 - 40:11

So this is actually a screenshot snap
from translate.panlex.org.
40:11 - 40:13

If you do a translation,
40:13 - 40:15

you'll get a list of translations
on the right side.
40:15 - 40:18

You click a little dot dot dot button,
you'll get a graph like this.
40:18 - 40:22

And what this shows
is the intermediate languages,
40:22 - 40:24

the top 20 by score--
40:24 - 40:26

I could go into the details
of how we do the score
40:26 - 40:27

but it's not super important now--
40:27 - 40:30

by score that are being used.
40:30 - 40:33

But to make the translation,
we're actually using way more than 20.
40:33 - 40:36

The reason I cap it at 20
is because if you have more than 20--
40:36 - 40:38

like this is actually
a kind of a physics simulation
40:38 - 40:40

you can move the things around
and they squiggle.
40:40 - 40:42

If you have more than 20,
your computer gets really mad.
40:45 - 40:47

So it's more of a demonstration, yeah.
40:56 - 40:58

(woman 5) Leila,
from Wikimedia Foundation.
40:58 - 41:00

Just one note on--
41:00 - 41:03

You mentioned Wikimedia Foundation
a couple of times in your presentation,
41:03 - 41:07

I wanted to say if you want to do
any kind of data ingestion
41:07 - 41:08

or a collaboration with Wikidata,
41:09 - 41:11

perhaps Wikimedia Deutschland
would be a better place
41:11 - 41:13

to have these conversations with?
41:13 - 41:16

Because Wikidata lives
within Wikimedia Deutschland
41:16 - 41:18

and the team is there,
41:18 - 41:20

and also the community
of volunteers around Wikidata
41:20 - 41:24

would be the perfect place to talk
41:24 - 41:26

about any kind of ingestions
41:26 - 41:31

or working with bringing
PanLex closer to Wikidata.
41:32 - 41:33

Great, thank you very much,
41:33 - 41:35

because honestly I'm not
exactly super familiar
41:35 - 41:38

with all of the intricacies
of the architecture
41:38 - 41:40

of how all the projects
relate to each other.
41:40 - 41:42

I'm guessing by the laughs
that it's complicated.
41:42 - 41:44

But, yeah, so basically
we would want to talk
41:44 - 41:48

with whoever is responsible for Wikidata.
41:48 - 41:52

So just do a little
[inaudible] place thing,
41:53 - 41:56

whoever is responsible for Wikidata,
that's who we're interested in talking to,
41:56 - 41:58

which is all of you volunteers.
42:03 - 42:05

Any further questions?
42:10 - 42:14

Okay, well, if anyone does end up having
any further questions beyond this
42:14 - 42:18

or ones that I talked about-- the details
and specifics about these things,
42:18 - 42:20

please come and talk to me,
I'm super interested.
42:20 - 42:24

And especially if you're dealing
with anything involving lexical stuff,
42:24 - 42:29

anything involving
endangered minority languages
42:29 - 42:30

and underserved languages,
42:30 - 42:34

and also Unicode,
which is something I do as well.
42:36 - 42:38

So thank you very much
42:38 - 42:40

and thank you
for inviting me to come speak,
42:40 - 42:42

I'm hoping that you enjoyed all this.
42:42 - 42:44

(applause)

Title:: cdn.media.ccc.de/.../wikidatacon2019-14-eng-Keynote_Why_is_collecting_lexical_data_one_of_the_best_ways_we_can_help_support_underserved_and_endangered_languages_hd.mp4
Video Language:: English
Duration:: 42:53

	Bar Sch edited English subtitles for cdn.media.ccc.de/.../wikidatacon2019-14-eng-Keynote_Why_is_collecting_lexical_data_one_of_the_best_ways_we_can_help_support_underserved_and_endangered_languages_hd.mp4
	C3Subtitles edited English subtitles for cdn.media.ccc.de/.../wikidatacon2019-14-eng-Keynote_Why_is_collecting_lexical_data_one_of_the_best_ways_we_can_help_support_underserved_and_endangered_languages_hd.mp4

English subtitles

Revisions

Revision 2 Uploaded

Bar Sch

cdn.media.ccc.de/.../wikidatacon2019-14-eng-Keynote_Why_is_collecting_lexical_data_one_of_the_best_ways_we_can_help_support_underserved_and_endangered_languages_hd.mp4

Revisions

Our website uses cookies

Operating cookies (Required)