< Return to Video

cdn.media.ccc.de/.../wikidatacon2019-1009-eng-Experimenting_with_Wikidata_in_the_Saami_languages_hd.mp4

  • 0:06 - 0:10
    (Susanna) ...Wikimedia Finland,
    and we have during this year
  • 0:10 - 0:12
    started working
    with the Saami communities,
  • 0:12 - 0:16
    the culture and language,
    starting experimenting
  • 0:16 - 0:19
    doing the groundwork for future projects.
  • 0:19 - 0:22
    (Kimberli) Well, actually she started
    working this year.
  • 0:22 - 0:25
    I've been working since 2006 so...
    (laughter)
  • 0:26 - 0:28
    (Susanna) Well, it's at
    the end of chapter...
  • 0:32 - 0:35
    Yep here we go. Let's see what we have.
  • 0:38 - 0:39
    I don't know which one it is.
  • 0:43 - 0:45
    [inaudible]
  • 0:49 - 0:52
    So usually when we give presentations,
    we realize nobody knows
  • 0:52 - 0:54
    what we're talking about,
    the Saami languages.
  • 0:54 - 0:58
    So this is Norway, Sweden,
    Finland and Russia.
  • 0:58 - 1:02
    And the yellow part--
    and it starts quite far down here--
  • 1:02 - 1:05
    is the Saami dialect continuum
    or language continuum.
  • 1:05 - 1:08
    And the languages
    that have Wikipedias are five--
  • 1:08 - 1:11
    or there's actually only one,
    Northern Saami Wikipedia.
  • 1:12 - 1:16
    And then the other languages
    that we work with are six and seven,
  • 1:16 - 1:18
    and Jon Harald is from Wikipedia Norway,
  • 1:18 - 1:21
    and they work with the other ones
    in Norway and Sweden
  • 1:21 - 1:23
    and the Northern Saami one.
  • 1:25 - 1:28
    Sää'mjânnam is the name
    for this area in Skolt Saami.
  • 1:30 - 1:32
    This is somehow...
  • 1:39 - 1:41
    Yeah, so.
  • 1:41 - 1:46
    (Susanna) Oh yes, while thinking
  • 1:46 - 1:49
    about how to serve
    these language communities,
  • 1:49 - 1:56
    as Kimberli was showing there--
    maybe we'll go back to the map,
  • 1:56 - 2:02
    the biggest language community
    in Saami area is the Northern Saami.
  • 2:02 - 2:06
    And when we think of Saami,
    we think of Northern Saami,
  • 2:06 - 2:10
    but there are at least
    eight other Saami communities
  • 2:10 - 2:11
    and language groups.
  • 2:11 - 2:13
    So we are working with two,
  • 2:13 - 2:19
    which is here--it's Inari Saami
  • 2:19 - 2:23
    as well Skolt Saami,
    they both have around 300 speakers.
  • 2:23 - 2:25
    So we cannot expect--
  • 2:25 - 2:28
    now going to the next slide--
  • 2:28 - 2:32
    there are two different types
    of language communities,
  • 2:32 - 2:35
    those that have Wikipedias
    and therefore are served
  • 2:35 - 2:38
    within the Wikimedia ecosystem
  • 2:38 - 2:41
    and those that don't have a Wikipedia,
  • 2:41 - 2:43
    and therefore it's
    much more difficult for them.
  • 2:43 - 2:47
    And we find that working
    with structured data,
  • 2:47 - 2:50
    we can serve
    these language communities as well.
  • 2:50 - 2:57
    So Kimberli may tell you
    about this sticker that you have got.
  • 2:58 - 2:59
    So the sticker says--
  • 2:59 - 3:01
    in Skolt Saami
    which is spoken by about 300 people--
  • 3:01 - 3:06
    it says Wikimedia Finland wishes
    everyone a happy United Nations
  • 3:06 - 3:10
    International Year
    of Indigenous Languages 2019.
  • 3:10 - 3:12
    And the sticker was created
    for an event that we went to
  • 3:12 - 3:15
    at the end of August in Northern Finland.
  • 3:18 - 3:22
    (Susanna) So, it wasn't that easy.
  • 3:22 - 3:27
    So we started setting up language code
  • 3:27 - 3:31
    for Skolt Saami and Inari Saami
    and found out that it's not
  • 3:31 - 3:32
    a straightforward process.
  • 3:32 - 3:33
    It's not really documented.
  • 3:33 - 3:36
    It was really, really hard
    to find out how to do it.
  • 3:36 - 3:44
    So we made this elephant metaphor
    here as a reindeer.
  • 3:44 - 3:48
    So there are different parts
    of this Wikimedia environment
  • 3:48 - 3:52
    that look at some specific area
    of this language,
  • 3:54 - 3:59
    definitions and there doesn't seem
    to be an overall way
  • 3:59 - 4:02
    and process of how to deal
    with adding your languages.
  • 4:02 - 4:07
    So what we did was we made
    a lot of noise
  • 4:07 - 4:12
    and tried to ask everyone
    to help us, and in the end,
  • 4:12 - 4:17
    we managed to first have
    Skolt Saami and Inari Saami
  • 4:17 - 4:20
    for monolingual properties;
  • 4:20 - 4:23
    then to labels in Wikidata;
  • 4:23 - 4:26
    and then only to find out
    that they wouldn't work
  • 4:26 - 4:28
    in structured data on Commons.
  • 4:28 - 4:34
    Then again after another process
    for that, maybe six months after,
  • 4:34 - 4:38
    we find out that they wouldn't work
    in Wikipedias
  • 4:38 - 4:41
    so I think that's still unsolved.
  • 4:41 - 4:43
    (Kimberli) When we first started,
    you could only use Northern Saami
  • 4:43 - 4:46
    and Southern Saami
    in Wikimedia projects.
  • 4:46 - 4:51
    And as a bonus part of this,
    we have now the ability to use
  • 4:52 - 4:54
    the Finnish Romani language also
  • 4:54 - 4:56
    within the Wikimedia projects.
  • 4:59 - 5:06
    This trying to get your language--
    the ability to be able to use
  • 5:06 - 5:09
    your language in a Wikimedia project
    is not straightforward.
  • 5:09 - 5:11
    It's really difficult,
    and when you talk to people,
  • 5:11 - 5:14
    they're like, "Oh yeah, I'll fix it.
    It'll take me five minutes."
  • 5:14 - 5:17
    And then, yeah, it takes them
    five minutes to fix one thing.
  • 5:17 - 5:19
    but then the next thing is not working,
  • 5:19 - 5:22
    the next thing, something else breaks,
    things like that.
  • 5:22 - 5:26
    And if we, people who have been
    in the Wikimedia projects forever,
  • 5:26 - 5:29
    can't figure out how this thing works
  • 5:29 - 5:32
    and how to get things
    straightforwardly working,
  • 5:32 - 5:36
    then we can't expect communities--
  • 5:36 - 5:40
    language communities that aren't
    familiar with the Wikimedia projects
  • 5:40 - 5:43
    to be able to figure out where to start
  • 5:43 - 5:45
    and how to navigate this process.
  • 5:45 - 5:47
    It's not possible.
  • 5:47 - 5:48
    And there are actual pages
  • 5:48 - 5:51
    that people are like, "Oh yeah,
    there's a page for this."
  • 5:51 - 5:54
    And you're going, "But it doesn't come up
    in Google Search for instance,
  • 5:54 - 5:56
    so it's not findable."
  • 5:56 - 5:59
    - Do you want to say something about that?
    - (Susanna) No, that's fine.
  • 5:59 - 6:03
    So well we tried to come up
    with some things
  • 6:03 - 6:05
    that should be looked into.
  • 6:05 - 6:07
    This is not an exhaustive list,
  • 6:07 - 6:12
    but well, obviously, the process
    needs to be streamlined.
  • 6:14 - 6:16
    (Kimberli) The one that I really hate
    are the language codes.
  • 6:17 - 6:20
    Because for instance I did research
    with [inaudible]
  • 6:20 - 6:23
    which is a specific language of its own.
  • 6:23 - 6:25
    And there is no ISO code for it.
  • 6:25 - 6:28
    There is an ISO code for [inaudible].
  • 6:28 - 6:30
    And they've lumped together
    two different languages
  • 6:30 - 6:33
    that are completely
    unintelligible to each other.
  • 6:33 - 6:39
    And so Wikimedia projects use ISO codes
    for these type of things.
  • 6:39 - 6:41
    And we really think
    that there should be
  • 6:41 - 6:44
    a more fine-grained level to this.
  • 6:44 - 6:47
    For Skolt Saami, even though
    there's only 300 people that speak it,
  • 6:47 - 6:49
    we have a lot of data for it.
  • 6:49 - 6:51
    And there's four main dialects,
  • 6:51 - 6:54
    and the words aren't the same
    in the four dialects.
  • 6:54 - 6:57
    So I would really like to be able to put
    this is from the Paaččjokk dialect,
  • 6:57 - 7:00
    this is from the Suõ´nn’jel dialect,
    and that type of stuff.
  • 7:00 - 7:01
    But we can't do that.
  • 7:01 - 7:02
    We can't do that for Spanish.
  • 7:02 - 7:03
    We can't do it for English even.
  • 7:03 - 7:07
    And so something has to be done
    about the language codes
  • 7:07 - 7:08
    in the Wikimedia projects.
  • 7:09 - 7:12
    Yeah, and something that started to happen
  • 7:12 - 7:17
    I think is to engage maybe
    the broader language,
  • 7:17 - 7:22
    linguist language communities
    into the decision-making process,
  • 7:22 - 7:25
    and maybe they're like the decisions
    that need to be made.
  • 7:25 - 7:29
    The bureaucracy maybe has
    to be somehow assessed.
  • 7:29 - 7:35
    What are the decisions that are needed
    in this sphere?
  • 7:35 - 7:40
    Like what are the application processes?
  • 7:40 - 7:44
    What are the... yeah, so.
  • 7:45 - 7:49
    Thanks to Benjamin's presentation today,
  • 7:49 - 7:51
    I think PanLex needs
    to be added to this too.
  • 7:51 - 7:53
    (laughing)
  • 7:53 - 7:56
    (man) We have individual ISO codes
  • 7:56 - 7:57
    for all the languages you mentioned.
  • 7:58 - 7:59
    Are you using IETF or... ?
  • 8:00 - 8:04
    (man) We start with [inaudible] codes
    and [inaudible] codes
  • 8:04 - 8:09
    and then they can just get
    a variety ID [inaudible].
  • 8:10 - 8:12
    [inaudible]
  • 8:12 - 8:16
    (Kimberli) Good. We'll talk
    about it more in the Q&A then.
  • 8:16 - 8:18
    (moderator) If we can repeat
    that for the stream
  • 8:18 - 8:19
    because it was...
  • 8:19 - 8:22
    (Susanna) Okay, I can't. (chuckles)
  • 8:24 - 8:26
    - (moderator) We can do it after.
    - (Susanna) Right.
  • 8:29 - 8:31
    (Kimberli) So some of the ways
    that we work together...
  • 8:31 - 8:33
    We work with the communities themselves,
  • 8:33 - 8:38
    and we were invited
    to this 70-year anniversary
  • 8:38 - 8:40
    of the Skolts living in Finland.
  • 8:40 - 8:42
    They were relocated to Finland
  • 8:42 - 8:44
    from when the border was closed off.
  • 8:44 - 8:46
    And so they've been living in this area
    for seven years,
  • 8:46 - 8:48
    and there was a big party going on,
  • 8:48 - 8:49
    and we were there.
  • 8:50 - 8:53
    She was working with little kids
    putting in Moomin characters
  • 8:53 - 8:57
    in the different Saami languages
    and different words like that.
  • 8:58 - 9:00
    Do you want to say
    something else about that?
  • 9:00 - 9:04
    (Susanna) Yeah, just
    to also pinpoint that.
  • 9:04 - 9:10
    We can find new ways of working
    with data or language
  • 9:10 - 9:11
    so we can go to this--
  • 9:11 - 9:15
    We can go together with the communities.
  • 9:15 - 9:21
    We want to create participatory methods
  • 9:21 - 9:24
    in which we can add more information.
  • 9:24 - 9:29
    I think we have come up with this idea
    of the term of "depictathons"
  • 9:29 - 9:33
    now that we can work with images
    or translateathons which have been
  • 9:33 - 9:37
    done earlier as well,
    but these are the kinds of events
  • 9:37 - 9:43
    together with the communities
    that we can work with the language.
  • 9:47 - 9:50
    (Kimberli) So some
    of the solutions that we have.
  • 9:50 - 9:53
    (Susanna) Here are two ideas
    for next year that we have.
  • 9:53 - 9:56
    We are developing and seeing
    what can be done with them.
  • 9:56 - 9:59
    One of them comes
    as a collaborative project
  • 9:59 - 10:01
    together with the Saami archives
  • 10:01 - 10:07
    and the Saami museum in Inari
    in the North of Finland,
  • 10:07 - 10:11
    and we could collect
    cultural heritage concepts
  • 10:11 - 10:15
    across these Nordic countries
    in different Saami languages,
  • 10:15 - 10:19
    but not only Saami languages
    but also in the Nordic languages
  • 10:19 - 10:23
    because we share
    a similar cultural heritage/history
  • 10:23 - 10:26
    that we have similar monuments.
  • 10:26 - 10:30
    This, of course, came up
    with a Wiki Loves Monuments competition
  • 10:30 - 10:33
    and archeological finds
    across the area are similar.
  • 10:34 - 10:38
    And the other one is place names,
  • 10:38 - 10:44
    that is a fortunate new project
    starting at Wikimedia.
  • 10:44 - 10:48
    Norway, that we could expand
    to be Pan Nordic,
  • 10:48 - 10:51
    to include place names in all these.
  • 10:52 - 10:55
    - Pan Saami.
    - Pan Saami, ooh.
  • 10:58 - 11:00
    (Kimberli) So these are depictathons.
  • 11:00 - 11:03
    The Skolt Saami--
    there are thousands of pictures
  • 11:03 - 11:05
    of the Skolt Saami in Commons.
  • 11:05 - 11:08
    They come from different archives,
    and they have data,
  • 11:08 - 11:13
    the structured data on them
    is basically from 100 years ago
  • 11:13 - 11:15
    so it's describing things
    in the way that they would have been
  • 11:15 - 11:17
    described 100 years ago.
  • 11:17 - 11:21
    We don't want those,
    those ways of description there anymore
  • 11:21 - 11:24
    because a lot of them are racist,
    quite racist.
  • 11:24 - 11:26
    We don't want them.
  • 11:26 - 11:28
    The community doesn't want them.
  • 11:28 - 11:31
    The community wants to be able
    to write what they want to say
  • 11:31 - 11:33
    about the pictures in their own language,
  • 11:33 - 11:36
    or in Finnish or Norwegian or Swedish.
  • 11:36 - 11:39
    And so we've been having depictathons
    as an idea that--
  • 11:39 - 11:41
    well, we've done it.
  • 11:41 - 11:45
    So people can change the captions,
    change the descriptions
  • 11:45 - 11:48
    of these pictures in Commons,
  • 11:48 - 11:51
    and you work with structured data
    so I'll let you talk about that.
  • 11:53 - 11:55
    (Susanna) Yeah, and well,
    let's see our next slide
  • 11:55 - 11:58
    because this is just as--
  • 11:58 - 12:03
    you all know structured data on Commons
    so for you this is no news.
  • 12:03 - 12:09
    And I think, well from these,
    we also enter delicate questions
  • 12:09 - 12:13
    of what are the descriptions,
  • 12:13 - 12:15
    but we'll come back to that.
  • 12:17 - 12:19
    (Kimberli) In the Northern Saami,
    we've been creating
  • 12:19 - 12:22
    autogenerated Wikidata info boxes.
  • 12:22 - 12:25
    They've been pulling in data
    from Wikidata
  • 12:25 - 12:28
    because I'm the one person
    that's correcting everything
  • 12:28 - 12:30
    in the Northern Saami Wikipedia,
  • 12:30 - 12:33
    and I don't have time
    to change every mayor,
  • 12:33 - 12:35
    the population of every country,
    things like that.
  • 12:35 - 12:40
    So I've been really blessed
    with the people
  • 12:40 - 12:43
    that have come up and started helping
    create these info boxes.
  • 12:43 - 12:46
    And it's expanded the amount of knowledge
  • 12:46 - 12:49
    we have in the Northern Saami
    Wikipedia greatly.
  • 12:52 - 12:54
    So this is Nils-Aslak Valkeapää,
  • 12:54 - 12:59
    who is one of the most famous Saami
    multi-talent--he's a polymath.
  • 12:59 - 13:02
    I mean, he was a singer, a writer,
  • 13:03 - 13:08
    artist, and we now have
    this info box there for him,
  • 13:08 - 13:11
    all of the data which is pulled
    from Wikidata.
  • 13:12 - 13:15
    Before we had maybe three lines
    and no picture.
  • 13:15 - 13:17
    (Susanna) And this applies specifically
  • 13:17 - 13:20
    of course to the languages
    that have a Wikipedia.
  • 13:20 - 13:22
    (Kimberli) Yeah, but doesn't work
    in an incubator.
  • 13:22 - 13:23
    (Susanna) Yep.
  • 13:24 - 13:26
    This is quite exciting now.
  • 13:26 - 13:28
    Once we have the--
  • 13:28 - 13:31
    well, we are not working
    with lexicographical data,
  • 13:31 - 13:34
    like specifically.
  • 13:34 - 13:37
    We will extend to it,
  • 13:37 - 13:44
    but we are concerned mainly
    about labels and items so far.
  • 13:45 - 13:50
    So what this makes possible
    is tagging content,
  • 13:50 - 13:55
    museums, libraries
    as well as broadcasters.
  • 13:55 - 13:56
    Yle, the Finnish Broadcasting Company
  • 13:56 - 14:00
    as they are already using
    the Wikidata for tagging,
  • 14:00 - 14:05
    this might be an opportunity
    for the small Saami languages
  • 14:05 - 14:06
    in the Nordic area.
  • 14:07 - 14:10
    And this is my opportunity to show
  • 14:10 - 14:12
    my project Wikidocumentaries as well
  • 14:12 - 14:16
    because it is a project that reads--
  • 14:16 - 14:21
    well, it's difficult to make the change...
  • 14:21 - 14:24
    Let me have [inaudible] help.
  • 14:30 - 14:31
    Yeah, there.
  • 14:31 - 14:36
    So here we have a page
    in Wikidocumentaries,
  • 14:36 - 14:38
    which is now in English.
  • 14:38 - 14:44
    This is a project that consumes
    information from the Wikimedia sphere.
  • 14:45 - 14:49
    Every item in Wikidata has a page,
  • 14:49 - 14:53
    or can be made into a page
  • 14:53 - 14:56
    or is automatically created into a page.
  • 14:56 - 15:01
    Then it gathers all this information
    across Wikimedia projects,
  • 15:04 - 15:09
    and the interface exists already
    in 40 plus languages,
  • 15:10 - 15:13
    and I would be able
    to change the interface
  • 15:14 - 15:20
    and then see all the same data
    in another language.
  • 15:20 - 15:25
    I could also, as you can see,
    or you were able to see
  • 15:25 - 15:29
    in the English one,
    that there is no article on this
  • 15:29 - 15:31
    in the English Wikipedia.
  • 15:31 - 15:34
    Therefore you could go to see
    which languages it exists,
  • 15:34 - 15:36
    and this one is in Northern Saami.
  • 15:37 - 15:41
    So you would be able to switch
    only the article language.
  • 15:41 - 15:48
    But also then it can also display
    any language
  • 15:50 - 15:54
    that is encoded in Wikidata.
  • 15:54 - 15:59
    So we also get it
    in the same page in Skolt Saami.
  • 15:59 - 16:01
    Although, there is no Wikipedia,
  • 16:01 - 16:05
    you get all the same content
  • 16:05 - 16:07
    in these languages.
  • 16:07 - 16:09
    (Kimberli) There is actually
    an article about her
  • 16:09 - 16:11
    in Skolt Saami on the incubator,
  • 16:11 - 16:13
    but it doesn't work with Wikidocumentaries
  • 16:13 - 16:17
    because of the way
    the incubator is encoded.
  • 16:17 - 16:18
    (Susanna) Oh yeah.
  • 16:19 - 16:26
    And just briefly, I'm very excited
    in thinking about an app
  • 16:26 - 16:31
    that will gamify this
    or like collecting these terms
  • 16:31 - 16:33
    into Wikidata.
  • 16:33 - 16:39
    But I haven't landed on one,
    and I'm sure there are experiences
  • 16:39 - 16:43
    of that across this community,
  • 16:43 - 16:48
    and it would be interesting
    to put together our thoughts on that.
  • 16:48 - 16:50
    (Kimberli) So there's
    quite a few challenges
  • 16:50 - 16:52
    that we have in these projects.
  • 16:52 - 16:54
    This picture, if you come across it
  • 16:54 - 16:56
    on any Wikipedia please delete it.
  • 16:56 - 16:59
    It's two Finns dressed as Saami people.
  • 16:59 - 17:02
    It's labeled fake Saami clothing,
  • 17:02 - 17:05
    and people still use it
    on Wikipedia projects.
  • 17:05 - 17:07
    I don't know why.
  • 17:07 - 17:09
    So we have false data.
  • 17:09 - 17:10
    We have racist--and with the Saami,
  • 17:10 - 17:12
    we have a lot of eugenics-based data.
  • 17:12 - 17:15
    So when they were trying to prove
    that the Saami were a lower race
  • 17:15 - 17:17
    so they could sterilize them
    and things like that,
  • 17:17 - 17:18
    we have a lot of that data
  • 17:18 - 17:21
    because that's the stuff
    that comes out of archives.
  • 17:21 - 17:24
    Data usage--data has been used
    without the consent
  • 17:24 - 17:25
    of the communities,
  • 17:25 - 17:30
    and for instance, the Skolt community
    was kind of shocked to see
  • 17:30 - 17:32
    that their relatives are in Commons,
  • 17:32 - 17:35
    and they weren't very appreciative of it.
  • 17:35 - 17:38
    Sensitive data,
    which Stacy can talk more about.
  • 17:40 - 17:42
    Yeah, this is used
    on the Hungarian Wikipedia.
  • 17:42 - 17:45
    Here's that lovely picture
  • 17:45 - 17:48
    describing that these people
    are Saami people.
  • 17:48 - 17:49
    Please delete it.
  • 17:50 - 17:55
    Yeah, this is more
    what Stacy will talk about.
  • 17:55 - 17:57
    (Susanna) Leave it to you?
  • 18:01 - 18:02
    (Kimberli) Sensitive data.
  • 18:02 - 18:04
    TK labels--you want to talk about before.
  • 18:04 - 18:06
    (Susanna) You're not addressing them.
  • 18:06 - 18:13
    I think we could also look
    into identifying content
  • 18:13 - 18:16
    already on Commons
    or just about to enter Commons,
  • 18:16 - 18:23
    how to tag and identify, tag
    and perhaps delete
  • 18:24 - 18:30
    or then find out restricting
    the usage of this media.
  • 18:31 - 18:33
    Well, it's very short,
  • 18:33 - 18:38
    but let's see if we have
    more opportunities to discuss that.
  • 18:40 - 18:42
    (Kimberli) We can skip this part.
  • 18:42 - 18:43
    Sorry.
  • 18:43 - 18:46
    I want to say that this is the week
  • 18:46 - 18:47
    of the Saami Language Week this week
  • 18:47 - 18:53
    so please feel free to use hashtags
    for Saami languages.
  • 18:53 - 18:55
    Gæjhtoe!
  • 18:55 - 18:56
    (Susanna) Spä'sseb!
  • 18:56 - 18:58
    (Kimberli) Spä'sseb!
  • 18:58 - 18:59
    Takkâ.
  • 18:59 - 19:02
    (applause)
Title:
cdn.media.ccc.de/.../wikidatacon2019-1009-eng-Experimenting_with_Wikidata_in_the_Saami_languages_hd.mp4
Video Language:
English
Duration:
19:10

English subtitles

Revisions