Return to Video

cdn.media.ccc.de/.../wikidatacon2019-1035-eng-Data_import_process_overview_hd.mp4

  • 0:00 - 0:08
    Good afternoon, everybody.
  • 0:09 - 0:12
    Welcome to our GLAM panel.
  • 0:13 - 0:17
    Before we start, I just have
    two announcements to make.
  • 0:17 - 0:23
    First of all, please extensively make use
    of our Etherpad to take notes.
  • 0:24 - 0:28
    And the second one is directed
    at our audience at home,
  • 0:28 - 0:30
    or wherever you are.
  • 0:30 - 0:31
    If you have any questions,
  • 0:31 - 0:34
    you can also write that into the Etherpad,
  • 0:34 - 0:38
    and our room angels
    will keep track of them.
  • 0:39 - 0:44
    So, we decided that for this year's panel,
  • 0:45 - 0:49
    after seeing all the contributions
    that were made,
  • 0:49 - 0:54
    we would focus on the role of Wikidata
    within data ecosystems
  • 0:54 - 0:57
    that go beyond the actual
    Wikimedia projects,
  • 0:57 - 1:00
    which is also absolutely in line
  • 1:00 - 1:04
    with the new Wikimedia
    Foundation strategy.
  • 1:05 - 1:08
    And we have, today, four panelists.
  • 1:08 - 1:10
    Three plus one.
  • 1:10 - 1:14
    So, I would like to ask you on stage,
  • 1:14 - 1:16
    so we can introduce you.
  • 1:22 - 1:25
    So, we have Susanna Ånäs.
  • 1:25 - 1:29
    She's a long time free-knowledge activist
  • 1:29 - 1:31
    involved in many WikiProjects.
  • 1:32 - 1:36
    And she will be reporting today
    on the project in cooperation
  • 1:36 - 1:38
    with the Finnish National Library.
  • 1:39 - 1:43
    Then we have, next to me, Mike Dickison,
  • 1:43 - 1:46
    who will be second in this order.
  • 1:47 - 1:50
    He is a museum curator from New Zealand.
  • 1:50 - 1:54
    He's a zoologist and a Wikipedia editor.
  • 1:54 - 1:59
    And he was New Zealand's
    first Wikipedian at Large
  • 1:59 - 2:03
    in 2018 and 2019.
  • 2:03 - 2:07
    And he will tell us
    about his experience in that role,
  • 2:07 - 2:13
    and what kind of role Wikidata
    is starting to play in that context.
  • 2:16 - 2:18
    Then we have Joachim Neubert
  • 2:18 - 2:23
    from the Leibniz Information Center
    for Economics in Kiel and Hamburg.
  • 2:24 - 2:29
    He has been working on making the largest
    public press archives worldwide
  • 2:29 - 2:35
    more accessible to the public,
    and he's using Wikidata to do that.
  • 2:36 - 2:39
    And then I will go last.
    My name is Beat Estermann.
  • 2:39 - 2:43
    I work for Bern University
    of Applied Sciences, in Switzerland.
  • 2:44 - 2:50
    And I've been a long-time promoter
    for OpenGLAM in Switzerland and Austria.
  • 2:50 - 2:55
    And I will today report
    about my activities in connection
  • 2:55 - 2:59
    with the mandate from the Canadian Arts
    Presenting Association,
  • 2:59 - 3:01
    focusing on performing arts.
  • 3:02 - 3:04
    Not primarily on Wikidata,
  • 3:04 - 3:08
    but you will see Wikidata
    is starting to play a role there, as well.
  • 3:09 - 3:13
    So now, most of us
    will take our seat here,
  • 3:13 - 3:17
    and I will give the floor to Susanna.
  • 3:18 - 3:23
    Okay. So, hello. My name is Susana Ånäs,
  • 3:23 - 3:26
    and I work part-time for Wikimedia Finland
  • 3:26 - 3:27
    as a GLAM coordinator,
  • 3:27 - 3:33
    and I also do consulting
    in the open knowledge sphere.
  • 3:33 - 3:36
    And this is a discourse,
    maybe, of [inaudible].
  • 3:36 - 3:39
    So, I have been involved in the workings
  • 3:39 - 3:46
    of geographic data group of the--
  • 3:48 - 3:51
    well, I looked it up,
    but it isn't in English,
  • 3:51 - 3:54
    but, cultural heritage initiative
    of the Finnish royal government.
  • 3:55 - 4:00
    So, this is about place names
  • 4:00 - 4:03
    and how they are represented
  • 4:03 - 4:07
    in different repositories
    in the GLAM sector in Finland,
  • 4:07 - 4:12
    and how they are trying to pull together
    these different sources,
  • 4:12 - 4:18
    and how they are informed
    by modeling in Wikidata and elsewhere.
  • 4:18 - 4:23
    So, here we see the three main sources
    for these YSO places,
  • 4:23 - 4:28
    which is part of the national ontology--
    general ontology.
  • 4:28 - 4:30
    AHAA is for Finnish archives,
  • 4:30 - 4:32
    Melinda is for Finnish libraries,
  • 4:32 - 4:34
    and KOOKOS is for Finnish museums.
  • 4:34 - 4:38
    So, there are three, also,
    content management systems
  • 4:38 - 4:40
    that come together in these YSO places.
  • 4:41 - 4:47
    And there are exchanges between Wikidata
    already taking place,
  • 4:48 - 4:53
    as well as the names project
    for the National Land Survey.
  • 4:53 - 4:56
    And then, there's a third project,
    the Finnish Names Archive,
  • 4:56 - 5:00
    which doesn't yet contribute to this,
  • 5:00 - 5:03
    but there are plans for that.
  • 5:03 - 5:09
    So, one of the key modeling issues
    in this whole problem area
  • 5:09 - 5:15
    is that there are three types
    of elements in place names
  • 5:16 - 5:18
    represented in this project.
  • 5:18 - 5:21
    One of them is the place,
    the one that has location.
  • 5:21 - 5:25
    And one of them is the place name,
    the toponym, for example.
  • 5:25 - 5:28
    And then, there are sources,
    which are documents
  • 5:28 - 5:31
    from which these both can be derived from,
  • 5:31 - 5:33
    or like, backed up with.
  • 5:33 - 5:36
    The YSO places--
    here, on the top right,
  • 5:36 - 5:39
    you will see the same diagram again.
  • 5:39 - 5:41
    It focuses mainly on the places.
  • 5:43 - 5:46
    The main thing of this
    is the Finnish National Library,
  • 5:46 - 5:49
    and the Finto project.
  • 5:50 - 5:56
    There are now more than 7,000 places
    in Finnish and Swedish
  • 5:56 - 5:59
    and over 3,000 in English,
  • 5:59 - 6:03
    and they are CC0 we've licensed with.
  • 6:03 - 6:06
    So, here you can see the service of Finto.
  • 6:06 - 6:10
    And a place-- I chose Sevettijärvi.
  • 6:10 - 6:14
    It is now also related
    to our language project
  • 6:14 - 6:15
    with the Skolt Sami--
  • 6:15 - 6:19
    this is a place
    in the very north of Finland
  • 6:19 - 6:22
    inhabited by Skolt Sámi.
  • 6:22 - 6:27
    So, here you can see the place
    which belongs to the--
  • 6:27 - 6:33
    well, you will see the data
    about this place.
  • 6:33 - 6:38
    You can see that it is connected
    to a Wikidata,
  • 6:38 - 6:42
    as well as this National Land Survey data.
  • 6:43 - 6:47
    Here we go. And you will see
    this in more detail, here.
  • 6:49 - 6:52
    It is also hierarchically arranged
  • 6:52 - 6:56
    inside this repository.
  • 6:58 - 7:00
    Well, actually,
    the actual place is not seen,
  • 7:00 - 7:06
    but it is underneath this municipality,
  • 7:06 - 7:08
    as well as the region,
  • 7:08 - 7:10
    and Finland as a country,
    and Nordic countries,
  • 7:10 - 7:13
    the broader region.
  • 7:13 - 7:14
    Here you can see that many of these
  • 7:14 - 7:18
    have been matched
    with Wikidata previously
  • 7:19 - 7:22
    through Mix'n'Match,
    and there are still remaining ones.
  • 7:22 - 7:28
    But then, the amount of names
    is not that high.
  • 7:28 - 7:31
    It's only less than 5,000.
  • 7:32 - 7:34
    So, then there is this other repository
  • 7:34 - 7:38
    by the Finnish Geospatial
    Platform Project--
  • 7:38 - 7:39
    Place Names Cards.
  • 7:39 - 7:42
    These are all the place names
    that are on Finnish maps.
  • 7:42 - 7:48
    And they have the linked data,
    which is licensed CC BY 4.0.
  • 7:49 - 7:54
    800,000 map labels in Finnish, Swedish,
    and all those three Saami languages
  • 7:54 - 7:56
    that are in Finland.
  • 7:56 - 7:59
    And they have
    two different types of entities.
  • 7:59 - 8:01
    The other ones are places,
    and the other ones
  • 8:01 - 8:03
    are place names, toponyms.
  • 8:03 - 8:05
    And they both have persistent URIs.
  • 8:06 - 8:10
    Here's, for example,
    the same Sevettijärvi, in first Finnish,
  • 8:10 - 8:14
    and then all those three Saami languages,
    as well as the geographic data,
  • 8:14 - 8:19
    and then there is more information
    about that, like the place type,
  • 8:20 - 8:21
    et cetera.
  • 8:22 - 8:28
    Here is the card for the place name,
    the toponym, having its own URI.
  • 8:30 - 8:34
    Sorry, it seems that it's not translated
    into the English list.
  • 8:34 - 8:39
    So, multilinguality
    is not covering the whole project.
  • 8:40 - 8:43
    Okay, we come
    to the Finnish Names Archive.
  • 8:43 - 8:46
    This is a project by the Institute
    for the Languages of Finland,
  • 8:46 - 8:50
    and these represent not the places,
    not the place names,
  • 8:50 - 8:53
    but they are actually sources for those.
  • 8:53 - 8:57
    So, these are three million
    field notes of place names,
  • 8:58 - 9:00
    and it is a Wikibase project.
  • 9:00 - 9:03
    They are in a Wikibase,
    mainly in Finnish, some in Swedish.
  • 9:03 - 9:08
    An outstanding collection of Saami names,
    which we are very interested in.
  • 9:08 - 9:10
    And they are licensed CC BY.
  • 9:10 - 9:15
    And that is also a challenge
    from the Wikidata point of view.
  • 9:15 - 9:18
    But if there was a Finnish local Wikibase,
  • 9:18 - 9:23
    we might be able to first work
    on them in that project.
  • 9:23 - 9:25
    So, here's a screenshot of that,
  • 9:26 - 9:31
    showing that there's information
    about the place, the maps--
  • 9:31 - 9:35
    the maps that the collectors
    initially use,
  • 9:35 - 9:41
    and the card that they produce
    of the information they collected.
  • 9:41 - 9:46
    So, here's one of those cards
  • 9:46 - 9:49
    broken down into data
  • 9:49 - 9:51
    that is included in them.
  • 9:51 - 9:54
    So, then they sent
    this linked data project
  • 9:54 - 9:56
    by the Helsinki Digital Humanities Lab
  • 9:56 - 9:58
    and Semantic Computers,
  • 9:58 - 10:01
    computing group of Aalto University--
  • 10:01 - 10:07
    and together with this Institute
    for the Languages of Finland--
  • 10:07 - 10:08
    the Names Sampo.
  • 10:08 - 10:11
    And this is an aggregated
    research interface
  • 10:11 - 10:14
    to several place name sources.
  • 10:14 - 10:18
    Here you can see that many
    of the sources are out there on the left,
  • 10:18 - 10:21
    and then, you can make
    different kinds of visualizations
  • 10:21 - 10:23
    based on this data.
  • 10:23 - 10:24
    And, yeah.
  • 10:25 - 10:31
    So, I've been bringing up this idea
    of modeling for a local Wikibase
  • 10:31 - 10:33
    that we could do with this data.
  • 10:33 - 10:37
    But when we enter
    these modeling questions,
  • 10:37 - 10:38
    how do we model?
  • 10:38 - 10:42
    There are different ways,
    different traditions in each of these.
  • 10:46 - 10:50
    And the good thing about it
    is it could also serve minority languages
  • 10:50 - 10:52
    with very little effort.
  • 10:53 - 10:57
    Okay. So, here we have
    the two basic options:
  • 10:57 - 11:02
    the SAPO model, which is
    the Finnish Space-Time Ontology,
  • 11:03 - 11:04
    and the Wikidata model.
  • 11:04 - 11:08
    Here you can see
    that Wikidata items tend to zero.
  • 11:08 - 11:13
    Ideally, they remain the same
    with the changing properties.
  • 11:13 - 11:17
    Whereas, in the SAPO model,
    these items become new
  • 11:17 - 11:20
    when there is a change,
    such as area change and name change.
  • 11:21 - 11:26
    So here, come back to this division
  • 11:26 - 11:32
    between these three different dimensions
    of places, place names.
  • 11:32 - 11:38
    So, should we make these place names
    into entities or properties?
  • 11:38 - 11:39
    Wikidata uses properties,
  • 11:39 - 11:43
    whereas this land survey
    project has entities.
  • 11:44 - 11:46
    Or should we make them into lexemes?
  • 11:46 - 11:51
    Wikidata has chosen to work
    with properties,
  • 11:51 - 11:55
    textual properties
    for place names over lexemes.
  • 11:56 - 11:58
    I'm sorry, the other way around.
  • 11:58 - 12:00
    So, the names are...
  • 12:03 - 12:05
    properties, not lexemes.
  • 12:06 - 12:07
    Right.
  • 12:07 - 12:11
    And maybe the shortcoming of the Wikibase
  • 12:11 - 12:16
    is the lack of geographical
    shapes inside that--
  • 12:16 - 12:21
    like in the basic setup of it,
  • 12:21 - 12:25
    so one would have to add
    more technology into the stack
  • 12:25 - 12:30
    to be able to use local geographic shapes.
  • 12:30 - 12:32
    And a federation is really needed
  • 12:32 - 12:38
    to be able to take advantage
    of the Wikidata corpus.
  • 12:39 - 12:43
    So, I'm done already. Thank you.
  • 12:44 - 12:46
    (applause)
  • 13:01 - 13:03
    Okay.
  • 13:03 - 13:05
    (speaking in Maori)
  • 13:05 - 13:08
    Welcome, everyone.
    My name is Mike Dickison.
  • 13:08 - 13:10
    And for a year,
  • 13:10 - 13:13
    I was New Zealand Wikipedian at Large.
  • 13:14 - 13:17
    You might wonder
    what a Wikipedian at Large is.
  • 13:18 - 13:22
    Because if you actually look out for it,
    there is no such thing, as we can see.
  • 13:23 - 13:26
    It's a term that I made up
    in the grant proposal,
  • 13:26 - 13:29
    which the foundation
    seemed to like very much.
  • 13:30 - 13:32
    And so, we ran with it.
  • 13:32 - 13:37
    So, for a year, I went through
    35 different institutions,
  • 13:37 - 13:41
    residents, and most of them,
    running training sessions,
  • 13:41 - 13:44
    organizing public events,
    and trying to develop
  • 13:44 - 13:47
    a Wikimedia strategy for each one.
  • 13:48 - 13:49
    It was a very interesting experience,
  • 13:49 - 13:53
    and you encounter a wide range
    of different projects and people.
  • 13:53 - 13:58
    And I wanted to try and talk through
    some of the different projects
  • 13:58 - 14:00
    that dealt with Wikidata
  • 14:01 - 14:05
    in interesting or, perhaps,
    illuminating ways,
  • 14:05 - 14:08
    that might be useful for folks to discuss.
  • 14:09 - 14:12
    The project was initially
    a Wikipedia project by the name,
  • 14:12 - 14:15
    simply because that was what people
    were familiar with,
  • 14:15 - 14:18
    and so we organized
    multiple different events
  • 14:18 - 14:23
    at very traditional edit-a-thons,
    gender gap work, and so forth.
  • 14:25 - 14:27
    [And a bunch you can see] [inaudible],
  • 14:27 - 14:31
    and a bunch of very successful
    new editors recruited, and so forth.
  • 14:32 - 14:34
    We did bulk uploads into Commons.
  • 14:35 - 14:41
    In this case, there was a collection
    of over 1,000 original artworks
  • 14:41 - 14:46
    by an entomological
    illustrator, Des Helmore,
  • 14:46 - 14:48
    which had been sitting on a hard drive,
  • 14:48 - 14:50
    [lacking] research for ten years,
  • 14:50 - 14:52
    and we were able
    to get clearance to release those
  • 14:52 - 14:54
    all under CC BY license.
  • 14:54 - 14:58
    So, easy wins to show to people there.
  • 14:58 - 15:01
    Everyone can understand
    lots of pictures of beetles.
  • 15:01 - 15:07
    Everyone can understand workshops
    devoted to fixing the gender gap.
  • 15:07 - 15:10
    But Wikidata
    is much more difficult to sell
  • 15:10 - 15:12
    to people in the GLAM sector,
  • 15:12 - 15:15
    or anyone outside
    of our particular movement.
  • 15:16 - 15:20
    So, I began to realize that Wikidata
  • 15:20 - 15:23
    was going to be a more
    and more important part
  • 15:23 - 15:26
    of the Wikipedian at Large projects.
  • 15:26 - 15:30
    So, as we went through, it became
    a larger and larger component
  • 15:30 - 15:32
    of what I was doing.
  • 15:32 - 15:36
    And I began to try and teach myself
    more about Wikidata as well,
  • 15:37 - 15:40
    because I was beginning to see
    how important it was.
  • 15:40 - 15:42
    So, this one project--
  • 15:42 - 15:46
    the kakapo is a native
    New Zealand flightless parrot.
  • 15:48 - 15:51
    We worked with
    the Department of Conservation,
  • 15:51 - 15:54
    whose job is to save
    this species from extinction,
  • 15:54 - 15:56
    and pitched the idea,
  • 15:56 - 15:59
    "What if we put every
    single kakapo into Wikidata?"
  • 16:01 - 16:03
    And that may seem ridiculous,
  • 16:03 - 16:06
    but it's actually
    a perfectly doable project.
  • 16:07 - 16:08
    A few of them are in there already.
  • 16:09 - 16:12
    A key thing to notice here
    is there are not many kakapos.
  • 16:12 - 16:13
    So, it's a manageable task.
  • 16:13 - 16:17
    There were 148 when I started,
    and then one died.
  • 16:17 - 16:21
    And they've just had
    a great breeding season up to 213.
  • 16:22 - 16:25
    This is great. This is the most kakapo
    there have been for over 50 years.
  • 16:26 - 16:28
    So, this was also a big deal.
  • 16:28 - 16:31
    This was on the news
    every day in New Zealand.
  • 16:31 - 16:33
    Each new one that was born--
  • 16:33 - 16:34
    (man) In the New York Times.
  • 16:34 - 16:36
    (Mike) Did it? Oh, lovely.
  • 16:36 - 16:39
    Yeah, this was national news.
    Everyone likes these birds.
  • 16:39 - 16:41
    But something interesting about them
  • 16:41 - 16:44
    is because unlike species
    that are more populous,
  • 16:44 - 16:48
    every single kakapo is named,
    has a unique name
  • 16:48 - 16:50
    and a unique ID number.
  • 16:50 - 16:52
    And often has good biographical data
  • 16:52 - 16:55
    about where and when they were born,
  • 16:55 - 16:57
    were hatched, who their father
    and mother was,
  • 16:57 - 16:59
    when they died, if they died.
  • 16:59 - 17:01
    So, there is, in fact,
    a Department of Conservation database
  • 17:01 - 17:03
    of all this information.
  • 17:03 - 17:07
    And one of the most famous kakapos,
    of course, is Sirocco,
  • 17:07 - 17:10
    who you can see is named
    after a wind, was born there.
  • 17:10 - 17:13
    Sirocco has a Twitter account,
  • 17:14 - 17:16
    which Wikidata had some problems with,
  • 17:16 - 17:19
    because, apparently,
    they just can't have Twitter accounts.
  • 17:19 - 17:20
    I don't know about that.
  • 17:21 - 17:23
    He's even featured
    on an album cover, and so forth.
  • 17:23 - 17:26
    So there are multiple properties of this,
  • 17:26 - 17:28
    probably one of the most famous
    individual kakapo.
  • 17:28 - 17:30
    So, I pitched to the Department
    of Conservation,
  • 17:30 - 17:33
    "Why don't we try and do this
    with every single one?"
  • 17:33 - 17:38
    And so, they had to think about
    how much of the biographical data
  • 17:38 - 17:39
    could be made public.
  • 17:39 - 17:41
    And they come up with a short list.
  • 17:41 - 17:47
    And now we've got, I think, 212,
    210--I think a couple died--
  • 17:47 - 17:51
    living kakapo that are all candidates now.
  • 17:51 - 17:53
    And they only get a name when they fledge.
  • 17:53 - 17:56
    They have a code number until that
    while they're still babies.
  • 17:56 - 17:58
    So, when we've got the full-fledged crop,
  • 17:58 - 18:02
    we're going to create
    a complete Wikidata--
  • 18:02 - 18:04
    the entire species will be in Wikidata.
  • 18:05 - 18:07
    But we need to come up
    with a property for DOC ID--
  • 18:07 - 18:09
    I actually would like to talk
    with folks about that.
  • 18:09 - 18:11
    Should we be using a very specific ID,
  • 18:11 - 18:13
    or should we be coming up with an ID
  • 18:13 - 18:18
    that would work for all individual birds
    or plants or animals
  • 18:18 - 18:22
    that have been tagged
    in any scientific research project?
  • 18:22 - 18:24
    It's a good question.
  • 18:25 - 18:27
    Second project was
    Christchurch Art Gallery.
  • 18:28 - 18:32
    There are very few paintings
    of Colin MacCahon,
  • 18:32 - 18:34
    New Zealand's most famous
    artist in existence.
  • 18:34 - 18:37
    This is a drawing he did
    for the New Zealand School Journal,
  • 18:37 - 18:38
    which was government-funded at the time.
  • 18:38 - 18:41
    So, it's actually in Archives New Zealand
  • 18:41 - 18:42
    who own the copyright for that.
  • 18:42 - 18:44
    This is a very unusual situation.
  • 18:45 - 18:47
    So, I worked with
    Christchurch Art Gallery
  • 18:47 - 18:49
    who, along with Auckland Art Gallery,
  • 18:49 - 18:53
    maintain a site called
    Find New Zealand artists.
  • 18:53 - 18:56
    The job of which is to keep track
    of the holdings--
  • 18:56 - 18:58
    every institution that has holdings
    of the New Zealand artist.
  • 18:58 - 19:03
    So, about 18,000 different artists
    in their database,
  • 19:03 - 19:06
    and most with very little
    information at all.
  • 19:06 - 19:09
    So, we did a standard sort of Mix'n'Match.
  • 19:09 - 19:14
    We did an export of the ones
    that had at least a birth date,
  • 19:14 - 19:18
    or a death date, or a place of birth,
    or a place of death.
  • 19:18 - 19:21
    So, that's not restricting it very much.
  • 19:21 - 19:23
    And even then, we were not able
    to match quite a few,
  • 19:23 - 19:26
    but we've got about 1,500 now
  • 19:26 - 19:29
    that are matched
    to known artists in Wikidata,
  • 19:29 - 19:30
    which is nice.
  • 19:30 - 19:32
    But what was appealing to them--
  • 19:32 - 19:34
    this is their website,
  • 19:34 - 19:39
    which really just maintains
    the holdings links there.
  • 19:39 - 19:45
    But this biographical data,
    which they create by hand, currently,
  • 19:45 - 19:46
    for every single artist.
  • 19:46 - 19:49
    And the act of exporting
    and putting into Mix'n'Match
  • 19:49 - 19:52
    exposed numerous typos
    and mistakes and such
  • 19:52 - 19:54
    that they haven't noticed.
  • 19:54 - 19:56
    And it's only when you start
    running things through [Excel],
  • 19:56 - 19:57
    these things show up.
  • 19:57 - 20:02
    And the value of Wikidata
    was suddenly conveyed to them
  • 20:02 - 20:06
    when I said, "You can just suck in
    that information from Wikidata."
  • 20:07 - 20:10
    And that made them sit up straight.
  • 20:10 - 20:12
    So this, I think, is one
    of the selling points.
  • 20:12 - 20:15
    When you have this carefully
    hand-curated website
  • 20:15 - 20:19
    with 18,000 entries, full of mistakes,
    and tell them there's another way,
  • 20:19 - 20:21
    that they can get other people
  • 20:21 - 20:23
    to do some of this fact-checking
    and correction for them--
  • 20:23 - 20:25
    that's when it sinks home.
  • 20:25 - 20:27
    And then announced I was pitching the idea
  • 20:27 - 20:30
    that they "Wikidatafy"
    this entire history book
  • 20:30 - 20:33
    of the New Zealand artists
    in Christchurch in the '30s,
  • 20:33 - 20:37
    and run through--just published--
    and run through every single person,
  • 20:37 - 20:39
    connection, place, exhibition, and such.
  • 20:39 - 20:43
    But it's a manageable sized project,
    and they're very excited by this.
  • 20:44 - 20:47
    And thirdly, I wanted to show you
    Maori Subject Headings.
  • 20:47 - 20:51
    A waka is a Maori name
    for a particular kind of canoe,
  • 20:51 - 20:53
    a war canoe.
  • 20:53 - 20:56
    So, in the National Library
    of New Zealand,
  • 20:56 - 20:59
    there's a listing for waka,
    because the National Library
  • 20:59 - 21:03
    actually has its own dictionary
    of Maori Subject Headings,
  • 21:03 - 21:04
    in the Maori language.
  • 21:04 - 21:06
    So, there it defines a waka,
  • 21:07 - 21:10
    in Maori and English.
  • 21:10 - 21:12
    But it also has a whole lot
    of narrower terms,
  • 21:12 - 21:14
    you can see there on the side there.
  • 21:14 - 21:16
    a typical would be taurapa.
  • 21:16 - 21:20
    And a definition first in Maori,
    and then in English.
  • 21:20 - 21:22
    It's the carved sternpost
    that you can see there.
  • 21:23 - 21:24
    And in English, you would say "sternpost,"
  • 21:24 - 21:27
    but you can't use
    the word "sternpost" for taurapa,
  • 21:27 - 21:31
    because taurapa only works
    for particular kinds of war canoes.
  • 21:31 - 21:34
    So, there's no English word
    equivalent for that.
  • 21:35 - 21:38
    And I suddenly realized
    that here is an entire ontology
  • 21:38 - 21:42
    of cultural-specific terms that have been
    very carefully worked out
  • 21:42 - 21:45
    and verified by the National
    Library with Maori,
  • 21:45 - 21:50
    constantly being added to and improved
    with definitions, with descriptions,
  • 21:50 - 21:52
    in both English and Maori.
  • 21:52 - 21:53
    Really exciting.
  • 21:53 - 21:56
    I suddenly thought we could put
    this whole lot into Wikidata--
  • 21:56 - 22:01
    Maori first, and then translated
    into English, as required.
  • 22:01 - 22:02
    Be a nice change, wouldn't it!
  • 22:03 - 22:05
    And here's the copyright licensing.
  • 22:05 - 22:09
    Unfortunately, NonCommercial-NoDerivs.
  • 22:10 - 22:12
    So now I have to start
    the conversation with them
  • 22:12 - 22:15
    about why did they pick that license.
  • 22:16 - 22:20
    And possibly because they only got
    [buy in] from Maori,
  • 22:20 - 22:23
    who agreed to sit down
    and [inaudible] this stuff
  • 22:23 - 22:24
    if there was a guarantee
  • 22:24 - 22:27
    that none of this information
    could be used for commercial purposes.
  • 22:28 - 22:32
    So, that's one of the frustrating
    aspects of the task
  • 22:32 - 22:34
    is coming up against
    these sorts of restrictions.
  • 22:34 - 22:37
    So, those are the three things
    I wanted to put out in front
  • 22:37 - 22:38
    and sparking discussion.
  • 22:38 - 22:41
    Putting an entire species into Wikidata,
  • 22:41 - 22:44
    what it takes to actually change
    an art gallery's curator's mind
  • 22:44 - 22:46
    about the value of Wikidata,
  • 22:46 - 22:50
    and what do we do when we would see
    a complete ontology
  • 22:50 - 22:52
    in another language that,
    unfortunately, has been slapped
  • 22:52 - 22:56
    with a restrictive
    Creative Commons license.
  • 22:56 - 22:57
    Thank you.
  • 22:57 - 22:59
    (applause)
  • 23:11 - 23:14
    Hello. My name is Joachim Neubert.
  • 23:14 - 23:16
    I'm working for the ZBW,
  • 23:18 - 23:21
    that is, Information Center
    for Economics in Hamburg,
  • 23:21 - 23:24
    as a scientific software developer.
  • 23:25 - 23:31
    And one of my tasks last year
    was preparing a data donation to Wikidata.
  • 23:32 - 23:37
    And I want to give some report on this
    on our first experiences
  • 23:38 - 23:43
    from donating metadata
    from the 20th-Century Press Archives.
  • 23:46 - 23:48
    To our best knowledge,
  • 23:48 - 23:53
    this is the largest public
    press archive in the world.
  • 23:54 - 23:59
    It has been collected
    between 1908 and 2005,
  • 24:01 - 24:04
    and has been got from
  • 24:05 - 24:09
    more than 1,500 newspapers
    and periodicals
  • 24:09 - 24:13
    from Germany, and also internationally.
  • 24:15 - 24:19
    And it has covered everything
    which could be of interest
  • 24:19 - 24:23
    for the Hamburg,
  • 24:26 - 24:28
    the Hamburg businesspeople
  • 24:28 - 24:32
    who wanted to expand over the world.
  • 24:35 - 24:39
    As you can see, this material
    has been clipped from newspapers
  • 24:39 - 24:42
    and put onto paper,
  • 24:42 - 24:45
    and then collected in folders.
  • 24:46 - 24:50
    Here you see a small corner
    of the Person's Archive,
  • 24:51 - 24:56
    and, similarly, information
    has been collected on companies,
  • 24:56 - 25:00
    on general topics, on wares,
    on everybody,
  • 25:02 - 25:06
    on everything which could be interesting.
  • 25:07 - 25:11
    These folders have been scanned
  • 25:13 - 25:16
    up to roughly 1949.
  • 25:17 - 25:23
    by the DFG-funded project in 2004 to 2007.
  • 25:24 - 25:31
    As a result, up to now,
    it was 25,000 thematic dossiers
  • 25:32 - 25:34
    of this time.
  • 25:34 - 25:38
    This contained about 2 million,
    or more than 2 million pages.
  • 25:39 - 25:42
    And these are online.
  • 25:44 - 25:48
    This application developed
    at that time by ZBW,
  • 25:50 - 25:54
    which now looks a bit outdated,
  • 25:55 - 25:58
    not so fancy,
    and what’s more of a problem.
  • 25:59 - 26:04
    It's an application which was built
    architecturally on Oracle,
  • 26:04 - 26:09
    it was built on ColdFusion,
    it runs on Windows servers,
  • 26:09 - 26:15
    so it's not very sustainable
    in the long term.
  • 26:16 - 26:19
    And we have discussed
    should we migrate this
  • 26:19 - 26:23
    to a more fancy linked data application,
  • 26:24 - 26:28
    or should we take a radical step
  • 26:28 - 26:32
    and put all this data in the open.
  • 26:33 - 26:37
    We have assigned CC0 license to that data
  • 26:37 - 26:41
    and, currently, moving some main--
  • 26:42 - 26:46
    access layer, some main discovery layer--
    so it's a primary access layer
  • 26:48 - 26:51
    to the open linked data web,
  • 26:51 - 26:57
    where it actually makes most sense
  • 26:57 - 27:01
    to put some metadata into Wikidata,
  • 27:02 - 27:07
    and to make sure that all folders
  • 27:08 - 27:11
    of the collections are linked to Wikidata,
  • 27:11 - 27:13
    so they are findable,
  • 27:14 - 27:18
    and that all metadata about these folders
  • 27:18 - 27:23
    is also transferred to Wikidata.
  • 27:23 - 27:28
    So it can be used there,
    and it can be enriched there, possibly.
  • 27:29 - 27:32
    Corrections can be made to that data.
  • 27:33 - 27:39
    What is still maintained by ZBW is,
    of course, the storage of the images,
  • 27:40 - 27:44
    which we can't put in any way,
  • 27:46 - 27:47
    or we can't give a license on that
  • 27:47 - 27:51
    because this was owned
    by the original creators.
  • 27:52 - 27:55
    But we make sure that they are accessible
  • 27:56 - 28:02
    by some, again, metadata files
    via DFG Viewer
  • 28:03 - 28:06
    in the future by IIIF manifests.
  • 28:07 - 28:11
    And we will prepare
    some static landing pages
  • 28:12 - 28:18
    which will serve as a data point
    of reference for Wikidata,
  • 28:18 - 28:23
    as well as still making available data
  • 28:23 - 28:26
    which doesn't fit well into Wikidata.
  • 28:31 - 28:37
    [For us] is migration
    and data donation to Wikidata
  • 28:37 - 28:41
    with our custom infrastructure
  • 28:41 - 28:45
    of SPARQL endpoint with that data,
  • 28:46 - 28:49
    and we basically used federated queries
  • 28:50 - 28:54
    between that endpoint
    and the Wikidata Query Service
  • 28:54 - 28:58
    to create according statements
  • 28:59 - 29:02
    through [eyes of] concatenated
  • 29:02 - 29:07
    in SPARQL queries themselves,
    or transformed via a script,
  • 29:08 - 29:12
    which also generated references
    for the statements.
  • 29:13 - 29:19
    And then put that into QuickStatements
    of the code to use this online.
  • 29:23 - 29:24
    So, this is what we get.
  • 29:24 - 29:29
    It's not only simple things
    like birth dates, but, sorry--
  • 29:30 - 29:35
    but also complex statements
  • 29:35 - 29:40
    about already existing items,
  • 29:40 - 29:45
    like this person was a supervisory
    board member of said company
  • 29:47 - 29:49
    during this period of time,
  • 29:50 - 29:57
    and referenced for use in...
  • 29:58 - 30:02
    in the scientific context.
  • 30:08 - 30:11
    The first part of this data donation
    has been finished.
  • 30:13 - 30:17
    The Person's Archive
    is completely linked to Wikidata.
  • 30:18 - 30:24
    And this is also an information tool.
  • 30:24 - 30:27
    A lot of items which have been before
  • 30:27 - 30:30
    not had any external references.
  • 30:31 - 30:36
    And we had about more
    than 6,000 statements,
  • 30:36 - 30:42
    which are now sourced
    in this archive's metadata.
  • 30:45 - 30:50
    Well, this was the most easy part,
  • 30:51 - 30:55
    because persons are easily
    identifiable in Wikidata.
  • 30:56 - 31:00
    More than 90% already existed here,
  • 31:00 - 31:02
    so we could link to that.
  • 31:02 - 31:06
    We created some 100 items for these,
  • 31:06 - 31:09
    for the ones which were missing.
  • 31:09 - 31:14
    But now, we are working
  • 31:14 - 31:18
    on the rest of the archive,
  • 31:18 - 31:20
    particularly on the topics archive.
  • 31:21 - 31:27
    Which means mapping a historic system
    for the organization of knowledge
  • 31:27 - 31:30
    about the whole world,
  • 31:30 - 31:34
    materialized as newspaper
    clippings to Wikidata.
  • 31:36 - 31:42
    To give you a basic idea,
    the Countries and Topics archive
  • 31:43 - 31:49
    is organized by a hierarchy of countries
  • 31:49 - 31:51
    and other geographic entities,
  • 31:52 - 31:56
    which is translated to English,
    which makes this more easy.
  • 31:56 - 32:02
    And German deeply nested...
  • 32:04 - 32:08
    deeply nested classification of topics.
  • 32:08 - 32:12
    And this combination defines one...
  • 32:13 - 32:16
    one folder.
  • 32:16 - 32:21
    So, what we now want to do
    is to match this
  • 32:21 - 32:25
    as a structure to Wikidata,
    and to bring the data in.
  • 32:25 - 32:29
    And I want to invite you
  • 32:29 - 32:34
    to join this really nice challenge
  • 32:34 - 32:36
    in terms of knowledge organization.
  • 32:38 - 32:41
    So, it's a WikiProject
    where this work is tracked,
  • 32:41 - 32:46
    and you can follow this
    or participate in this.
  • 32:47 - 32:49
    And, yes, thank you very much.
  • 32:50 - 32:52
    (applause)
  • 33:04 - 33:07
    So, we're taking
    performing arts to Wikidata.
  • 33:08 - 33:12
    And we're taking performing arts
    to the linked open data cloud,
  • 33:12 - 33:16
    by building a linked open data
    ecosystem for the performing arts.
  • 33:16 - 33:21
    And the question I'm trying to answer,
  • 33:21 - 33:24
    and I hope you'll help me
    in answering the questions
  • 33:24 - 33:27
    which place for Wikidata and all that.
  • 33:27 - 33:31
    But let me first start with my experiences
  • 33:31 - 33:34
    which I made this year,
  • 33:35 - 33:38
    the first half of the year,
    when I had the pleasure
  • 33:38 - 33:39
    to work with CAPACOA,
  • 33:39 - 33:42
    which is the Canadian Arts
    Presenting Association,
  • 33:42 - 33:47
    which actually launched a project
    called Linked Digital Future Initiative,
  • 33:48 - 33:53
    to actually get the entire art sector
    in Canada to embrace linked open data.
  • 33:53 - 33:57
    And they did that based on the observation
  • 33:57 - 33:59
    that over the past five years,
  • 34:00 - 34:04
    the [inaudible]-- the important topic
    within performing arts
  • 34:04 - 34:09
    was the fact that metadata
    was not around in sufficient quality
  • 34:09 - 34:12
    and not interlinked, not interoperable.
  • 34:12 - 34:16
    And that was why some of the performances,
  • 34:16 - 34:20
    some of the events
    are not so well findable
  • 34:20 - 34:25
    by Google and by personal
    computer-based assistants, and so on.
  • 34:26 - 34:30
    So, the vision we kind
    of developed together
  • 34:30 - 34:33
    is that we want to have a knowledge base
  • 34:34 - 34:36
    for many stakeholders at once.
  • 34:36 - 34:40
    So we looked at the entire
    performing arts value network,
  • 34:40 - 34:42
    we identified key stakeholders in there,
  • 34:42 - 34:47
    we looked at the usage scenarios
    that we like to pursue,
  • 34:48 - 34:52
    and we kind of mapped it
    to the whole architecture
  • 34:52 - 34:57
    of such a knowledge base,
    or of the different platforms in there,
  • 34:57 - 35:00
    which, obviously,
    is a distributed architecture,
  • 35:00 - 35:01
    and not one big monolith.
  • 35:02 - 35:06
    I'm just going to run
    through that quite quickly
  • 35:06 - 35:08
    because we have ten minutes each.
  • 35:09 - 35:14
    But I think we'll have plenty of time
    tonight or tomorrow to deepen that
  • 35:14 - 35:16
    if anybody's interested in the details.
  • 35:16 - 35:19
    So, we started from
    that Performing Arts Value Network,
  • 35:19 - 35:23
    which, interestingly,
    was just published last year.
  • 35:23 - 35:28
    So, we're lucky to be able
    to build on previous work,
  • 35:28 - 35:31
    like you have the primary value chain
    of the performing arts in the middle,
  • 35:31 - 35:34
    and various stakeholders around that.
  • 35:34 - 35:37
    All in all, we identified
    20 stakeholder groups,
  • 35:37 - 35:43
    which then we kind of boiled down
    into seven larger categories
  • 35:43 - 35:45
    for each of the stakeholder groups.
  • 35:45 - 35:52
    We kind of formulated what kind of needs
  • 35:52 - 35:55
    they would have in terms
    of such an infrastructure,
  • 35:55 - 35:59
    and what would they be able to achieve
    if the whole thing was interlinked
  • 35:59 - 36:02
    and the data was publicly accessible.
  • 36:03 - 36:05
    And so, you can see the types here,
  • 36:05 - 36:09
    the different types is Production,
    then Presention & Promotion,
  • 36:09 - 36:12
    Coverage & Reuse, Live Audiences,
  • 36:12 - 36:14
    Online Consumption, Heritage,
  • 36:14 - 36:16
    Research & Education.
  • 36:16 - 36:19
    And after kind of setting up a big table,
  • 36:19 - 36:21
    of which you can see
    just the first part here,
  • 36:21 - 36:25
    we kind of compared [over there],
    had a look at which type of data
  • 36:25 - 36:27
    were actually used across the board
  • 36:27 - 36:31
    by all different groups of stakeholders.
  • 36:31 - 36:37
    And there's quite a large basis of data
    that is common to all of them,
  • 36:37 - 36:38
    and that is really is the area
  • 36:38 - 36:43
    where it makes a lot of sense, actually,
    to cooperate and to keep that--
  • 36:43 - 36:46
    to maintain the data together.
  • 36:48 - 36:51
    So, when talking about
    platform architecture,
  • 36:51 - 36:54
    you can see that we have four layers here.
  • 36:54 - 36:56
    At the bottom, display the data layer.
  • 36:56 - 36:59
    Of course, Wikidata plays a part in it,
  • 36:59 - 37:03
    but also a lot of other databases,
    distributed databases
  • 37:03 - 37:08
    that can expose data
    through SPARQL endpoints.
  • 37:09 - 37:13
    The yellow part in the middle,
    that's the semantic layer.
  • 37:13 - 37:16
    It's our common language
    to describe our things,
  • 37:16 - 37:22
    to make statements about things
    around the performing arts, the ontology.
  • 37:22 - 37:25
    Then we have an application layer
  • 37:25 - 37:31
    that consists of various modules,
    for example, data analysis,
  • 37:31 - 37:35
    data extraction-- so, how do you
    actually get unstructured data
  • 37:35 - 37:36
    into structured data--
  • 37:36 - 37:39
    how can we support that by tools.
  • 37:39 - 37:42
    Then, obviously, there's
    a visualization of data--
  • 37:42 - 37:47
    so if there are large quantities of data,
    you want to visualize it in some way.
  • 37:48 - 37:50
    And on the top, you have
    the presentation layer,
  • 37:50 - 37:55
    that's what the ordinary people
    are actually interacting with
  • 37:55 - 37:56
    on a daily basis--
  • 37:56 - 38:00
    search engines, encyclopedias,
    cultural agendas,
  • 38:00 - 38:02
    and a variety of other services.
  • 38:03 - 38:05
    We're not starting from scratch.
  • 38:05 - 38:09
    Some work has already
    been done in this area.
  • 38:09 - 38:13
    I'll just cite a few examples
    from a project
  • 38:13 - 38:15
    which I have been involved in.
  • 38:15 - 38:18
    Some other stuff going on as well.
  • 38:18 - 38:21
    And so, I started in this area
  • 38:21 - 38:24
    with the Swiss Archive
    of the Performing Arts.
  • 38:25 - 38:28
    [Until] building a Swiss
    Performing Arts database,
  • 38:28 - 38:31
    we created the performing arts ontology,
  • 38:31 - 38:34
    that's currently being
    implemented into RDF.
  • 38:35 - 38:40
    And there we have the database
    of like 60, 70 years
  • 38:40 - 38:43
    of performance history in Switzerland.
  • 38:43 - 38:45
    So, that's something that can build on,
  • 38:45 - 38:49
    and that's something
    that's been transformed into RDF.
  • 38:50 - 38:55
    And there was a builder platform
    where this data can be accessed.
  • 38:56 - 39:02
    Then we have done
    several ingests into Wikidata,
  • 39:02 - 39:03
    partly from Switzerland,
  • 39:03 - 39:09
    partly also from
    the performance arts institutes,
  • 39:10 - 39:12
    for example, Bart Magnus
    was involved in that.
  • 39:13 - 39:15
    He was the driving force behind that.
  • 39:15 - 39:17
    There's also stuff from Wikimedia Commons,
  • 39:17 - 39:21
    but not very well interlinked
    with all the rest of our metadata.
  • 39:21 - 39:25
    And obviously, by doing this ingest,
  • 39:25 - 39:29
    we also kind of started to implement
    parts of this Swiss data model
  • 39:29 - 39:31
    into Wikidata.
  • 39:33 - 39:38
    Then one of the Canadian
    implementation partners
  • 39:38 - 39:39
    is Culture Creates.
  • 39:39 - 39:44
    They're running a platform that actually
    scrapes information from theater websites,
  • 39:44 - 39:47
    and inputs it into a knowledge graph,
  • 39:48 - 39:54
    to then expose it to search engines
    and other search devices.
  • 39:56 - 40:03
    And there again, we kind of had
    to implement and extend this in ontology.
  • 40:03 - 40:08
    And as you can see from the slide,
    is that there's so many empty spaces,
  • 40:08 - 40:10
    but there's also some overlap,
  • 40:10 - 40:13
    and an important overlap, obviously,
    is the common shared language,
  • 40:13 - 40:19
    which will help us actually interlink
    the various data sets.
  • 40:21 - 40:23
    What is also important, obviously,
  • 40:23 - 40:26
    is that we're using the same
    base registers and authority files.
  • 40:26 - 40:31
    And this is a place where Wikidata
    plays an important role
  • 40:31 - 40:34
    by kind of interlinking these.
  • 40:35 - 40:38
    Now, I'd like to share the recommendations
  • 40:38 - 40:42
    by the Linked Data Future Initiatives
    Advisory Committee.
  • 40:43 - 40:45
    At least the two first recommendations.
  • 40:45 - 40:48
    So, for the Canadians,
    now it's absolutely crucial
  • 40:48 - 40:53
    to kind of fill in their own Canadian
    performing arts knowledge graph,
  • 40:53 - 40:56
    because unlike the Swiss Archive
    of the Performing Arts,
  • 40:56 - 40:59
    they're not starting
    with an already existing database,
  • 40:59 - 41:02
    but they're kind of
    creating it from scratch.
  • 41:02 - 41:04
    And it's absolutely crucial
    to have data in there.
  • 41:04 - 41:09
    And second, as you can see,
    comes in already Wikidata.
  • 41:09 - 41:12
    Wikidata, by the Advisory Committee,
  • 41:12 - 41:18
    has been seen as complementary
    to Artsdata.ca, this knowledge graph,
  • 41:18 - 41:21
    and, therefore, efforts should
    be undertaken to contribute
  • 41:21 - 41:25
    to its population
    with performing arts-related data.
  • 41:26 - 41:31
    And that's where we're going to work on
    over the coming months and years,
  • 41:31 - 41:35
    and that's also why
    I'm kind of on the lookout here
  • 41:35 - 41:39
    to see who else will join that effort.
  • 41:41 - 41:45
    So, right now, obviously,
    we're saying they're complementary.
  • 41:45 - 41:48
    So, we have to think about whether
    the pluses and the minuses
  • 41:48 - 41:50
    of each of the approaches.
  • 41:50 - 41:52
    And you can see here a comparison
  • 41:52 - 41:56
    between Wikidata and the Classical
    Linked Open Data approach.
  • 41:57 - 42:00
    I would be happy to discuss
    that further with you guys,
  • 42:00 - 42:03
    how your experiences are in there.
  • 42:03 - 42:08
    But, as I see it, Wikidata is a huge plus
    because it's a crowdsourcing platform,
  • 42:08 - 42:12
    and it's easy to invite further parties
    to actually contribute.
  • 42:12 - 42:17
    On the negative side, obviously,
    you get this problem of loss of control.
  • 42:18 - 42:23
    Data owners have to give up control
    over their graphs, data quality,
  • 42:23 - 42:24
    and completeness.
  • 42:27 - 42:31
    It's harder to track on Wikidata
    than if you have it under your control.
  • 42:31 - 42:34
    And the other strength of Wikidata
  • 42:34 - 42:40
    is that it requires immediate integration
    into that worldwide graph.
  • 42:40 - 42:42
    And you kind of just do it--
  • 42:43 - 42:47
    kind of reconcile step by step
    against other databases,
  • 42:47 - 42:50
    which may also be seen by some
    as an advantage,
  • 42:50 - 42:54
    but of course, if you're looking
    for integration and interoperability,
  • 42:54 - 42:57
    Wikidata forces you to go for that
    from the beginning.
  • 42:59 - 43:03
    And then, obviously, harmonizing
    data modeling practices
  • 43:03 - 43:06
    is an issue in both cases.
  • 43:06 - 43:11
    But it may seem, at the beginning,
    easier to do with just in your own silo,
  • 43:11 - 43:13
    because at some point,
    you're done with the task,
  • 43:13 - 43:17
    and it would be
    an ongoing task on Wikidata.
  • 43:18 - 43:23
    So, when it now comes to prioritizing
    the data to be ingested,
  • 43:24 - 43:28
    that's like the rules
    I kind of go by at the moment.
  • 43:30 - 43:32
    First of all, we'd like to ingest it
  • 43:32 - 43:36
    where it's unclear who would be
    the natural authority in the given area.
  • 43:36 - 43:40
    So that's definitely data
    that will be managed in a shared manner.
  • 43:41 - 43:44
    And we'd like to ingest it where we see
  • 43:44 - 43:47
    a high potential
    for crowdsourcing approaches.
  • 43:47 - 43:52
    We'd like to ingest data where the data
    is likely to be reused
  • 43:52 - 43:54
    in the context of Wikipedia.
  • 43:55 - 44:00
    And there's also hope that some part
    of the international coordination
  • 44:00 - 44:04
    around the whole data modeling,
    about the standardization,
  • 44:04 - 44:08
    they could actually take place
    directly on Wikidata,
  • 44:08 - 44:09
    if it's not taking place elsewhere,
  • 44:09 - 44:12
    because it kind of forces people
    to start interacting
  • 44:12 - 44:15
    if they ingest data in the same part.
  • 44:16 - 44:22
    And we'd like to focus now next
    on base registers and authority files
  • 44:22 - 44:26
    because they kind of help us
    create the linkages
  • 44:26 - 44:29
    between different data
    and uncontrolled vocabularies
  • 44:29 - 44:33
    as an extension of the existing ontology.
  • 44:34 - 44:36
    So, just two more slides.
  • 44:36 - 44:41
    The next steps will be that we're taking
    the sum of all GLAMs approach
  • 44:41 - 44:43
    to Wiki Loves Performing Arts.
  • 44:43 - 44:48
    That means we're describing
    venues and organizations,
  • 44:48 - 44:51
    and try to push the data to Wikipedia
  • 44:51 - 44:54
    in forms of infoboxes
    and [bubble] templates.
  • 44:54 - 45:00
    And the other one, the other projects
    I'm going to pursue is COST Action
  • 45:00 - 45:02
    that we'll submit next year
  • 45:03 - 45:06
    around that Linked Open Data Ecosystem
    for the Performing Arts.
  • 45:06 - 45:10
    COST is a European program
    that supports networking activities,
  • 45:10 - 45:14
    and the topics to be covered
    are listed here.
  • 45:14 - 45:16
    Two of them, I have highlighted--
  • 45:16 - 45:21
    one of them is like the question
    of federation between Wikidata
  • 45:21 - 45:24
    and the classical linked
    open data approaches.
  • 45:24 - 45:28
    And the other one, I think,
    is very important also,
  • 45:28 - 45:31
    where we have a huge potential still,
  • 45:31 - 45:36
    is implementing international campaigns
    to supplement data on Wikidata.
  • 45:38 - 45:41
    So, that's it. Thank you
    for your attention.
  • 45:41 - 45:46
    Now, I would like to ask
    my colleagues up here.
  • 45:47 - 45:51
    To the panel, maybe you'll get them
    microphones as well.
  • 45:54 - 45:56
    And then I would like to...
  • 45:57 - 46:00
    give you the chance to ask questions.
  • 46:01 - 46:05
    And obviously, also ask my colleagues
  • 46:06 - 46:08
    whether they have questions to each other.
  • 46:12 - 46:15
    So, do we have maybe a question
    from the audience?
  • 46:21 - 46:23
    (man) [inaudible]
  • 46:24 - 46:27
    I would like to ask from each of you
  • 46:27 - 46:31
    where would you draw the line,
  • 46:31 - 46:33
    basically, how you define--
  • 46:33 - 46:36
    when do you need to run your own Wikibase,
  • 46:36 - 46:39
    and what do you want to put on Wikidata?
  • 46:39 - 46:44
    Like, is this a clear delineation
    of what is seen
  • 46:44 - 46:46
    behind of putting it [into order.]
  • 46:48 - 46:51
    I can answer first because I have the mic.
  • 46:51 - 46:57
    So, I've been thinking
    that one of the issues is notability.
  • 46:59 - 47:02
    I'm addressing that
    in a different project.
  • 47:02 - 47:06
    And I think licensing could be one,
  • 47:06 - 47:10
    because you can apply your own terms
    in your own database,
  • 47:10 - 47:14
    and then I think wherever it's possible.
  • 47:14 - 47:20
    And then, the third one
    is just to have it as a sandbox,
  • 47:20 - 47:23
    prepare it for ingestion into Wikidata.
  • 47:23 - 47:26
    These are the three main things
    that I come up with now,
  • 47:26 - 47:29
    but I can come up with more.
  • 47:30 - 47:32
    For me, rights are always
    going to be an issue.
  • 47:32 - 47:37
    So, if the National Library
    wanted to move towards Wikibase,
  • 47:37 - 47:40
    that would enable them to continue
    to control the licensing
  • 47:40 - 47:43
    for the work they've done
    with Maori language terms.
  • 47:43 - 47:46
    The kakapo database only contains data
  • 47:46 - 47:50
    that the Department of Conservation
    felt could be made public,
  • 47:50 - 47:53
    but I suspect if they see it
    up and running,
  • 47:53 - 47:56
    they might be tempted
    to use a private Wikibase
  • 47:56 - 47:58
    to maintain their own database,
  • 47:58 - 48:01
    simply because of some
    of the visualization tools
  • 48:01 - 48:04
    that could be applied might be better
  • 48:04 - 48:07
    than the sort of Excel spreadsheet system
    that they currently run.
  • 48:12 - 48:17
    Well, I think this very much depends
    on the kind of data.
  • 48:18 - 48:22
    We are, with the Press Archive, of course,
    in a quite lucky position,
  • 48:22 - 48:27
    in that this was material
    which was published,
  • 48:27 - 48:30
    it was published at the time,
  • 48:30 - 48:32
    but it was expensive to publish.
  • 48:33 - 48:36
    So, this is quite easy.
  • 48:36 - 48:39
    I think, also, projects--
  • 48:40 - 48:42
    and this is a typical project,
  • 48:42 - 48:46
    so it was funded for some time,
    and then funding ended,
  • 48:46 - 48:52
    and what happens with the data
    which is enclosed in some silo,
  • 48:52 - 48:55
    and some software
    which will not run forever.
  • 48:56 - 48:59
    And so, it makes
    absolute sense in my eyes.
  • 49:00 - 49:03
    At the time, Wikidata
    wasn't around, but now it is,
  • 49:03 - 49:07
    and it makes absolute sense
    for our project to early on
  • 49:07 - 49:13
    discuss sustainability in the context
    of how could we put this
  • 49:13 - 49:17
    into a larger ecosystem like Wikidata,
  • 49:19 - 49:21
    and discuss this with the data community
  • 49:21 - 49:27
    what is notable and what makes sense
    to add this to Wikidata,
  • 49:27 - 49:32
    and what makes sense to keep this
    as a proprietary form.
  • 49:32 - 49:38
    Maybe in a more simple form
    than sophisticated application,
  • 49:38 - 49:43
    but make it discoverable
    and make it linked to the large data cloud
  • 49:43 - 49:46
    instead of investing lots of money
  • 49:46 - 49:53
    into some silo which will not sustain.
  • 49:55 - 50:00
    Yeah, as I said before
    in the project I was presenting here,
  • 50:00 - 50:05
    are dualities between Wikidata
    and classical linked open data approaches.
  • 50:05 - 50:08
    So, it's not so much about
    setting up a private Wikibase.
  • 50:11 - 50:15
    Like one challenge we have had,
    and, of course, in Wikidata,
  • 50:15 - 50:18
    is that when we ingest
    your own data there,
  • 50:18 - 50:20
    you also have to do some housekeeping
  • 50:21 - 50:24
    of people, of other people, actually.
  • 50:24 - 50:28
    And they can put off people,
    [or it also means] that we will address it
  • 50:28 - 50:30
    just step by step.
  • 50:30 - 50:33
    So, there will be, at the moment,
    a database living--
  • 50:34 - 50:36
    in classical linked open data
  • 50:36 - 50:38
    and we're starting to linking it
    with Wikidata,
  • 50:38 - 50:41
    and it's a continuous process to find out
  • 50:42 - 50:48
    for which areas the most data
    will be eventually on Wikidata,
  • 50:48 - 50:52
    and for which areas it will actually
    live on other databases.
  • 50:53 - 50:57
    Obviously, we'll have challenges
    regarding synchronization,
  • 50:57 - 50:59
    as we probably all have,
  • 50:59 - 51:02
    because that linked data field,
  • 51:02 - 51:05
    where we still have
    to negotiate who we trust,
  • 51:05 - 51:09
    who has authority about what.
  • 51:14 - 51:16
    (assistant) Other questions?
  • 51:24 - 51:26
    (woman) Thank you.
  • 51:26 - 51:31
    So, fully agree with that issue of--
  • 51:34 - 51:41
    where to put the boundary
    between why do we put data on Wikidata,
  • 51:43 - 51:49
    or why do we keep them,
    and create, manage, and maintain them
  • 51:49 - 51:53
    in local databases and for what purposes.
  • 51:54 - 51:57
    And I think that
    this is a large discussion
  • 51:57 - 52:02
    that goes beyond just the excitement
  • 52:02 - 52:07
    of putting data on Wikidata
    because it is public,
  • 52:07 - 52:11
    because it serves humanity, because--
  • 52:11 - 52:13
    while there are two cool tools,
  • 52:13 - 52:18
    and things are more complicated
    in real life, I think.
  • 52:19 - 52:24
    Well, despite this,
    it's quite an interesting discussion.
  • 52:24 - 52:30
    And then this is another issue, also,
    or another problem that is being discussed
  • 52:30 - 52:35
    in this event in different panels.
  • 52:36 - 52:41
    It is on one side, have your own database,
  • 52:41 - 52:43
    whatever the technology is
  • 52:43 - 52:47
    and publish things on Wikidata,
  • 52:47 - 52:51
    or build your own system
  • 52:51 - 52:55
    of creating and managing information
  • 52:55 - 52:58
    on the Wikibase technology.
  • 52:59 - 53:04
    And then, synchronize or whatever--
    do federation or things,
  • 53:04 - 53:08
    so it's a matter
    of technology that is used,
  • 53:09 - 53:15
    and the fact that you use Wikidata
    just for publishing,
  • 53:15 - 53:19
    or the infrastructure
    that is underneath Wikidata
  • 53:19 - 53:23
    to create and manage your data.
  • 53:27 - 53:31
    I mean, we had a discussion
  • 53:31 - 53:34
    about the Wikibase panel,
  • 53:34 - 53:37
    and there will be other discussions here,
  • 53:37 - 53:41
    but things are
    on different levels, I think.
  • 53:42 - 53:48
    Maybe [you sort of get] to that discussion
    about Wikibase or Wikidata--
  • 53:49 - 53:52
    I think it's problematic
    that we are focusing so much
  • 53:52 - 53:56
    on this Wikibase infrastructure,
    because there are other infrastructures,
  • 53:56 - 53:59
    like in the area of performing arts.
  • 54:00 - 54:04
    We have another complementary community,
    which is MusicBrainz
  • 54:04 - 54:09
    that runs on their own platform
    that provides linked open data,
  • 54:10 - 54:13
    and as I understand it,
  • 54:14 - 54:17
    there's agreement
    within the Wikidata community
  • 54:17 - 54:20
    that we're not going
    to double all their data--
  • 54:20 - 54:24
    we're not going to copy all their data,
    but we accept that they're complementary.
  • 54:25 - 54:30
    So, what will happen when you start
    integrating this data in Wikipedia?
  • 54:30 - 54:32
    Infoboxes, for example.
  • 54:32 - 54:36
    Would we be able to pull that data
    directly from their SPARQL endpoint?
  • 54:37 - 54:40
    Or would we be obliged
    to kind of copy all the data,
  • 54:40 - 54:42
    and what kind of processes
    are involved in that?
  • 54:42 - 54:45
    (woman) Discussions are open, I think,
  • 54:45 - 54:50
    because within this event,
    you have both interested communities--
  • 54:50 - 54:52
    those that are interested in Wikibase,
  • 54:52 - 54:54
    and those that are interested in Wikidata,
  • 54:54 - 54:56
    and those who are interested in both.
  • 54:56 - 55:00
    Yeah, but we're not going
    to oblige them to move to Wikibase.
  • 55:00 - 55:03
    - (woman) Not necessarily.
    - MusicBrainz is not running on Wikibase.
  • 55:03 - 55:07
    (woman) No, I just wanted to say
    that you have separate problems,
  • 55:07 - 55:11
    sometimes interrelated,
    sometimes not completely separated.
  • 55:12 - 55:17
    And I had another question or remark
  • 55:17 - 55:22
    regarding the management of hierarchies
    in controlled vocabularies,
  • 55:22 - 55:26
    like thesaurus, like you in Finto.
  • 55:28 - 55:31
    You do have the places
  • 55:32 - 55:35
    in the Maori
  • 55:36 - 55:41
    Subject Headings,
  • 55:42 - 55:48
    Well, they have to deal with
    the management of concepts in hierarchy.
  • 55:48 - 55:52
    What is your take, your opinion
  • 55:52 - 55:57
    about the possibility
    of managing this controlled
  • 55:59 - 56:02
    knowledge organization
    systems in Wikidata?
  • 56:07 - 56:10
    I think in the case
    of Finto and YSO places,
  • 56:11 - 56:14
    the repository will be a collection
  • 56:14 - 56:19
    of several sources, eventually.
  • 56:19 - 56:22
    So, it is in flux, anyway.
  • 56:22 - 56:25
    So, we don't have to necessarily--
  • 56:25 - 56:28
    well, I don't represent
    the National Library,
  • 56:28 - 56:32
    but in that possible project,
  • 56:32 - 56:36
    we would not have
    to maintain an existing--
  • 56:36 - 56:39
    or fight with an existing structure.
  • 56:39 - 56:45
    So, in that sense, it is an area
    open for exploration.
  • 56:49 - 56:52
    The Maori Subject Headings
    seems to lend themselves ideally
  • 56:52 - 56:54
    to Wikidata structure,
  • 56:54 - 56:57
    but the licensing,
    of course, forbids that.
  • 56:57 - 56:59
    I suspect that if the licensing
    were different
  • 56:59 - 57:02
    and they were put into Wikidata,
  • 57:02 - 57:05
    as soon as somebody decided
    they didn't like the hierarchy
  • 57:05 - 57:06
    and started to change things,
  • 57:06 - 57:10
    there would be an immediate outcry
    from people who worked very hard
  • 57:10 - 57:12
    to create that structure
  • 57:12 - 57:16
    and get the sign-off
    from various different Maori
  • 57:16 - 57:18
    that was the current hierarchy.
  • 57:18 - 57:21
    So, that's an issue to try and resolve.
  • 57:24 - 57:27
    I think in terms of knowledge
    organization systems,
  • 57:27 - 57:28
    they are all different.
  • 57:28 - 57:32
    And I'm not sure
    if it would be a good idea
  • 57:32 - 57:37
    to represent different hierarchies
    in Wikidata as such,
  • 57:38 - 57:42
    but it maybe makes sense
    to think about overlays
  • 57:43 - 57:45
    of the data.
  • 57:45 - 57:48
    So, to do mappings on the content level.
  • 57:49 - 57:54
    For example, as ZBW partnership
    Thesaurus for Economics.
  • 57:55 - 57:59
    And this thesaurus has its own hierarchy,
  • 58:00 - 58:04
    and, of course, it would be possible
    to project the hierarchy
  • 58:04 - 58:08
    of this thesaurus into Wikidata concepts
  • 58:08 - 58:12
    without actually storing
    this kind of structure
  • 58:12 - 58:15
    as an alternative structure
    within Wikidata
  • 58:15 - 58:19
    which would make a lot of confusion.
  • 58:19 - 58:25
    But I think we should think
    of Wikidata, also, as a pool of concepts
  • 58:25 - 58:30
    which can be connected on layers
    which are outside,
  • 58:30 - 58:33
    and which give another view of the world
  • 58:33 - 58:39
    which is not necessarily to be
    within Wikidata.
  • 58:46 - 58:48
    (assistant) Alright. Some other questions?
  • 58:49 - 58:52
    Otherwise-- okay.
  • 58:55 - 58:58
    (man 2) Joachim, I just wanted
    to follow up on that last point.
  • 58:58 - 59:01
    So, these layers, as you picture it,
  • 59:02 - 59:04
    they would be maintained externally
  • 59:04 - 59:07
    and somehow integrated
  • 59:09 - 59:12
    with Wikidata from the Wikidata side,
  • 59:12 - 59:17
    or have you thought a bit further
  • 59:17 - 59:19
    about how that might be managed?
  • 59:22 - 59:25
    Actually, no, I have no--
  • 59:25 - 59:30
    I have done experiments
    with ZBW and Wikidata.
  • 59:31 - 59:33
    I was [inaudible] here at Wikidata.
  • 59:33 - 59:39
    But I think this is
    a whole new complex thing,
  • 59:39 - 59:46
    and so, it's up to [discuss],
    [to give up a lot of control]
  • 59:46 - 59:48
    to do such things.
  • 59:48 - 59:50
    But it has to be figured out.
  • 59:57 - 59:58
    Should we take one more?
  • 59:58 - 60:00
    (man 3) Ah, great.
  • 60:00 - 60:03
    I was just wondering
    about the kakapo project.
  • 60:04 - 60:05
    Uh-hmm.
  • 60:05 - 60:11
    (man 3) Okay. So, did you get
    any pushback from the Wikidata community
  • 60:11 - 60:15
    about having individual animals
    out of those items?
  • 60:16 - 60:17
    Not so far.
  • 60:17 - 60:19
    (man 3) Has anyone heard
    about this before?
  • 60:19 - 60:22
    Is it "not so far" because
    no one has heard about it yet?
  • 60:23 - 60:26
    There's been a small discussion
    for quite some time now--
  • 60:26 - 60:29
    those people interested
    in this sort of thing in Wikidata,
  • 60:29 - 60:32
    and we all seem to think
    that it's a natural extension
  • 60:32 - 60:36
    of getting individual Wikidata items
    to a famous racehorse
  • 60:36 - 60:40
    or someone's cat, which--
    that's modeled pretty well.
  • 60:40 - 60:44
    I guess just the audacious thing
    is putting the entire species in there.
  • 60:44 - 60:48
    But I think it's perfectly manageable.
  • 60:48 - 60:50
    (man 3) Don't try it with cats and dogs.
  • 60:50 - 60:52
    (laughter)
  • 60:52 - 60:54
    (assistant) Okay. I think
    the time is finished.
  • 60:54 - 60:56
    Thank you very much for attending.
  • 60:56 - 60:59
    I think the speakers will be still open
    for the questions and a break.
  • 60:59 - 61:01
    And have fun.
  • 61:01 - 61:02
    Thank you very much.
  • 61:02 - 61:04
    (applause)
Title:
cdn.media.ccc.de/.../wikidatacon2019-1035-eng-Data_import_process_overview_hd.mp4
Video Language:
English
Duration:
54:29

English subtitles

Revisions