Return to Video

cdn.media.ccc.de/.../wikidatacon2019-8-eng-Libraries_panel_hd.mp4

  • 0:07 - 0:12
    I work as a teacher
    at the University of Alicante,
  • 0:12 - 0:17
    where I recently obtained my PhD
    on data libraries and linked open data.
  • 0:17 - 0:19
    And I'm also a software developer
  • 0:19 - 0:22
    at the Biblioteca Virtual
    Miguel de Cervantes.
  • 0:22 - 0:24
    And today, I'm going to talk
    about data quality.
  • 0:28 - 0:32
    Well, those are my colleagues
    at the university.
  • 0:32 - 0:37
    And as you may know, many organizations
    are publishing their data
  • 0:37 - 0:38
    or linked open data--
  • 0:38 - 0:41
    for example,
    the National Library of France,
  • 0:41 - 0:46
    the National Library of Spain,
    us, which is Cervantes Virtual,
  • 0:46 - 0:49
    the British National Bibliography,
  • 0:49 - 0:52
    the Library of Congress and Europeana.
  • 0:52 - 0:56
    All of them provide a SPARQL endpoint,
  • 0:56 - 0:59
    which is useful in order
    to retrieve the data.
  • 0:59 - 1:01
    And if I'm not wrong,
  • 1:01 - 1:06
    the Library of Congress only provide
    the data as a dump that you can't use.
  • 1:08 - 1:14
    When we publish our repository
    as linked open data,
  • 1:14 - 1:17
    my idea was to be reused
    by other institutions.
  • 1:18 - 1:24
    But what about if I'm an institution
    who wants to enrich their data
  • 1:24 - 1:27
    with any data from other data libraries.
  • 1:28 - 1:31
    Which data set should I use?
  • 1:31 - 1:34
    Which data set is better
    in terms of quality?
  • 1:37 - 1:41
    The benefits of the evaluation
    of data quality in libraries are many.
  • 1:41 - 1:47
    For example, methodologies can be improved
    in order to include new criteria,
  • 1:47 - 1:49
    in order to assess the quality.
  • 1:49 - 1:55
    And also, organizations can benefit
    from best practices and guidelines
  • 1:55 - 1:58
    in order to publish their data
    as linked open data.
  • 2:00 - 2:03
    What do we need
    in order to assess the quality?
  • 2:03 - 2:07
    Well, obviously, a set of candidates
    and a set of features.
  • 2:07 - 2:10
    For example, do they have
    a SPARQL endpoint,
  • 2:10 - 2:13
    do they have a web interface,
    how many publications do they have,
  • 2:13 - 2:18
    how many vocabularies do they use,
    how many Wikidata properties do they have,
  • 2:18 - 2:21
    and where can I get those candidates?
  • 2:21 - 2:22
    I use LOD Cloud--
  • 2:22 - 2:27
    but when I was doing this slide,
    I thought about using Wikidata
  • 2:28 - 2:30
    in order to retrieve those candidates.
  • 2:30 - 2:34
    For example, getting entities
    of type data library,
  • 2:34 - 2:36
    which has a SPARQL endpoint.
  • 2:36 - 2:39
    You have here the link.
  • 2:41 - 2:45
    And I come up with those data libraries.
  • 2:45 - 2:50
    The first one uses bibliographic ontology
    as main vocabulary,
  • 2:50 - 2:54
    and the others are based,
    more or less, on FRBR,
  • 2:54 - 2:57
    which is a vocabulary published by IFLA.
  • 2:57 - 3:00
    And this is just an example
    of how we could compare
  • 3:00 - 3:04
    data libraries using
    bubble charts on Wikidata.
  • 3:04 - 3:09
    And this is just an example comparing
    how many Wikidata properties
  • 3:09 - 3:11
    are per data library.
  • 3:13 - 3:16
    Well, how can we measure quality?
  • 3:16 - 3:18
    There are different methodologies,
  • 3:18 - 3:20
    for example, FRBR 1,
  • 3:20 - 3:24
    which provides a set of criteria
    grouped by dimensions,
  • 3:24 - 3:28
    and those in green
    are the ones that I found--
  • 3:28 - 3:31
    that I could assess by means of Wikidata.
  • 3:34 - 3:39
    And we also find that we
    could define new criteria,
  • 3:39 - 3:45
    for example, a new one to evaluate
    the number of duplications in Wikidata.
  • 3:45 - 3:47
    We use those properties.
  • 3:47 - 3:50
    And this is an example of SPARQL,
  • 3:50 - 3:54
    in order to count the number
    of duplicates property.
  • 3:57 - 4:00
    And about the results,
    while at the moment of doing this study,
  • 4:00 - 4:05
    not the slides, there was no property
    for the British National Bibliography.
  • 4:06 - 4:08
    They don't provide provenance information,
  • 4:08 - 4:12
    which could be useful
    for metadata enrichment.
  • 4:12 - 4:15
    And they don't allow
    to edit the information.
  • 4:15 - 4:17
    So, we've been talking
    about Wikibase the whole weekend,
  • 4:17 - 4:21
    and maybe we should try to adopt
    Wikibase as an interface.
  • 4:23 - 4:25
    And they are focused on their own content,
  • 4:25 - 4:29
    and this is just the SPARQL query
    based on Wikidata
  • 4:29 - 4:31
    in order to assess the population.
  • 4:32 - 4:36
    And the BnF provides labels
    in multiple languages,
  • 4:36 - 4:39
    and they all use self-describing URIs,
  • 4:39 - 4:43
    which is that in the URI,
    they have the type of entity,
  • 4:43 - 4:48
    which allows the human reader
    to understand what they are using.
  • 4:51 - 4:55
    And more results, they provide
    different output format,
  • 4:55 - 4:59
    they use external vocabularies.
  • 4:59 - 5:01
    Only the British National Bibliography
  • 5:01 - 5:04
    provides machine-readable
    licensing information.
  • 5:04 - 5:09
    And up to one-third of the instances
    are connected to external repositories,
  • 5:09 - 5:11
    which is really nice.
  • 5:13 - 5:18
    And while this study, this work
    has been done in our Labs team,
  • 5:18 - 5:22
    a lab in a GLAM is a group of people
  • 5:22 - 5:28
    who want to explore new ways
  • 5:28 - 5:30
    of reusing data collections.
  • 5:31 - 5:35
    And there's a community
    led by the British Library,
  • 5:35 - 5:37
    and in particular, Mahendra Mahey,
  • 5:37 - 5:41
    and we had a first event in London,
  • 5:41 - 5:43
    and another one in Copenhagen,
  • 5:43 - 5:45
    and we're going to have a new one in May
  • 5:45 - 5:48
    at the Library of Congress in Washington.
  • 5:49 - 5:52
    And we are now 250 people.
  • 5:52 - 5:56
    And I'm so glad that I found
    somebody here at the WikidataCon
  • 5:56 - 5:59
    who has just joined us--
  • 5:59 - 6:01
    Sylvia from [inaudible], Mexico.
  • 6:01 - 6:05
    And I'd like to invite you
    to our community,
  • 6:05 - 6:10
    since you may be part
    of a GLAM institution.
  • 6:11 - 6:13
    So, we can talk later
    if you want to know about this.
  • 6:15 - 6:17
    And this--it's all about people.
  • 6:17 - 6:20
    This is me, people
    from the British Library,
  • 6:20 - 6:25
    Library of Congress, Universities,
    and National Libraries in Europe
  • 6:25 - 6:28
    And there's a link here
    in case you want to know more.
  • 6:28 - 6:33
    And, well, last month,
    we decided to meet in Doha
  • 6:33 - 6:37
    in order to write a book
    about how to create a lab in our GLAM.
  • 6:39 - 6:43
    And they choose 15 people,
    and I was so lucky to be there.
  • 6:45 - 6:49
    And the book follows
    the Booksprint methodology,
  • 6:49 - 6:52
    which means that nothing
    is prepared beforehand.
  • 6:52 - 6:53
    All is done there in a week.
  • 6:53 - 6:56
    And believe me, it was really hard work
  • 6:56 - 6:59
    to have their whole book
    done in this week.
  • 7:00 - 7:04
    And I'd like to introduce you to the book,
    which will be published--
  • 7:04 - 7:06
    it was supposed to be published this week,
  • 7:06 - 7:08
    but it will be next week.
  • 7:09 - 7:13
    And it will be published open,
    so you can have it,
  • 7:13 - 7:16
    and I can show you
    a little bit later if you want.
  • 7:16 - 7:18
    And those are the authors.
  • 7:18 - 7:20
    I'm here-- I'm so happy, too.
  • 7:20 - 7:22
    And those are the institutions--
  • 7:22 - 7:27
    Library of Congress, British Library--
    and this is the title.
  • 7:27 - 7:30
    And now, I'd like to show you--
  • 7:31 - 7:34
    a map that I'm doing.
  • 7:34 - 7:37
    We are launching a website
    for our community,
  • 7:37 - 7:43
    and I'm in charge of creating a map
    with our institutions there.
  • 7:43 - 7:45
    This is not finished.
  • 7:45 - 7:50
    But this is just SPARQL, and below,
  • 7:52 - 7:53
    we see the map.
  • 7:53 - 7:58
    And we see here
    the new people that I found, here,
  • 7:58 - 8:00
    at the WikidataCon--
    I'm so happy for this.
  • 8:01 - 8:06
    And we have here my data library
    of my university,
  • 8:06 - 8:08
    and many other institutions.
  • 8:09 - 8:11
    Also, from Australia--
  • 8:12 - 8:13
    if I can do it.
  • 8:14 - 8:16
    Well, here, we have some links.
  • 8:20 - 8:21
    There you go.
  • 8:21 - 8:23
    Okay, this is not finished.
  • 8:24 - 8:26
    We are still working on this,
    and that's all.
  • 8:26 - 8:28
    Thank you very much for your attention.
  • 8:29 - 8:34
    (applause)
  • 8:42 - 8:48
    [inaudible]
  • 8:59 - 9:01
    Good morning, everybody.
  • 9:01 - 9:02
    I'm Olaf Janssen.
  • 9:02 - 9:04
    I'm the Wikimedia coordinator
  • 9:04 - 9:06
    at the National Library
    of the Netherlands.
  • 9:06 - 9:08
    And I would like to share my work,
  • 9:08 - 9:12
    which I'm doing about creating
    Linked Open Data
  • 9:12 - 9:15
    for Dutch Public Libraries using Wikidata.
  • 9:18 - 9:21
    And my story starts roughly a year ago
  • 9:21 - 9:25
    when I was at the GLAM Wiki conference
    in Tel Aviv, in Israel.
  • 9:25 - 9:28
    And there are two men
    with very similar shirts,
  • 9:28 - 9:31
    and equally similar hairdos, [Matt]...
  • 9:31 - 9:33
    (laughter)
  • 9:33 - 9:35
    And on the left, that's me.
  • 9:35 - 9:39
    And a year ago, I didn't have
    any practical knowledge and skills
  • 9:39 - 9:40
    about Wikidata.
  • 9:40 - 9:43
    I looked at Wikidata,
    and I looked at the items,
  • 9:43 - 9:45
    and I played with it.
  • 9:45 - 9:47
    But I wasn't able to make a SPARQL query
  • 9:47 - 9:50
    or to do data modeling
    with the right shape expression.
  • 9:51 - 9:53
    That's a year ago.
  • 9:53 - 9:57
    And on the lefthand side,
    that's Simon Cobb, user: Sic19.
  • 9:57 - 10:00
    And I was talking to him,
    because, just before,
  • 10:01 - 10:02
    he had given a presentation
  • 10:02 - 10:06
    about improving the coverage
    of public libraries in Wikidata.
  • 10:07 - 10:09
    And I was very inspired by his talk.
  • 10:10 - 10:13
    And basically, he was talking
    about adding basic data
  • 10:13 - 10:15
    about public libraries.
  • 10:15 - 10:19
    So, the name of the library, if available,
    the photo of the building,
  • 10:19 - 10:21
    the address data of the library,
  • 10:21 - 10:25
    the geo-coordinates
    latitude and longitude,
  • 10:25 - 10:26
    and some other things,
  • 10:26 - 10:29
    including with all source references.
  • 10:31 - 10:35
    And what I was very impressed
    about a year ago was this map.
  • 10:35 - 10:37
    This is a map about
    public libraries in the U.K.
  • 10:37 - 10:39
    with all the colors.
  • 10:39 - 10:43
    And you can see that all the libraries
    are layered by library organizations.
  • 10:43 - 10:46
    And when he showed this,
    I was really, "Wow, that's cool."
  • 10:47 - 10:49
    So, then, one minute later, I thought,
  • 10:49 - 10:53
    "Well, let's do it
    for the country for that one."
  • 10:53 - 10:55
    (laughter)
  • 10:57 - 10:59
    And something about public libraries
    in the Netherlands--
  • 10:59 - 11:03
    there are about 1,300 library
    branches in our country,
  • 11:03 - 11:07
    grouped into 160 library organizations.
  • 11:08 - 11:11
    And you might wonder why
    do I want to do this project?
  • 11:11 - 11:14
    Well, first of all, because
    for the common good, for society,
  • 11:14 - 11:17
    because I think using Wikidata,
  • 11:17 - 11:21
    and from there,
    creating Wikipedia articles,
  • 11:21 - 11:23
    and opening it up
    via the linked open data cloud--
  • 11:23 - 11:29
    it's improving visibility and reusability
    of public libraries in the Netherlands.
  • 11:30 - 11:32
    And my second goal was actually
    a more personal one,
  • 11:32 - 11:37
    because a year ago, I had this
    yearly evaluation with my manager,
  • 11:37 - 11:42
    and we decided it was a good idea
    that I got more practical skills
  • 11:42 - 11:46
    on linked open data, data modeling,
    and also on Wikidata.
  • 11:46 - 11:50
    And of course, I wanted to be able to make
    these kinds of maps myself.
  • 11:50 - 11:51
    (laughter)
  • 11:54 - 11:57
    Then you might wonder
    why do I want to do this?
  • 11:57 - 12:02
    Isn't there already enough basic
    library data out there in the Netherlands
  • 12:02 - 12:04
    to have a good coverage?
  • 12:06 - 12:08
    So, let me show you some of the websites
  • 12:08 - 12:13
    that are available to discover
    address and location information
  • 12:13 - 12:15
    about Dutch public libraries.
  • 12:15 - 12:18
    And the first one is this one--
    Gidsvoornederland.nl--
  • 12:18 - 12:21
    and that's the official
    public library inventory
  • 12:21 - 12:23
    maintained by my library,
    the National Library.
  • 12:24 - 12:29
    And you can look up addresses
    and geo-coordinates on that website.
  • 12:30 - 12:33
    Then there is this site,
    Bibliotheekinzicht--
  • 12:33 - 12:37
    this is also an official website
    maintained by my National Library.
  • 12:37 - 12:39
    And this is about
    public library statistics.
  • 12:41 - 12:44
    Then there is another one,
    debibliotheken.nl--
  • 12:44 - 12:46
    as you can see there is also
    address information
  • 12:46 - 12:50
    about library organizations,
    not about individual branches.
  • 12:52 - 12:55
    And there's even this one,
    which also has address information.
  • 12:57 - 12:59
    And of course, there's something
    like Google Maps,
  • 12:59 - 13:02
    which also has all the names
    and the locations and the addresses.
  • 13:03 - 13:06
    And this one, the International
    Library of Technology,
  • 13:06 - 13:10
    which has a worldwide
    inventory of libraries,
  • 13:10 - 13:11
    including the Netherlands.
  • 13:13 - 13:15
    And I even discovered there is a data set
  • 13:15 - 13:18
    you can buy for 50 euros or so
    to download it.
  • 13:18 - 13:21
    And there is also--seems to be
    I didn't download it,
  • 13:21 - 13:24
    but there seems to be address
    information available.
  • 13:24 - 13:30
    You might wonder is this kind of data
    good enough for the purposes I had?
  • 13:32 - 13:37
    So, this is my birthday list
    for my ideal public library data list.
  • 13:37 - 13:39
    And what's on my list?
  • 13:39 - 13:44
    First of all, the data I want to have
    must be up-to-date-ish--
  • 13:44 - 13:46
    it must be fairly up-to-date.
  • 13:46 - 13:49
    So, doesn't have to be real time,
  • 13:49 - 13:51
    but let's say, a couple
    of months, or half a year,
  • 13:53 - 13:57
    delayed with official publication,
    that's okay for my purposes.
  • 13:58 - 14:01
    And I want to have it both
    library branches
  • 14:01 - 14:03
    and the library organizations.
  • 14:04 - 14:08
    Then I want my data to be structured,
    because it has to be machine-readable.
  • 14:08 - 14:12
    It has to be in open file format,
    such as CSV or JSON or RDF.
  • 14:13 - 14:15
    It has to be linked
    to other resources preferably.
  • 14:16 - 14:22
    And the uses--the license on the data
    needs to be manifest public domain or CC0.
  • 14:24 - 14:26
    Then, I would like my data to have an API,
  • 14:27 - 14:31
    which must be public, free,
    and preferably also anonymous
  • 14:31 - 14:35
    so you don't have to use an API key,
    or you have to register an account.
  • 14:36 - 14:39
    And I also want to have
    a SPARQL interface.
  • 14:41 - 14:44
    So, now, these are all the sites
    I just showed you.
  • 14:44 - 14:46
    And I'm going to make a big grid.
  • 14:47 - 14:50
    And then, this is about
    the evaluation I did.
  • 14:51 - 14:54
    I'm not going into it,
    but there is no single column
  • 14:54 - 14:56
    which has all green check marks.
  • 14:56 - 14:58
    That's the important thing to take away.
  • 14:59 - 15:04
    And so, in summary, there was no
    linked public free linked open data
  • 15:04 - 15:09
    for Dutch public libraries available
    before I started my project.
  • 15:09 - 15:13
    So, this was the ideal motivation
    to actually work on it.
  • 15:15 - 15:17
    So, that's what I've been doing
    for a year now.
  • 15:18 - 15:23
    And I've been adding libraries bit by bit,
    organization by organization to Wikidata.
  • 15:23 - 15:26
    I created also a project website on it.
  • 15:27 - 15:30
    It's still rather messy,
    but it has all the information,
  • 15:30 - 15:33
    and I try to keep it
    as up-to-date as possible.
  • 15:33 - 15:36
    And also all the SPARQL queries
    you can see are linked from here.
  • 15:38 - 15:40
    And I'm just adding
    really basic information.
  • 15:40 - 15:44
    You see the instances,
    images if available,
  • 15:44 - 15:47
    addresses, locations, et cetera,
    municipalities.
  • 15:49 - 15:53
    And where possible, I also try to link
    the libraries to external identifiers.
  • 15:56 - 15:58
    And then, you can really easily--
    we all know,
  • 15:58 - 16:03
    generating some Listeria lists
    with public libraries grouped
  • 16:03 - 16:05
    by organizations, for instance.
  • 16:05 - 16:08
    Or using SPARQL queries,
    you can also do aggregation on data--
  • 16:08 - 16:11
    let's say, give me all
    the municipalities in the Netherlands
  • 16:11 - 16:15
    and the number of library branches
    in all the municipalities.
  • 16:17 - 16:20
    With one click, you can make
    these kinds of photo galleries.
  • 16:22 - 16:24
    And what I set out to do first,
  • 16:24 - 16:26
    you can really create these kinds of maps.
  • 16:27 - 16:30
    And you might wonder,
    "Are there any libraries here or there?"
  • 16:31 - 16:33
    There are--they are not yet in Wikidata.
  • 16:33 - 16:35
    We're still working on that.
  • 16:35 - 16:38
    And actually, last week,
    I spoke with a volunteer,
  • 16:38 - 16:41
    who's helping now
    with entering the libraries.
  • 16:42 - 16:45
    You can really make cool--in Wikidata,
  • 16:45 - 16:48
    and also with using
    the Cartographer extension,
  • 16:48 - 16:50
    you can use these kinds of maps.
  • 16:52 - 16:54
    And I even took it one step further.
  • 16:54 - 16:57
    I also have some Python skills,
    and some Leaflet things skills--
  • 16:57 - 17:00
    so, I created, and I'm quite
    proud of it, actually.
  • 17:00 - 17:03
    I created this library heat map,
    which is fully interactive.
  • 17:03 - 17:06
    You can zoom in to it,
    and you can see all the libraries,
  • 17:07 - 17:09
    and you can also run it off Wiki.
  • 17:09 - 17:11
    So, you can just embed it
    in your own website,
  • 17:11 - 17:13
    and it fully runs interactively.
  • 17:15 - 17:18
    So, now going back to my big scary table.
  • 17:20 - 17:23
    There is one column
    on the right, which is blank.
  • 17:23 - 17:25
    And no surprise, it will be Wikidata.
  • 17:25 - 17:26
    Let's see how it scores there.
  • 17:26 - 17:30
    (cheering)
  • 17:33 - 17:35
    So, I actually think
    of printing this on a T-shirt.
  • 17:35 - 17:37
    (laughter)
  • 17:38 - 17:40
    So, just to summarize this in words,
  • 17:40 - 17:41
    thanks to my project, now,
  • 17:41 - 17:46
    there is public free linked open data
    available for Dutch public libraries.
  • 17:47 - 17:50
    And who can benefit from my effort?
  • 17:50 - 17:52
    Well, all kinds of parties--
  • 17:52 - 17:54
    you see Wikipedia,
    because you can generate lists
  • 17:54 - 17:56
    and overviews and articles,
  • 17:56 - 18:00
    for instance, using this
    and be able to from Wikidata
  • 18:00 - 18:02
    for our National Library for--
  • 18:03 - 18:05
    IFLA also has an inventory
    of worldwide libraries,
  • 18:05 - 18:07
    they can also reuse the data.
  • 18:08 - 18:09
    And especially for Sandra,
  • 18:10 - 18:13
    it's also important for the Ministry--
    Dutch Ministry of Culture--
  • 18:13 - 18:16
    because Sandra is going
    to have a talk about Wikidata
  • 18:16 - 18:18
    with the Ministry this Monday,
    next Monday.
  • 18:20 - 18:22
    And also, on the righthand side,
    for instance,
  • 18:24 - 18:27
    Amazon with Alexa, the assistant,
  • 18:27 - 18:29
    they're also using Wikidata,
  • 18:29 - 18:31
    so you can imagine that they also use,
  • 18:31 - 18:33
    if you're looking for public
    library information,
  • 18:33 - 18:37
    they can also use Wikidata for that.
  • 18:39 - 18:42
    Because one year ago,
    Simon Cobb inspired me
  • 18:42 - 18:44
    to do this project,
    I would like to call upon you,
  • 18:44 - 18:46
    if you have time available,
  • 18:46 - 18:50
    and if you have data from your own country
    about public libraries,
  • 18:52 - 18:54
    make the coverage better,
    add more red dots,
  • 18:55 - 18:57
    and of course, I'm willing
    to help you with that.
  • 18:57 - 18:59
    And Simon is also willing
    to help with this.
  • 19:00 - 19:01
    And so, I hope next year, somebody else
  • 19:01 - 19:04
    will be at this conference
    or another conference
  • 19:04 - 19:06
    and there will be more
    red dots on the map.
  • 19:08 - 19:09
    Thank you very much.
  • 19:09 - 19:13
    (applause)
  • 19:18 - 19:20
    Thank you, Olaf.
  • 19:20 - 19:24
    Next we have Ursula Oberst
    and Heleen Smits
  • 19:24 - 19:28
    presenting how can a small
    research library benefit from Wikidata:
  • 19:28 - 19:31
    enhancing library products using Wikidata.
  • 19:54 - 19:58
    Okay. Good morning.
    My name is Heleen Smits.
  • 19:59 - 20:02
    And my colleague,
    Ursula Oberst--where are you?
  • 20:02 - 20:04
    (laughter)
  • 20:04 - 20:09
    And I work at the Library
    of the African Studies Center
  • 20:09 - 20:11
    in Leiden, in the Netherlands.
  • 20:11 - 20:15
    And the African Studies Center
    is a center devoted--
  • 20:15 - 20:21
    is an academic institution
    devoted entirely to the study of Africa,
  • 20:21 - 20:24
    focusing on Humanities and Social Studies.
  • 20:25 - 20:28
    We used to be an independent
    research organization,
  • 20:28 - 20:33
    but in 2016, we became part
    of Leiden University,
  • 20:33 - 20:38
    and our catalog was integrated
    into the larger university catalog.
  • 20:39 - 20:44
    Though it remained possible
    to do a search in the part of the Leiden--
  • 20:44 - 20:46
    of the African Studies Catalog, alone,
  • 20:48 - 20:51
    we remained independent in some respects.
  • 20:51 - 20:53
    For example, with respect
    to our thesaurus.
  • 20:55 - 21:00
    And also with respect
    to the products we make for our users,
  • 21:01 - 21:04
    such as acquisition lists
    and work dossiers.
  • 21:05 - 21:12
    And it is in the field of the web dossiers
  • 21:12 - 21:15
    that we have been looking
  • 21:15 - 21:20
    for possible ways to apply Wikidata,
  • 21:20 - 21:23
    and that's the part where Ursula
    will in the second part of this talk
  • 21:24 - 21:27
    show you a bit
    what we've been doing there.
  • 21:31 - 21:35
    The web dossiers are our collections
  • 21:35 - 21:39
    of titles from our catalog
    that we compile
  • 21:39 - 21:46
    around a theme usually connected
    to, for example, a conference,
  • 21:46 - 21:51
    or to a special event, and actually,
    the most recent web dossier we made
  • 21:51 - 21:56
    was connected to the year
    of indigenous languages,
  • 21:56 - 22:00
    and that was around proverbs
    in African languages.
  • 22:01 - 22:02
    Our first steps--
  • 22:04 - 22:09
    next slide--our first steps
    on the Wiki path as a library,
  • 22:10 - 22:15
    were in 2013, when we were one
    of 12 GLAM institutions
  • 22:15 - 22:16
    in the Netherlands,
  • 22:16 - 22:21
    part of the project
    of Wikipedians in Residence,
  • 22:21 - 22:26
    and we had for two months,
    a Wikipedian in the house,
  • 22:27 - 22:33
    and he gave us trainings
    for adding articles to Wikipedia,
  • 22:33 - 22:38
    and also, we made a start with uploading
    photo collections to Commons,
  • 22:39 - 22:43
    which always remained a little bit
    dependent on funding, as well,
  • 22:43 - 22:46
    whether we would be able to digitize them,
  • 22:46 - 22:50
    and to mostly have
    a student assistant to do this.
  • 22:51 - 22:55
    But it was actually a great adding
    to what we could offer
  • 22:55 - 22:58
    as an academic library.
  • 22:59 - 23:05
    In May 2018, so is that my Ursula,
    my colleague Ursula--
  • 23:05 - 23:09
    she started to really explore--
    dive into Wikidata
  • 23:09 - 23:15
    and see what we as a small
    and not very much experienced library
  • 23:15 - 23:18
    in these fields could do with that.
  • 23:25 - 23:27
    So, I mentioned, we have
    our own thesaurus.
  • 23:28 - 23:31
    And this is where we started.
  • 23:31 - 23:35
    This is a thesaurus of 13,000 terms,
  • 23:35 - 23:38
    all in the field of African studies.
  • 23:38 - 23:41
    It contains a lot of African languages,
  • 23:43 - 23:46
    names of ethnic groups in Africa,
  • 23:48 - 23:49
    and other proper names,
  • 23:49 - 23:56
    which are perhaps especially
    interesting for Wikidata.
  • 23:59 - 24:05
    So, it is a real authority control
  • 24:05 - 24:08
    to vocabulary
    with 5,000 preferred terms.
  • 24:09 - 24:11
    So, we submitted the request to Wikidata,
  • 24:11 - 24:17
    and that was actually very quickly
    met with a positive response,
  • 24:17 - 24:19
    which was very encouraging for us.
  • 24:23 - 24:26
    Our thesaurus was loaded into Mix-n-Match,
  • 24:26 - 24:32
    and by now, 75% of the terms
  • 24:32 - 24:36
    have been manually matched with Wikidata.
  • 24:38 - 24:42
    So, it means, well, that we are now--
  • 24:43 - 24:48
    we are added as an identifier--
  • 24:48 - 24:52
    for example, if you click
    on Swahili language,
  • 24:52 - 24:57
    what happens then in Wikidata
    on the number that--
  • 24:59 - 25:02
    that connects our term--
    is the Wikidata term--
  • 25:03 - 25:06
    we enter into our thesaurus,
  • 25:06 - 25:10
    and from there, you can do a search
    directly in the catalog
  • 25:10 - 25:13
    by clicking the button again.
  • 25:13 - 25:18
    It means, also, that Wikidata
    has not really integrated
  • 25:18 - 25:20
    into our catalog.
  • 25:20 - 25:22
    But that's also more difficult.
  • 25:22 - 25:26
    Okay, we have to give the floor
  • 25:26 - 25:31
    to Ursula for the next part.
  • 25:31 - 25:33
    (Ursula) Thank you very much, Heleen.
  • 25:33 - 25:37
    So, I will talk about our experiences
  • 25:37 - 25:40
    with incorporating Wikidata elements
  • 25:40 - 25:41
    to our web dossier.
  • 25:41 - 25:45
    A web dossier is--oh, sorry, yeah, sorry.
  • 25:45 - 25:50
    A web dossier, or a classical web dossier,
    consists of three parts:
  • 25:50 - 25:53
    an introduction to the subject,
  • 25:53 - 25:56
    mostly written by one of our researchers;
  • 25:56 - 26:01
    a selection of titles, both books
    and articles from our collection;
  • 26:01 - 26:06
    and the third part, an annotated list
  • 26:06 - 26:09
    with links to electronic resources.
  • 26:09 - 26:16
    And this year, we added a fourth part
    to our web dossiers,
  • 26:16 - 26:18
    which is the Wikidata elements.
  • 26:19 - 26:22
    And it all started last year,
  • 26:22 - 26:25
    and my story is similar
    to the story of Olaf, actually.
  • 26:25 - 26:30
    Last year, when I had no clue
    about Wikidata,
  • 26:30 - 26:33
    and I discovered this wonderful
    article by Alex Stinson
  • 26:33 - 26:37
    on how to write a query in Wikidata.
  • 26:37 - 26:42
    And he chose a subject--
    a very appealing subject to me.
  • 26:42 - 26:46
    Namely, "Discovering Women Writers
    from North Africa."
  • 26:46 - 26:51
    I can really recommend this article,
  • 26:51 - 26:53
    because it's very instructive.
  • 26:53 - 26:57
    And I thought I will be--
    I'm going to work on this query,
  • 26:57 - 27:03
    and try to change it to:
    "Southern African Women Writers,"
  • 27:03 - 27:07
    and try to add a link
    to their work in our catalog.
  • 27:07 - 27:11
    And on the right-hand side,
    you see the SPARQL query
  • 27:12 - 27:15
    which searches for
    "Southern African Women Writers."
  • 27:15 - 27:21
    If you click on the button,
    on the blue button on the lefthand side,
  • 27:22 - 27:24
    the search result will appear beneath.
  • 27:24 - 27:26
    The search result can have
    different formats.
  • 27:26 - 27:30
    In my case, the search result is a map.
  • 27:30 - 27:33
    And the nice thing about Wikidata
  • 27:33 - 27:37
    is that you can embed
    to this search result
  • 27:37 - 27:39
    into your own webpage,
  • 27:39 - 27:42
    and that's what we are now doing
    with our work dossiers.
  • 27:42 - 27:47
    So, this was the very first one
    on Southern African women writers,
  • 27:47 - 27:50
    listed classical three elements,
  • 27:50 - 27:53
    plus this map on the lefthand side,
  • 27:53 - 27:56
    which gives extra information--
  • 27:56 - 27:58
    a link to the Southern African
    women writer--
  • 27:58 - 28:01
    a link to her works in our catalog,
  • 28:01 - 28:07
    and a link to the Wikidata record
    of her birth place, and her name,
  • 28:08 - 28:13
    her personal record, plus a photo,
    if it's available on Wikidata.
  • 28:16 - 28:20
    And you have to retrieve a nice map
  • 28:20 - 28:24
    with a lot of red dots
    on the African continent.
  • 28:24 - 28:29
    You need nice data in Wikidata,
    complete, sufficient data.
  • 28:29 - 28:33
    So, with our second web dossier
    on public art in Africa,
  • 28:33 - 28:38
    we also started to enhance
    the data in Wikidata.
  • 28:38 - 28:43
    In this case, for a public art--
    we edited geo-locations--
  • 28:43 - 28:47
    geo-locations to Wikidata.
  • 28:47 - 28:51
    And we also searched for works
    of public art in commons,
  • 28:51 - 28:55
    and if they don't have
    a record on Wikidata yet,
  • 28:55 - 29:01
    we edited the record to Wikidata.
  • 29:01 - 29:05
    And the third thing we do,
  • 29:05 - 29:10
    because when we prepare a web dossier,
  • 29:10 - 29:16
    we download the titles from our catalog,
  • 29:16 - 29:18
    and the tiles are in MARC 21,
  • 29:18 - 29:23
    so we have to convert them to a format
    that is presentable on the website,
  • 29:23 - 29:28
    and it takes not much time and effort
    to convert the same set of titles
  • 29:28 - 29:30
    to Wikidata QuickStatements,
  • 29:30 - 29:37
    and then, we also upload
    a title set to Wikidata,
  • 29:37 - 29:41
    and you can see the titles we uploaded
  • 29:41 - 29:44
    from our latest web dossier
  • 29:44 - 29:48
    on African proverbs in Scholia.
  • 29:49 - 29:52
    A really nice tool
    that visualizes Scholia publications
  • 29:52 - 29:55
    being present in Wikidata.
  • 29:55 - 30:00
    And, one second--when it is possible,
    we add a Scholia template
  • 30:00 - 30:02
    to our web dossier's topic.
  • 30:02 - 30:03
    Thank you very much.
  • 30:03 - 30:08
    (applause)
  • 30:09 - 30:12
    Thank you, Heleen and Ursula.
  • 30:12 - 30:17
    Next we have Adrian Pohl
    presenting using Wikidata
  • 30:17 - 30:22
    to improve spatial subject indexing
    and regional bibliography.
  • 30:45 - 30:47
    Okay, hello everybody.
  • 30:47 - 30:50
    I'm going right into the topic.
  • 30:50 - 30:54
    I only have ten minutes to present
    a three-year project.
  • 30:55 - 30:57
    It wasn't full time. (laughs)
  • 30:57 - 31:00
    Okay, what's the NWBib?
  • 31:00 - 31:04
    It's an acronym for North-Rhine
    Westphalian Bibliography.
  • 31:04 - 31:08
    It's a regional bibliography
    that records literature
  • 31:08 - 31:11
    about people and places
    in North Rhine-Westphalia.
  • 31:13 - 31:14
    And the monograph's in it--
  • 31:15 - 31:19
    there are a lot of articles in it,
    and most of them are quite unique,
  • 31:19 - 31:22
    so, that's the interesting thing
    about this bibliography--
  • 31:22 - 31:25
    because it's often
    less quite obscure stuff--
  • 31:25 - 31:28
    local people writing
    about that tradition,
  • 31:28 - 31:29
    and something like this.
  • 31:30 - 31:33
    And there's over 400,000 entries in there.
  • 31:33 - 31:38
    And the bibliography started in 1983,
  • 31:38 - 31:43
    and so we only have titles
    from this publication year onwards.
  • 31:45 - 31:49
    If you want to take a look at it,
    it's at nwbib.de,
  • 31:49 - 31:51
    that's the web application.
  • 31:51 - 31:55
    It's based on our service,
    lobid.org, the API.
  • 31:57 - 32:01
    Because it's cataloged as part
    of the hbz union catalog,
  • 32:01 - 32:05
    which comprises around 20 million records,
  • 32:05 - 32:09
    it's an [inaudible] Aleph system
    we get the data out of there,
  • 32:09 - 32:11
    and make RDF out of it,
  • 32:11 - 32:16
    and provide it as via JSON
    or the HTTP API.
  • 32:17 - 32:21
    So, the initial status in 2017
  • 32:21 - 32:25
    was we had nearly 9,000 distinct strings
  • 32:25 - 32:29
    about places--referring to places,
    in North Rhine-Westphalia.
  • 32:29 - 32:34
    Mostly, those were administrative areas,
    like towns and districts,
  • 32:34 - 32:38
    but also monasteries, principalities,
    or natural regions.
  • 32:39 - 32:44
    And we already used Wikidata in 2017,
  • 32:44 - 32:48
    and matched those strings
    with Wikidata API to Wikidata entries
  • 32:48 - 32:52
    quite naively to get
    the geo-coordinates from there,
  • 32:52 - 32:57
    and do some geo-based
    discovery stuff with it.
  • 32:57 - 33:00
    But this had some drawbacks.
  • 33:00 - 33:03
    And so, the matching was really poor,
  • 33:03 - 33:05
    and there were a lot of false positives,
  • 33:05 - 33:09
    and we still had no hierarchy
    in those places,
  • 33:09 - 33:13
    and we still had a lot
    of non-unique names.
  • 33:14 - 33:15
    So, this is an example here.
  • 33:17 - 33:18
    Does this work?
  • 33:18 - 33:22
    Yeah, as you can see,
    for one place, Brauweiler,
  • 33:22 - 33:25
    there are four different strings in there.
  • 33:25 - 33:28
    So, we all know how this happens.
  • 33:28 - 33:32
    If there's no authority file,
    you end up with this data.
  • 33:32 - 33:34
    But we want to improve on that.
  • 33:35 - 33:38
    And as you can also see,
    that while the matching didn't work--
  • 33:38 - 33:40
    so you have this name of the place
  • 33:40 - 33:45
    and there's often the name
    of the superior administrative area,
  • 33:45 - 33:51
    and even on the second level,
    a superior administrative area
  • 33:51 - 33:52
    often in the name
  • 33:52 - 33:59
    to identify the place successfully.
  • 33:59 - 34:05
    So, the goal was to build a full-fledged
    spatial classification based on this data,
  • 34:05 - 34:07
    with a hierarchical view of places,
  • 34:09 - 34:11
    with one entry or ID for each place.
  • 34:12 - 34:17
    And we got this mock-up
    by NWBib editors in 2016, made in Excel,
  • 34:18 - 34:23
    to get a feeling of what
    they would like to have.
  • 34:25 - 34:28
    There you have the--
    Regierungsbezirk--
  • 34:28 - 34:31
    that's the most superior
    administrative area--
  • 34:31 - 34:35
    we have in there some towns
    or districts--rural districts--
  • 34:35 - 34:40
    and then, it's going down
    to the parts of towns,
  • 34:40 - 34:42
    even to this level.
  • 34:43 - 34:46
    And we chose Wikidata for this task.
  • 34:46 - 34:50
    We also looked at the GND,
    the Integrated Authority File,
  • 34:50 - 34:55
    and GeoNames--but Wikidata
    had the best coverage,
  • 34:55 - 34:57
    and the best infrastructure.
  • 34:58 - 35:02
    The coverage for the places
    and the geo-coordinates we need,
  • 35:02 - 35:05
    and the hierarchical
    information, for example.
  • 35:05 - 35:07
    There were a lot of places,
    also, in the GND,
  • 35:07 - 35:10
    but there was no hierarchical
    information in there.
  • 35:11 - 35:14
    And also, Wikidata provides
    the infrastructure
  • 35:14 - 35:15
    for editing and versioning.
  • 35:15 - 35:20
    And there's also a community
    that helps maintaining the data,
  • 35:20 - 35:22
    which was quite good.
  • 35:23 - 35:27
    Okay, but there was a requirement
    by the NWBib editors.
  • 35:28 - 35:31
    They did not want to directly
    rely on Wikidata,
  • 35:31 - 35:33
    which was understandable.
  • 35:33 - 35:35
    We don't have those servers
    under our control,
  • 35:35 - 35:38
    and we won't know what's going on there.
  • 35:38 - 35:42
    There might be some unwelcome edits
    that destroy the classification,
  • 35:42 - 35:44
    or parts of it, or vandalism.
  • 35:44 - 35:51
    So, we decide to put
    an intermediate SKOS file in between,
  • 35:51 - 35:56
    on which the application would--
    which should be generated from Wikidata.
  • 35:57 - 35:59
    And SKOS is the Simple Knowledge
    Organization System--
  • 35:59 - 36:04
    it's the standard way to model
  • 36:04 - 36:08
    a classification in the linked data world.
  • 36:08 - 36:09
    So, how we did it? Five steps.
  • 36:09 - 36:14
    I will come to each
    of the steps in more detail.
  • 36:14 - 36:18
    We match the strings to Wikidata
    with a better approach than before.
  • 36:19 - 36:23
    Created classification based
    on Wikidata, edit,
  • 36:23 - 36:26
    then back the links
    from Wikidata to NWBib
  • 36:26 - 36:28
    with a custom property.
  • 36:28 - 36:33
    And now, we are in the process
    of establishing a good process
  • 36:33 - 36:37
    for updating the classification
    in Wikidata.
  • 36:37 - 36:39
    Seeing--having a DIF
    of the changes,
  • 36:39 - 36:41
    and then publishing it to the SKOS file.
  • 36:43 - 36:45
    I will come to the details.
  • 36:45 - 36:46
    So, the matching approach--
  • 36:46 - 36:48
    as the API wasn't very sufficient,
  • 36:48 - 36:54
    and because we have those
    different levels in the strings,
  • 36:54 - 36:59
    we build a custom Elasticsearch
    index for our task.
  • 37:00 - 37:04
    I think by now, you could probably,
    as well, use OpenRefine for doing this,
  • 37:04 - 37:09
    but at that point in time,
    it wasn't available for Wikidata.
  • 37:10 - 37:14
    And we build this index base
    on SPARQL query,
  • 37:14 - 37:20
    and for entities in NRW,
    and with a specific type.
  • 37:20 - 37:25
    And the query evolved over time a lot.
  • 37:25 - 37:29
    And we have a few entries
    that you can see the history on GitHub.
  • 37:30 - 37:32
    So, where we put in the matching index,
  • 37:32 - 37:36
    in the spatial object,
    is what we need in our data.
  • 37:36 - 37:40
    It's the label and the ID
    or the link to Wikidata,
  • 37:40 - 37:44
    the geo-coordinates, and the type
    from Wikidata [inaudible], as well.
  • 37:44 - 37:50
    But also for the matching, very important
    that aliases and the broader thing--
  • 37:50 - 37:54
    and this is also an example where the name
    of the broader entity
  • 37:54 - 37:58
    and the district itself are very similar.
  • 37:58 - 38:03
    So, it's important to have
    some type information, as well,
  • 38:03 - 38:05
    for the matching.
  • 38:05 - 38:08
    So, the nationwide results
    were very good.
  • 38:08 - 38:11
    We could automatically match
    more than 99% of records
  • 38:11 - 38:12
    with this approach.
  • 38:14 - 38:16
    These were only 92% of the strings.
  • 38:17 - 38:18
    So, obviously, the results--
  • 38:18 - 38:21
    those strings that only occurred
    one or two times
  • 38:21 - 38:22
    often didn't appear in Wikidata.
  • 38:22 - 38:26
    And so, we had to do a lot of work
    with those with the [long tail].
  • 38:28 - 38:32
    And for around 1,000 strings,
    the matching was incorrect.
  • 38:32 - 38:35
    But the catalogers did a lot of work
    in the Aleph catalog,
  • 38:35 - 38:40
    but also in Wikidata, they made
    more than 6,000 manual edits to Wikidata
  • 38:40 - 38:45
    to reach 100% coverage by adding
    aliases-type information,
  • 38:45 - 38:47
    creating new entries.
  • 38:47 - 38:49
    Okay, so, I have to speed up.
  • 38:50 - 38:54
    We created classification based on this,
    on the hierarchical statements.
  • 38:54 - 38:59
    P131 is the main property there.
  • 39:00 - 39:02
    We added the information to our data.
  • 39:03 - 39:07
    So, we now have this
    in our data spatial object--
  • 39:07 - 39:12
    and we focus this--the link to Wikidata,
    and the types are there,
  • 39:13 - 39:18
    and here's the ID
    from the SKOS classification
  • 39:18 - 39:19
    we built based on Wikidata.
  • 39:20 - 39:24
    And you can see there
    are Q identifiers in there.
  • 39:27 - 39:29
    Now, you can basically query our API
  • 39:29 - 39:34
    with such a query using Wikidata URIs,
  • 39:34 - 39:39
    and get literature, in this example,
    about Cologne back.
  • 39:40 - 39:46
    Then we created a Wikidata property
    for NWBib and edit those links
  • 39:46 - 39:51
    from Wikidata to the classification--
    batch load them with QuickStatements.
  • 39:52 - 39:54
    And there's also a nice--
  • 39:54 - 39:59
    also a move to using a qualifier
    on this property
  • 39:59 - 40:03
    to add the broader information there.
  • 40:03 - 40:06
    So, I think people won't mess around
    that work with this,
  • 40:06 - 40:09
    and as with the P131 statement.
  • 40:10 - 40:12
    So, this is what it looks like.
  • 40:13 - 40:16
    This will go to the classification
    where you can then start a query.
  • 40:19 - 40:23
    Now, we have to build this
    update and review process,
  • 40:23 - 40:29
    and we will add those data like this,
  • 40:29 - 40:32
    with a zero sub-field to Aleph,
  • 40:32 - 40:37
    and the catalogers will start
    using those Wikidata based IDs,
  • 40:37 - 40:41
    URIs, for cataloging for spatial indexing.
  • 40:45 - 40:50
    So, by now, there are more than 400,000
    NWBib entries with links to Wikidata,
  • 40:50 - 40:56
    and more than 4,400 Wikidata entries
    with links to NWBib.
  • 40:57 - 40:58
    Thank you.
  • 40:58 - 41:03
    (applause)
  • 41:08 - 41:10
    Thank you, Adrian.
  • 41:13 - 41:15
    I got it. Thank you.
  • 41:31 - 41:34
    So, as you've seen me before,
    I'm Hilary Thorsen.
  • 41:34 - 41:36
    I'm Wikimedian in residence
  • 41:36 - 41:38
    with the Linked Data
    for Production Project.
  • 41:38 - 41:40
    I am based at Stanford,
  • 41:40 - 41:43
    and I'm here today
    with my colleague, Lena Denis,
  • 41:43 - 41:46
    who is Cartographic Assistant
    at Harvard Library.
  • 41:46 - 41:50
    And Christine Fernsebner Eslao
    is here in spirit.
  • 41:50 - 41:54
    She is currently back in Boston,
    but supporting us from afar.
  • 41:54 - 41:56
    So, we'll be talking
    about Wikidata and Libraries
  • 41:56 - 42:00
    as partners in data production,
    organization, and project inspiration.
  • 42:01 - 42:04
    And our work is part of the Linked Data
    for Production Project.
  • 42:05 - 42:08
    So, Linked Data for Production
    is in its second phase,
  • 42:08 - 42:10
    called Pathway for Implementation.
  • 42:10 - 42:13
    And it's an Andrew W. Mellon
    Foundation grant,
  • 42:13 - 42:16
    involving the partnership
    of several universities,
  • 42:16 - 42:20
    with the goal of constructing a pathway
    for shifting the catalog community
  • 42:20 - 42:25
    to begin describing library
    resources with linked data.
  • 42:25 - 42:27
    And it builds upon a previous grant,
  • 42:27 - 42:30
    but this iteration is focused
    on the practical aspects
  • 42:30 - 42:32
    of the transition.
  • 42:34 - 42:36
    One of these pathways of investigation
  • 42:36 - 42:39
    has been integrating
    library metadata with Wikidata.
  • 42:39 - 42:41
    We have a lot of questions,
  • 42:41 - 42:43
    but some of the ones
    we're most interested in
  • 42:43 - 42:46
    are how we can integrate
    library metadata with Wikidata,
  • 42:46 - 42:50
    and make contribution
    a part of our cataloging workflows,
  • 42:50 - 42:54
    how Wikidata can help us improve
    our library discovery environment,
  • 42:54 - 42:56
    how it can help us reveal
    more relationships
  • 42:56 - 43:00
    and connections within our data
    and with external data sets,
  • 43:00 - 43:04
    and if we have connections in our own data
    that can be added to Wikidata,
  • 43:04 - 43:07
    how libraries can help
    fill in gaps in Wikidata,
  • 43:07 - 43:10
    and how libraries can work
    with local communities
  • 43:10 - 43:13
    to describe library
    and archival resources.
  • 43:14 - 43:17
    Finding answers to these questions
    has focused on the mutual benefit
  • 43:17 - 43:20
    for the library and Wikidata communities.
  • 43:20 - 43:23
    We've learned through starting to work
    on our different Wikidata projects,
  • 43:23 - 43:25
    that many of the issues
    libraries grapple with,
  • 43:25 - 43:29
    like data modeling, identity management,
    data maintenance, documentation,
  • 43:29 - 43:31
    and instruction on linked data,
  • 43:31 - 43:34
    are ones the Wikidata
    community works on too.
  • 43:34 - 43:36
    I'm going to turn things over to Lena
  • 43:36 - 43:40
    to talk about what
    she's been working on now.
  • 43:47 - 43:51
    Hi, so, as Hilary briefly mentioned,
    I work as a map librarian at Harvard,
  • 43:51 - 43:54
    where I process maps, atlases,
    and archives for our online catalog.
  • 43:54 - 43:57
    And while processing two-dimensional
    cartographic works
  • 43:57 - 44:00
    is relatively straighforward,
    cataloging archival collections
  • 44:00 - 44:02
    so that their cartographic resources
    can be made discoverable,
  • 44:02 - 44:04
    has always been more difficult.
  • 44:04 - 44:07
    So, my use case for Wikidata
    is visually modeling relationships
  • 44:07 - 44:10
    between archival collections
    and the individual items within them,
  • 44:10 - 44:13
    as well as between archival drafts
    in published works.
  • 44:13 - 44:17
    So, I used Wikidata to highlight the work
    of our cartographer named Erwin Raisz,
  • 44:17 - 44:20
    who worked at Harvard
    in the early 20th-century.
  • 44:20 - 44:23
    He was known for his vividly detailed
    and artistic land forms,
  • 44:23 - 44:24
    like this one on the screen--
  • 44:24 - 44:26
    but also for inventing
    the armadillo projection,
  • 44:26 - 44:29
    writing the first cartography
    textbook in English
  • 44:29 - 44:31
    and other various
    important contributions
  • 44:31 - 44:33
    to the field of geography.
  • 44:33 - 44:35
    And at the Harvard Map Collection,
  • 44:35 - 44:39
    we have a 66-item collection
    of Raisz's field notebooks,
  • 44:39 - 44:41
    which begin when he was a student
    and end just before his death.
  • 44:44 - 44:46
    So, this is the collection-level record
    that I made for them,
  • 44:46 - 44:48
    which merely gives an overview,
  • 44:48 - 44:51
    but his notebooks are full of information
  • 44:51 - 44:53
    that he used in later atlases,
    maps, and textbooks.
  • 44:53 - 44:56
    But researchers don't know how to find
    that trajectory information,
  • 44:56 - 44:59
    and the system
    is not designed to show them.
  • 45:01 - 45:04
    So, I felt that with Wikidata,
    and other Wikimedia platforms,
  • 45:04 - 45:05
    I'd be able to take advantage
  • 45:05 - 45:08
    of information that already exists
    about him on the open web,
  • 45:08 - 45:11
    along with library records
    and a notebook inventory
  • 45:11 - 45:13
    that I had made in an Excel spreadsheet
  • 45:13 - 45:15
    to show relationships and influences
    between his works.
  • 45:16 - 45:19
    So here, you can see how I edited
    and reconciled library data
  • 45:19 - 45:20
    in OpenRefine.
  • 45:20 - 45:23
    And then, I used QuickStatements
    to batch import my results.
  • 45:23 - 45:25
    So, now, I was ready
    to create knowledge graphs
  • 45:25 - 45:28
    with SPARQL queries
    to show patterns of influence.
  • 45:30 - 45:33
    The examples here show
    how I leveraged Wikimedia Commons images
  • 45:33 - 45:35
    that I connected to him.
  • 45:35 - 45:36
    And the hierarchy of some of his works
  • 45:36 - 45:39
    that were contributing
    factors to other works.
  • 45:39 - 45:42
    So, modeling Raisz's works on Wikidata
    allowed me to encompass in a single image,
  • 45:42 - 45:46
    or in this case, in two images,
    the connections that require many pages
  • 45:46 - 45:48
    of bibliographic data to reveal.
  • 45:52 - 45:56
    So, this video is going to load.
  • 45:56 - 45:57
    Yes! Alright.
  • 45:57 - 46:00
    This video is a minute and a half long
    screencast I made,
  • 46:00 - 46:02
    that I'm going to narrate as you watch.
  • 46:02 - 46:05
    It shows the process of inputting
    and then running a SPARQL query,
  • 46:05 - 46:09
    showing hierarchical relationships
    between notebooks, an atlas, and a map
  • 46:09 - 46:11
    that Raisz created about Cuba.
  • 46:11 - 46:13
    He worked there before the revolution,
  • 46:13 - 46:15
    so he had the unique position
    of having support
  • 46:15 - 46:17
    from both the American
    and the Cuban governments.
  • 46:17 - 46:21
    So, I made this query as an example
    to show people who work on Raisz,
  • 46:21 - 46:24
    and who are interested in narrowing down
    what materials they'd like to request
  • 46:24 - 46:26
    when they come to us for research.
  • 46:26 - 46:30
    To make the approach replicable
    for other archival collections,
  • 46:30 - 46:33
    I hope that Harvard and other institutions
    will prioritize Wikidata look-ups
  • 46:33 - 46:35
    as they move to linked data
    cataloging production,
  • 46:35 - 46:38
    which my co-presenters
    can speak to the progress on
  • 46:38 - 46:39
    better than I can.
  • 46:39 - 46:42
    But my work has brought me--
    has brought to mind a particular issue
  • 46:42 - 46:47
    that I see as a future opportunity,
    which is that of archival modeling.
  • 46:47 - 46:52
    So, to an archivist, an item
    is a discrete archival material
  • 46:52 - 46:55
    within a larger collection
    of archival materials
  • 46:55 - 46:57
    that is not a physical location.
  • 46:57 - 47:01
    So an archivist from the American National
    Archives and Records Administration,
  • 47:01 - 47:03
    who is also a Wikidata enthusiast,
  • 47:03 - 47:06
    advised me when I was trying
    to determine how to express this
  • 47:06 - 47:08
    using an example item,
  • 47:08 - 47:10
    that I'm going to show
    as soon as this video is finally over.
  • 47:11 - 47:14
    Alright. Great.
  • 47:20 - 47:22
    Nope, that's not what I wanted.
  • 47:22 - 47:24
    Here we go.
  • 47:31 - 47:32
    It's doing that.
  • 47:32 - 47:34
    (humming)
  • 47:34 - 47:37
    Nope. Sorry. Sorry.
  • 47:40 - 47:43
    Alright, I don't know why
    it's not going full screen again.
  • 47:43 - 47:44
    I can't get it to do anything.
  • 47:44 - 47:47
    But this is the-- oh, my gosh.
  • 47:47 - 47:48
    Stop that. Alright.
  • 47:48 - 47:51
    So, this is the item that I mentioned.
  • 47:52 - 47:54
    So, this was what the archivist
  • 47:54 - 47:56
    from the National Archives
    and Records Administration
  • 47:56 - 47:57
    showed me as an example.
  • 47:57 - 48:02
    And he recommended this compromise,
    which is to use the part of property
  • 48:02 - 48:06
    to connect a lower level description
    to a higher level of description,
  • 48:06 - 48:09
    which allows the relationships
    between different hierarchical levels
  • 48:09 - 48:11
    to be asserted as statements
    and qualifiers.
  • 48:11 - 48:13
    So, in this example that's on screen,
  • 48:13 - 48:16
    the relationship between an item,
    a series, a collection, and a record group
  • 48:16 - 48:20
    are thus contained and described
    within a Wikidata item entity.
  • 48:20 - 48:22
    So, I followed this model
    in my work on Raisz.
  • 48:23 - 48:26
    And one of my images is missing.
  • 48:26 - 48:28
    No, it's not. It's right there. I'm sorry.
  • 48:28 - 48:31
    And so, I followed this model
    on my work on Raisz,
  • 48:31 - 48:33
    but I look forward
    to further standardization.
  • 48:39 - 48:41
    So, another archival project
    Harvard is working on
  • 48:41 - 48:45
    is the Arthur Freedman collection
    of more than 2,000 hours
  • 48:45 - 48:49
    of punk rock performances
    from the 1970s to early 2000s
  • 48:49 - 48:52
    in the Boston and Cambridge,
    Massachussets areas.
  • 48:52 - 48:55
    It includes many bands and venues
    that no longer exist.
  • 48:56 - 49:00
    So far, work has been done in OpenRefine
    on reconciliation of the bands and venues
  • 49:00 - 49:02
    to see which need an item
    created in Wikidata.
  • 49:03 - 49:06
    A basic item will be created
    via batch process next spring,
  • 49:06 - 49:09
    and then, an edit-a-thon will be
    held in conjunction
  • 49:09 - 49:12
    with the New England Music Library
    Association's meeting in Boston
  • 49:12 - 49:16
    to focus on adding more statements
    to the batch-created items,
  • 49:16 - 49:19
    by drawing on local music
    community knowledge.
  • 49:19 - 49:22
    We're interested in learning more
    about models for pairing librarians
  • 49:22 - 49:26
    and Wiki enthusiasts with new contributors
    who have domain knowledge.
  • 49:26 - 49:29
    Items will eventually be linked
    to digitized video
  • 49:29 - 49:31
    in Harvard's digital collection platform
  • 49:31 - 49:33
    once rights have
    been cleared with artists,
  • 49:33 - 49:35
    which will likely be a slow process.
  • 49:36 - 49:38
    There's also a great amount of interest
  • 49:38 - 49:42
    in moving away from manual cataloging
    and creation of authority data
  • 49:42 - 49:43
    towards identity management,
  • 49:43 - 49:46
    where descriptions
    can be created in batches.
  • 49:46 - 49:48
    An additional project that focused on
  • 49:48 - 49:51
    creating international standard
    name identifiers, or ISNIs,
  • 49:51 - 49:53
    for avant-garde and women filmmakers
  • 49:53 - 49:58
    can be adapted for creating Wikidata items
    for these filmmakers, as well.
  • 49:58 - 50:01
    Spreadsheets with the ISNIs,
    filmmaker names, and other details
  • 50:01 - 50:05
    can be reconciled in OpenRefine,
    and uploaded with QuickStatements.
  • 50:05 - 50:07
    Once people in organizations
    have been described,
  • 50:07 - 50:09
    we'll move toward describing
    the films in Wikidata,
  • 50:09 - 50:13
    which will likely present
    some additional modeling challenges.
  • 50:13 - 50:15
    A library presentation
    wouldn't be complete
  • 50:15 - 50:17
    without a MARC record.
  • 50:17 - 50:20
    Here, you can see the record
    for Karen Aqua's taxonomy film,
  • 50:20 - 50:22
    where her ISNI and Wikidata Q number
  • 50:22 - 50:24
    have been added to the 100 field.
  • 50:24 - 50:27
    The ISNIs and Wikidata Q numbers
    that have been created
  • 50:27 - 50:30
    can then be batch added
    back into MARC records via MarcEdit.
  • 50:30 - 50:33
    You might be asking why I'm showing you
    this ugly MARC record,
  • 50:33 - 50:36
    instead of some beautiful
    linked data statements.
  • 50:36 - 50:39
    And that's because our libraries
    will be working in a hybrid environment
  • 50:39 - 50:40
    for some time.
  • 50:40 - 50:42
    Our library catalogs still relies
    on MARC records,
  • 50:42 - 50:44
    so by adding in these URIs,
  • 50:44 - 50:46
    we can try to take advantage
    of linked data,
  • 50:46 - 50:48
    while our systems still use MARC.
  • 50:49 - 50:53
    Adding URIs into MARC records
    makes an additional aspect
  • 50:53 - 50:54
    of our project possible.
  • 50:54 - 50:57
    Work has been done at Stanford
    and Cornell to bring data
  • 50:57 - 51:02
    from Wikidata into our library catalog
    using URIs already in our MARC records.
  • 51:02 - 51:05
    You can see an example
    of a knowledge panel,
  • 51:05 - 51:07
    where all the data is sourced
    from Wikidata,
  • 51:07 - 51:11
    and links back to the item itself,
    along with an invitation to contribute.
  • 51:11 - 51:15
    This is currently in a test environment,
    not in production in our catalog.
  • 51:15 - 51:17
    Ideally, eventually,
    these will be generated
  • 51:17 - 51:20
    from linked data descriptions
    of library resources
  • 51:20 - 51:23
    created using Sinopia,
    our linked data editor
  • 51:23 - 51:25
    developed for cataloging.
  • 51:25 - 51:28
    We found that adding a look-up
    to Wikidata in Sinopia is difficult.
  • 51:28 - 51:32
    The scale and modeling of Wikidata
    makes it hard to partition the data
  • 51:32 - 51:34
    to be able to look up typed entities,
  • 51:34 - 51:35
    and we've run into the problem
  • 51:35 - 51:37
    of SPARQL not being good
    for keyword search,
  • 51:37 - 51:42
    but wanting our keyword APIs
    to return SPARQL-like RDF descriptions.
  • 51:42 - 51:45
    So, as you can see, we still have
    quite a bit of work to do.
  • 51:45 - 51:48
    This round of the grant
    runs until June 2020,
  • 51:48 - 51:50
    so, we'll be continuing our exploration.
  • 51:50 - 51:53
    And I just wanted to invite anyone
  • 51:53 - 51:58
    who's continued an interest in talking
    about Wikidata and libraries,
  • 51:58 - 52:01
    I lead a Wikidata Affinity Group
    that's open to anyone to join.
  • 52:01 - 52:03
    We meet every two weeks,
  • 52:03 - 52:06
    and our next call is Tuesday,
    November the 5th,
  • 52:06 - 52:08
    so if you're interested
    in continuing discussions,
  • 52:08 - 52:10
    I would love to talk with you further.
  • 52:10 - 52:12
    Thank you, everyone.
  • 52:12 - 52:14
    And thank you to the other presenters
  • 52:14 - 52:17
    for talking about all
    of their wonderful projects.
  • 52:17 - 52:21
    (applause)
Title:
cdn.media.ccc.de/.../wikidatacon2019-8-eng-Libraries_panel_hd.mp4
Video Language:
English
Duration:
52:29

English subtitles

Revisions