- 
I work as a teacher
at the University of Alicante,
 
- 
where I recently obtained my PhD
on data libraries and linked open data.
 
- 
And I'm also a software developer
 
- 
at the Biblioteca Virtual
Miguel de Cervantes.
 
- 
And today, I'm going to talk
about data quality.
 
- 
Well, those are my colleagues
at the university.
 
- 
And as you may know, many organizations
are publishing their data
 
- 
or linked open data--
 
- 
for example,
the National Library of France,
 
- 
the National Library of Spain,
us, which is Cervantes Virtual,
 
- 
the British National Bibliography,
 
- 
the Library of Congress and Europeana.
 
- 
All of them provide a SPARQL endpoint,
 
- 
which is useful in order
to retrieve the data.
 
- 
And if I'm not wrong,
 
- 
the Library of Congress only provide
the data as a dump that you can't use.
 
- 
When we publish our repository
as linked open data,
 
- 
my idea was to be reused
by other institutions.
 
- 
But what about if I'm an institution
who wants to enrich their data
 
- 
with any data from other data libraries.
 
- 
Which data set should I use?
 
- 
Which data set is better
in terms of quality?
 
- 
The benefits of the evaluation
of data quality in libraries are many.
 
- 
For example, methodologies can be improved
in order to include new criteria,
 
- 
in order to assess the quality.
 
- 
And also, organizations can benefit
from best practices and guidelines
 
- 
in order to publish their data
as linked open data.
 
- 
What do we need
in order to assess the quality?
 
- 
Well, obviously, a set of candidates
and a set of features.
 
- 
For example, do they have
a SPARQL endpoint,
 
- 
do they have a web interface,
how many publications do they have,
 
- 
how many vocabularies do they use,
how many Wikidata properties do they have,
 
- 
and where can I get those candidates?
 
- 
I use LOD Cloud--
 
- 
but when I was doing this slide,
I thought about using Wikidata
 
- 
in order to retrieve those candidates.
 
- 
For example, getting entities
of type data library,
 
- 
which has a SPARQL endpoint.
 
- 
You have here the link.
 
- 
And I come up with those data libraries.
 
- 
The first one uses bibliographic ontology
as main vocabulary,
 
- 
and the others are based,
more or less, on FRBR,
 
- 
which is a vocabulary published by IFLA.
 
- 
And this is just an example
of how we could compare
 
- 
data libraries using
bubble charts on Wikidata.
 
- 
And this is just an example comparing
how many Wikidata properties
 
- 
are per data library.
 
- 
Well, how can we measure quality?
 
- 
There are different methodologies,
 
- 
for example, FRBR 1,
 
- 
which provides a set of criteria
grouped by dimensions,
 
- 
and those in green
are the ones that I found--
 
- 
that I could assess by means of Wikidata.
 
- 
And we also find that we
could define new criteria,
 
- 
for example, a new one to evaluate
the number of duplications in Wikidata.
 
- 
We use those properties.
 
- 
And this is an example of SPARQL,
 
- 
in order to count the number
of duplicates property.
 
- 
And about the results,
while at the moment of doing this study,
 
- 
not the slides, there was no property
for the British National Bibliography.
 
- 
They don't provide provenance information,
 
- 
which could be useful
for metadata enrichment.
 
- 
And they don't allow
to edit the information.
 
- 
So, we've been talking
about Wikibase the whole weekend,
 
- 
and maybe we should try to adopt
Wikibase as an interface.
 
- 
And they are focused on their own content,
 
- 
and this is just the SPARQL query
based on Wikidata
 
- 
in order to assess the population.
 
- 
And the BnF provides labels
in multiple languages,
 
- 
and they all use self-describing URIs,
 
- 
which is that in the URI,
they have the type of entity,
 
- 
which allows the human reader
to understand what they are using.
 
- 
And more results, they provide
different output format,
 
- 
they use external vocabularies.
 
- 
Only the British National Bibliography
 
- 
provides machine-readable
licensing information.
 
- 
And up to one-third of the instances
are connected to external repositories,
 
- 
which is really nice.
 
- 
And while this study, this work
has been done in our Labs team,
 
- 
a lab in a GLAM is a group of people
 
- 
who want to explore new ways
 
- 
of reusing data collections.
 
- 
And there's a community
led by the British Library,
 
- 
and in particular, Mahendra Mahey,
 
- 
and we had a first event in London,
 
- 
and another one in Copenhagen,
 
- 
and we're going to have a new one in May
 
- 
at the Library of Congress in Washington.
 
- 
And we are now 250 people.
 
- 
And I'm so glad that I found
somebody here at the WikidataCon
 
- 
who has just joined us--
 
- 
Sylvia from [inaudible], Mexico.
 
- 
And I'd like to invite you
to our community,
 
- 
since you may be part
of a GLAM institution.
 
- 
So, we can talk later
if you want to know about this.
 
- 
And this--it's all about people.
 
- 
This is me, people
from the British Library,
 
- 
Library of Congress, Universities,
and National Libraries in Europe
 
- 
And there's a link here
in case you want to know more.
 
- 
And, well, last month,
we decided to meet in Doha
 
- 
in order to write a book
about how to create a lab in our GLAM.
 
- 
And they choose 15 people,
and I was so lucky to be there.
 
- 
And the book follows
the Booksprint methodology,
 
- 
which means that nothing
is prepared beforehand.
 
- 
All is done there in a week.
 
- 
And believe me, it was really hard work
 
- 
to have their whole book
done in this week.
 
- 
And I'd like to introduce you to the book,
which will be published--
 
- 
it was supposed to be published this week,
 
- 
but it will be next week.
 
- 
And it will be published open,
so you can have it,
 
- 
and I can show you
a little bit later if you want.
 
- 
And those are the authors.
 
- 
I'm here-- I'm so happy, too.
 
- 
And those are the institutions--
 
- 
Library of Congress, British Library--
and this is the title.
 
- 
And now, I'd like to show you--
 
- 
a map that I'm doing.
 
- 
We are launching a website
for our community,
 
- 
and I'm in charge of creating a map
with our institutions there.
 
- 
This is not finished.
 
- 
But this is just SPARQL, and below,
 
- 
we see the map.
 
- 
And we see here
the new people that I found, here,
 
- 
at the WikidataCon--
I'm so happy for this.
 
- 
And we have here my data library
of my university,
 
- 
and many other institutions.
 
- 
Also, from Australia--
 
- 
if I can do it.
 
- 
Well, here, we have some links.
 
- 
There you go.
 
- 
Okay, this is not finished.
 
- 
We are still working on this,
and that's all.
 
- 
Thank you very much for your attention.
 
- 
(applause)
 
- 
[inaudible]
 
- 
Good morning, everybody.
 
- 
I'm Olaf Janssen.
 
- 
I'm the Wikimedia coordinator
 
- 
at the National Library
of the Netherlands.
 
- 
And I would like to share my work,
 
- 
which I'm doing about creating
Linked Open Data
 
- 
for Dutch Public Libraries using Wikidata.
 
- 
And my story starts roughly a year ago
 
- 
when I was at the GLAM Wiki conference
in Tel Aviv, in Israel.
 
- 
And there are two men
with very similar shirts,
 
- 
and equally similar hairdos, [Matt]...
 
- 
(laughter)
 
- 
And on the left, that's me.
 
- 
And a year ago, I didn't have
any practical knowledge and skills
 
- 
about Wikidata.
 
- 
I looked at Wikidata,
and I looked at the items,
 
- 
and I played with it.
 
- 
But I wasn't able to make a SPARQL query
 
- 
or to do data modeling
with the right shape expression.
 
- 
That's a year ago.
 
- 
And on the lefthand side,
that's Simon Cobb, user: Sic19.
 
- 
And I was talking to him,
because, just before,
 
- 
he had given a presentation
 
- 
about improving the coverage
of public libraries in Wikidata.
 
- 
And I was very inspired by his talk.
 
- 
And basically, he was talking
about adding basic data
 
- 
about public libraries.
 
- 
So, the name of the library, if available,
the photo of the building,
 
- 
the address data of the library,
 
- 
the geo-coordinates
latitude and longitude,
 
- 
and some other things,
 
- 
including with all source references.
 
- 
And what I was very impressed
about a year ago was this map.
 
- 
This is a map about
public libraries in the U.K.
 
- 
with all the colors.
 
- 
And you can see that all the libraries
are layered by library organizations.
 
- 
And when he showed this,
I was really, "Wow, that's cool."
 
- 
So, then, one minute later, I thought,
 
- 
"Well, let's do it
for the country for that one."
 
- 
(laughter)
 
- 
And something about public libraries
in the Netherlands--
 
- 
there are about 1,300 library
branches in our country,
 
- 
grouped into 160 library organizations.
 
- 
And you might wonder why
do I want to do this project?
 
- 
Well, first of all, because
for the common good, for society,
 
- 
because I think using Wikidata,
 
- 
and from there,
creating Wikipedia articles,
 
- 
and opening it up
via the linked open data cloud--
 
- 
it's improving visibility and reusability
of public libraries in the Netherlands.
 
- 
And my second goal was actually
a more personal one,
 
- 
because a year ago, I had this
yearly evaluation with my manager,
 
- 
and we decided it was a good idea
that I got more practical skills
 
- 
on linked open data, data modeling,
and also on Wikidata.
 
- 
And of course, I wanted to be able to make
these kinds of maps myself.
 
- 
(laughter)
 
- 
Then you might wonder
why do I want to do this?
 
- 
Isn't there already enough basic
library data out there in the Netherlands
 
- 
to have a good coverage?
 
- 
So, let me show you some of the websites
 
- 
that are available to discover
address and location information
 
- 
about Dutch public libraries.
 
- 
And the first one is this one--
Gidsvoornederland.nl--
 
- 
and that's the official
public library inventory
 
- 
maintained by my library,
the National Library.
 
- 
And you can look up addresses
and geo-coordinates on that website.
 
- 
Then there is this site,
Bibliotheekinzicht--
 
- 
this is also an official website
maintained by my National Library.
 
- 
And this is about
public library statistics.
 
- 
Then there is another one,
debibliotheken.nl--
 
- 
as you can see there is also
address information
 
- 
about library organizations,
not about individual branches.
 
- 
And there's even this one,
which also has address information.
 
- 
And of course, there's something
like Google Maps,
 
- 
which also has all the names
and the locations and the addresses.
 
- 
And this one, the International
Library of Technology,
 
- 
which has a worldwide
inventory of libraries,
 
- 
including the Netherlands.
 
- 
And I even discovered there is a data set
 
- 
you can buy for 50 euros or so
to download it.
 
- 
And there is also--seems to be
I didn't download it,
 
- 
but there seems to be address
information available.
 
- 
You might wonder is this kind of data
good enough for the purposes I had?
 
- 
So, this is my birthday list
for my ideal public library data list.
 
- 
And what's on my list?
 
- 
First of all, the data I want to have
must be up-to-date-ish--
 
- 
it must be fairly up-to-date.
 
- 
So, doesn't have to be real time,
 
- 
but let's say, a couple
of months, or half a year,
 
- 
delayed with official publication,
that's okay for my purposes.
 
- 
And I want to have it both
library branches
 
- 
and the library organizations.
 
- 
Then I want my data to be structured,
because it has to be machine-readable.
 
- 
It has to be in open file format,
such as CSV or JSON or RDF.
 
- 
It has to be linked
to other resources preferably.
 
- 
And the uses--the license on the data
needs to be manifest public domain or CC0.
 
- 
Then, I would like my data to have an API,
 
- 
which must be public, free,
and preferably also anonymous
 
- 
so you don't have to use an API key,
or you have to register an account.
 
- 
And I also want to have
a SPARQL interface.
 
- 
So, now, these are all the sites
I just showed you.
 
- 
And I'm going to make a big grid.
 
- 
And then, this is about
the evaluation I did.
 
- 
I'm not going into it,
but there is no single column
 
- 
which has all green check marks.
 
- 
That's the important thing to take away.
 
- 
And so, in summary, there was no
linked public free linked open data
 
- 
for Dutch public libraries available
before I started my project.
 
- 
So, this was the ideal motivation
to actually work on it.
 
- 
So, that's what I've been doing
for a year now.
 
- 
And I've been adding libraries bit by bit,
organization by organization to Wikidata.
 
- 
I created also a project website on it.
 
- 
It's still rather messy,
but it has all the information,
 
- 
and I try to keep it
as up-to-date as possible.
 
- 
And also all the SPARQL queries
you can see are linked from here.
 
- 
And I'm just adding
really basic information.
 
- 
You see the instances,
images if available,
 
- 
addresses, locations, et cetera,
municipalities.
 
- 
And where possible, I also try to link
the libraries to external identifiers.
 
- 
And then, you can really easily--
we all know,
 
- 
generating some Listeria lists
with public libraries grouped
 
- 
by organizations, for instance.
 
- 
Or using SPARQL queries,
you can also do aggregation on data--
 
- 
let's say, give me all
the municipalities in the Netherlands
 
- 
and the number of library branches
in all the municipalities.
 
- 
With one click, you can make
these kinds of photo galleries.
 
- 
And what I set out to do first,
 
- 
you can really create these kinds of maps.
 
- 
And you might wonder,
"Are there any libraries here or there?"
 
- 
There are--they are not yet in Wikidata.
 
- 
We're still working on that.
 
- 
And actually, last week,
I spoke with a volunteer,
 
- 
who's helping now
with entering the libraries.
 
- 
You can really make cool--in Wikidata,
 
- 
and also with using
the Cartographer extension,
 
- 
you can use these kinds of maps.
 
- 
And I even took it one step further.
 
- 
I also have some Python skills,
and some Leaflet things skills--
 
- 
so, I created, and I'm quite
proud of it, actually.
 
- 
I created this library heat map,
which is fully interactive.
 
- 
You can zoom in to it,
and you can see all the libraries,
 
- 
and you can also run it off Wiki.
 
- 
So, you can just embed it
in your own website,
 
- 
and it fully runs interactively.
 
- 
So, now going back to my big scary table.
 
- 
There is one column
on the right, which is blank.
 
- 
And no surprise, it will be Wikidata.
 
- 
Let's see how it scores there.
 
- 
(cheering)
 
- 
So, I actually think
of printing this on a T-shirt.
 
- 
(laughter)
 
- 
So, just to summarize this in words,
 
- 
thanks to my project, now,
 
- 
there is public free linked open data
available for Dutch public libraries.
 
- 
And who can benefit from my effort?
 
- 
Well, all kinds of parties--
 
- 
you see Wikipedia,
because you can generate lists
 
- 
and overviews and articles,
 
- 
for instance, using this
and be able to from Wikidata
 
- 
for our National Library for--
 
- 
IFLA also has an inventory
of worldwide libraries,
 
- 
they can also reuse the data.
 
- 
And especially for Sandra,
 
- 
it's also important for the Ministry--
Dutch Ministry of Culture--
 
- 
because Sandra is going
to have a talk about Wikidata
 
- 
with the Ministry this Monday,
next Monday.
 
- 
And also, on the righthand side, 
for instance,
 
- 
Amazon with Alexa, the assistant,
 
- 
they're also using Wikidata,
 
- 
so you can imagine that they also use,
 
- 
if you're looking for public
library information,
 
- 
they can also use Wikidata for that.
 
- 
Because one year ago,
Simon Cobb inspired me
 
- 
to do this project,
I would like to call upon you,
 
- 
if you have time available,
 
- 
and if you have data from your own country
about public libraries,
 
- 
make the coverage better,
add more red dots,
 
- 
and of course, I'm willing
to help you with that.
 
- 
And Simon is also willing
to help with this.
 
- 
And so, I hope next year, somebody else
 
- 
will be at this conference
or another conference
 
- 
and there will be more
red dots on the map.
 
- 
Thank you very much.
 
- 
(applause)
 
- 
Thank you, Olaf.
 
- 
Next we have Ursula Oberst
and Heleen Smits
 
- 
presenting how can a small
research library benefit from Wikidata:
 
- 
enhancing library products using Wikidata.
 
- 
Okay. Good morning.
My name is Heleen Smits.
 
- 
And my colleague,
Ursula Oberst--where are you?
 
- 
(laughter)
 
- 
And I work at the Library
of the African Studies Center
 
- 
in Leiden, in the Netherlands.
 
- 
And the African Studies Center
is a center devoted--
 
- 
is an academic institution
devoted entirely to the study of Africa,
 
- 
focusing on Humanities and Social Studies.
 
- 
We used to be an independent
research organization,
 
- 
but in 2016, we became part
of Leiden University,
 
- 
and our catalog was integrated
into the larger university catalog.
 
- 
Though it remained possible
to do a search in the part of the Leiden--
 
- 
of the African Studies Catalog, alone,
 
- 
we remained independent in some respects.
 
- 
For example, with respect
to our thesaurus.
 
- 
And also with respect
to the products we make for our users,
 
- 
such as acquisition lists
and work dossiers.
 
- 
And it is in the field of the web dossiers
 
- 
that we have been looking
 
- 
for possible ways to apply Wikidata,
 
- 
and that's the part where Ursula
will in the second part of this talk
 
- 
show you a bit
what we've been doing there.
 
- 
The web dossiers are our collections
 
- 
of titles from our catalog
that we compile
 
- 
around a theme usually connected
to, for example, a conference,
 
- 
or to a special event, and actually,
the most recent web dossier we made
 
- 
was connected to the year
of indigenous languages,
 
- 
and that was around proverbs
in African languages.
 
- 
Our first steps--
 
- 
next slide--our first steps
on the Wiki path as a library,
 
- 
were in 2013, when we were one
of 12 GLAM institutions
 
- 
in the Netherlands,
 
- 
part of the project
of Wikipedians in Residence,
 
- 
and we had for two months,
a Wikipedian in the house,
 
- 
and he gave us trainings
for adding articles to Wikipedia,
 
- 
and also, we made a start with uploading
photo collections to Commons,
 
- 
which always remained a little bit
dependent on funding, as well,
 
- 
whether we would be able to digitize them,
 
- 
and to mostly have
a student assistant to do this.
 
- 
But it was actually a great adding 
to what we could offer
 
- 
as an academic library.
 
- 
In May 2018, so is that my Ursula,
my colleague Ursula--
 
- 
she started to really explore--
dive into Wikidata
 
- 
and see what we as a small
and not very much experienced library
 
- 
in these fields could do with that.
 
- 
So, I mentioned, we have
our own thesaurus.
 
- 
And this is where we started.
 
- 
This is a thesaurus of 13,000 terms,
 
- 
all in the field of African studies.
 
- 
It contains a lot of African languages,
 
- 
names of ethnic groups in Africa,
 
- 
and other proper names,
 
- 
which are perhaps especially 
interesting for Wikidata.
 
- 
So, it is a real authority control
 
- 
to vocabulary 
with 5,000 preferred terms.
 
- 
So, we submitted the request to Wikidata,
 
- 
and that was actually very quickly
met with a positive response,
 
- 
which was very encouraging for us.
 
- 
Our thesaurus was loaded into Mix-n-Match,
 
- 
and by now, 75% of the terms
 
- 
have been manually matched with Wikidata.
 
- 
So, it means, well, that we are now--
 
- 
we are added as an identifier--
 
- 
for example, if you click
on Swahili language,
 
- 
what happens then in Wikidata
on the number that--
 
- 
that connects our term--
is the Wikidata term--
 
- 
we enter into our thesaurus,
 
- 
and from there, you can do a search
directly in the catalog
 
- 
by clicking the button again.
 
- 
It means, also, that Wikidata
has not really integrated
 
- 
into our catalog.
 
- 
But that's also more difficult.
 
- 
Okay, we have to give the floor
 
- 
to Ursula for the next part.
 
- 
(Ursula) Thank you very much, Heleen.
 
- 
So, I will talk about our experiences
 
- 
with incorporating Wikidata elements
 
- 
to our web dossier.
 
- 
A web dossier is--oh, sorry, yeah, sorry.
 
- 
A web dossier, or a classical web dossier,
consists of three parts:
 
- 
an introduction to the subject,
 
- 
mostly written by one of our researchers;
 
- 
a selection of titles, both books
and articles from our collection;
 
- 
and the third part, an annotated list
 
- 
with links to electronic resources.
 
- 
And this year, we added a fourth part
to our web dossiers,
 
- 
which is the Wikidata elements.
 
- 
And it all started last year,
 
- 
and my story is similar
to the story of Olaf, actually.
 
- 
Last year, when I had no clue
about Wikidata,
 
- 
and I discovered this wonderful
article by Alex Stinson
 
- 
on how to write a query in Wikidata.
 
- 
And he chose a subject--
a very appealing subject to me.
 
- 
Namely, "Discovering Women Writers
from North Africa."
 
- 
I can really recommend this article,
 
- 
because it's very instructive.
 
- 
And I thought I will be--
I'm going to work on this query,
 
- 
and try to change it to:
"Southern African Women Writers,"
 
- 
and try to add a link
to their work in our catalog.
 
- 
And on the right-hand side,
you see the SPARQL query
 
- 
which searches for
"Southern African Women Writers."
 
- 
If you click on the button,
on the blue button on the lefthand side,
 
- 
the search result will appear beneath.
 
- 
The search result can have
different formats.
 
- 
In my case, the search result is a map.
 
- 
And the nice thing about Wikidata
 
- 
is that you can embed
to this search result
 
- 
into your own webpage,
 
- 
and that's what we are now doing
with our work dossiers.
 
- 
So, this was the very first one
on Southern African women writers,
 
- 
listed classical three elements,
 
- 
plus this map on the lefthand side,
 
- 
which gives extra information--
 
- 
a link to the Southern African
women writer--
 
- 
a link to her works in our catalog,
 
- 
and a link to the Wikidata record
of her birth place, and her name,
 
- 
her personal record, plus a photo,
if it's available on Wikidata.
 
- 
And you have to retrieve a nice map
 
- 
with a lot of red dots
on the African continent.
 
- 
You need nice data in Wikidata,
complete, sufficient data.
 
- 
So, with our second web dossier
on public art in Africa,
 
- 
we also started to enhance
the data in Wikidata.
 
- 
In this case, for a public art--
we edited geo-locations--
 
- 
geo-locations to Wikidata.
 
- 
And we also searched for works
of public art in commons,
 
- 
and if they don't have
a record on Wikidata yet,
 
- 
we edited the record to Wikidata.
 
- 
And the third thing we do,
 
- 
because when we prepare a web dossier,
 
- 
we download the titles from our catalog,
 
- 
and the tiles are in MARC 21,
 
- 
so we have to convert them to a format
that is presentable on the website,
 
- 
and it takes not much time and effort
to convert the same set of titles
 
- 
to Wikidata QuickStatements,
 
- 
and then, we also upload
a title set to Wikidata,
 
- 
and you can see the titles we uploaded
 
- 
from our latest web dossier
 
- 
on African proverbs in Scholia.
 
- 
A really nice tool
that visualizes Scholia publications
 
- 
being present in Wikidata.
 
- 
And, one second--when it is possible,
we add a Scholia template
 
- 
to our web dossier's topic.
 
- 
Thank you very much.
 
- 
(applause)
 
- 
Thank you, Heleen and Ursula.
 
- 
Next we have Adrian Pohl
presenting using Wikidata
 
- 
to improve spatial subject indexing
and regional bibliography.
 
- 
Okay, hello everybody.
 
- 
I'm going right into the topic.
 
- 
I only have ten minutes to present
a three-year project.
 
- 
It wasn't full time. (laughs)
 
- 
Okay, what's the NWBib?
 
- 
It's an acronym for North-Rhine
Westphalian Bibliography.
 
- 
It's a regional bibliography
that records literature
 
- 
about people and places
in North Rhine-Westphalia.
 
- 
And the monograph's in it--
 
- 
there are a lot of articles in it,
and most of them are quite unique,
 
- 
so, that's the interesting thing
about this bibliography--
 
- 
because it's often
less quite obscure stuff--
 
- 
local people writing
about that tradition,
 
- 
and something like this.
 
- 
And there's over 400,000 entries in there.
 
- 
And the bibliography started in 1983,
 
- 
and so we only have titles
from this publication year onwards.
 
- 
If you want to take a look at it,
it's at nwbib.de,
 
- 
that's the web application.
 
- 
It's based on our service,
lobid.org, the API.
 
- 
Because it's cataloged as part
of the hbz union catalog,
 
- 
which comprises around 20 million records,
 
- 
it's an [inaudible] Aleph system
we get the data out of there,
 
- 
and make RDF out of it,
 
- 
and provide it as via JSON 
or the HTTP API.
 
- 
So, the initial status in 2017
 
- 
was we had nearly 9,000 distinct strings
 
- 
about places--referring to places,
in North Rhine-Westphalia.
 
- 
Mostly, those were administrative areas,
like towns and districts,
 
- 
but also monasteries, principalities,
or natural regions.
 
- 
And we already used Wikidata in 2017,
 
- 
and matched those strings
with Wikidata API to Wikidata entries
 
- 
quite naively to get
the geo-coordinates from there,
 
- 
and do some geo-based
discovery stuff with it.
 
- 
But this had some drawbacks.
 
- 
And so, the matching was really poor,
 
- 
and there were a lot of false positives,
 
- 
and we still had no hierarchy
in those places,
 
- 
and we still had a lot
of non-unique names.
 
- 
So, this is an example here.
 
- 
Does this work?
 
- 
Yeah, as you can see,
for one place, Brauweiler,
 
- 
there are four different strings in there.
 
- 
So, we all know how this happens.
 
- 
If there's no authority file,
you end up with this data.
 
- 
But we want to improve on that.
 
- 
And as you can also see,
that while the matching didn't work--
 
- 
so you have this name of the place
 
- 
and there's often the name 
of the superior administrative area,
 
- 
and even on the second level,
a superior administrative area
 
- 
often in the name
 
- 
to identify the place successfully.
 
- 
So, the goal was to build a full-fledged
spatial classification based on this data,
 
- 
with a hierarchical view of places,
 
- 
with one entry or ID for each place.
 
- 
And we got this mock-up
by NWBib editors in 2016, made in Excel,
 
- 
to get a feeling of what
they would like to have.
 
- 
There you have the--
Regierungsbezirk--
 
- 
that's the most superior
administrative area--
 
- 
we have in there some towns
or districts--rural districts--
 
- 
and then, it's going down
to the parts of towns,
 
- 
even to this level.
 
- 
And we chose Wikidata for this task.
 
- 
We also looked at the GND,
the Integrated Authority File,
 
- 
and GeoNames--but Wikidata
had the best coverage,
 
- 
and the best infrastructure.
 
- 
The coverage for the places
and the geo-coordinates we need,
 
- 
and the hierarchical 
information, for example.
 
- 
There were a lot of places, 
also, in the GND,
 
- 
but there was no hierarchical
information in there.
 
- 
And also, Wikidata provides
the infrastructure
 
- 
for editing and versioning.
 
- 
And there's also a community
that helps maintaining the data,
 
- 
which was quite good.
 
- 
Okay, but there was a requirement
by the NWBib editors.
 
- 
They did not want to directly
rely on Wikidata,
 
- 
which was understandable.
 
- 
We don't have those servers
under our control,
 
- 
and we won't know what's going on there.
 
- 
There might be some unwelcome edits
that destroy the classification,
 
- 
or parts of it, or vandalism.
 
- 
So, we decide to put
an intermediate SKOS file in between,
 
- 
on which the application would--
which should be generated from Wikidata.
 
- 
And SKOS is the Simple Knowledge
Organization System--
 
- 
it's the standard way to model
 
- 
a classification in the linked data world.
 
- 
So, how we did it? Five steps.
 
- 
I will come to each
of the steps in more detail.
 
- 
We match the strings to Wikidata
with a better approach than before.
 
- 
Created classification based
on Wikidata, edit,
 
- 
then back the links
from Wikidata to NWBib
 
- 
with a custom property.
 
- 
And now, we are in the process
of establishing a good process
 
- 
for updating the classification
in Wikidata.
 
- 
Seeing--having a DIF
of the changes,
 
- 
and then publishing it to the SKOS file.
 
- 
I will come to the details.
 
- 
So, the matching approach--
 
- 
as the API wasn't very sufficient,
 
- 
and because we have those
different levels in the strings,
 
- 
we build a custom Elasticsearch
index for our task.
 
- 
I think by now, you could probably,
as well, use OpenRefine for doing this,
 
- 
but at that point in time,
it wasn't available for Wikidata.
 
- 
And we build this index base
on SPARQL query,
 
- 
and for entities in NRW,
and with a specific type.
 
- 
And the query evolved over time a lot.
 
- 
And we have a few entries
that you can see the history on GitHub.
 
- 
So, where we put in the matching index,
 
- 
in the spatial object, 
is what we need in our data.
 
- 
It's the label and the ID
or the link to Wikidata,
 
- 
the geo-coordinates, and the type
from Wikidata [inaudible], as well.
 
- 
But also for the matching, very important
that aliases and the broader thing--
 
- 
and this is also an example where the name
of the broader entity
 
- 
and the district itself are very similar.
 
- 
So, it's important to have
some type information, as well,
 
- 
for the matching.
 
- 
So, the nationwide results
were very good.
 
- 
We could automatically match
more than 99% of records
 
- 
with this approach.
 
- 
These were only 92% of the strings.
 
- 
So, obviously, the results--
 
- 
those strings that only occurred
one or two times
 
- 
often didn't appear in Wikidata.
 
- 
And so, we had to do a lot of work
with those with the [long tail].
 
- 
And for around 1,000 strings,
the matching was incorrect.
 
- 
But the catalogers did a lot of work
in the Aleph catalog,
 
- 
but also in Wikidata, they made
more than 6,000 manual edits to Wikidata
 
- 
to reach 100% coverage by adding
aliases-type information,
 
- 
creating new entries.
 
- 
Okay, so, I have to speed up.
 
- 
We created classification based on this,
on the hierarchical statements.
 
- 
P131 is the main property there.
 
- 
We added the information to our data.
 
- 
So, we now have this
in our data spatial object--
 
- 
and we focus this--the link to Wikidata,
and the types are there,
 
- 
and here's the ID
from the SKOS classification
 
- 
we built based on Wikidata.
 
- 
And you can see there
are Q identifiers in there.
 
- 
Now, you can basically query our API
 
- 
with such a query using Wikidata URIs,
 
- 
and get literature, in this example,
about Cologne back.
 
- 
Then we created a Wikidata property
for NWBib and edit those links
 
- 
from Wikidata to the classification--
batch load them with QuickStatements.
 
- 
And there's also a nice--
 
- 
also a move to using a qualifier
on this property
 
- 
to add the broader information there.
 
- 
So, I think people won't mess around
that work with this,
 
- 
and as with the P131 statement.
 
- 
So, this is what it looks like.
 
- 
This will go to the classification
where you can then start a query.
 
- 
Now, we have to build this
update and review process,
 
- 
and we will add those data like this,
 
- 
with a zero sub-field to Aleph,
 
- 
and the catalogers will start
using those Wikidata based IDs,
 
- 
URIs, for cataloging for spatial indexing.
 
- 
So, by now, there are more than 400,000
NWBib entries with links to Wikidata,
 
- 
and more than 4,400 Wikidata entries
with links to NWBib.
 
- 
Thank you.
 
- 
(applause)
 
- 
Thank you, Adrian.
 
- 
I got it. Thank you.
 
- 
So, as you've seen me before,
I'm Hilary Thorsen.
 
- 
I'm Wikimedian in residence
 
- 
with the Linked Data
for Production Project.
 
- 
I am based at Stanford,
 
- 
and I'm here today
with my colleague, Lena Denis,
 
- 
who is Cartographic Assistant
at Harvard Library.
 
- 
And Christine Fernsebner Eslao
is here in spirit.
 
- 
She is currently back in Boston,
but supporting us from afar.
 
- 
So, we'll be talking
about Wikidata and Libraries
 
- 
as partners in data production,
organization, and project inspiration.
 
- 
And our work is part of the Linked Data
for Production Project.
 
- 
So, Linked Data for Production
is in its second phase,
 
- 
called Pathway for Implementation.
 
- 
And it's an Andrew W. Mellon
Foundation grant,
 
- 
involving the partnership
of several universities,
 
- 
with the goal of constructing a pathway
for shifting the catalog community
 
- 
to begin describing library
resources with linked data.
 
- 
And it builds upon a previous grant,
 
- 
but this iteration is focused
on the practical aspects
 
- 
of the transition.
 
- 
One of these pathways of investigation
 
- 
has been integrating
library metadata with Wikidata.
 
- 
We have a lot of questions,
 
- 
but some of the ones
we're most interested in
 
- 
are how we can integrate
library metadata with Wikidata,
 
- 
and make contribution
a part of our cataloging workflows,
 
- 
how Wikidata can help us improve
our library discovery environment,
 
- 
how it can help us reveal
more relationships
 
- 
and connections within our data
and with external data sets,
 
- 
and if we have connections in our own data
that can be added to Wikidata,
 
- 
how libraries can help
fill in gaps in Wikidata,
 
- 
and how libraries can work
with local communities
 
- 
to describe library
and archival resources.
 
- 
Finding answers to these questions
has focused on the mutual benefit
 
- 
for the library and Wikidata communities.
 
- 
We've learned through starting to work
on our different Wikidata projects,
 
- 
that many of the issues
libraries grapple with,
 
- 
like data modeling, identity management,
data maintenance, documentation,
 
- 
and instruction on linked data,
 
- 
are ones the Wikidata
community works on too.
 
- 
I'm going to turn things over to Lena
 
- 
to talk about what
she's been working on now.
 
- 
Hi, so, as Hilary briefly mentioned,
I work as a map librarian at Harvard,
 
- 
where I process maps, atlases,
and archives for our online catalog.
 
- 
And while processing two-dimensional
cartographic works
 
- 
is relatively straighforward,
cataloging archival collections
 
- 
so that their cartographic resources
can be made discoverable,
 
- 
has always been more difficult.
 
- 
So, my use case for Wikidata
is visually modeling relationships
 
- 
between archival collections
and the individual items within them,
 
- 
as well as between archival drafts
in published works.
 
- 
So, I used Wikidata to highlight the work
of our cartographer named Erwin Raisz,
 
- 
who worked at Harvard
in the early 20th-century.
 
- 
He was known for his vividly detailed
and artistic land forms,
 
- 
like this one on the screen--
 
- 
but also for inventing
the armadillo projection,
 
- 
writing the first cartography
textbook in English
 
- 
and other various
important contributions
 
- 
to the field of geography.
 
- 
And at the Harvard Map Collection,
 
- 
we have a 66-item collection
of Raisz's field notebooks,
 
- 
which begin when he was a student
and end just before his death.
 
- 
So, this is the collection-level record
that I made for them,
 
- 
which merely gives an overview,
 
- 
but his notebooks are full of information
 
- 
that he used in later atlases,
maps, and textbooks.
 
- 
But researchers don't know how to find
that trajectory information,
 
- 
and the system
is not designed to show them.
 
- 
So, I felt that with Wikidata,
and other Wikimedia platforms,
 
- 
I'd be able to take advantage
 
- 
of information that already exists
about him on the open web,
 
- 
along with library records
and a notebook inventory
 
- 
that I had made in an Excel spreadsheet
 
- 
to show relationships and influences
between his works.
 
- 
So here, you can see how I edited
and reconciled library data
 
- 
in OpenRefine.
 
- 
And then, I used QuickStatements
to batch import my results.
 
- 
So, now, I was ready
to create knowledge graphs
 
- 
with SPARQL queries
to show patterns of influence.
 
- 
The examples here show
how I leveraged Wikimedia Commons images
 
- 
that I connected to him.
 
- 
And the hierarchy of some of his works
 
- 
that were contributing
factors to other works.
 
- 
So, modeling Raisz's works on Wikidata
allowed me to encompass in a single image,
 
- 
or in this case, in two images,
the connections that require many pages
 
- 
of bibliographic data to reveal.
 
- 
So, this video is going to load.
 
- 
Yes! Alright.
 
- 
This video is a minute and a half long
screencast I made,
 
- 
that I'm going to narrate as you watch.
 
- 
It shows the process of inputting
and then running a SPARQL query,
 
- 
showing hierarchical relationships
between notebooks, an atlas, and a map
 
- 
that Raisz created about Cuba.
 
- 
He worked there before the revolution,
 
- 
so he had the unique position
of having support
 
- 
from both the American
and the Cuban governments.
 
- 
So, I made this query as an example
to show people who work on Raisz,
 
- 
and who are interested in narrowing down
what materials they'd like to request
 
- 
when they come to us for research.
 
- 
To make the approach replicable
for other archival collections,
 
- 
I hope that Harvard and other institutions
will prioritize Wikidata look-ups
 
- 
as they move to linked data
cataloging production,
 
- 
which my co-presenters
can speak to the progress on
 
- 
better than I can.
 
- 
But my work has brought me--
has brought to mind a particular issue
 
- 
that I see as a future opportunity,
which is that of archival modeling.
 
- 
So, to an archivist, an item
is a discrete archival material
 
- 
within a larger collection
of archival materials
 
- 
that is not a physical location.
 
- 
So an archivist from the American National
Archives and Records Administration,
 
- 
who is also a Wikidata enthusiast,
 
- 
advised me when I was trying
to determine how to express this
 
- 
using an example item,
 
- 
that I'm going to show
as soon as this video is finally over.
 
- 
Alright. Great.
 
- 
Nope, that's not what I wanted.
 
- 
Here we go.
 
- 
It's doing that.
 
- 
(humming)
 
- 
Nope. Sorry. Sorry.
 
- 
Alright, I don't know why
it's not going full screen again.
 
- 
I can't get it to do anything.
 
- 
But this is the-- oh, my gosh.
 
- 
Stop that. Alright.
 
- 
So, this is the item that I mentioned.
 
- 
So, this was what the archivist
 
- 
from the National Archives
and Records Administration
 
- 
showed me as an example.
 
- 
And he recommended this compromise,
which is to use the part of property
 
- 
to connect a lower level description
to a higher level of description,
 
- 
which allows the relationships
between different hierarchical levels
 
- 
to be asserted as statements
and qualifiers.
 
- 
So, in this example that's on screen,
 
- 
the relationship between an item,
a series, a collection, and a record group
 
- 
are thus contained and described
within a Wikidata item entity.
 
- 
So, I followed this model
in my work on Raisz.
 
- 
And one of my images is missing.
 
- 
No, it's not. It's right there. I'm sorry.
 
- 
And so, I followed this model
on my work on Raisz,
 
- 
but I look forward
to further standardization.
 
- 
So, another archival project
Harvard is working on
 
- 
is the Arthur Freedman collection
of more than 2,000 hours
 
- 
of punk rock performances
from the 1970s to early 2000s
 
- 
in the Boston and Cambridge,
Massachussets areas.
 
- 
It includes many bands and venues
that no longer exist.
 
- 
So far, work has been done in OpenRefine
on reconciliation of the bands and venues
 
- 
to see which need an item
created in Wikidata.
 
- 
A basic item will be created
via batch process next spring,
 
- 
and then, an edit-a-thon will be 
held in conjunction
 
- 
with the New England Music Library
Association's meeting in Boston
 
- 
to focus on adding more statements
to the batch-created items,
 
- 
by drawing on local music
community knowledge.
 
- 
We're interested in learning more
about models for pairing librarians
 
- 
and Wiki enthusiasts with new contributors
who have domain knowledge.
 
- 
Items will eventually be linked
to digitized video
 
- 
in Harvard's digital collection platform
 
- 
once rights have
been cleared with artists,
 
- 
which will likely be a slow process.
 
- 
There's also a great amount of interest
 
- 
in moving away from manual cataloging
and creation of authority data
 
- 
towards identity management,
 
- 
where descriptions
can be created in batches.
 
- 
An additional project that focused on
 
- 
creating international standard
name identifiers, or ISNIs,
 
- 
for avant-garde and women filmmakers
 
- 
can be adapted for creating Wikidata items
for these filmmakers, as well.
 
- 
Spreadsheets with the ISNIs,
filmmaker names, and other details
 
- 
can be reconciled in OpenRefine,
and uploaded with QuickStatements.
 
- 
Once people in organizations
have been described,
 
- 
we'll move toward describing
the films in Wikidata,
 
- 
which will likely present
some additional modeling challenges.
 
- 
A library presentation
wouldn't be complete
 
- 
without a MARC record.
 
- 
Here, you can see the record
for Karen Aqua's taxonomy film,
 
- 
where her ISNI and Wikidata Q number
 
- 
have been added to the 100 field.
 
- 
The ISNIs and Wikidata Q numbers
that have been created
 
- 
can then be batch added
back into MARC records via MarcEdit.
 
- 
You might be asking why I'm showing you
this ugly MARC record,
 
- 
instead of some beautiful
linked data statements.
 
- 
And that's because our libraries
will be working in a hybrid environment
 
- 
for some time.
 
- 
Our library catalogs still relies
on MARC records,
 
- 
so by adding in these URIs,
 
- 
we can try to take advantage
of linked data,
 
- 
while our systems still use MARC.
 
- 
Adding URIs into MARC records
makes an additional aspect
 
- 
of our project possible.
 
- 
Work has been done at Stanford
and Cornell to bring data
 
- 
from Wikidata into our library catalog
using URIs already in our MARC records.
 
- 
You can see an example
of a knowledge panel,
 
- 
where all the data is sourced
from Wikidata,
 
- 
and links back to the item itself,
along with an invitation to contribute.
 
- 
This is currently in a test environment,
not in production in our catalog.
 
- 
Ideally, eventually,
these will be generated
 
- 
from linked data descriptions
of library resources
 
- 
created using Sinopia,
our linked data editor
 
- 
developed for cataloging.
 
- 
We found that adding a look-up
to Wikidata in Sinopia is difficult.
 
- 
The scale and modeling of Wikidata
makes it hard to partition the data
 
- 
to be able to look up typed entities,
 
- 
and we've run into the problem
 
- 
of SPARQL not being good
for keyword search,
 
- 
but wanting our keyword APIs
to return SPARQL-like RDF descriptions.
 
- 
So, as you can see, we still have
quite a bit of work to do.
 
- 
This round of the grant
runs until June 2020,
 
- 
so, we'll be continuing our exploration.
 
- 
And I just wanted to invite anyone
 
- 
who's continued an interest in talking
about Wikidata and libraries,
 
- 
I lead a Wikidata Affinity Group
that's open to anyone to join.
 
- 
We meet every two weeks,
 
- 
and our next call is Tuesday,
November the 5th,
 
- 
so if you're interested
in continuing discussions,
 
- 
I would love to talk with you further.
 
- 
Thank you, everyone.
 
- 
And thank you to the other presenters
 
- 
for talking about all
of their wonderful projects.
 
- 
(applause)