-
I work as a teacher
at the University of Alicante,
-
where I recently obtained my PhD
on data libraries and linked open data.
-
And I'm also a software developer
-
at the Biblioteca Virtual
Miguel de Cervantes.
-
And today, I'm going to talk
about data quality.
-
Well, those are my colleagues
at the university.
-
And as you may know, many organizations
are publishing their data
-
or linked open data--
-
for example,
the National Library of France,
-
the National Library of Spain,
us, which is Cervantes Virtual,
-
the British National Bibliography,
-
the Library of Congress and Europeana.
-
All of them provide a SPARQL endpoint,
-
which is useful in order
to retrieve the data.
-
And if I'm not wrong,
-
the Library of Congress only provide
the data as a dump that you can't use.
-
When we publish our repository
as linked open data,
-
my idea was to be reused
by other institutions.
-
But what about if I'm an institution
who wants to enrich their data
-
with any data from other data libraries.
-
Which data set should I use?
-
Which data set is better
in terms of quality?
-
The benefits of the evaluation
of data quality in libraries are many.
-
For example, methodologies can be improved
in order to include new criteria,
-
in order to assess the quality.
-
And also, organizations can benefit
from best practices and guidelines
-
in order to publish their data
as linked open data.
-
What do we need
in order to assess the quality?
-
Well, obviously, a set of candidates
and a set of features.
-
For example, do they have
a SPARQL endpoint,
-
do they have a web interface,
how many publications do they have,
-
how many vocabularies do they use,
how many Wikidata properties do they have,
-
and where can I get those candidates?
-
I use LOD Cloud--
-
but when I was doing this slide,
I thought about using Wikidata
-
in order to retrieve those candidates.
-
For example, getting entities
of type data library,
-
which has a SPARQL endpoint.
-
You have here the link.
-
And I come up with those data libraries.
-
The first one uses bibliographic ontology
as main vocabulary,
-
and the others are based,
more or less, on FRBR,
-
which is a vocabulary published by IFLA.
-
And this is just an example
of how we could compare
-
data libraries using
bubble charts on Wikidata.
-
And this is just an example comparing
how many Wikidata properties
-
are per data library.
-
Well, how can we measure quality?
-
There are different methodologies,
-
for example, FRBR 1,
-
which provides a set of criteria
grouped by dimensions,
-
and those in green
are the ones that I found--
-
that I could assess by means of Wikidata.
-
And we also find that we
could define new criteria,
-
for example, a new one to evaluate
the number of duplications in Wikidata.
-
We use those properties.
-
And this is an example of SPARQL,
-
in order to count the number
of duplicates property.
-
And about the results,
while at the moment of doing this study,
-
not the slides, there was no property
for the British National Bibliography.
-
They don't provide provenance information,
-
which could be useful
for metadata enrichment.
-
And they don't allow
to edit the information.
-
So, we've been talking
about Wikibase the whole weekend,
-
and maybe we should try to adopt
Wikibase as an interface.
-
And they are focused on their own content,
-
and this is just the SPARQL query
based on Wikidata
-
in order to assess the population.
-
And the BnF provides labels
in multiple languages,
-
and they all use self-describing URIs,
-
which is that in the URI,
they have the type of entity,
-
which allows the human reader
to understand what they are using.
-
And more results, they provide
different output format,
-
they use external vocabularies.
-
Only the British National Bibliography
-
provides machine-readable
licensing information.
-
And up to one-third of the instances
are connected to external repositories,
-
which is really nice.
-
And while this study, this work
has been done in our Labs team,
-
a lab in a GLAM is a group of people
-
who want to explore new ways
-
of reusing data collections.
-
And there's a community
led by the British Library,
-
and in particular, Mahendra Mahey,
-
and we had a first event in London,
-
and another one in Copenhagen,
-
and we're going to have a new one in May
-
at the Library of Congress in Washington.
-
And we are now 250 people.
-
And I'm so glad that I found
somebody here at the WikidataCon
-
who has just joined us--
-
Sylvia from [inaudible], Mexico.
-
And I'd like to invite you
to our community,
-
since you may be part
of a GLAM institution.
-
So, we can talk later
if you want to know about this.
-
And this--it's all about people.
-
This is me, people
from the British Library,
-
Library of Congress, Universities,
and National Libraries in Europe
-
And there's a link here
in case you want to know more.
-
And, well, last month,
we decided to meet in Doha
-
in order to write a book
about how to create a lab in our GLAM.
-
And they choose 15 people,
and I was so lucky to be there.
-
And the book follows
the Booksprint methodology,
-
which means that nothing
is prepared beforehand.
-
All is done there in a week.
-
And believe me, it was really hard work
-
to have their whole book
done in this week.
-
And I'd like to introduce you to the book,
which will be published--
-
it was supposed to be published this week,
-
but it will be next week.
-
And it will be published open,
so you can have it,
-
and I can show you
a little bit later if you want.
-
And those are the authors.
-
I'm here-- I'm so happy, too.
-
And those are the institutions--
-
Library of Congress, British Library--
and this is the title.
-
And now, I'd like to show you--
-
a map that I'm doing.
-
We are launching a website
for our community,
-
and I'm in charge of creating a map
with our institutions there.
-
This is not finished.
-
But this is just SPARQL, and below,
-
we see the map.
-
And we see here
the new people that I found, here,
-
at the WikidataCon--
I'm so happy for this.
-
And we have here my data library
of my university,
-
and many other institutions.
-
Also, from Australia--
-
if I can do it.
-
Well, here, we have some links.
-
There you go.
-
Okay, this is not finished.
-
We are still working on this,
and that's all.
-
Thank you very much for your attention.
-
(applause)
-
[inaudible]
-
Good morning, everybody.
-
I'm Olaf Janssen.
-
I'm the Wikimedia coordinator
-
at the National Library
of the Netherlands.
-
And I would like to share my work,
-
which I'm doing about creating
Linked Open Data
-
for Dutch Public Libraries using Wikidata.
-
And my story starts roughly a year ago
-
when I was at the GLAM Wiki conference
in Tel Aviv, in Israel.
-
And there are two men
with very similar shirts,
-
and equally similar hairdos, [Matt]...
-
(laughter)
-
And on the left, that's me.
-
And a year ago, I didn't have
any practical knowledge and skills
-
about Wikidata.
-
I looked at Wikidata,
and I looked at the items,
-
and I played with it.
-
But I wasn't able to make a SPARQL query
-
or to do data modeling
with the right shape expression.
-
That's a year ago.
-
And on the lefthand side,
that's Simon Cobb, user: Sic19.
-
And I was talking to him,
because, just before,
-
he had given a presentation
-
about improving the coverage
of public libraries in Wikidata.
-
And I was very inspired by his talk.
-
And basically, he was talking
about adding basic data
-
about public libraries.
-
So, the name of the library, if available,
the photo of the building,
-
the address data of the library,
-
the geo-coordinates
latitude and longitude,
-
and some other things,
-
including with all source references.
-
And what I was very impressed
about a year ago was this map.
-
This is a map about
public libraries in the U.K.
-
with all the colors.
-
And you can see that all the libraries
are layered by library organizations.
-
And when he showed this,
I was really, "Wow, that's cool."
-
So, then, one minute later, I thought,
-
"Well, let's do it
for the country for that one."
-
(laughter)
-
And something about public libraries
in the Netherlands--
-
there are about 1,300 library
branches in our country,
-
grouped into 160 library organizations.
-
And you might wonder why
do I want to do this project?
-
Well, first of all, because
for the common good, for society,
-
because I think using Wikidata,
-
and from there,
creating Wikipedia articles,
-
and opening it up
via the linked open data cloud--
-
it's improving visibility and reusability
of public libraries in the Netherlands.
-
And my second goal was actually
a more personal one,
-
because a year ago, I had this
yearly evaluation with my manager,
-
and we decided it was a good idea
that I got more practical skills
-
on linked open data, data modeling,
and also on Wikidata.
-
And of course, I wanted to be able to make
these kinds of maps myself.
-
(laughter)
-
Then you might wonder
why do I want to do this?
-
Isn't there already enough basic
library data out there in the Netherlands
-
to have a good coverage?
-
So, let me show you some of the websites
-
that are available to discover
address and location information
-
about Dutch public libraries.
-
And the first one is this one--
Gidsvoornederland.nl--
-
and that's the official
public library inventory
-
maintained by my library,
the National Library.
-
And you can look up addresses
and geo-coordinates on that website.
-
Then there is this site,
Bibliotheekinzicht--
-
this is also an official website
maintained by my National Library.
-
And this is about
public library statistics.
-
Then there is another one,
debibliotheken.nl--
-
as you can see there is also
address information
-
about library organizations,
not about individual branches.
-
And there's even this one,
which also has address information.
-
And of course, there's something
like Google Maps,
-
which also has all the names
and the locations and the addresses.
-
And this one, the International
Library of Technology,
-
which has a worldwide
inventory of libraries,
-
including the Netherlands.
-
And I even discovered there is a data set
-
you can buy for 50 euros or so
to download it.
-
And there is also--seems to be
I didn't download it,
-
but there seems to be address
information available.
-
You might wonder is this kind of data
good enough for the purposes I had?
-
So, this is my birthday list
for my ideal public library data list.
-
And what's on my list?
-
First of all, the data I want to have
must be up-to-date-ish--
-
it must be fairly up-to-date.
-
So, doesn't have to be real time,
-
but let's say, a couple
of months, or half a year,
-
delayed with official publication,
that's okay for my purposes.
-
And I want to have it both
library branches
-
and the library organizations.
-
Then I want my data to be structured,
because it has to be machine-readable.
-
It has to be in open file format,
such as CSV or JSON or RDF.
-
It has to be linked
to other resources preferably.
-
And the uses--the license on the data
needs to be manifest public domain or CC0.
-
Then, I would like my data to have an API,
-
which must be public, free,
and preferably also anonymous
-
so you don't have to use an API key,
or you have to register an account.
-
And I also want to have
a SPARQL interface.
-
So, now, these are all the sites
I just showed you.
-
And I'm going to make a big grid.
-
And then, this is about
the evaluation I did.
-
I'm not going into it,
but there is no single column
-
which has all green check marks.
-
That's the important thing to take away.
-
And so, in summary, there was no
linked public free linked open data
-
for Dutch public libraries available
before I started my project.
-
So, this was the ideal motivation
to actually work on it.
-
So, that's what I've been doing
for a year now.
-
And I've been adding libraries bit by bit,
organization by organization to Wikidata.
-
I created also a project website on it.
-
It's still rather messy,
but it has all the information,
-
and I try to keep it
as up-to-date as possible.
-
And also all the SPARQL queries
you can see are linked from here.
-
And I'm just adding
really basic information.
-
You see the instances,
images if available,
-
addresses, locations, et cetera,
municipalities.
-
And where possible, I also try to link
the libraries to external identifiers.
-
And then, you can really easily--
we all know,
-
generating some Listeria lists
with public libraries grouped
-
by organizations, for instance.
-
Or using SPARQL queries,
you can also do aggregation on data--
-
let's say, give me all
the municipalities in the Netherlands
-
and the number of library branches
in all the municipalities.
-
With one click, you can make
these kinds of photo galleries.
-
And what I set out to do first,
-
you can really create these kinds of maps.
-
And you might wonder,
"Are there any libraries here or there?"
-
There are--they are not yet in Wikidata.
-
We're still working on that.
-
And actually, last week,
I spoke with a volunteer,
-
who's helping now
with entering the libraries.
-
You can really make cool--in Wikidata,
-
and also with using
the Cartographer extension,
-
you can use these kinds of maps.
-
And I even took it one step further.
-
I also have some Python skills,
and some Leaflet things skills--
-
so, I created, and I'm quite
proud of it, actually.
-
I created this library heat map,
which is fully interactive.
-
You can zoom in to it,
and you can see all the libraries,
-
and you can also run it off Wiki.
-
So, you can just embed it
in your own website,
-
and it fully runs interactively.
-
So, now going back to my big scary table.
-
There is one column
on the right, which is blank.
-
And no surprise, it will be Wikidata.
-
Let's see how it scores there.
-
(cheering)
-
So, I actually think
of printing this on a T-shirt.
-
(laughter)
-
So, just to summarize this in words,
-
thanks to my project, now,
-
there is public free linked open data
available for Dutch public libraries.
-
And who can benefit from my effort?
-
Well, all kinds of parties--
-
you see Wikipedia,
because you can generate lists
-
and overviews and articles,
-
for instance, using this
and be able to from Wikidata
-
for our National Library for--
-
IFLA also has an inventory
of worldwide libraries,
-
they can also reuse the data.
-
And especially for Sandra,
-
it's also important for the Ministry--
Dutch Ministry of Culture--
-
because Sandra is going
to have a talk about Wikidata
-
with the Ministry this Monday,
next Monday.
-
And also, on the righthand side,
for instance,
-
Amazon with Alexa, the assistant,
-
they're also using Wikidata,
-
so you can imagine that they also use,
-
if you're looking for public
library information,
-
they can also use Wikidata for that.
-
Because one year ago,
Simon Cobb inspired me
-
to do this project,
I would like to call upon you,
-
if you have time available,
-
and if you have data from your own country
about public libraries,
-
make the coverage better,
add more red dots,
-
and of course, I'm willing
to help you with that.
-
And Simon is also willing
to help with this.
-
And so, I hope next year, somebody else
-
will be at this conference
or another conference
-
and there will be more
red dots on the map.
-
Thank you very much.
-
(applause)
-
Thank you, Olaf.
-
Next we have Ursula Oberst
and Heleen Smits
-
presenting how can a small
research library benefit from Wikidata:
-
enhancing library products using Wikidata.
-
Okay. Good morning.
My name is Heleen Smits.
-
And my colleague,
Ursula Oberst--where are you?
-
(laughter)
-
And I work at the Library
of the African Studies Center
-
in Leiden, in the Netherlands.
-
And the African Studies Center
is a center devoted--
-
is an academic institution
devoted entirely to the study of Africa,
-
focusing on Humanities and Social Studies.
-
We used to be an independent
research organization,
-
but in 2016, we became part
of Leiden University,
-
and our catalog was integrated
into the larger university catalog.
-
Though it remained possible
to do a search in the part of the Leiden--
-
of the African Studies Catalog, alone,
-
we remained independent in some respects.
-
For example, with respect
to our thesaurus.
-
And also with respect
to the products we make for our users,
-
such as acquisition lists
and work dossiers.
-
And it is in the field of the web dossiers
-
that we have been looking
-
for possible ways to apply Wikidata,
-
and that's the part where Ursula
will in the second part of this talk
-
show you a bit
what we've been doing there.
-
The web dossiers are our collections
-
of titles from our catalog
that we compile
-
around a theme usually connected
to, for example, a conference,
-
or to a special event, and actually,
the most recent web dossier we made
-
was connected to the year
of indigenous languages,
-
and that was around proverbs
in African languages.
-
Our first steps--
-
next slide--our first steps
on the Wiki path as a library,
-
were in 2013, when we were one
of 12 GLAM institutions
-
in the Netherlands,
-
part of the project
of Wikipedians in Residence,
-
and we had for two months,
a Wikipedian in the house,
-
and he gave us trainings
for adding articles to Wikipedia,
-
and also, we made a start with uploading
photo collections to Commons,
-
which always remained a little bit
dependent on funding, as well,
-
whether we would be able to digitize them,
-
and to mostly have
a student assistant to do this.
-
But it was actually a great adding
to what we could offer
-
as an academic library.
-
In May 2018, so is that my Ursula,
my colleague Ursula--
-
she started to really explore--
dive into Wikidata
-
and see what we as a small
and not very much experienced library
-
in these fields could do with that.
-
So, I mentioned, we have
our own thesaurus.
-
And this is where we started.
-
This is a thesaurus of 13,000 terms,
-
all in the field of African studies.
-
It contains a lot of African languages,
-
names of ethnic groups in Africa,
-
and other proper names,
-
which are perhaps especially
interesting for Wikidata.
-
So, it is a real authority control
-
to vocabulary
with 5,000 preferred terms.
-
So, we submitted the request to Wikidata,
-
and that was actually very quickly
met with a positive response,
-
which was very encouraging for us.
-
Our thesaurus was loaded into Mix-n-Match,
-
and by now, 75% of the terms
-
have been manually matched with Wikidata.
-
So, it means, well, that we are now--
-
we are added as an identifier--
-
for example, if you click
on Swahili language,
-
what happens then in Wikidata
on the number that--
-
that connects our term--
is the Wikidata term--
-
we enter into our thesaurus,
-
and from there, you can do a search
directly in the catalog
-
by clicking the button again.
-
It means, also, that Wikidata
has not really integrated
-
into our catalog.
-
But that's also more difficult.
-
Okay, we have to give the floor
-
to Ursula for the next part.
-
(Ursula) Thank you very much, Heleen.
-
So, I will talk about our experiences
-
with incorporating Wikidata elements
-
to our web dossier.
-
A web dossier is--oh, sorry, yeah, sorry.
-
A web dossier, or a classical web dossier,
consists of three parts:
-
an introduction to the subject,
-
mostly written by one of our researchers;
-
a selection of titles, both books
and articles from our collection;
-
and the third part, an annotated list
-
with links to electronic resources.
-
And this year, we added a fourth part
to our web dossiers,
-
which is the Wikidata elements.
-
And it all started last year,
-
and my story is similar
to the story of Olaf, actually.
-
Last year, when I had no clue
about Wikidata,
-
and I discovered this wonderful
article by Alex Stinson
-
on how to write a query in Wikidata.
-
And he chose a subject--
a very appealing subject to me.
-
Namely, "Discovering Women Writers
from North Africa."
-
I can really recommend this article,
-
because it's very instructive.
-
And I thought I will be--
I'm going to work on this query,
-
and try to change it to:
"Southern African Women Writers,"
-
and try to add a link
to their work in our catalog.
-
And on the right-hand side,
you see the SPARQL query
-
which searches for
"Southern African Women Writers."
-
If you click on the button,
on the blue button on the lefthand side,
-
the search result will appear beneath.
-
The search result can have
different formats.
-
In my case, the search result is a map.
-
And the nice thing about Wikidata
-
is that you can embed
to this search result
-
into your own webpage,
-
and that's what we are now doing
with our work dossiers.
-
So, this was the very first one
on Southern African women writers,
-
listed classical three elements,
-
plus this map on the lefthand side,
-
which gives extra information--
-
a link to the Southern African
women writer--
-
a link to her works in our catalog,
-
and a link to the Wikidata record
of her birth place, and her name,
-
her personal record, plus a photo,
if it's available on Wikidata.
-
And you have to retrieve a nice map
-
with a lot of red dots
on the African continent.
-
You need nice data in Wikidata,
complete, sufficient data.
-
So, with our second web dossier
on public art in Africa,
-
we also started to enhance
the data in Wikidata.
-
In this case, for a public art--
we edited geo-locations--
-
geo-locations to Wikidata.
-
And we also searched for works
of public art in commons,
-
and if they don't have
a record on Wikidata yet,
-
we edited the record to Wikidata.
-
And the third thing we do,
-
because when we prepare a web dossier,
-
we download the titles from our catalog,
-
and the tiles are in MARC 21,
-
so we have to convert them to a format
that is presentable on the website,
-
and it takes not much time and effort
to convert the same set of titles
-
to Wikidata QuickStatements,
-
and then, we also upload
a title set to Wikidata,
-
and you can see the titles we uploaded
-
from our latest web dossier
-
on African proverbs in Scholia.
-
A really nice tool
that visualizes Scholia publications
-
being present in Wikidata.
-
And, one second--when it is possible,
we add a Scholia template
-
to our web dossier's topic.
-
Thank you very much.
-
(applause)
-
Thank you, Heleen and Ursula.
-
Next we have Adrian Pohl
presenting using Wikidata
-
to improve spatial subject indexing
and regional bibliography.
-
Okay, hello everybody.
-
I'm going right into the topic.
-
I only have ten minutes to present
a three-year project.
-
It wasn't full time. (laughs)
-
Okay, what's the NWBib?
-
It's an acronym for North-Rhine
Westphalian Bibliography.
-
It's a regional bibliography
that records literature
-
about people and places
in North Rhine-Westphalia.
-
And the monograph's in it--
-
there are a lot of articles in it,
and most of them are quite unique,
-
so, that's the interesting thing
about this bibliography--
-
because it's often
less quite obscure stuff--
-
local people writing
about that tradition,
-
and something like this.
-
And there's over 400,000 entries in there.
-
And the bibliography started in 1983,
-
and so we only have titles
from this publication year onwards.
-
If you want to take a look at it,
it's at nwbib.de,
-
that's the web application.
-
It's based on our service,
lobid.org, the API.
-
Because it's cataloged as part
of the hbz union catalog,
-
which comprises around 20 million records,
-
it's an [inaudible] Aleph system
we get the data out of there,
-
and make RDF out of it,
-
and provide it as via JSON
or the HTTP API.
-
So, the initial status in 2017
-
was we had nearly 9,000 distinct strings
-
about places--referring to places,
in North Rhine-Westphalia.
-
Mostly, those were administrative areas,
like towns and districts,
-
but also monasteries, principalities,
or natural regions.
-
And we already used Wikidata in 2017,
-
and matched those strings
with Wikidata API to Wikidata entries
-
quite naively to get
the geo-coordinates from there,
-
and do some geo-based
discovery stuff with it.
-
But this had some drawbacks.
-
And so, the matching was really poor,
-
and there were a lot of false positives,
-
and we still had no hierarchy
in those places,
-
and we still had a lot
of non-unique names.
-
So, this is an example here.
-
Does this work?
-
Yeah, as you can see,
for one place, Brauweiler,
-
there are four different strings in there.
-
So, we all know how this happens.
-
If there's no authority file,
you end up with this data.
-
But we want to improve on that.
-
And as you can also see,
that while the matching didn't work--
-
so you have this name of the place
-
and there's often the name
of the superior administrative area,
-
and even on the second level,
a superior administrative area
-
often in the name
-
to identify the place successfully.
-
So, the goal was to build a full-fledged
spatial classification based on this data,
-
with a hierarchical view of places,
-
with one entry or ID for each place.
-
And we got this mock-up
by NWBib editors in 2016, made in Excel,
-
to get a feeling of what
they would like to have.
-
There you have the--
Regierungsbezirk--
-
that's the most superior
administrative area--
-
we have in there some towns
or districts--rural districts--
-
and then, it's going down
to the parts of towns,
-
even to this level.
-
And we chose Wikidata for this task.
-
We also looked at the GND,
the Integrated Authority File,
-
and GeoNames--but Wikidata
had the best coverage,
-
and the best infrastructure.
-
The coverage for the places
and the geo-coordinates we need,
-
and the hierarchical
information, for example.
-
There were a lot of places,
also, in the GND,
-
but there was no hierarchical
information in there.
-
And also, Wikidata provides
the infrastructure
-
for editing and versioning.
-
And there's also a community
that helps maintaining the data,
-
which was quite good.
-
Okay, but there was a requirement
by the NWBib editors.
-
They did not want to directly
rely on Wikidata,
-
which was understandable.
-
We don't have those servers
under our control,
-
and we won't know what's going on there.
-
There might be some unwelcome edits
that destroy the classification,
-
or parts of it, or vandalism.
-
So, we decide to put
an intermediate SKOS file in between,
-
on which the application would--
which should be generated from Wikidata.
-
And SKOS is the Simple Knowledge
Organization System--
-
it's the standard way to model
-
a classification in the linked data world.
-
So, how we did it? Five steps.
-
I will come to each
of the steps in more detail.
-
We match the strings to Wikidata
with a better approach than before.
-
Created classification based
on Wikidata, edit,
-
then back the links
from Wikidata to NWBib
-
with a custom property.
-
And now, we are in the process
of establishing a good process
-
for updating the classification
in Wikidata.
-
Seeing--having a DIF
of the changes,
-
and then publishing it to the SKOS file.
-
I will come to the details.
-
So, the matching approach--
-
as the API wasn't very sufficient,
-
and because we have those
different levels in the strings,
-
we build a custom Elasticsearch
index for our task.
-
I think by now, you could probably,
as well, use OpenRefine for doing this,
-
but at that point in time,
it wasn't available for Wikidata.
-
And we build this index base
on SPARQL query,
-
and for entities in NRW,
and with a specific type.
-
And the query evolved over time a lot.
-
And we have a few entries
that you can see the history on GitHub.
-
So, where we put in the matching index,
-
in the spatial object,
is what we need in our data.
-
It's the label and the ID
or the link to Wikidata,
-
the geo-coordinates, and the type
from Wikidata [inaudible], as well.
-
But also for the matching, very important
that aliases and the broader thing--
-
and this is also an example where the name
of the broader entity
-
and the district itself are very similar.
-
So, it's important to have
some type information, as well,
-
for the matching.
-
So, the nationwide results
were very good.
-
We could automatically match
more than 99% of records
-
with this approach.
-
These were only 92% of the strings.
-
So, obviously, the results--
-
those strings that only occurred
one or two times
-
often didn't appear in Wikidata.
-
And so, we had to do a lot of work
with those with the [long tail].
-
And for around 1,000 strings,
the matching was incorrect.
-
But the catalogers did a lot of work
in the Aleph catalog,
-
but also in Wikidata, they made
more than 6,000 manual edits to Wikidata
-
to reach 100% coverage by adding
aliases-type information,
-
creating new entries.
-
Okay, so, I have to speed up.
-
We created classification based on this,
on the hierarchical statements.
-
P131 is the main property there.
-
We added the information to our data.
-
So, we now have this
in our data spatial object--
-
and we focus this--the link to Wikidata,
and the types are there,
-
and here's the ID
from the SKOS classification
-
we built based on Wikidata.
-
And you can see there
are Q identifiers in there.
-
Now, you can basically query our API
-
with such a query using Wikidata URIs,
-
and get literature, in this example,
about Cologne back.
-
Then we created a Wikidata property
for NWBib and edit those links
-
from Wikidata to the classification--
batch load them with QuickStatements.
-
And there's also a nice--
-
also a move to using a qualifier
on this property
-
to add the broader information there.
-
So, I think people won't mess around
that work with this,
-
and as with the P131 statement.
-
So, this is what it looks like.
-
This will go to the classification
where you can then start a query.
-
Now, we have to build this
update and review process,
-
and we will add those data like this,
-
with a zero sub-field to Aleph,
-
and the catalogers will start
using those Wikidata based IDs,
-
URIs, for cataloging for spatial indexing.
-
So, by now, there are more than 400,000
NWBib entries with links to Wikidata,
-
and more than 4,400 Wikidata entries
with links to NWBib.
-
Thank you.
-
(applause)
-
Thank you, Adrian.
-
I got it. Thank you.
-
So, as you've seen me before,
I'm Hilary Thorsen.
-
I'm Wikimedian in residence
-
with the Linked Data
for Production Project.
-
I am based at Stanford,
-
and I'm here today
with my colleague, Lena Denis,
-
who is Cartographic Assistant
at Harvard Library.
-
And Christine Fernsebner Eslao
is here in spirit.
-
She is currently back in Boston,
but supporting us from afar.
-
So, we'll be talking
about Wikidata and Libraries
-
as partners in data production,
organization, and project inspiration.
-
And our work is part of the Linked Data
for Production Project.
-
So, Linked Data for Production
is in its second phase,
-
called Pathway for Implementation.
-
And it's an Andrew W. Mellon
Foundation grant,
-
involving the partnership
of several universities,
-
with the goal of constructing a pathway
for shifting the catalog community
-
to begin describing library
resources with linked data.
-
And it builds upon a previous grant,
-
but this iteration is focused
on the practical aspects
-
of the transition.
-
One of these pathways of investigation
-
has been integrating
library metadata with Wikidata.
-
We have a lot of questions,
-
but some of the ones
we're most interested in
-
are how we can integrate
library metadata with Wikidata,
-
and make contribution
a part of our cataloging workflows,
-
how Wikidata can help us improve
our library discovery environment,
-
how it can help us reveal
more relationships
-
and connections within our data
and with external data sets,
-
and if we have connections in our own data
that can be added to Wikidata,
-
how libraries can help
fill in gaps in Wikidata,
-
and how libraries can work
with local communities
-
to describe library
and archival resources.
-
Finding answers to these questions
has focused on the mutual benefit
-
for the library and Wikidata communities.
-
We've learned through starting to work
on our different Wikidata projects,
-
that many of the issues
libraries grapple with,
-
like data modeling, identity management,
data maintenance, documentation,
-
and instruction on linked data,
-
are ones the Wikidata
community works on too.
-
I'm going to turn things over to Lena
-
to talk about what
she's been working on now.
-
Hi, so, as Hilary briefly mentioned,
I work as a map librarian at Harvard,
-
where I process maps, atlases,
and archives for our online catalog.
-
And while processing two-dimensional
cartographic works
-
is relatively straighforward,
cataloging archival collections
-
so that their cartographic resources
can be made discoverable,
-
has always been more difficult.
-
So, my use case for Wikidata
is visually modeling relationships
-
between archival collections
and the individual items within them,
-
as well as between archival drafts
in published works.
-
So, I used Wikidata to highlight the work
of our cartographer named Erwin Raisz,
-
who worked at Harvard
in the early 20th-century.
-
He was known for his vividly detailed
and artistic land forms,
-
like this one on the screen--
-
but also for inventing
the armadillo projection,
-
writing the first cartography
textbook in English
-
and other various
important contributions
-
to the field of geography.
-
And at the Harvard Map Collection,
-
we have a 66-item collection
of Raisz's field notebooks,
-
which begin when he was a student
and end just before his death.
-
So, this is the collection-level record
that I made for them,
-
which merely gives an overview,
-
but his notebooks are full of information
-
that he used in later atlases,
maps, and textbooks.
-
But researchers don't know how to find
that trajectory information,
-
and the system
is not designed to show them.
-
So, I felt that with Wikidata,
and other Wikimedia platforms,
-
I'd be able to take advantage
-
of information that already exists
about him on the open web,
-
along with library records
and a notebook inventory
-
that I had made in an Excel spreadsheet
-
to show relationships and influences
between his works.
-
So here, you can see how I edited
and reconciled library data
-
in OpenRefine.
-
And then, I used QuickStatements
to batch import my results.
-
So, now, I was ready
to create knowledge graphs
-
with SPARQL queries
to show patterns of influence.
-
The examples here show
how I leveraged Wikimedia Commons images
-
that I connected to him.
-
And the hierarchy of some of his works
-
that were contributing
factors to other works.
-
So, modeling Raisz's works on Wikidata
allowed me to encompass in a single image,
-
or in this case, in two images,
the connections that require many pages
-
of bibliographic data to reveal.
-
So, this video is going to load.
-
Yes! Alright.
-
This video is a minute and a half long
screencast I made,
-
that I'm going to narrate as you watch.
-
It shows the process of inputting
and then running a SPARQL query,
-
showing hierarchical relationships
between notebooks, an atlas, and a map
-
that Raisz created about Cuba.
-
He worked there before the revolution,
-
so he had the unique position
of having support
-
from both the American
and the Cuban governments.
-
So, I made this query as an example
to show people who work on Raisz,
-
and who are interested in narrowing down
what materials they'd like to request
-
when they come to us for research.
-
To make the approach replicable
for other archival collections,
-
I hope that Harvard and other institutions
will prioritize Wikidata look-ups
-
as they move to linked data
cataloging production,
-
which my co-presenters
can speak to the progress on
-
better than I can.
-
But my work has brought me--
has brought to mind a particular issue
-
that I see as a future opportunity,
which is that of archival modeling.
-
So, to an archivist, an item
is a discrete archival material
-
within a larger collection
of archival materials
-
that is not a physical location.
-
So an archivist from the American National
Archives and Records Administration,
-
who is also a Wikidata enthusiast,
-
advised me when I was trying
to determine how to express this
-
using an example item,
-
that I'm going to show
as soon as this video is finally over.
-
Alright. Great.
-
Nope, that's not what I wanted.
-
Here we go.
-
It's doing that.
-
(humming)
-
Nope. Sorry. Sorry.
-
Alright, I don't know why
it's not going full screen again.
-
I can't get it to do anything.
-
But this is the-- oh, my gosh.
-
Stop that. Alright.
-
So, this is the item that I mentioned.
-
So, this was what the archivist
-
from the National Archives
and Records Administration
-
showed me as an example.
-
And he recommended this compromise,
which is to use the part of property
-
to connect a lower level description
to a higher level of description,
-
which allows the relationships
between different hierarchical levels
-
to be asserted as statements
and qualifiers.
-
So, in this example that's on screen,
-
the relationship between an item,
a series, a collection, and a record group
-
are thus contained and described
within a Wikidata item entity.
-
So, I followed this model
in my work on Raisz.
-
And one of my images is missing.
-
No, it's not. It's right there. I'm sorry.
-
And so, I followed this model
on my work on Raisz,
-
but I look forward
to further standardization.
-
So, another archival project
Harvard is working on
-
is the Arthur Freedman collection
of more than 2,000 hours
-
of punk rock performances
from the 1970s to early 2000s
-
in the Boston and Cambridge,
Massachussets areas.
-
It includes many bands and venues
that no longer exist.
-
So far, work has been done in OpenRefine
on reconciliation of the bands and venues
-
to see which need an item
created in Wikidata.
-
A basic item will be created
via batch process next spring,
-
and then, an edit-a-thon will be
held in conjunction
-
with the New England Music Library
Association's meeting in Boston
-
to focus on adding more statements
to the batch-created items,
-
by drawing on local music
community knowledge.
-
We're interested in learning more
about models for pairing librarians
-
and Wiki enthusiasts with new contributors
who have domain knowledge.
-
Items will eventually be linked
to digitized video
-
in Harvard's digital collection platform
-
once rights have
been cleared with artists,
-
which will likely be a slow process.
-
There's also a great amount of interest
-
in moving away from manual cataloging
and creation of authority data
-
towards identity management,
-
where descriptions
can be created in batches.
-
An additional project that focused on
-
creating international standard
name identifiers, or ISNIs,
-
for avant-garde and women filmmakers
-
can be adapted for creating Wikidata items
for these filmmakers, as well.
-
Spreadsheets with the ISNIs,
filmmaker names, and other details
-
can be reconciled in OpenRefine,
and uploaded with QuickStatements.
-
Once people in organizations
have been described,
-
we'll move toward describing
the films in Wikidata,
-
which will likely present
some additional modeling challenges.
-
A library presentation
wouldn't be complete
-
without a MARC record.
-
Here, you can see the record
for Karen Aqua's taxonomy film,
-
where her ISNI and Wikidata Q number
-
have been added to the 100 field.
-
The ISNIs and Wikidata Q numbers
that have been created
-
can then be batch added
back into MARC records via MarcEdit.
-
You might be asking why I'm showing you
this ugly MARC record,
-
instead of some beautiful
linked data statements.
-
And that's because our libraries
will be working in a hybrid environment
-
for some time.
-
Our library catalogs still relies
on MARC records,
-
so by adding in these URIs,
-
we can try to take advantage
of linked data,
-
while our systems still use MARC.
-
Adding URIs into MARC records
makes an additional aspect
-
of our project possible.
-
Work has been done at Stanford
and Cornell to bring data
-
from Wikidata into our library catalog
using URIs already in our MARC records.
-
You can see an example
of a knowledge panel,
-
where all the data is sourced
from Wikidata,
-
and links back to the item itself,
along with an invitation to contribute.
-
This is currently in a test environment,
not in production in our catalog.
-
Ideally, eventually,
these will be generated
-
from linked data descriptions
of library resources
-
created using Sinopia,
our linked data editor
-
developed for cataloging.
-
We found that adding a look-up
to Wikidata in Sinopia is difficult.
-
The scale and modeling of Wikidata
makes it hard to partition the data
-
to be able to look up typed entities,
-
and we've run into the problem
-
of SPARQL not being good
for keyword search,
-
but wanting our keyword APIs
to return SPARQL-like RDF descriptions.
-
So, as you can see, we still have
quite a bit of work to do.
-
This round of the grant
runs until June 2020,
-
so, we'll be continuing our exploration.
-
And I just wanted to invite anyone
-
who's continued an interest in talking
about Wikidata and libraries,
-
I lead a Wikidata Affinity Group
that's open to anyone to join.
-
We meet every two weeks,
-
and our next call is Tuesday,
November the 5th,
-
so if you're interested
in continuing discussions,
-
I would love to talk with you further.
-
Thank you, everyone.
-
And thank you to the other presenters
-
for talking about all
of their wonderful projects.
-
(applause)