I work as a teacher
at the University of Alicante,
where I recently obtained my PhD
on data libraries and linked open data.
And I'm also a software developer
at the Biblioteca Virtual
Miguel de Cervantes.
And today, I'm going to talk
about data quality.
Well, those are my colleagues
at the university.
And as you may know, many organizations
are publishing their data
or linked open data--
for example,
the National Library of France,
the National Library of Spain,
us, which is Cervantes Virtual,
the British National Bibliography,
the Library of Congress and Europeana.
All of them provide a SPARQL endpoint,
which is useful in order
to retrieve the data.
And if I'm not wrong,
the Library of Congress only provide
the data as a dump that you can't use.
When we publish our repository
as linked open data,
my idea was to be reused
by other institutions.
But what about if I'm an institution
who wants to enrich their data
with any data from other data libraries.
Which data set should I use?
Which data set is better
in terms of quality?
The benefits of the evaluation
of data quality in libraries are many.
For example, methodologies can be improved
in order to include new criteria,
in order to assess the quality.
And also, organizations can benefit
from best practices and guidelines
in order to publish their data
as linked open data.
What do we need
in order to assess the quality?
Well, obviously, a set of candidates
and a set of features.
For example, do they have
a SPARQL endpoint,
do they have a web interface,
how many publications do they have,
how many vocabularies do they use,
how many Wikidata properties do they have,
and where can I get those candidates?
I use LOD Cloud--
but when I was doing this slide,
I thought about using Wikidata
in order to retrieve those candidates.
For example, getting entities
of type data library,
which has a SPARQL endpoint.
You have here the link.
And I come up with those data libraries.
The first one uses bibliographic ontology
as main vocabulary,
and the others are based,
more or less, on FRBR,
which is a vocabulary published by IFLA.
And this is just an example
of how we could compare
data libraries using
bubble charts on Wikidata.
And this is just an example comparing
how many Wikidata properties
are per data library.
Well, how can we measure quality?
There are different methodologies,
for example, FRBR 1,
which provides a set of criteria
grouped by dimensions,
and those in green
are the ones that I found--
that I could assess by means of Wikidata.
And we also find that we
could define new criteria,
for example, a new one to evaluate
the number of duplications in Wikidata.
We use those properties.
And this is an example of SPARQL,
in order to count the number
of duplicates property.
And about the results,
while at the moment of doing this study,
not the slides, there was no property
for the British National Bibliography.
They don't provide provenance information,
which could be useful
for metadata enrichment.
And they don't allow
to edit the information.
So, we've been talking
about Wikibase the whole weekend,
and maybe we should try to adopt
Wikibase as an interface.
And they are focused on their own content,
and this is just the SPARQL query
based on Wikidata
in order to assess the population.
And the BnF provides labels
in multiple languages,
and they all use self-describing URIs,
which is that in the URI,
they have the type of entity,
which allows the human reader
to understand what they are using.
And more results, they provide
different output format,
they use external vocabularies.
Only the British National Bibliography
provides machine-readable
licensing information.
And up to one-third of the instances
are connected to external repositories,
which is really nice.
And while this study, this work
has been done in our Labs team,
a lab in a GLAM is a group of people
who want to explore new ways
of reusing data collections.
And there's a community
led by the British Library,
and in particular, Mahendra Mahey,
and we had a first event in London,
and another one in Copenhagen,
and we're going to have a new one in May
at the Library of Congress in Washington.
And we are now 250 people.
And I'm so glad that I found
somebody here at the WikidataCon
who has just joined us--
Sylvia from [inaudible], Mexico.
And I'd like to invite you
to our community,
since you may be part
of a GLAM institution.
So, we can talk later
if you want to know about this.
And this--it's all about people.
This is me, people
from the British Library,
Library of Congress, Universities,
and National Libraries in Europe
And there's a link here
in case you want to know more.
And, well, last month,
we decided to meet in Doha
in order to write a book
about how to create a lab in our GLAM.
And they choose 15 people,
and I was so lucky to be there.
And the book follows
the Booksprint methodology,
which means that nothing
is prepared beforehand.
All is done there in a week.
And believe me, it was really hard work
to have their whole book
done in this week.
And I'd like to introduce you to the book,
which will be published--
it was supposed to be published this week,
but it will be next week.
And it will be published open,
so you can have it,
and I can show you
a little bit later if you want.
And those are the authors.
I'm here-- I'm so happy, too.
And those are the institutions--
Library of Congress, British Library--
and this is the title.
And now, I'd like to show you--
a map that I'm doing.
We are launching a website
for our community,
and I'm in charge of creating a map
with our institutions there.
This is not finished.
But this is just SPARQL, and below,
we see the map.
And we see here
the new people that I found, here,
at the WikidataCon--
I'm so happy for this.
And we have here my data library
of my university,
and many other institutions.
Also, from Australia--
if I can do it.
Well, here, we have some links.
There you go.
Okay, this is not finished.
We are still working on this,
and that's all.
Thank you very much for your attention.
(applause)
[inaudible]
Good morning, everybody.
I'm Olaf Janssen.
I'm the Wikimedia coordinator
at the National Library
of the Netherlands.
And I would like to share my work,
which I'm doing about creating
Linked Open Data
for Dutch Public Libraries using Wikidata.
And my story starts roughly a year ago
when I was at the GLAM Wiki conference
in Tel Aviv, in Israel.
And there are two men
with very similar shirts,
and equally similar hairdos, [Matt]...
(laughter)
And on the left, that's me.
And a year ago, I didn't have
any practical knowledge and skills
about Wikidata.
I looked at Wikidata,
and I looked at the items,
and I played with it.
But I wasn't able to make a SPARQL query
or to do data modeling
with the right shape expression.
That's a year ago.
And on the lefthand side,
that's Simon Cobb, user: Sic19.
And I was talking to him,
because, just before,
he had given a presentation
about improving the coverage
of public libraries in Wikidata.
And I was very inspired by his talk.
And basically, he was talking
about adding basic data
about public libraries.
So, the name of the library, if available,
the photo of the building,
the address data of the library,
the geo-coordinates
latitude and longitude,
and some other things,
including with all source references.
And what I was very impressed
about a year ago was this map.
This is a map about
public libraries in the U.K.
with all the colors.
And you can see that all the libraries
are layered by library organizations.
And when he showed this,
I was really, "Wow, that's cool."
So, then, one minute later, I thought,
"Well, let's do it
for the country for that one."
(laughter)
And something about public libraries
in the Netherlands--
there are about 1,300 library
branches in our country,
grouped into 160 library organizations.
And you might wonder why
do I want to do this project?
Well, first of all, because
for the common good, for society,
because I think using Wikidata,
and from there,
creating Wikipedia articles,
and opening it up
via the linked open data cloud--
it's improving visibility and reusability
of public libraries in the Netherlands.
And my second goal was actually
a more personal one,
because a year ago, I had this
yearly evaluation with my manager,
and we decided it was a good idea
that I got more practical skills
on linked open data, data modeling,
and also on Wikidata.
And of course, I wanted to be able to make
these kinds of maps myself.
(laughter)
Then you might wonder
why do I want to do this?
Isn't there already enough basic
library data out there in the Netherlands
to have a good coverage?
So, let me show you some of the websites
that are available to discover
address and location information
about Dutch public libraries.
And the first one is this one--
Gidsvoornederland.nl--
and that's the official
public library inventory
maintained by my library,
the National Library.
And you can look up addresses
and geo-coordinates on that website.
Then there is this site,
Bibliotheekinzicht--
this is also an official website
maintained by my National Library.
And this is about
public library statistics.
Then there is another one,
debibliotheken.nl--
as you can see there is also
address information
about library organizations,
not about individual branches.
And there's even this one,
which also has address information.
And of course, there's something
like Google Maps,
which also has all the names
and the locations and the addresses.
And this one, the International
Library of Technology,
which has a worldwide
inventory of libraries,
including the Netherlands.
And I even discovered there is a data set
you can buy for 50 euros or so
to download it.
And there is also--seems to be
I didn't download it,
but there seems to be address
information available.
You might wonder is this kind of data
good enough for the purposes I had?
So, this is my birthday list
for my ideal public library data list.
And what's on my list?
First of all, the data I want to have
must be up-to-date-ish--
it must be fairly up-to-date.
So, doesn't have to be real time,
but let's say, a couple
of months, or half a year,
delayed with official publication,
that's okay for my purposes.
And I want to have it both
library branches
and the library organizations.
Then I want my data to be structured,
because it has to be machine-readable.
It has to be in open file format,
such as CSV or JSON or RDF.
It has to be linked
to other resources preferably.
And the uses--the license on the data
needs to be manifest public domain or CC0.
Then, I would like my data to have an API,
which must be public, free,
and preferably also anonymous
so you don't have to use an API key,
or you have to register an account.
And I also want to have
a SPARQL interface.
So, now, these are all the sites
I just showed you.
And I'm going to make a big grid.
And then, this is about
the evaluation I did.
I'm not going into it,
but there is no single column
which has all green check marks.
That's the important thing to take away.
And so, in summary, there was no
linked public free linked open data
for Dutch public libraries available
before I started my project.
So, this was the ideal motivation
to actually work on it.
So, that's what I've been doing
for a year now.
And I've been adding libraries bit by bit,
organization by organization to Wikidata.
I created also a project website on it.
It's still rather messy,
but it has all the information,
and I try to keep it
as up-to-date as possible.
And also all the SPARQL queries
you can see are linked from here.
And I'm just adding
really basic information.
You see the instances,
images if available,
addresses, locations, et cetera,
municipalities.
And where possible, I also try to link
the libraries to external identifiers.
And then, you can really easily--
we all know,
generating some Listeria lists
with public libraries grouped
by organizations, for instance.
Or using SPARQL queries,
you can also do aggregation on data--
let's say, give me all
the municipalities in the Netherlands
and the number of library branches
in all the municipalities.
With one click, you can make
these kinds of photo galleries.
And what I set out to do first,
you can really create these kinds of maps.
And you might wonder,
"Are there any libraries here or there?"
There are--they are not yet in Wikidata.
We're still working on that.
And actually, last week,
I spoke with a volunteer,
who's helping now
with entering the libraries.
You can really make cool--in Wikidata,
and also with using
the Cartographer extension,
you can use these kinds of maps.
And I even took it one step further.
I also have some Python skills,
and some Leaflet things skills--
so, I created, and I'm quite
proud of it, actually.
I created this library heat map,
which is fully interactive.
You can zoom in to it,
and you can see all the libraries,
and you can also run it off Wiki.
So, you can just embed it
in your own website,
and it fully runs interactively.
So, now going back to my big scary table.
There is one column
on the right, which is blank.
And no surprise, it will be Wikidata.
Let's see how it scores there.
(cheering)
So, I actually think
of printing this on a T-shirt.
(laughter)
So, just to summarize this in words,
thanks to my project, now,
there is public free linked open data
available for Dutch public libraries.
And who can benefit from my effort?
Well, all kinds of parties--
you see Wikipedia,
because you can generate lists
and overviews and articles,
for instance, using this
and be able to from Wikidata
for our National Library for--
IFLA also has an inventory
of worldwide libraries,
they can also reuse the data.
And especially for Sandra,
it's also important for the Ministry--
Dutch Ministry of Culture--
because Sandra is going
to have a talk about Wikidata
with the Ministry this Monday,
next Monday.
And also, on the righthand side,
for instance,
Amazon with Alexa, the assistant,
they're also using Wikidata,
so you can imagine that they also use,
if you're looking for public
library information,
they can also use Wikidata for that.
Because one year ago,
Simon Cobb inspired me
to do this project,
I would like to call upon you,
if you have time available,
and if you have data from your own country
about public libraries,
make the coverage better,
add more red dots,
and of course, I'm willing
to help you with that.
And Simon is also willing
to help with this.
And so, I hope next year, somebody else
will be at this conference
or another conference
and there will be more
red dots on the map.
Thank you very much.
(applause)
Thank you, Olaf.
Next we have Ursula Oberst
and Heleen Smits
presenting how can a small
research library benefit from Wikidata:
enhancing library products using Wikidata.
Okay. Good morning.
My name is Heleen Smits.
And my colleague,
Ursula Oberst--where are you?
(laughter)
And I work at the Library
of the African Studies Center
in Leiden, in the Netherlands.
And the African Studies Center
is a center devoted--
is an academic institution
devoted entirely to the study of Africa,
focusing on Humanities and Social Studies.
We used to be an independent
research organization,
but in 2016, we became part
of Leiden University,
and our catalog was integrated
into the larger university catalog.
Though it remained possible
to do a search in the part of the Leiden--
of the African Studies Catalog, alone,
we remained independent in some respects.
For example, with respect
to our thesaurus.
And also with respect
to the products we make for our users,
such as acquisition lists
and work dossiers.
And it is in the field of the web dossiers
that we have been looking
for possible ways to apply Wikidata,
and that's the part where Ursula
will in the second part of this talk
show you a bit
what we've been doing there.
The web dossiers are our collections
of titles from our catalog
that we compile
around a theme usually connected
to, for example, a conference,
or to a special event, and actually,
the most recent web dossier we made
was connected to the year
of indigenous languages,
and that was around proverbs
in African languages.
Our first steps--
next slide--our first steps
on the Wiki path as a library,
were in 2013, when we were one
of 12 GLAM institutions
in the Netherlands,
part of the project
of Wikipedians in Residence,
and we had for two months,
a Wikipedian in the house,
and he gave us trainings
for adding articles to Wikipedia,
and also, we made a start with uploading
photo collections to Commons,
which always remained a little bit
dependent on funding, as well,
whether we would be able to digitize them,
and to mostly have
a student assistant to do this.
But it was actually a great adding
to what we could offer
as an academic library.
In May 2018, so is that my Ursula,
my colleague Ursula--
she started to really explore--
dive into Wikidata
and see what we as a small
and not very much experienced library
in these fields could do with that.
So, I mentioned, we have
our own thesaurus.
And this is where we started.
This is a thesaurus of 13,000 terms,
all in the field of African studies.
It contains a lot of African languages,
names of ethnic groups in Africa,
and other proper names,
which are perhaps especially
interesting for Wikidata.
So, it is a real authority control
to vocabulary
with 5,000 preferred terms.
So, we submitted the request to Wikidata,
and that was actually very quickly
met with a positive response,
which was very encouraging for us.
Our thesaurus was loaded into Mix-n-Match,
and by now, 75% of the terms
have been manually matched with Wikidata.
So, it means, well, that we are now--
we are added as an identifier--
for example, if you click
on Swahili language,
what happens then in Wikidata
on the number that--
that connects our term--
is the Wikidata term--
we enter into our thesaurus,
and from there, you can do a search
directly in the catalog
by clicking the button again.
It means, also, that Wikidata
has not really integrated
into our catalog.
But that's also more difficult.
Okay, we have to give the floor
to Ursula for the next part.
(Ursula) Thank you very much, Heleen.
So, I will talk about our experiences
with incorporating Wikidata elements
to our web dossier.
A web dossier is--oh, sorry, yeah, sorry.
A web dossier, or a classical web dossier,
consists of three parts:
an introduction to the subject,
mostly written by one of our researchers;
a selection of titles, both books
and articles from our collection;
and the third part, an annotated list
with links to electronic resources.
And this year, we added a fourth part
to our web dossiers,
which is the Wikidata elements.
And it all started last year,
and my story is similar
to the story of Olaf, actually.
Last year, when I had no clue
about Wikidata,
and I discovered this wonderful
article by Alex Stinson
on how to write a query in Wikidata.
And he chose a subject--
a very appealing subject to me.
Namely, "Discovering Women Writers
from North Africa."
I can really recommend this article,
because it's very instructive.
And I thought I will be--
I'm going to work on this query,
and try to change it to:
"Southern African Women Writers,"
and try to add a link
to their work in our catalog.
And on the right-hand side,
you see the SPARQL query
which searches for
"Southern African Women Writers."
If you click on the button,
on the blue button on the lefthand side,
the search result will appear beneath.
The search result can have
different formats.
In my case, the search result is a map.
And the nice thing about Wikidata
is that you can embed
to this search result
into your own webpage,
and that's what we are now doing
with our work dossiers.
So, this was the very first one
on Southern African women writers,
listed classical three elements,
plus this map on the lefthand side,
which gives extra information--
a link to the Southern African
women writer--
a link to her works in our catalog,
and a link to the Wikidata record
of her birth place, and her name,
her personal record, plus a photo,
if it's available on Wikidata.
And you have to retrieve a nice map
with a lot of red dots
on the African continent.
You need nice data in Wikidata,
complete, sufficient data.
So, with our second web dossier
on public art in Africa,
we also started to enhance
the data in Wikidata.
In this case, for a public art--
we edited geo-locations--
geo-locations to Wikidata.
And we also searched for works
of public art in commons,
and if they don't have
a record on Wikidata yet,
we edited the record to Wikidata.
And the third thing we do,
because when we prepare a web dossier,
we download the titles from our catalog,
and the tiles are in MARC 21,
so we have to convert them to a format
that is presentable on the website,
and it takes not much time and effort
to convert the same set of titles
to Wikidata QuickStatements,
and then, we also upload
a title set to Wikidata,
and you can see the titles we uploaded
from our latest web dossier
on African proverbs in Scholia.
A really nice tool
that visualizes Scholia publications
being present in Wikidata.
And, one second--when it is possible,
we add a Scholia template
to our web dossier's topic.
Thank you very much.
(applause)
Thank you, Heleen and Ursula.
Next we have Adrian Pohl
presenting using Wikidata
to improve spatial subject indexing
and regional bibliography.
Okay, hello everybody.
I'm going right into the topic.
I only have ten minutes to present
a three-year project.
It wasn't full time. (laughs)
Okay, what's the NWBib?
It's an acronym for North-Rhine
Westphalian Bibliography.
It's a regional bibliography
that records literature
about people and places
in North Rhine-Westphalia.
And the monograph's in it--
there are a lot of articles in it,
and most of them are quite unique,
so, that's the interesting thing
about this bibliography--
because it's often
less quite obscure stuff--
local people writing
about that tradition,
and something like this.
And there's over 400,000 entries in there.
And the bibliography started in 1983,
and so we only have titles
from this publication year onwards.
If you want to take a look at it,
it's at nwbib.de,
that's the web application.
It's based on our service,
lobid.org, the API.
Because it's cataloged as part
of the hbz union catalog,
which comprises around 20 million records,
it's an [inaudible] Aleph system
we get the data out of there,
and make RDF out of it,
and provide it as via JSON
or the HTTP API.
So, the initial status in 2017
was we had nearly 9,000 distinct strings
about places--referring to places,
in North Rhine-Westphalia.
Mostly, those were administrative areas,
like towns and districts,
but also monasteries, principalities,
or natural regions.
And we already used Wikidata in 2017,
and matched those strings
with Wikidata API to Wikidata entries
quite naively to get
the geo-coordinates from there,
and do some geo-based
discovery stuff with it.
But this had some drawbacks.
And so, the matching was really poor,
and there were a lot of false positives,
and we still had no hierarchy
in those places,
and we still had a lot
of non-unique names.
So, this is an example here.
Does this work?
Yeah, as you can see,
for one place, Brauweiler,
there are four different strings in there.
So, we all know how this happens.
If there's no authority file,
you end up with this data.
But we want to improve on that.
And as you can also see,
that while the matching didn't work--
so you have this name of the place
and there's often the name
of the superior administrative area,
and even on the second level,
a superior administrative area
often in the name
to identify the place successfully.
So, the goal was to build a full-fledged
spatial classification based on this data,
with a hierarchical view of places,
with one entry or ID for each place.
And we got this mock-up
by NWBib editors in 2016, made in Excel,
to get a feeling of what
they would like to have.
There you have the--
Regierungsbezirk--
that's the most superior
administrative area--
we have in there some towns
or districts--rural districts--
and then, it's going down
to the parts of towns,
even to this level.
And we chose Wikidata for this task.
We also looked at the GND,
the Integrated Authority File,
and GeoNames--but Wikidata
had the best coverage,
and the best infrastructure.
The coverage for the places
and the geo-coordinates we need,
and the hierarchical
information, for example.
There were a lot of places,
also, in the GND,
but there was no hierarchical
information in there.
And also, Wikidata provides
the infrastructure
for editing and versioning.
And there's also a community
that helps maintaining the data,
which was quite good.
Okay, but there was a requirement
by the NWBib editors.
They did not want to directly
rely on Wikidata,
which was understandable.
We don't have those servers
under our control,
and we won't know what's going on there.
There might be some unwelcome edits
that destroy the classification,
or parts of it, or vandalism.
So, we decide to put
an intermediate SKOS file in between,
on which the application would--
which should be generated from Wikidata.
And SKOS is the Simple Knowledge
Organization System--
it's the standard way to model
a classification in the linked data world.
So, how we did it? Five steps.
I will come to each
of the steps in more detail.
We match the strings to Wikidata
with a better approach than before.
Created classification based
on Wikidata, edit,
then back the links
from Wikidata to NWBib
with a custom property.
And now, we are in the process
of establishing a good process
for updating the classification
in Wikidata.
Seeing--having a DIF
of the changes,
and then publishing it to the SKOS file.
I will come to the details.
So, the matching approach--
as the API wasn't very sufficient,
and because we have those
different levels in the strings,
we build a custom Elasticsearch
index for our task.
I think by now, you could probably,
as well, use OpenRefine for doing this,
but at that point in time,
it wasn't available for Wikidata.
And we build this index base
on SPARQL query,
and for entities in NRW,
and with a specific type.
And the query evolved over time a lot.
And we have a few entries
that you can see the history on GitHub.
So, where we put in the matching index,
in the spatial object,
is what we need in our data.
It's the label and the ID
or the link to Wikidata,
the geo-coordinates, and the type
from Wikidata [inaudible], as well.
But also for the matching, very important
that aliases and the broader thing--
and this is also an example where the name
of the broader entity
and the district itself are very similar.
So, it's important to have
some type information, as well,
for the matching.
So, the nationwide results
were very good.
We could automatically match
more than 99% of records
with this approach.
These were only 92% of the strings.
So, obviously, the results--
those strings that only occurred
one or two times
often didn't appear in Wikidata.
And so, we had to do a lot of work
with those with the [long tail].
And for around 1,000 strings,
the matching was incorrect.
But the catalogers did a lot of work
in the Aleph catalog,
but also in Wikidata, they made
more than 6,000 manual edits to Wikidata
to reach 100% coverage by adding
aliases-type information,
creating new entries.
Okay, so, I have to speed up.
We created classification based on this,
on the hierarchical statements.
P131 is the main property there.
We added the information to our data.
So, we now have this
in our data spatial object--
and we focus this--the link to Wikidata,
and the types are there,
and here's the ID
from the SKOS classification
we built based on Wikidata.
And you can see there
are Q identifiers in there.
Now, you can basically query our API
with such a query using Wikidata URIs,
and get literature, in this example,
about Cologne back.
Then we created a Wikidata property
for NWBib and edit those links
from Wikidata to the classification--
batch load them with QuickStatements.
And there's also a nice--
also a move to using a qualifier
on this property
to add the broader information there.
So, I think people won't mess around
that work with this,
and as with the P131 statement.
So, this is what it looks like.
This will go to the classification
where you can then start a query.
Now, we have to build this
update and review process,
and we will add those data like this,
with a zero sub-field to Aleph,
and the catalogers will start
using those Wikidata based IDs,
URIs, for cataloging for spatial indexing.
So, by now, there are more than 400,000
NWBib entries with links to Wikidata,
and more than 4,400 Wikidata entries
with links to NWBib.
Thank you.
(applause)
Thank you, Adrian.
I got it. Thank you.
So, as you've seen me before,
I'm Hilary Thorsen.
I'm Wikimedian in residence
with the Linked Data
for Production Project.
I am based at Stanford,
and I'm here today
with my colleague, Lena Denis,
who is Cartographic Assistant
at Harvard Library.
And Christine Fernsebner Eslao
is here in spirit.
She is currently back in Boston,
but supporting us from afar.
So, we'll be talking
about Wikidata and Libraries
as partners in data production,
organization, and project inspiration.
And our work is part of the Linked Data
for Production Project.
So, Linked Data for Production
is in its second phase,
called Pathway for Implementation.
And it's an Andrew W. Mellon
Foundation grant,
involving the partnership
of several universities,
with the goal of constructing a pathway
for shifting the catalog community
to begin describing library
resources with linked data.
And it builds upon a previous grant,
but this iteration is focused
on the practical aspects
of the transition.
One of these pathways of investigation
has been integrating
library metadata with Wikidata.
We have a lot of questions,
but some of the ones
we're most interested in
are how we can integrate
library metadata with Wikidata,
and make contribution
a part of our cataloging workflows,
how Wikidata can help us improve
our library discovery environment,
how it can help us reveal
more relationships
and connections within our data
and with external data sets,
and if we have connections in our own data
that can be added to Wikidata,
how libraries can help
fill in gaps in Wikidata,
and how libraries can work
with local communities
to describe library
and archival resources.
Finding answers to these questions
has focused on the mutual benefit
for the library and Wikidata communities.
We've learned through starting to work
on our different Wikidata projects,
that many of the issues
libraries grapple with,
like data modeling, identity management,
data maintenance, documentation,
and instruction on linked data,
are ones the Wikidata
community works on too.
I'm going to turn things over to Lena
to talk about what
she's been working on now.
Hi, so, as Hilary briefly mentioned,
I work as a map librarian at Harvard,
where I process maps, atlases,
and archives for our online catalog.
And while processing two-dimensional
cartographic works
is relatively straighforward,
cataloging archival collections
so that their cartographic resources
can be made discoverable,
has always been more difficult.
So, my use case for Wikidata
is visually modeling relationships
between archival collections
and the individual items within them,
as well as between archival drafts
in published works.
So, I used Wikidata to highlight the work
of our cartographer named Erwin Raisz,
who worked at Harvard
in the early 20th-century.
He was known for his vividly detailed
and artistic land forms,
like this one on the screen--
but also for inventing
the armadillo projection,
writing the first cartography
textbook in English
and other various
important contributions
to the field of geography.
And at the Harvard Map Collection,
we have a 66-item collection
of Raisz's field notebooks,
which begin when he was a student
and end just before his death.
So, this is the collection-level record
that I made for them,
which merely gives an overview,
but his notebooks are full of information
that he used in later atlases,
maps, and textbooks.
But researchers don't know how to find
that trajectory information,
and the system
is not designed to show them.
So, I felt that with Wikidata,
and other Wikimedia platforms,
I'd be able to take advantage
of information that already exists
about him on the open web,
along with library records
and a notebook inventory
that I had made in an Excel spreadsheet
to show relationships and influences
between his works.
So here, you can see how I edited
and reconciled library data
in OpenRefine.
And then, I used QuickStatements
to batch import my results.
So, now, I was ready
to create knowledge graphs
with SPARQL queries
to show patterns of influence.
The examples here show
how I leveraged Wikimedia Commons images
that I connected to him.
And the hierarchy of some of his works
that were contributing
factors to other works.
So, modeling Raisz's works on Wikidata
allowed me to encompass in a single image,
or in this case, in two images,
the connections that require many pages
of bibliographic data to reveal.
So, this video is going to load.
Yes! Alright.
This video is a minute and a half long
screencast I made,
that I'm going to narrate as you watch.
It shows the process of inputting
and then running a SPARQL query,
showing hierarchical relationships
between notebooks, an atlas, and a map
that Raisz created about Cuba.
He worked there before the revolution,
so he had the unique position
of having support
from both the American
and the Cuban governments.
So, I made this query as an example
to show people who work on Raisz,
and who are interested in narrowing down
what materials they'd like to request
when they come to us for research.
To make the approach replicable
for other archival collections,
I hope that Harvard and other institutions
will prioritize Wikidata look-ups
as they move to linked data
cataloging production,
which my co-presenters
can speak to the progress on
better than I can.
But my work has brought me--
has brought to mind a particular issue
that I see as a future opportunity,
which is that of archival modeling.
So, to an archivist, an item
is a discrete archival material
within a larger collection
of archival materials
that is not a physical location.
So an archivist from the American National
Archives and Records Administration,
who is also a Wikidata enthusiast,
advised me when I was trying
to determine how to express this
using an example item,
that I'm going to show
as soon as this video is finally over.
Alright. Great.
Nope, that's not what I wanted.
Here we go.
It's doing that.
(humming)
Nope. Sorry. Sorry.
Alright, I don't know why
it's not going full screen again.
I can't get it to do anything.
But this is the-- oh, my gosh.
Stop that. Alright.
So, this is the item that I mentioned.
So, this was what the archivist
from the National Archives
and Records Administration
showed me as an example.
And he recommended this compromise,
which is to use the part of property
to connect a lower level description
to a higher level of description,
which allows the relationships
between different hierarchical levels
to be asserted as statements
and qualifiers.
So, in this example that's on screen,
the relationship between an item,
a series, a collection, and a record group
are thus contained and described
within a Wikidata item entity.
So, I followed this model
in my work on Raisz.
And one of my images is missing.
No, it's not. It's right there. I'm sorry.
And so, I followed this model
on my work on Raisz,
but I look forward
to further standardization.
So, another archival project
Harvard is working on
is the Arthur Freedman collection
of more than 2,000 hours
of punk rock performances
from the 1970s to early 2000s
in the Boston and Cambridge,
Massachussets areas.
It includes many bands and venues
that no longer exist.
So far, work has been done in OpenRefine
on reconciliation of the bands and venues
to see which need an item
created in Wikidata.
A basic item will be created
via batch process next spring,
and then, an edit-a-thon will be
held in conjunction
with the New England Music Library
Association's meeting in Boston
to focus on adding more statements
to the batch-created items,
by drawing on local music
community knowledge.
We're interested in learning more
about models for pairing librarians
and Wiki enthusiasts with new contributors
who have domain knowledge.
Items will eventually be linked
to digitized video
in Harvard's digital collection platform
once rights have
been cleared with artists,
which will likely be a slow process.
There's also a great amount of interest
in moving away from manual cataloging
and creation of authority data
towards identity management,
where descriptions
can be created in batches.
An additional project that focused on
creating international standard
name identifiers, or ISNIs,
for avant-garde and women filmmakers
can be adapted for creating Wikidata items
for these filmmakers, as well.
Spreadsheets with the ISNIs,
filmmaker names, and other details
can be reconciled in OpenRefine,
and uploaded with QuickStatements.
Once people in organizations
have been described,
we'll move toward describing
the films in Wikidata,
which will likely present
some additional modeling challenges.
A library presentation
wouldn't be complete
without a MARC record.
Here, you can see the record
for Karen Aqua's taxonomy film,
where her ISNI and Wikidata Q number
have been added to the 100 field.
The ISNIs and Wikidata Q numbers
that have been created
can then be batch added
back into MARC records via MarcEdit.
You might be asking why I'm showing you
this ugly MARC record,
instead of some beautiful
linked data statements.
And that's because our libraries
will be working in a hybrid environment
for some time.
Our library catalogs still relies
on MARC records,
so by adding in these URIs,
we can try to take advantage
of linked data,
while our systems still use MARC.
Adding URIs into MARC records
makes an additional aspect
of our project possible.
Work has been done at Stanford
and Cornell to bring data
from Wikidata into our library catalog
using URIs already in our MARC records.
You can see an example
of a knowledge panel,
where all the data is sourced
from Wikidata,
and links back to the item itself,
along with an invitation to contribute.
This is currently in a test environment,
not in production in our catalog.
Ideally, eventually,
these will be generated
from linked data descriptions
of library resources
created using Sinopia,
our linked data editor
developed for cataloging.
We found that adding a look-up
to Wikidata in Sinopia is difficult.
The scale and modeling of Wikidata
makes it hard to partition the data
to be able to look up typed entities,
and we've run into the problem
of SPARQL not being good
for keyword search,
but wanting our keyword APIs
to return SPARQL-like RDF descriptions.
So, as you can see, we still have
quite a bit of work to do.
This round of the grant
runs until June 2020,
so, we'll be continuing our exploration.
And I just wanted to invite anyone
who's continued an interest in talking
about Wikidata and libraries,
I lead a Wikidata Affinity Group
that's open to anyone to join.
We meet every two weeks,
and our next call is Tuesday,
November the 5th,
so if you're interested
in continuing discussions,
I would love to talk with you further.
Thank you, everyone.
And thank you to the other presenters
for talking about all
of their wonderful projects.
(applause)