0:00:07.133,0:00:11.738
I work as a teacher[br]at the University of Alicante,
0:00:11.738,0:00:17.040
where I recently obtained my PhD[br]on data libraries and linked open data.
0:00:17.040,0:00:19.038
And I'm also a software developer
0:00:19.038,0:00:21.718
at the Biblioteca Virtual[br]Miguel de Cervantes.
0:00:21.718,0:00:24.467
And today, I'm going to talk[br]about data quality.
0:00:28.252,0:00:31.527
Well, those are my colleagues[br]at the university.
0:00:32.457,0:00:36.727
And as you may know, many organizations[br]are publishing their data
0:00:36.727,0:00:38.447
or linked open data--
0:00:38.447,0:00:41.437
for example,[br]the National Library of France,
0:00:41.437,0:00:45.947
the National Library of Spain,[br]us, which is Cervantes Virtual,
0:00:45.947,0:00:49.007
the British National Bibliography,
0:00:49.007,0:00:51.667
the Library of Congress and Europeana.
0:00:51.667,0:00:56.000
All of them provide a SPARQL endpoint,
0:00:56.000,0:00:58.875
which is useful in order[br]to retrieve the data.
0:00:59.104,0:01:00.984
And if I'm not wrong,
0:01:00.984,0:01:05.890
the Library of Congress only provide[br]the data as a dump that you can't use.
0:01:07.956,0:01:13.787
When we publish our repository[br]as linked open data,
0:01:13.787,0:01:17.475
my idea was to be reused[br]by other institutions.
0:01:17.981,0:01:24.000
But what about if I'm an institution[br]who wants to enrich their data
0:01:24.000,0:01:27.435
with any data from other data libraries.
0:01:27.574,0:01:30.674
Which data set should I use?
0:01:30.674,0:01:34.314
Which data set is better[br]in terms of quality?
0:01:36.874,0:01:41.314
The benefits of the evaluation[br]of data quality in libraries are many.
0:01:41.314,0:01:47.143
For example, methodologies can be improved[br]in order to include new criteria,
0:01:47.182,0:01:49.162
in order to assess the quality.
0:01:49.162,0:01:54.592
And also, organizations can benefit[br]from best practices and guidelines
0:01:54.602,0:01:58.270
in order to publish their data[br]as linked open data.
0:02:00.012,0:02:03.462
What do we need[br]in order to assess the quality?
0:02:03.462,0:02:06.862
Well, obviously, a set of candidates[br]and a set of features.
0:02:06.862,0:02:10.077
For example, do they have[br]a SPARQL endpoint,
0:02:10.077,0:02:13.132
do they have a web interface,[br]how many publications do they have,
0:02:13.132,0:02:18.092
how many vocabularies do they use,[br]how many Wikidata properties do they have,
0:02:18.092,0:02:20.892
and where can I get those candidates?
0:02:20.892,0:02:22.472
I use LOD Cloud--
0:02:22.472,0:02:27.422
but when I was doing this slide,[br]I thought about using Wikidata
0:02:27.562,0:02:29.746
in order to retrieve those candidates.
0:02:29.746,0:02:34.295
For example, getting entities[br]of type data library,
0:02:34.295,0:02:36.473
which has a SPARQL endpoint.
0:02:36.473,0:02:38.693
You have here the link.
0:02:41.453,0:02:45.083
And I come up with those data libraries.
0:02:45.104,0:02:50.233
The first one uses bibliographic ontology[br]as main vocabulary,
0:02:50.233,0:02:54.122
and the others are based,[br]more or less, on FRBR,
0:02:54.122,0:02:57.180
which is a vocabulary published by IFLA.
0:02:57.180,0:03:00.013
And this is just an example[br]of how we could compare
0:03:00.013,0:03:04.393
data libraries using[br]bubble charts on Wikidata.
0:03:04.393,0:03:08.613
And this is just an example comparing[br]how many Wikidata properties
0:03:08.613,0:03:10.633
are per data library.
0:03:13.483,0:03:15.980
Well, how can we measure quality?
0:03:15.928,0:03:17.972
There are different methodologies,
0:03:17.972,0:03:19.726
for example, FRBR 1,
0:03:19.726,0:03:24.337
which provides a set of criteria[br]grouped by dimensions,
0:03:24.337,0:03:27.556
and those in green[br]are the ones that I found--
0:03:27.556,0:03:30.917
that I could assess by means of Wikidata.
0:03:33.870,0:03:39.397
And we also find that we[br]could define new criteria,
0:03:39.397,0:03:44.567
for example, a new one to evaluate[br]the number of duplications in Wikidata.
0:03:45.047,0:03:47.206
We use those properties.
0:03:47.206,0:03:50.098
And this is an example of SPARQL,
0:03:50.098,0:03:54.486
in order to count the number[br]of duplicates property.
0:03:57.136,0:04:00.366
And about the results,[br]while at the moment of doing this study,
0:04:00.366,0:04:05.216
not the slides, there was no property[br]for the British National Bibliography.
0:04:05.860,0:04:08.260
They don't provide provenance information,
0:04:08.260,0:04:11.536
which could be useful[br]for metadata enrichment.
0:04:11.536,0:04:14.660
And they don't allow[br]to edit the information.
0:04:14.660,0:04:17.166
So, we've been talking[br]about Wikibase the whole weekend,
0:04:17.166,0:04:21.396
and maybe we should try to adopt[br]Wikibase as an interface.
0:04:23.186,0:04:25.436
And they are focused on their own content,
0:04:25.436,0:04:28.856
and this is just the SPARQL query[br]based on Wikidata
0:04:28.856,0:04:31.411
in order to assess the population.
0:04:32.066,0:04:36.006
And the BnF provides labels[br]in multiple languages,
0:04:36.006,0:04:38.956
and they all use self-describing URIs,
0:04:38.956,0:04:43.058
which is that in the URI,[br]they have the type of entity,
0:04:43.058,0:04:48.406
which allows the human reader[br]to understand what they are using.
0:04:51.499,0:04:55.256
And more results, they provide[br]different output format,
0:04:55.256,0:04:58.646
they use external vocabularies.
0:04:58.854,0:05:01.116
Only the British National Bibliography
0:05:01.116,0:05:03.734
provides machine-readable[br]licensing information.
0:05:03.734,0:05:09.124
And up to one-third of the instances[br]are connected to external repositories,
0:05:09.124,0:05:11.225
which is really nice.
0:05:12.604,0:05:18.290
And while this study, this work[br]has been done in our Labs team,
0:05:18.364,0:05:22.391
a lab in a GLAM is a group of people
0:05:22.391,0:05:27.520
who want to explore new ways
0:05:27.587,0:05:30.306
of reusing data collections.
0:05:31.039,0:05:35.054
And there's a community[br]led by the British Library,
0:05:35.054,0:05:37.366
and in particular, Mahendra Mahey,
0:05:37.366,0:05:40.610
and we had a first event in London,
0:05:40.610,0:05:42.601
and another one in Copenhagen,
0:05:42.601,0:05:45.279
and we're going to have a new one in May
0:05:45.279,0:05:48.240
at the Library of Congress in Washington.
0:05:48.528,0:05:52.481
And we are now 250 people.
0:05:52.481,0:05:56.421
And I'm so glad that I found[br]somebody here at the WikidataCon
0:05:56.421,0:05:58.860
who has just joined us--
0:05:58.860,0:06:01.160
Sylvia from [inaudible], Mexico.
0:06:01.160,0:06:04.509
And I'd like to invite you[br]to our community,
0:06:04.509,0:06:09.719
since you may be part[br]of a GLAM institution.
0:06:10.659,0:06:13.164
So, we can talk later[br]if you want to know about this.
0:06:14.589,0:06:16.719
And this--it's all about people.
0:06:16.719,0:06:19.669
This is me, people[br]from the British Library,
0:06:19.669,0:06:24.629
Library of Congress, Universities,[br]and National Libraries in Europe
0:06:24.871,0:06:28.050
And there's a link here[br]in case you want to know more.
0:06:28.433,0:06:32.655
And, well, last month,[br]we decided to meet in Doha
0:06:32.655,0:06:37.448
in order to write a book[br]about how to create a lab in our GLAM.
0:06:38.585,0:06:43.279
And they choose 15 people,[br]and I was so lucky to be there.
0:06:45.314,0:06:48.594
And the book follows[br]the Booksprint methodology,
0:06:48.594,0:06:51.674
which means that nothing[br]is prepared beforehand.
0:06:51.674,0:06:53.495
All is done there in a week.
0:06:53.495,0:06:55.725
And believe me, it was really hard work
0:06:55.725,0:06:58.905
to have their whole book[br]done in this week.
0:06:59.890,0:07:04.490
And I'd like to introduce you to the book,[br]which will be published--
0:07:04.490,0:07:06.455
it was supposed to be published this week,
0:07:06.455,0:07:08.274
but it will be next week.
0:07:08.974,0:07:13.014
And it will be published open,[br]so you can have it,
0:07:13.065,0:07:15.668
and I can show you[br]a little bit later if you want.
0:07:15.734,0:07:17.601
And those are the authors.
0:07:17.601,0:07:19.678
I'm here-- I'm so happy, too.
0:07:19.678,0:07:22.110
And those are the institutions--
0:07:22.110,0:07:26.722
Library of Congress, British Library--[br]and this is the title.
0:07:27.330,0:07:29.604
And now, I'd like to show you--
0:07:31.441,0:07:33.971
a map that I'm doing.
0:07:34.278,0:07:37.234
We are launching a website[br]for our community,
0:07:37.234,0:07:42.893
and I'm in charge of creating a map[br]with our institutions there.
0:07:43.097,0:07:44.860
This is not finished.
0:07:44.860,0:07:50.276
But this is just SPARQL, and below,
0:07:51.546,0:07:53.027
we see the map.
0:07:53.027,0:07:58.086
And we see here[br]the new people that I found, here,
0:07:58.086,0:08:00.486
at the WikidataCon--[br]I'm so happy for this.
0:08:00.621,0:08:05.631
And we have here my data library[br]of my university,
0:08:05.681,0:08:08.490
and many other institutions.
0:08:09.051,0:08:10.940
Also, from Australia--
0:08:11.850,0:08:13.061
if I can do it.
0:08:13.930,0:08:15.711
Well, here, we have some links.
0:08:19.586,0:08:21.088
There you go.
0:08:21.189,0:08:23.059
Okay, this is not finished.
0:08:23.539,0:08:26.049
We are still working on this,[br]and that's all.
0:08:26.057,0:08:28.170
Thank you very much for your attention.
0:08:28.858,0:08:33.683
(applause)
0:08:41.962,0:08:48.079
[inaudible]
0:08:59.490,0:09:00.870
Good morning, everybody.
0:09:00.870,0:09:01.930
I'm Olaf Janssen.
0:09:01.930,0:09:03.570
I'm the Wikimedia coordinator
0:09:03.570,0:09:06.150
at the National Library[br]of the Netherlands.
0:09:06.310,0:09:08.390
And I would like to share my work,
0:09:08.390,0:09:11.610
which I'm doing about creating[br]Linked Open Data
0:09:11.640,0:09:15.351
for Dutch Public Libraries using Wikidata.
0:09:17.600,0:09:20.850
And my story starts roughly a year ago
0:09:20.850,0:09:24.581
when I was at the GLAM Wiki conference[br]in Tel Aviv, in Israel.
0:09:25.301,0:09:27.938
And there are two men[br]with very similar shirts,
0:09:27.938,0:09:31.120
and equally similar hairdos, [Matt]...
0:09:31.120,0:09:33.440
(laughter)
0:09:33.440,0:09:35.325
And on the left, that's me.
0:09:35.325,0:09:39.065
And a year ago, I didn't have[br]any practical knowledge and skills
0:09:39.065,0:09:40.265
about Wikidata.
0:09:40.265,0:09:43.285
I looked at Wikidata,[br]and I looked at the items,
0:09:43.285,0:09:44.524
and I played with it.
0:09:44.524,0:09:47.070
But I wasn't able to make a SPARQL query
0:09:47.070,0:09:50.285
or to do data modeling[br]with the right shape expression.
0:09:51.305,0:09:52.865
That's a year ago.
0:09:53.465,0:09:57.065
And on the lefthand side,[br]that's Simon Cobb, user: Sic19.
0:09:57.304,0:10:00.265
And I was talking to him,[br]because, just before,
0:10:00.525,0:10:01.974
he had given a presentation
0:10:01.974,0:10:06.374
about improving the coverage[br]of public libraries in Wikidata.
0:10:06.757,0:10:08.934
And I was very inspired by his talk.
0:10:09.564,0:10:13.355
And basically, he was talking[br]about adding basic data
0:10:13.355,0:10:14.867
about public libraries.
0:10:14.867,0:10:19.046
So, the name of the library, if available,[br]the photo of the building,
0:10:19.046,0:10:21.497
the address data of the library,
0:10:21.497,0:10:25.120
the geo-coordinates[br]latitude and longitude,
0:10:25.120,0:10:26.367
and some other things,
0:10:26.367,0:10:29.187
including with all source references.
0:10:31.317,0:10:34.557
And what I was very impressed[br]about a year ago was this map.
0:10:34.557,0:10:37.337
This is a map about[br]public libraries in the U.K.
0:10:37.337,0:10:38.577
with all the colors.
0:10:38.577,0:10:43.017
And you can see that all the libraries[br]are layered by library organizations.
0:10:43.017,0:10:46.210
And when he showed this,[br]I was really, "Wow, that's cool."
0:10:46.637,0:10:49.138
So, then, one minute later, I thought,
0:10:49.138,0:10:52.918
"Well, let's do it[br]for the country for that one."
0:10:52.918,0:10:54.850
(laughter)
0:10:57.149,0:10:59.496
And something about public libraries[br]in the Netherlands--
0:10:59.496,0:11:03.020
there are about 1,300 library[br]branches in our country,
0:11:03.020,0:11:06.710
grouped into 160 library organizations.
0:11:07.723,0:11:10.937
And you might wonder why[br]do I want to do this project?
0:11:10.997,0:11:14.137
Well, first of all, because[br]for the common good, for society,
0:11:14.137,0:11:16.707
because I think using Wikidata,
0:11:16.707,0:11:20.657
and from there,[br]creating Wikipedia articles,
0:11:20.657,0:11:23.417
and opening it up[br]via the linked open data cloud--
0:11:23.417,0:11:29.006
it's improving visibility and reusability[br]of public libraries in the Netherlands.
0:11:30.110,0:11:32.197
And my second goal was actually[br]a more personal one,
0:11:32.197,0:11:36.517
because a year ago, I had this[br]yearly evaluation with my manager,
0:11:37.243,0:11:41.737
and we decided it was a good idea[br]that I got more practical skills
0:11:41.737,0:11:45.853
on linked open data, data modeling,[br]and also on Wikidata.
0:11:46.464,0:11:50.286
And of course, I wanted to be able to make[br]these kinds of maps myself.
0:11:50.286,0:11:51.396
(laughter)
0:11:54.345,0:11:57.100
Then you might wonder[br]why do I want to do this?
0:11:57.100,0:12:01.723
Isn't there already enough basic[br]library data out there in the Netherlands
0:12:02.450,0:12:04.233
to have a good coverage?
0:12:06.019,0:12:08.367
So, let me show you some of the websites
0:12:08.367,0:12:12.882
that are available to discover[br]address and location information
0:12:12.882,0:12:14.505
about Dutch public libraries.
0:12:14.505,0:12:17.722
And the first one is this one--[br]Gidsvoornederland.nl--
0:12:17.722,0:12:20.641
and that's the official[br]public library inventory
0:12:20.641,0:12:23.037
maintained by my library,[br]the National Library.
0:12:23.727,0:12:29.160
And you can look up addresses[br]and geo-coordinates on that website.
0:12:30.493,0:12:32.797
Then there is this site,[br]Bibliotheekinzicht--
0:12:32.797,0:12:36.502
this is also an official website[br]maintained by my National Library.
0:12:36.502,0:12:38.982
And this is about[br]public library statistics.
0:12:41.010,0:12:43.933
Then there is another one,[br]debibliotheken.nl--
0:12:43.933,0:12:46.005
as you can see there is also[br]address information
0:12:46.005,0:12:49.659
about library organizations,[br]not about individual branches.
0:12:51.724,0:12:55.010
And there's even this one,[br]which also has address information.
0:12:56.546,0:12:59.028
And of course, there's something[br]like Google Maps,
0:12:59.028,0:13:02.157
which also has all the names[br]and the locations and the addresses.
0:13:03.455,0:13:06.218
And this one, the International[br]Library of Technology,
0:13:06.218,0:13:09.580
which has a worldwide[br]inventory of libraries,
0:13:09.646,0:13:11.393
including the Netherlands.
0:13:13.058,0:13:15.049
And I even discovered there is a data set
0:13:15.049,0:13:18.423
you can buy for 50 euros or so[br]to download it.
0:13:18.423,0:13:21.023
And there is also--seems to be[br]I didn't download it,
0:13:21.023,0:13:23.633
but there seems to be address[br]information available.
0:13:24.273,0:13:30.180
You might wonder is this kind of data[br]good enough for the purposes I had?
0:13:32.282,0:13:37.372
So, this is my birthday list[br]for my ideal public library data list.
0:13:37.439,0:13:39.105
And what's on my list?
0:13:39.173,0:13:43.830
First of all, the data I want to have[br]must be up-to-date-ish--
0:13:43.830,0:13:45.604
it must be fairly up-to-date.
0:13:45.604,0:13:48.513
So, doesn't have to be real time,
0:13:48.513,0:13:51.323
but let's say, a couple[br]of months, or half a year,
0:13:53.284,0:13:57.354
delayed with official publication,[br]that's okay for my purposes.
0:13:58.116,0:14:00.956
And I want to have it both[br]library branches
0:14:00.956,0:14:02.697
and the library organizations.
0:14:04.206,0:14:08.400
Then I want my data to be structured,[br]because it has to be machine-readable.
0:14:08.301,0:14:11.986
It has to be in open file format,[br]such as CSV or JSON or RDF.
0:14:12.717,0:14:15.197
It has to be linked[br]to other resources preferably.
0:14:16.011,0:14:22.182
And the uses--the license on the data[br]needs to be manifest public domain or CC0.
0:14:23.520,0:14:26.192
Then, I would like my data to have an API,
0:14:26.599,0:14:30.548
which must be public, free,[br]and preferably also anonymous
0:14:30.548,0:14:34.900
so you don't have to use an API key,[br]or you have to register an account.
0:14:36.103,0:14:38.863
And I also want to have[br]a SPARQL interface.
0:14:41.131,0:14:43.651
So, now, these are all the sites[br]I just showed you.
0:14:43.717,0:14:46.450
And I'm going to make a big grid.
0:14:47.337,0:14:50.017
And then, this is about[br]the evaluation I did.
0:14:51.187,0:14:54.166
I'm not going into it,[br]but there is no single column
0:14:54.166,0:14:56.007
which has all green check marks.
0:14:56.007,0:14:57.997
That's the important thing to take away.
0:14:58.967,0:15:03.947
And so, in summary, there was no[br]linked public free linked open data
0:15:03.947,0:15:08.937
for Dutch public libraries available[br]before I started my project.
0:15:09.237,0:15:13.027
So, this was the ideal motivation[br]to actually work on it.
0:15:14.730,0:15:17.427
So, that's what I've been doing[br]for a year now.
0:15:17.717,0:15:22.977
And I've been adding libraries bit by bit,[br]organization by organization to Wikidata.
0:15:23.417,0:15:26.387
I created also a project website on it.
0:15:26.727,0:15:29.567
It's still rather messy,[br]but it has all the information,
0:15:29.567,0:15:33.240
and I try to keep it[br]as up-to-date as possible.
0:15:33.240,0:15:36.277
And also all the SPARQL queries[br]you can see are linked from here.
0:15:38.002,0:15:40.235
And I'm just adding[br]really basic information.
0:15:40.235,0:15:44.097
You see the instances,[br]images if available,
0:15:44.097,0:15:47.229
addresses, locations, et cetera,[br]municipalities.
0:15:48.534,0:15:53.276
And where possible, I also try to link[br]the libraries to external identifiers.
0:15:56.024,0:15:58.415
And then, you can really easily--[br]we all know,
0:15:58.415,0:16:03.050
generating some Listeria lists[br]with public libraries grouped
0:16:03.050,0:16:05.060
by organizations, for instance.
0:16:05.060,0:16:08.380
Or using SPARQL queries,[br]you can also do aggregation on data--
0:16:08.380,0:16:11.060
let's say, give me all[br]the municipalities in the Netherlands
0:16:11.060,0:16:15.115
and the number of library branches[br]in all the municipalities.
0:16:17.025,0:16:20.228
With one click, you can make[br]these kinds of photo galleries.
0:16:22.092,0:16:23.655
And what I set out to do first,
0:16:23.655,0:16:26.036
you can really create these kinds of maps.
0:16:27.176,0:16:30.425
And you might wonder,[br]"Are there any libraries here or there?"
0:16:30.555,0:16:33.355
There are--they are not yet in Wikidata.
0:16:33.355,0:16:35.055
We're still working on that.
0:16:35.135,0:16:37.644
And actually, last week,[br]I spoke with a volunteer,
0:16:37.644,0:16:40.864
who's helping now[br]with entering the libraries.
0:16:41.644,0:16:45.394
You can really make cool--in Wikidata,
0:16:45.394,0:16:47.914
and also with using[br]the Cartographer extension,
0:16:47.914,0:16:50.244
you can use these kinds of maps.
0:16:51.724,0:16:53.736
And I even took it one step further.
0:16:53.911,0:16:57.399
I also have some Python skills,[br]and some Leaflet things skills--
0:16:57.399,0:16:59.971
so, I created, and I'm quite[br]proud of it, actually.
0:16:59.971,0:17:03.482
I created this library heat map,[br]which is fully interactive.
0:17:03.482,0:17:05.956
You can zoom in to it,[br]and you can see all the libraries,
0:17:06.712,0:17:08.726
and you can also run it off Wiki.
0:17:08.726,0:17:10.552
So, you can just embed it[br]in your own website,
0:17:10.552,0:17:13.412
and it fully runs interactively.
0:17:15.131,0:17:17.592
So, now going back to my big scary table.
0:17:19.512,0:17:22.970
There is one column[br]on the right, which is blank.
0:17:22.970,0:17:24.940
And no surprise, it will be Wikidata.
0:17:24.940,0:17:26.448
Let's see how it scores there.
0:17:26.448,0:17:29.500
(cheering)
0:17:32.892,0:17:35.191
So, I actually think[br]of printing this on a T-shirt.
0:17:35.301,0:17:37.288
(laughter)
0:17:37.788,0:17:39.700
So, just to summarize this in words,
0:17:39.700,0:17:41.129
thanks to my project, now,
0:17:41.129,0:17:45.879
there is public free linked open data[br]available for Dutch public libraries.
0:17:47.124,0:17:49.686
And who can benefit from my effort?
0:17:50.333,0:17:52.002
Well, all kinds of parties--
0:17:52.002,0:17:54.274
you see Wikipedia,[br]because you can generate lists
0:17:54.274,0:17:56.051
and overviews and articles,
0:17:56.051,0:17:59.908
for instance, using this[br]and be able to from Wikidata
0:17:59.908,0:18:01.976
for our National Library for--
0:18:02.850,0:18:05.391
IFLA also has an inventory[br]of worldwide libraries,
0:18:05.391,0:18:07.216
they can also reuse the data.
0:18:07.650,0:18:09.497
And especially for Sandra,
0:18:09.549,0:18:13.237
it's also important for the Ministry--[br]Dutch Ministry of Culture--
0:18:13.277,0:18:15.667
because Sandra is going[br]to have a talk about Wikidata
0:18:15.667,0:18:18.287
with the Ministry this Monday,[br]next Monday.
0:18:19.922,0:18:22.277
And also, on the righthand side, [br]for instance,
0:18:23.891,0:18:27.098
Amazon with Alexa, the assistant,
0:18:27.098,0:18:28.961
they're also using Wikidata,
0:18:28.961,0:18:30.995
so you can imagine that they also use,
0:18:30.995,0:18:33.357
if you're looking for public[br]library information,
0:18:33.357,0:18:36.580
they can also use Wikidata for that.
0:18:38.955,0:18:41.680
Because one year ago,[br]Simon Cobb inspired me
0:18:41.680,0:18:44.244
to do this project,[br]I would like to call upon you,
0:18:44.244,0:18:45.664
if you have time available,
0:18:45.664,0:18:49.532
and if you have data from your own country[br]about public libraries,
0:18:51.572,0:18:54.422
make the coverage better,[br]add more red dots,
0:18:54.982,0:18:56.982
and of course, I'm willing[br]to help you with that.
0:18:56.982,0:18:59.227
And Simon is also willing[br]to help with this.
0:18:59.870,0:19:01.471
And so, I hope next year, somebody else
0:19:01.471,0:19:03.901
will be at this conference[br]or another conference
0:19:03.901,0:19:06.291
and there will be more[br]red dots on the map.
0:19:07.551,0:19:08.911
Thank you very much.
0:19:09.004,0:19:12.740
(applause)
0:19:18.336,0:19:20.086
Thank you, Olaf.
0:19:20.086,0:19:23.554
Next we have Ursula Oberst[br]and Heleen Smits
0:19:23.613,0:19:27.734
presenting how can a small[br]research library benefit from Wikidata:
0:19:27.734,0:19:31.423
enhancing library products using Wikidata.
0:19:53.717,0:19:57.637
Okay. Good morning.[br]My name is Heleen Smits.
0:19:58.680,0:20:01.753
And my colleague,[br]Ursula Oberst--where are you?
0:20:01.753,0:20:03.873
(laughter)
0:20:04.371,0:20:09.220
And I work at the Library[br]of the African Studies Center
0:20:09.220,0:20:11.086
in Leiden, in the Netherlands.
0:20:11.086,0:20:15.038
And the African Studies Center[br]is a center devoted--
0:20:15.038,0:20:21.464
is an academic institution[br]devoted entirely to the study of Africa,
0:20:21.464,0:20:23.986
focusing on Humanities and Social Studies.
0:20:24.672,0:20:28.123
We used to be an independent[br]research organization,
0:20:28.123,0:20:33.064
but in 2016, we became part[br]of Leiden University,
0:20:33.064,0:20:38.433
and our catalog was integrated[br]into the larger university catalog.
0:20:39.283,0:20:43.593
Though it remained possible[br]to do a search in the part of the Leiden--
0:20:43.593,0:20:45.894
of the African Studies Catalog, alone,
0:20:47.960,0:20:50.505
we remained independent in some respects.
0:20:50.586,0:20:53.262
For example, with respect[br]to our thesaurus.
0:20:54.921,0:20:59.883
And also with respect[br]to the products we make for our users,
0:21:01.180,0:21:04.378
such as acquisition lists[br]and work dossiers.
0:21:05.158,0:21:11.975
And it is in the field of the web dossiers
0:21:11.975,0:21:14.582
that we have been looking
0:21:14.582,0:21:19.582
for possible ways to apply Wikidata,
0:21:19.582,0:21:23.372
and that's the part where Ursula[br]will in the second part of this talk
0:21:24.212,0:21:27.184
show you a bit[br]what we've been doing there.
0:21:31.250,0:21:35.160
The web dossiers are our collections
0:21:35.160,0:21:39.000
of titles from our catalog[br]that we compile
0:21:39.000,0:21:45.591
around a theme usually connected[br]to, for example, a conference,
0:21:45.591,0:21:51.227
or to a special event, and actually,[br]the most recent web dossier we made
0:21:51.227,0:21:56.017
was connected to the year[br]of indigenous languages,
0:21:56.017,0:21:59.547
and that was around proverbs[br]in African languages.
0:22:00.780,0:22:02.327
Our first steps--
0:22:04.307,0:22:09.287
next slide--our first steps[br]on the Wiki path as a library,
0:22:10.267,0:22:15.046
were in 2013, when we were one[br]of 12 GLAM institutions
0:22:15.046,0:22:16.472
in the Netherlands,
0:22:16.472,0:22:20.952
part of the project[br]of Wikipedians in Residence,
0:22:20.952,0:22:26.443
and we had for two months,[br]a Wikipedian in the house,
0:22:27.035,0:22:32.527
and he gave us trainings[br]for adding articles to Wikipedia,
0:22:33.000,0:22:37.720
and also, we made a start with uploading[br]photo collections to Commons,
0:22:38.530,0:22:42.650
which always remained a little bit[br]dependent on funding, as well,
0:22:43.229,0:22:45.702
whether we would be able to digitize them,
0:22:45.702,0:22:50.350
and to mostly have[br]a student assistant to do this.
0:22:51.220,0:22:55.440
But it was actually a great adding [br]to what we could offer
0:22:55.440,0:22:57.560
as an academic library.
0:22:59.370,0:23:04.742
In May 2018, so is that my Ursula,[br]my colleague Ursula--
0:23:04.742,0:23:09.465
she started to really explore--[br]dive into Wikidata
0:23:09.465,0:23:14.515
and see what we as a small[br]and not very much experienced library
0:23:14.515,0:23:18.175
in these fields could do with that.
0:23:25.050,0:23:26.995
So, I mentioned, we have[br]our own thesaurus.
0:23:28.210,0:23:30.689
And this is where we started.
0:23:30.689,0:23:34.502
This is a thesaurus of 13,000 terms,
0:23:34.502,0:23:37.670
all in the field of African studies.
0:23:37.670,0:23:41.457
It contains a lot of African languages,
0:23:43.417,0:23:46.360
names of ethnic groups in Africa,
0:23:47.586,0:23:49.431
and other proper names,
0:23:49.431,0:23:55.509
which are perhaps especially [br]interesting for Wikidata.
0:23:58.604,0:24:04.824
So, it is a real authority control
0:24:04.824,0:24:08.370
to vocabulary [br]with 5,000 preferred terms.
0:24:08.554,0:24:11.204
So, we submitted the request to Wikidata,
0:24:11.204,0:24:17.135
and that was actually very quickly[br]met with a positive response,
0:24:17.214,0:24:19.354
which was very encouraging for us.
0:24:22.884,0:24:25.574
Our thesaurus was loaded into Mix-n-Match,
0:24:25.574,0:24:31.691
and by now, 75% of the terms
0:24:31.691,0:24:36.145
have been manually matched with Wikidata.
0:24:38.061,0:24:42.081
So, it means, well, that we are now--
0:24:42.971,0:24:47.687
we are added as an identifier--
0:24:48.387,0:24:51.553
for example, if you click[br]on Swahili language,
0:24:52.463,0:24:57.152
what happens then in Wikidata[br]on the number that--
0:24:59.004,0:25:02.354
that connects our term--[br]is the Wikidata term--
0:25:02.560,0:25:05.620
we enter into our thesaurus,
0:25:05.620,0:25:10.000
and from there, you can do a search[br]directly in the catalog
0:25:10.000,0:25:12.560
by clicking the button again.
0:25:12.560,0:25:18.160
It means, also, that Wikidata[br]has not really integrated
0:25:18.160,0:25:19.572
into our catalog.
0:25:19.572,0:25:22.090
But that's also more difficult.
0:25:22.314,0:25:26.053
Okay, we have to give the floor
0:25:26.053,0:25:30.838
to Ursula for the next part.
0:25:30.838,0:25:32.554
(Ursula) Thank you very much, Heleen.
0:25:32.554,0:25:37.258
So, I will talk about our experiences
0:25:37.258,0:25:39.677
with incorporating Wikidata elements
0:25:39.677,0:25:41.356
to our web dossier.
0:25:41.356,0:25:44.607
A web dossier is--oh, sorry, yeah, sorry.
0:25:45.447,0:25:49.646
A web dossier, or a classical web dossier,[br]consists of three parts:
0:25:50.248,0:25:53.320
an introduction to the subject,
0:25:53.320,0:25:56.060
mostly written by one of our researchers;
0:25:56.060,0:26:01.328
a selection of titles, both books[br]and articles from our collection;
0:26:01.328,0:26:06.146
and the third part, an annotated list
0:26:06.146,0:26:08.876
with links to electronic resources.
0:26:09.161,0:26:15.815
And this year, we added a fourth part[br]to our web dossiers,
0:26:15.815,0:26:18.276
which is the Wikidata elements.
0:26:19.008,0:26:22.007
And it all started last year,
0:26:22.007,0:26:25.206
and my story is similar[br]to the story of Olaf, actually.
0:26:25.352,0:26:29.570
Last year, when I had no clue[br]about Wikidata,
0:26:29.570,0:26:33.402
and I discovered this wonderful[br]article by Alex Stinson
0:26:33.402,0:26:36.932
on how to write a query in Wikidata.
0:26:37.382,0:26:41.592
And he chose a subject--[br]a very appealing subject to me.
0:26:41.592,0:26:45.902
Namely, "Discovering Women Writers[br]from North Africa."
0:26:46.402,0:26:51.162
I can really recommend this article,
0:26:51.162,0:26:52.981
because it's very instructive.
0:26:52.981,0:26:57.422
And I thought I will be--[br]I'm going to work on this query,
0:26:57.422,0:27:02.662
and try to change it to:[br]"Southern African Women Writers,"
0:27:02.662,0:27:07.034
and try to add a link[br]to their work in our catalog.
0:27:07.311,0:27:10.861
And on the right-hand side,[br]you see the SPARQL query
0:27:11.592,0:27:15.181
which searches for[br]"Southern African Women Writers."
0:27:15.181,0:27:20.686
If you click on the button,[br]on the blue button on the lefthand side,
0:27:21.526,0:27:23.971
the search result will appear beneath.
0:27:23.971,0:27:26.448
The search result can have[br]different formats.
0:27:26.448,0:27:29.871
In my case, the search result is a map.
0:27:29.871,0:27:32.850
And the nice thing about Wikidata
0:27:32.850,0:27:36.652
is that you can embed[br]to this search result
0:27:36.652,0:27:38.682
into your own webpage,
0:27:38.682,0:27:42.339
and that's what we are now doing[br]with our work dossiers.
0:27:42.339,0:27:47.039
So, this was the very first one[br]on Southern African women writers,
0:27:47.039,0:27:49.649
listed classical three elements,
0:27:49.649,0:27:53.209
plus this map on the lefthand side,
0:27:53.209,0:27:55.650
which gives extra information--
0:27:55.650,0:27:58.219
a link to the Southern African[br]women writer--
0:27:58.219,0:28:00.749
a link to her works in our catalog,
0:28:00.749,0:28:07.252
and a link to the Wikidata record[br]of her birth place, and her name,
0:28:08.219,0:28:13.099
her personal record, plus a photo,[br]if it's available on Wikidata.
0:28:16.231,0:28:20.329
And you have to retrieve a nice map
0:28:20.329,0:28:24.032
with a lot of red dots[br]on the African continent.
0:28:24.032,0:28:28.662
You need nice data in Wikidata,[br]complete, sufficient data.
0:28:29.042,0:28:33.442
So, with our second web dossier[br]on public art in Africa,
0:28:33.442,0:28:38.420
we also started to enhance[br]the data in Wikidata.
0:28:38.420,0:28:43.242
In this case, for a public art--[br]we edited geo-locations--
0:28:43.242,0:28:46.919
geo-locations to Wikidata.
0:28:46.919,0:28:51.139
And we also searched for works[br]of public art in commons,
0:28:51.139,0:28:55.165
and if they don't have[br]a record on Wikidata yet,
0:28:55.165,0:29:00.670
we edited the record to Wikidata.
0:29:00.855,0:29:05.327
And the third thing we do,
0:29:05.327,0:29:09.958
because when we prepare a web dossier,
0:29:09.958,0:29:15.514
we download the titles from our catalog,
0:29:15.514,0:29:17.584
and the tiles are in MARC 21,
0:29:17.584,0:29:23.226
so we have to convert them to a format[br]that is presentable on the website,
0:29:23.226,0:29:28.229
and it takes not much time and effort[br]to convert the same set of titles
0:29:28.229,0:29:30.457
to Wikidata QuickStatements,
0:29:30.457,0:29:36.999
and then, we also upload[br]a title set to Wikidata,
0:29:36.999,0:29:41.254
and you can see the titles we uploaded
0:29:41.254,0:29:44.124
from our latest web dossier
0:29:44.124,0:29:47.514
on African proverbs in Scholia.
0:29:48.546,0:29:52.294
A really nice tool[br]that visualizes Scholia publications
0:29:52.294,0:29:54.674
being present in Wikidata.
0:29:54.674,0:29:59.674
And, one second--when it is possible,[br]we add a Scholia template
0:29:59.674,0:30:01.863
to our web dossier's topic.
0:30:01.863,0:30:03.272
Thank you very much.
0:30:03.272,0:30:08.079
(applause)
0:30:09.255,0:30:11.724
Thank you, Heleen and Ursula.
0:30:12.010,0:30:16.866
Next we have Adrian Pohl[br]presenting using Wikidata
0:30:16.866,0:30:22.265
to improve spatial subject indexing[br]and regional bibliography.
0:30:45.181,0:30:46.621
Okay, hello everybody.
0:30:46.621,0:30:49.630
I'm going right into the topic.
0:30:49.630,0:30:54.146
I only have ten minutes to present[br]a three-year project.
0:30:54.535,0:30:57.044
It wasn't full time. (laughs)
0:30:57.044,0:31:00.100
Okay, what's the NWBib?
0:31:00.100,0:31:04.404
It's an acronym for North-Rhine[br]Westphalian Bibliography.
0:31:04.404,0:31:07.944
It's a regional bibliography[br]that records literature
0:31:07.944,0:31:11.441
about people and places[br]in North Rhine-Westphalia.
0:31:12.534,0:31:14.103
And the monograph's in it--
0:31:15.162,0:31:19.451
there are a lot of articles in it,[br]and most of them are quite unique,
0:31:19.451,0:31:22.052
so, that's the interesting thing[br]about this bibliography--
0:31:22.052,0:31:25.472
because it's often[br]less quite obscure stuff--
0:31:25.472,0:31:28.188
local people writing[br]about that tradition,
0:31:28.188,0:31:29.488
and something like this.
0:31:29.612,0:31:33.428
And there's over 400,000 entries in there.
0:31:33.428,0:31:37.689
And the bibliography started in 1983,
0:31:37.689,0:31:42.718
and so we only have titles[br]from this publication year onwards.
0:31:44.744,0:31:49.166
If you want to take a look at it,[br]it's at nwbib.de,
0:31:49.166,0:31:50.859
that's the web application.
0:31:50.859,0:31:55.389
It's based on our service,[br]lobid.org, the API.
0:31:57.148,0:32:01.220
Because it's cataloged as part[br]of the hbz union catalog,
0:32:01.220,0:32:04.988
which comprises around 20 million records,
0:32:04.988,0:32:08.869
it's an [inaudible] Aleph system[br]we get the data out of there,
0:32:08.869,0:32:11.308
and make RDF out of it,
0:32:11.308,0:32:16.408
and provide it as via JSON [br]or the HTTP API.
0:32:17.129,0:32:20.507
So, the initial status in 2017
0:32:20.507,0:32:25.307
was we had nearly 9,000 distinct strings
0:32:25.307,0:32:28.727
about places--referring to places,[br]in North Rhine-Westphalia.
0:32:28.727,0:32:34.187
Mostly, those were administrative areas,[br]like towns and districts,
0:32:34.187,0:32:38.458
but also monasteries, principalities,[br]or natural regions.
0:32:38.907,0:32:43.517
And we already used Wikidata in 2017,
0:32:43.517,0:32:48.496
and matched those strings[br]with Wikidata API to Wikidata entries
0:32:48.496,0:32:51.907
quite naively to get[br]the geo-coordinates from there,
0:32:51.907,0:32:57.210
and do some geo-based[br]discovery stuff with it.
0:32:57.326,0:32:59.910
But this had some drawbacks.
0:32:59.910,0:33:02.577
And so, the matching was really poor,
0:33:02.577,0:33:05.197
and there were a lot of false positives,
0:33:05.197,0:33:09.184
and we still had no hierarchy[br]in those places,
0:33:09.184,0:33:13.201
and we still had a lot[br]of non-unique names.
0:33:13.505,0:33:15.356
So, this is an example here.
0:33:16.616,0:33:18.378
Does this work?
0:33:18.494,0:33:22.314
Yeah, as you can see,[br]for one place, Brauweiler,
0:33:22.314,0:33:24.615
there are four different strings in there.
0:33:24.820,0:33:27.893
So, we all know how this happens.
0:33:27.893,0:33:31.994
If there's no authority file,[br]you end up with this data.
0:33:31.994,0:33:33.894
But we want to improve on that.
0:33:34.614,0:33:38.211
And as you can also see,[br]that while the matching didn't work--
0:33:38.211,0:33:40.382
so you have this name of the place
0:33:40.382,0:33:45.170
and there's often the name [br]of the superior administrative area,
0:33:45.170,0:33:50.532
and even on the second level,[br]a superior administrative area
0:33:50.532,0:33:52.040
often in the name
0:33:52.040,0:33:58.909
to identify the place successfully.
0:33:58.909,0:34:04.679
So, the goal was to build a full-fledged[br]spatial classification based on this data,
0:34:04.679,0:34:07.109
with a hierarchical view of places,
0:34:09.079,0:34:11.389
with one entry or ID for each place.
0:34:11.518,0:34:17.488
And we got this mock-up[br]by NWBib editors in 2016, made in Excel,
0:34:18.048,0:34:23.116
to get a feeling of what[br]they would like to have.
0:34:25.006,0:34:28.198
There you have the--[br]Regierungsbezirk--
0:34:28.198,0:34:31.016
that's the most superior[br]administrative area--
0:34:31.016,0:34:34.918
we have in there some towns[br]or districts--rural districts--
0:34:34.918,0:34:39.861
and then, it's going down[br]to the parts of towns,
0:34:39.861,0:34:42.011
even to this level.
0:34:43.225,0:34:46.232
And we chose Wikidata for this task.
0:34:46.232,0:34:50.087
We also looked at the GND,[br]the Integrated Authority File,
0:34:50.087,0:34:54.918
and GeoNames--but Wikidata[br]had the best coverage,
0:34:54.918,0:34:56.902
and the best infrastructure.
0:34:58.112,0:35:02.072
The coverage for the places[br]and the geo-coordinates we need,
0:35:02.072,0:35:04.512
and the hierarchical [br]information, for example.
0:35:04.512,0:35:06.732
There were a lot of places, [br]also, in the GND,
0:35:06.732,0:35:09.694
but there was no hierarchical[br]information in there.
0:35:11.170,0:35:13.682
And also, Wikidata provides[br]the infrastructure
0:35:13.682,0:35:15.343
for editing and versioning.
0:35:15.343,0:35:20.022
And there's also a community[br]that helps maintaining the data,
0:35:20.022,0:35:22.052
which was quite good.
0:35:22.950,0:35:26.882
Okay, but there was a requirement[br]by the NWBib editors.
0:35:27.682,0:35:31.447
They did not want to directly[br]rely on Wikidata,
0:35:31.447,0:35:32.972
which was understandable.
0:35:32.972,0:35:34.982
We don't have those servers[br]under our control,
0:35:34.982,0:35:38.002
and we won't know what's going on there.
0:35:38.084,0:35:41.944
There might be some unwelcome edits[br]that destroy the classification,
0:35:41.944,0:35:44.159
or parts of it, or vandalism.
0:35:44.159,0:35:50.794
So, we decide to put[br]an intermediate SKOS file in between,
0:35:50.794,0:35:55.534
on which the application would--[br]which should be generated from Wikidata.
0:35:57.113,0:35:59.462
And SKOS is the Simple Knowledge[br]Organization System--
0:35:59.462,0:36:03.919
it's the standard way to model
0:36:03.919,0:36:07.519
a classification in the linked data world.
0:36:07.603,0:36:09.278
So, how we did it? Five steps.
0:36:09.278,0:36:14.037
I will come to each[br]of the steps in more detail.
0:36:14.037,0:36:18.460
We match the strings to Wikidata[br]with a better approach than before.
0:36:18.727,0:36:23.131
Created classification based[br]on Wikidata, edit,
0:36:23.131,0:36:26.255
then back the links[br]from Wikidata to NWBib
0:36:26.255,0:36:27.590
with a custom property.
0:36:27.590,0:36:32.659
And now, we are in the process[br]of establishing a good process
0:36:32.659,0:36:36.559
for updating the classification[br]in Wikidata.
0:36:36.619,0:36:38.888
Seeing--having a DIF[br]of the changes,
0:36:38.888,0:36:41.158
and then publishing it to the SKOS file.
0:36:42.813,0:36:44.646
I will come to the details.
0:36:44.646,0:36:46.261
So, the matching approach--
0:36:46.261,0:36:48.356
as the API wasn't very sufficient,
0:36:48.356,0:36:53.585
and because we have those[br]different levels in the strings,
0:36:54.441,0:36:59.036
we build a custom Elasticsearch[br]index for our task.
0:36:59.596,0:37:04.378
I think by now, you could probably,[br]as well, use OpenRefine for doing this,
0:37:04.378,0:37:09.306
but at that point in time,[br]it wasn't available for Wikidata.
0:37:10.186,0:37:14.336
And we build this index base[br]on SPARQL query,
0:37:14.336,0:37:20.484
and for entities in NRW,[br]and with a specific type.
0:37:20.484,0:37:25.069
And the query evolved over time a lot.
0:37:25.148,0:37:29.157
And we have a few entries[br]that you can see the history on GitHub.
0:37:29.727,0:37:32.088
So, where we put in the matching index,
0:37:32.088,0:37:36.337
in the spatial object, [br]is what we need in our data.
0:37:36.337,0:37:39.662
It's the label and the ID[br]or the link to Wikidata,
0:37:40.222,0:37:43.874
the geo-coordinates, and the type[br]from Wikidata [inaudible], as well.
0:37:44.194,0:37:50.488
But also for the matching, very important[br]that aliases and the broader thing--
0:37:50.488,0:37:54.138
and this is also an example where the name[br]of the broader entity
0:37:54.138,0:37:57.875
and the district itself are very similar.
0:37:57.937,0:38:03.096
So, it's important to have[br]some type information, as well,
0:38:03.096,0:38:04.606
for the matching.
0:38:04.900,0:38:07.900
So, the nationwide results[br]were very good.
0:38:07.900,0:38:11.110
We could automatically match[br]more than 99% of records
0:38:11.110,0:38:12.265
with this approach.
0:38:13.885,0:38:16.356
These were only 92% of the strings.
0:38:16.540,0:38:18.140
So, obviously, the results--
0:38:18.140,0:38:20.610
those strings that only occurred[br]one or two times
0:38:20.610,0:38:22.419
often didn't appear in Wikidata.
0:38:22.419,0:38:26.309
And so, we had to do a lot of work[br]with those with the [long tail].
0:38:27.905,0:38:32.039
And for around 1,000 strings,[br]the matching was incorrect.
0:38:32.114,0:38:34.950
But the catalogers did a lot of work[br]in the Aleph catalog,
0:38:34.950,0:38:39.869
but also in Wikidata, they made[br]more than 6,000 manual edits to Wikidata
0:38:39.869,0:38:45.019
to reach 100% coverage by adding[br]aliases-type information,
0:38:45.085,0:38:46.615
creating new entries.
0:38:46.615,0:38:49.100
Okay, so, I have to speed up.
0:38:49.546,0:38:54.295
We created classification based on this,[br]on the hierarchical statements.
0:38:54.295,0:38:58.580
P131 is the main property there.
0:38:59.827,0:39:02.495
We added the information to our data.
0:39:03.035,0:39:06.525
So, we now have this[br]in our data spatial object--
0:39:06.525,0:39:11.535
and we focus this--the link to Wikidata,[br]and the types are there,
0:39:12.625,0:39:17.554
and here's the ID[br]from the SKOS classification
0:39:17.554,0:39:19.234
we built based on Wikidata.
0:39:20.034,0:39:23.555
And you can see there[br]are Q identifiers in there.
0:39:26.940,0:39:29.286
Now, you can basically query our API
0:39:29.286,0:39:34.051
with such a query using Wikidata URIs,
0:39:34.316,0:39:38.627
and get literature, in this example,[br]about Cologne back.
0:39:39.724,0:39:45.675
Then we created a Wikidata property[br]for NWBib and edit those links
0:39:45.675,0:39:50.995
from Wikidata to the classification--[br]batch load them with QuickStatements.
0:39:52.105,0:39:53.634
And there's also a nice--
0:39:53.634,0:39:59.344
also a move to using a qualifier[br]on this property
0:39:59.344,0:40:02.994
to add the broader information there.
0:40:02.994,0:40:06.333
So, I think people won't mess around[br]that work with this,
0:40:06.333,0:40:09.223
and as with the P131 statement.
0:40:10.094,0:40:11.743
So, this is what it looks like.
0:40:12.563,0:40:16.142
This will go to the classification[br]where you can then start a query.
0:40:18.670,0:40:23.293
Now, we have to build this[br]update and review process,
0:40:23.293,0:40:28.692
and we will add those data like this,
0:40:28.692,0:40:32.452
with a zero sub-field to Aleph,
0:40:32.452,0:40:36.962
and the catalogers will start[br]using those Wikidata based IDs,
0:40:36.962,0:40:41.012
URIs, for cataloging for spatial indexing.
0:40:44.702,0:40:50.082
So, by now, there are more than 400,000[br]NWBib entries with links to Wikidata,
0:40:50.082,0:40:55.905
and more than 4,400 Wikidata entries[br]with links to NWBib.
0:40:56.617,0:40:58.042
Thank you.
0:40:58.042,0:41:03.182
(applause)
0:41:07.574,0:41:09.682
Thank you, Adrian.
0:41:13.312,0:41:15.472
I got it. Thank you.
0:41:31.122,0:41:34.402
So, as you've seen me before,[br]I'm Hilary Thorsen.
0:41:34.402,0:41:36.152
I'm Wikimedian in residence
0:41:36.152,0:41:38.382
with the Linked Data[br]for Production Project.
0:41:38.382,0:41:39.942
I am based at Stanford,
0:41:39.942,0:41:42.590
and I'm here today[br]with my colleague, Lena Denis,
0:41:42.590,0:41:45.581
who is Cartographic Assistant[br]at Harvard Library.
0:41:45.581,0:41:50.041
And Christine Fernsebner Eslao[br]is here in spirit.
0:41:50.041,0:41:53.530
She is currently back in Boston,[br]but supporting us from afar.
0:41:53.530,0:41:56.240
So, we'll be talking[br]about Wikidata and Libraries
0:41:56.240,0:42:00.350
as partners in data production,[br]organization, and project inspiration.
0:42:00.850,0:42:04.300
And our work is part of the Linked Data[br]for Production Project.
0:42:05.450,0:42:08.190
So, Linked Data for Production[br]is in its second phase,
0:42:08.190,0:42:10.450
called Pathway for Implementation.
0:42:10.450,0:42:13.291
And it's an Andrew W. Mellon[br]Foundation grant,
0:42:13.291,0:42:16.120
involving the partnership[br]of several universities,
0:42:16.120,0:42:20.280
with the goal of constructing a pathway[br]for shifting the catalog community
0:42:20.280,0:42:24.860
to begin describing library[br]resources with linked data.
0:42:24.860,0:42:26.919
And it builds upon a previous grant,
0:42:26.919,0:42:30.369
but this iteration is focused[br]on the practical aspects
0:42:30.369,0:42:32.009
of the transition.
0:42:33.559,0:42:35.650
One of these pathways of investigation
0:42:35.650,0:42:39.000
has been integrating[br]library metadata with Wikidata.
0:42:39.429,0:42:41.054
We have a lot of questions,
0:42:41.054,0:42:42.999
but some of the ones[br]we're most interested in
0:42:42.999,0:42:46.180
are how we can integrate[br]library metadata with Wikidata,
0:42:46.180,0:42:49.580
and make contribution[br]a part of our cataloging workflows,
0:42:49.580,0:42:53.589
how Wikidata can help us improve[br]our library discovery environment,
0:42:53.589,0:42:55.929
how it can help us reveal[br]more relationships
0:42:55.929,0:42:59.629
and connections within our data[br]and with external data sets,
0:42:59.629,0:43:04.370
and if we have connections in our own data[br]that can be added to Wikidata,
0:43:04.370,0:43:07.480
how libraries can help[br]fill in gaps in Wikidata,
0:43:07.480,0:43:09.969
and how libraries can work[br]with local communities
0:43:09.969,0:43:13.070
to describe library[br]and archival resources.
0:43:14.010,0:43:17.129
Finding answers to these questions[br]has focused on the mutual benefit
0:43:17.129,0:43:19.649
for the library and Wikidata communities.
0:43:19.649,0:43:22.949
We've learned through starting to work[br]on our different Wikidata projects,
0:43:22.949,0:43:25.279
that many of the issues[br]libraries grapple with,
0:43:25.279,0:43:29.451
like data modeling, identity management,[br]data maintenance, documentation,
0:43:29.451,0:43:31.289
and instruction on linked data,
0:43:31.289,0:43:33.970
are ones the Wikidata[br]community works on too.
0:43:34.370,0:43:36.099
I'm going to turn things over to Lena
0:43:36.099,0:43:39.640
to talk about what[br]she's been working on now.
0:43:46.550,0:43:51.040
Hi, so, as Hilary briefly mentioned,[br]I work as a map librarian at Harvard,
0:43:51.040,0:43:54.180
where I process maps, atlases,[br]and archives for our online catalog.
0:43:54.180,0:43:56.580
And while processing two-dimensional[br]cartographic works
0:43:56.580,0:43:59.572
is relatively straighforward,[br]cataloging archival collections
0:43:59.572,0:44:02.429
so that their cartographic resources[br]can be made discoverable,
0:44:02.429,0:44:04.119
has always been more difficult.
0:44:04.119,0:44:06.989
So, my use case for Wikidata[br]is visually modeling relationships
0:44:06.989,0:44:10.389
between archival collections[br]and the individual items within them,
0:44:10.389,0:44:13.210
as well as between archival drafts[br]in published works.
0:44:13.359,0:44:17.329
So, I used Wikidata to highlight the work[br]of our cartographer named Erwin Raisz,
0:44:17.329,0:44:19.890
who worked at Harvard[br]in the early 20th-century.
0:44:19.890,0:44:22.539
He was known for his vividly detailed[br]and artistic land forms,
0:44:22.539,0:44:23.939
like this one on the screen--
0:44:23.939,0:44:26.294
but also for inventing[br]the armadillo projection,
0:44:26.294,0:44:29.020
writing the first cartography[br]textbook in English
0:44:29.020,0:44:31.318
and other various[br]important contributions
0:44:31.318,0:44:32.919
to the field of geography.
0:44:32.919,0:44:34.609
And at the Harvard Map Collection,
0:44:34.609,0:44:38.509
we have a 66-item collection[br]of Raisz's field notebooks,
0:44:38.509,0:44:41.359
which begin when he was a student[br]and end just before his death.
0:44:43.679,0:44:46.229
So, this is the collection-level record[br]that I made for them,
0:44:46.229,0:44:47.994
which merely gives an overview,
0:44:47.994,0:44:50.513
but his notebooks are full of information
0:44:50.513,0:44:53.351
that he used in later atlases,[br]maps, and textbooks.
0:44:53.351,0:44:56.313
But researchers don't know how to find[br]that trajectory information,
0:44:56.313,0:44:58.665
and the system[br]is not designed to show them.
0:45:01.030,0:45:03.734
So, I felt that with Wikidata,[br]and other Wikimedia platforms,
0:45:03.734,0:45:05.154
I'd be able to take advantage
0:45:05.154,0:45:08.075
of information that already exists[br]about him on the open web,
0:45:08.075,0:45:10.629
along with library records[br]and a notebook inventory
0:45:10.629,0:45:12.574
that I had made in an Excel spreadsheet
0:45:12.574,0:45:15.416
to show relationships and influences[br]between his works.
0:45:15.574,0:45:18.594
So here, you can see how I edited[br]and reconciled library data
0:45:18.594,0:45:20.165
in OpenRefine.
0:45:20.165,0:45:23.164
And then, I used QuickStatements[br]to batch import my results.
0:45:23.304,0:45:25.244
So, now, I was ready[br]to create knowledge graphs
0:45:25.244,0:45:27.864
with SPARQL queries[br]to show patterns of influence.
0:45:30.084,0:45:33.304
The examples here show[br]how I leveraged Wikimedia Commons images
0:45:33.304,0:45:34.664
that I connected to him.
0:45:34.664,0:45:36.459
And the hierarchy of some of his works
0:45:36.459,0:45:38.604
that were contributing[br]factors to other works.
0:45:38.604,0:45:42.354
So, modeling Raisz's works on Wikidata[br]allowed me to encompass in a single image,
0:45:42.354,0:45:45.890
or in this case, in two images,[br]the connections that require many pages
0:45:45.890,0:45:47.864
of bibliographic data to reveal.
0:45:51.684,0:45:55.544
So, this video is going to load.
0:45:55.563,0:45:57.233
Yes! Alright.
0:45:57.233,0:46:00.113
This video is a minute and a half long[br]screencast I made,
0:46:00.113,0:46:02.033
that I'm going to narrate as you watch.
0:46:02.033,0:46:05.423
It shows the process of inputting[br]and then running a SPARQL query,
0:46:05.423,0:46:09.283
showing hierarchical relationships[br]between notebooks, an atlas, and a map
0:46:09.283,0:46:11.033
that Raisz created about Cuba.
0:46:11.033,0:46:12.603
He worked there before the revolution,
0:46:12.603,0:46:14.633
so he had the unique position[br]of having support
0:46:14.633,0:46:17.013
from both the American[br]and the Cuban governments.
0:46:17.334,0:46:20.583
So, I made this query as an example[br]to show people who work on Raisz,
0:46:20.583,0:46:24.134
and who are interested in narrowing down[br]what materials they'd like to request
0:46:24.134,0:46:26.154
when they come to us for research.
0:46:26.154,0:46:29.684
To make the approach replicable[br]for other archival collections,
0:46:29.684,0:46:33.105
I hope that Harvard and other institutions[br]will prioritize Wikidata look-ups
0:46:33.105,0:46:35.414
as they move to linked data[br]cataloging production,
0:46:35.414,0:46:37.520
which my co-presenters[br]can speak to the progress on
0:46:37.520,0:46:38.854
better than I can.
0:46:38.854,0:46:41.543
But my work has brought me--[br]has brought to mind a particular issue
0:46:41.543,0:46:46.580
that I see as a future opportunity,[br]which is that of archival modeling.
0:46:47.369,0:46:52.302
So, to an archivist, an item[br]is a discrete archival material
0:46:52.302,0:46:55.000
within a larger collection[br]of archival materials
0:46:55.000,0:46:56.884
that is not a physical location.
0:46:56.884,0:47:00.663
So an archivist from the American National[br]Archives and Records Administration,
0:47:00.663,0:47:02.943
who is also a Wikidata enthusiast,
0:47:02.943,0:47:05.742
advised me when I was trying[br]to determine how to express this
0:47:05.742,0:47:07.734
using an example item,
0:47:07.734,0:47:10.456
that I'm going to show[br]as soon as this video is finally over.
0:47:11.433,0:47:14.391
Alright. Great.
0:47:20.437,0:47:22.100
Nope, that's not what I wanted.
0:47:22.135,0:47:23.536
Here we go.
0:47:31.190,0:47:32.280
It's doing that.
0:47:32.280,0:47:34.154
(humming)
0:47:34.208,0:47:37.418
Nope. Sorry. Sorry.
0:47:40.444,0:47:43.045
Alright, I don't know why[br]it's not going full screen again.
0:47:43.045,0:47:44.329
I can't get it to do anything.
0:47:44.329,0:47:46.880
But this is the-- oh, my gosh.
0:47:46.880,0:47:48.235
Stop that. Alright.
0:47:48.235,0:47:51.195
So, this is the item that I mentioned.
0:47:51.575,0:47:53.655
So, this was what the archivist
0:47:53.655,0:47:55.964
from the National Archives[br]and Records Administration
0:47:55.964,0:47:57.414
showed me as an example.
0:47:57.414,0:48:02.414
And he recommended this compromise,[br]which is to use the part of property
0:48:02.414,0:48:05.614
to connect a lower level description[br]to a higher level of description,
0:48:05.614,0:48:08.534
which allows the relationships[br]between different hierarchical levels
0:48:08.534,0:48:10.840
to be asserted as statements[br]and qualifiers.
0:48:10.840,0:48:12.884
So, in this example that's on screen,
0:48:12.884,0:48:16.294
the relationship between an item,[br]a series, a collection, and a record group
0:48:16.294,0:48:19.655
are thus contained and described[br]within a Wikidata item entity.
0:48:19.655,0:48:22.024
So, I followed this model[br]in my work on Raisz.
0:48:22.704,0:48:26.024
And one of my images is missing.
0:48:26.024,0:48:27.971
No, it's not. It's right there. I'm sorry.
0:48:28.210,0:48:30.613
And so, I followed this model[br]on my work on Raisz,
0:48:30.613,0:48:33.103
but I look forward[br]to further standardization.
0:48:38.983,0:48:41.352
So, another archival project[br]Harvard is working on
0:48:41.352,0:48:44.632
is the Arthur Freedman collection[br]of more than 2,000 hours
0:48:44.632,0:48:48.702
of punk rock performances[br]from the 1970s to early 2000s
0:48:48.702,0:48:51.970
in the Boston and Cambridge,[br]Massachussets areas.
0:48:51.970,0:48:55.145
It includes many bands and venues[br]that no longer exist.
0:48:55.604,0:48:59.505
So far, work has been done in OpenRefine[br]on reconciliation of the bands and venues
0:48:59.505,0:49:02.324
to see which need an item[br]created in Wikidata.
0:49:02.886,0:49:05.964
A basic item will be created[br]via batch process next spring,
0:49:05.964,0:49:08.697
and then, an edit-a-thon will be [br]held in conjunction
0:49:08.697,0:49:12.254
with the New England Music Library[br]Association's meeting in Boston
0:49:12.254,0:49:15.866
to focus on adding more statements[br]to the batch-created items,
0:49:15.866,0:49:18.937
by drawing on local music[br]community knowledge.
0:49:18.937,0:49:22.086
We're interested in learning more[br]about models for pairing librarians
0:49:22.086,0:49:26.310
and Wiki enthusiasts with new contributors[br]who have domain knowledge.
0:49:26.297,0:49:29.293
Items will eventually be linked[br]to digitized video
0:49:29.293,0:49:31.387
in Harvard's digital collection platform
0:49:31.387,0:49:33.167
once rights have[br]been cleared with artists,
0:49:33.167,0:49:35.147
which will likely be a slow process.
0:49:36.327,0:49:38.030
There's also a great amount of interest
0:49:38.030,0:49:41.680
in moving away from manual cataloging[br]and creation of authority data
0:49:41.680,0:49:43.247
towards identity management,
0:49:43.247,0:49:45.667
where descriptions[br]can be created in batches.
0:49:45.667,0:49:48.057
An additional project that focused on
0:49:48.057,0:49:51.297
creating international standard[br]name identifiers, or ISNIs,
0:49:51.297,0:49:53.477
for avant-garde and women filmmakers
0:49:53.477,0:49:57.657
can be adapted for creating Wikidata items[br]for these filmmakers, as well.
0:49:57.657,0:50:01.076
Spreadsheets with the ISNIs,[br]filmmaker names, and other details
0:50:01.076,0:50:04.697
can be reconciled in OpenRefine,[br]and uploaded with QuickStatements.
0:50:04.910,0:50:06.940
Once people in organizations[br]have been described,
0:50:06.940,0:50:09.316
we'll move toward describing[br]the films in Wikidata,
0:50:09.316,0:50:12.526
which will likely present[br]some additional modeling challenges.
0:50:13.446,0:50:15.486
A library presentation[br]wouldn't be complete
0:50:15.486,0:50:16.882
without a MARC record.
0:50:16.882,0:50:19.916
Here, you can see the record[br]for Karen Aqua's taxonomy film,
0:50:19.916,0:50:22.096
where her ISNI and Wikidata Q number
0:50:22.096,0:50:24.176
have been added to the 100 field.
0:50:24.176,0:50:26.636
The ISNIs and Wikidata Q numbers[br]that have been created
0:50:26.636,0:50:30.066
can then be batch added[br]back into MARC records via MarcEdit.
0:50:30.066,0:50:33.236
You might be asking why I'm showing you[br]this ugly MARC record,
0:50:33.236,0:50:35.596
instead of some beautiful[br]linked data statements.
0:50:35.596,0:50:38.576
And that's because our libraries[br]will be working in a hybrid environment
0:50:38.576,0:50:39.896
for some time.
0:50:39.896,0:50:42.326
Our library catalogs still relies[br]on MARC records,
0:50:42.326,0:50:44.076
so by adding in these URIs,
0:50:44.076,0:50:46.366
we can try to take advantage[br]of linked data,
0:50:46.366,0:50:48.346
while our systems still use MARC.
0:50:49.496,0:50:52.950
Adding URIs into MARC records[br]makes an additional aspect
0:50:52.950,0:50:54.335
of our project possible.
0:50:54.335,0:50:56.894
Work has been done at Stanford[br]and Cornell to bring data
0:50:56.894,0:51:01.873
from Wikidata into our library catalog[br]using URIs already in our MARC records.
0:51:02.334,0:51:05.090
You can see an example[br]of a knowledge panel,
0:51:05.090,0:51:06.984
where all the data is sourced[br]from Wikidata,
0:51:06.984,0:51:11.004
and links back to the item itself,[br]along with an invitation to contribute.
0:51:11.403,0:51:15.130
This is currently in a test environment,[br]not in production in our catalog.
0:51:15.130,0:51:17.444
Ideally, eventually,[br]these will be generated
0:51:17.444,0:51:19.916
from linked data descriptions[br]of library resources
0:51:19.916,0:51:22.954
created using Sinopia,[br]our linked data editor
0:51:22.954,0:51:24.563
developed for cataloging.
0:51:24.563,0:51:27.994
We found that adding a look-up[br]to Wikidata in Sinopia is difficult.
0:51:27.994,0:51:31.514
The scale and modeling of Wikidata[br]makes it hard to partition the data
0:51:31.514,0:51:33.544
to be able to look up typed entities,
0:51:33.544,0:51:34.900
and we've run into the problem
0:51:34.900,0:51:37.493
of SPARQL not being good[br]for keyword search,
0:51:37.493,0:51:41.883
but wanting our keyword APIs[br]to return SPARQL-like RDF descriptions.
0:51:41.883,0:51:45.043
So, as you can see, we still have[br]quite a bit of work to do.
0:51:45.043,0:51:47.937
This round of the grant[br]runs until June 2020,
0:51:47.937,0:51:50.163
so, we'll be continuing our exploration.
0:51:50.163,0:51:53.113
And I just wanted to invite anyone
0:51:53.113,0:51:57.573
who's continued an interest in talking[br]about Wikidata and libraries,
0:51:57.573,0:52:01.454
I lead a Wikidata Affinity Group[br]that's open to anyone to join.
0:52:01.454,0:52:03.013
We meet every two weeks,
0:52:03.013,0:52:05.513
and our next call is Tuesday,[br]November the 5th,
0:52:05.513,0:52:08.073
so if you're interested[br]in continuing discussions,
0:52:08.073,0:52:10.393
I would love to talk with you further.
0:52:10.393,0:52:11.890
Thank you, everyone.
0:52:11.890,0:52:13.623
And thank you to the other presenters
0:52:13.623,0:52:16.893
for talking about all[br]of their wonderful projects.
0:52:16.893,0:52:21.283
(applause)