0:00:07.133,0:00:11.738 I work as a teacher[br]at the University of Alicante, 0:00:11.738,0:00:17.040 where I recently obtained my PhD[br]on data libraries and linked open data. 0:00:17.040,0:00:19.038 And I'm also a software developer 0:00:19.038,0:00:21.718 at the Biblioteca Virtual[br]Miguel de Cervantes. 0:00:21.718,0:00:24.467 And today, I'm going to talk[br]about data quality. 0:00:28.252,0:00:31.527 Well, those are my colleagues[br]at the university. 0:00:32.457,0:00:36.727 And as you may know, many organizations[br]are publishing their data 0:00:36.727,0:00:38.447 or linked open data-- 0:00:38.447,0:00:41.437 for example,[br]the National Library of France, 0:00:41.437,0:00:45.947 the National Library of Spain,[br]us, which is Cervantes Virtual, 0:00:45.947,0:00:49.007 the British National Bibliography, 0:00:49.007,0:00:51.667 the Library of Congress and Europeana. 0:00:51.667,0:00:56.000 All of them provide a SPARQL endpoint, 0:00:56.000,0:00:58.875 which is useful in order[br]to retrieve the data. 0:00:59.104,0:01:00.984 And if I'm not wrong, 0:01:00.984,0:01:05.890 the Library of Congress only provide[br]the data as a dump that you can't use. 0:01:07.956,0:01:13.787 When we publish our repository[br]as linked open data, 0:01:13.787,0:01:17.475 my idea was to be reused[br]by other institutions. 0:01:17.981,0:01:24.000 But what about if I'm an institution[br]who wants to enrich their data 0:01:24.000,0:01:27.435 with any data from other data libraries. 0:01:27.574,0:01:30.674 Which data set should I use? 0:01:30.674,0:01:34.314 Which data set is better[br]in terms of quality? 0:01:36.874,0:01:41.314 The benefits of the evaluation[br]of data quality in libraries are many. 0:01:41.314,0:01:47.143 For example, methodologies can be improved[br]in order to include new criteria, 0:01:47.182,0:01:49.162 in order to assess the quality. 0:01:49.162,0:01:54.592 And also, organizations can benefit[br]from best practices and guidelines 0:01:54.602,0:01:58.270 in order to publish their data[br]as linked open data. 0:02:00.012,0:02:03.462 What do we need[br]in order to assess the quality? 0:02:03.462,0:02:06.862 Well, obviously, a set of candidates[br]and a set of features. 0:02:06.862,0:02:10.077 For example, do they have[br]a SPARQL endpoint, 0:02:10.077,0:02:13.132 do they have a web interface,[br]how many publications do they have, 0:02:13.132,0:02:18.092 how many vocabularies do they use,[br]how many Wikidata properties do they have, 0:02:18.092,0:02:20.892 and where can I get those candidates? 0:02:20.892,0:02:22.472 I use LOD Cloud-- 0:02:22.472,0:02:27.422 but when I was doing this slide,[br]I thought about using Wikidata 0:02:27.562,0:02:29.746 in order to retrieve those candidates. 0:02:29.746,0:02:34.295 For example, getting entities[br]of type data library, 0:02:34.295,0:02:36.473 which has a SPARQL endpoint. 0:02:36.473,0:02:38.693 You have here the link. 0:02:41.453,0:02:45.083 And I come up with those data libraries. 0:02:45.104,0:02:50.233 The first one uses bibliographic ontology[br]as main vocabulary, 0:02:50.233,0:02:54.122 and the others are based,[br]more or less, on FRBR, 0:02:54.122,0:02:57.180 which is a vocabulary published by IFLA. 0:02:57.180,0:03:00.013 And this is just an example[br]of how we could compare 0:03:00.013,0:03:04.393 data libraries using[br]bubble charts on Wikidata. 0:03:04.393,0:03:08.613 And this is just an example comparing[br]how many Wikidata properties 0:03:08.613,0:03:10.633 are per data library. 0:03:13.483,0:03:15.980 Well, how can we measure quality? 0:03:15.928,0:03:17.972 There are different methodologies, 0:03:17.972,0:03:19.726 for example, FRBR 1, 0:03:19.726,0:03:24.337 which provides a set of criteria[br]grouped by dimensions, 0:03:24.337,0:03:27.556 and those in green[br]are the ones that I found-- 0:03:27.556,0:03:30.917 that I could assess by means of Wikidata. 0:03:33.870,0:03:39.397 And we also find that we[br]could define new criteria, 0:03:39.397,0:03:44.567 for example, a new one to evaluate[br]the number of duplications in Wikidata. 0:03:45.047,0:03:47.206 We use those properties. 0:03:47.206,0:03:50.098 And this is an example of SPARQL, 0:03:50.098,0:03:54.486 in order to count the number[br]of duplicates property. 0:03:57.136,0:04:00.366 And about the results,[br]while at the moment of doing this study, 0:04:00.366,0:04:05.216 not the slides, there was no property[br]for the British National Bibliography. 0:04:05.860,0:04:08.260 They don't provide provenance information, 0:04:08.260,0:04:11.536 which could be useful[br]for metadata enrichment. 0:04:11.536,0:04:14.660 And they don't allow[br]to edit the information. 0:04:14.660,0:04:17.166 So, we've been talking[br]about Wikibase the whole weekend, 0:04:17.166,0:04:21.396 and maybe we should try to adopt[br]Wikibase as an interface. 0:04:23.186,0:04:25.436 And they are focused on their own content, 0:04:25.436,0:04:28.856 and this is just the SPARQL query[br]based on Wikidata 0:04:28.856,0:04:31.411 in order to assess the population. 0:04:32.066,0:04:36.006 And the BnF provides labels[br]in multiple languages, 0:04:36.006,0:04:38.956 and they all use self-describing URIs, 0:04:38.956,0:04:43.058 which is that in the URI,[br]they have the type of entity, 0:04:43.058,0:04:48.406 which allows the human reader[br]to understand what they are using. 0:04:51.499,0:04:55.256 And more results, they provide[br]different output format, 0:04:55.256,0:04:58.646 they use external vocabularies. 0:04:58.854,0:05:01.116 Only the British National Bibliography 0:05:01.116,0:05:03.734 provides machine-readable[br]licensing information. 0:05:03.734,0:05:09.124 And up to one-third of the instances[br]are connected to external repositories, 0:05:09.124,0:05:11.225 which is really nice. 0:05:12.604,0:05:18.290 And while this study, this work[br]has been done in our Labs team, 0:05:18.364,0:05:22.391 a lab in a GLAM is a group of people 0:05:22.391,0:05:27.520 who want to explore new ways 0:05:27.587,0:05:30.306 of reusing data collections. 0:05:31.039,0:05:35.054 And there's a community[br]led by the British Library, 0:05:35.054,0:05:37.366 and in particular, Mahendra Mahey, 0:05:37.366,0:05:40.610 and we had a first event in London, 0:05:40.610,0:05:42.601 and another one in Copenhagen, 0:05:42.601,0:05:45.279 and we're going to have a new one in May 0:05:45.279,0:05:48.240 at the Library of Congress in Washington. 0:05:48.528,0:05:52.481 And we are now 250 people. 0:05:52.481,0:05:56.421 And I'm so glad that I found[br]somebody here at the WikidataCon 0:05:56.421,0:05:58.860 who has just joined us-- 0:05:58.860,0:06:01.160 Sylvia from [inaudible], Mexico. 0:06:01.160,0:06:04.509 And I'd like to invite you[br]to our community, 0:06:04.509,0:06:09.719 since you may be part[br]of a GLAM institution. 0:06:10.659,0:06:13.164 So, we can talk later[br]if you want to know about this. 0:06:14.589,0:06:16.719 And this--it's all about people. 0:06:16.719,0:06:19.669 This is me, people[br]from the British Library, 0:06:19.669,0:06:24.629 Library of Congress, Universities,[br]and National Libraries in Europe 0:06:24.871,0:06:28.050 And there's a link here[br]in case you want to know more. 0:06:28.433,0:06:32.655 And, well, last month,[br]we decided to meet in Doha 0:06:32.655,0:06:37.448 in order to write a book[br]about how to create a lab in our GLAM. 0:06:38.585,0:06:43.279 And they choose 15 people,[br]and I was so lucky to be there. 0:06:45.314,0:06:48.594 And the book follows[br]the Booksprint methodology, 0:06:48.594,0:06:51.674 which means that nothing[br]is prepared beforehand. 0:06:51.674,0:06:53.495 All is done there in a week. 0:06:53.495,0:06:55.725 And believe me, it was really hard work 0:06:55.725,0:06:58.905 to have their whole book[br]done in this week. 0:06:59.890,0:07:04.490 And I'd like to introduce you to the book,[br]which will be published-- 0:07:04.490,0:07:06.455 it was supposed to be published this week, 0:07:06.455,0:07:08.274 but it will be next week. 0:07:08.974,0:07:13.014 And it will be published open,[br]so you can have it, 0:07:13.065,0:07:15.668 and I can show you[br]a little bit later if you want. 0:07:15.734,0:07:17.601 And those are the authors. 0:07:17.601,0:07:19.678 I'm here-- I'm so happy, too. 0:07:19.678,0:07:22.110 And those are the institutions-- 0:07:22.110,0:07:26.722 Library of Congress, British Library--[br]and this is the title. 0:07:27.330,0:07:29.604 And now, I'd like to show you-- 0:07:31.441,0:07:33.971 a map that I'm doing. 0:07:34.278,0:07:37.234 We are launching a website[br]for our community, 0:07:37.234,0:07:42.893 and I'm in charge of creating a map[br]with our institutions there. 0:07:43.097,0:07:44.860 This is not finished. 0:07:44.860,0:07:50.276 But this is just SPARQL, and below, 0:07:51.546,0:07:53.027 we see the map. 0:07:53.027,0:07:58.086 And we see here[br]the new people that I found, here, 0:07:58.086,0:08:00.486 at the WikidataCon--[br]I'm so happy for this. 0:08:00.621,0:08:05.631 And we have here my data library[br]of my university, 0:08:05.681,0:08:08.490 and many other institutions. 0:08:09.051,0:08:10.940 Also, from Australia-- 0:08:11.850,0:08:13.061 if I can do it. 0:08:13.930,0:08:15.711 Well, here, we have some links. 0:08:19.586,0:08:21.088 There you go. 0:08:21.189,0:08:23.059 Okay, this is not finished. 0:08:23.539,0:08:26.049 We are still working on this,[br]and that's all. 0:08:26.057,0:08:28.170 Thank you very much for your attention. 0:08:28.858,0:08:33.683 (applause) 0:08:41.962,0:08:48.079 [inaudible] 0:08:59.490,0:09:00.870 Good morning, everybody. 0:09:00.870,0:09:01.930 I'm Olaf Janssen. 0:09:01.930,0:09:03.570 I'm the Wikimedia coordinator 0:09:03.570,0:09:06.150 at the National Library[br]of the Netherlands. 0:09:06.310,0:09:08.390 And I would like to share my work, 0:09:08.390,0:09:11.610 which I'm doing about creating[br]Linked Open Data 0:09:11.640,0:09:15.351 for Dutch Public Libraries using Wikidata. 0:09:17.600,0:09:20.850 And my story starts roughly a year ago 0:09:20.850,0:09:24.581 when I was at the GLAM Wiki conference[br]in Tel Aviv, in Israel. 0:09:25.301,0:09:27.938 And there are two men[br]with very similar shirts, 0:09:27.938,0:09:31.120 and equally similar hairdos, [Matt]... 0:09:31.120,0:09:33.440 (laughter) 0:09:33.440,0:09:35.325 And on the left, that's me. 0:09:35.325,0:09:39.065 And a year ago, I didn't have[br]any practical knowledge and skills 0:09:39.065,0:09:40.265 about Wikidata. 0:09:40.265,0:09:43.285 I looked at Wikidata,[br]and I looked at the items, 0:09:43.285,0:09:44.524 and I played with it. 0:09:44.524,0:09:47.070 But I wasn't able to make a SPARQL query 0:09:47.070,0:09:50.285 or to do data modeling[br]with the right shape expression. 0:09:51.305,0:09:52.865 That's a year ago. 0:09:53.465,0:09:57.065 And on the lefthand side,[br]that's Simon Cobb, user: Sic19. 0:09:57.304,0:10:00.265 And I was talking to him,[br]because, just before, 0:10:00.525,0:10:01.974 he had given a presentation 0:10:01.974,0:10:06.374 about improving the coverage[br]of public libraries in Wikidata. 0:10:06.757,0:10:08.934 And I was very inspired by his talk. 0:10:09.564,0:10:13.355 And basically, he was talking[br]about adding basic data 0:10:13.355,0:10:14.867 about public libraries. 0:10:14.867,0:10:19.046 So, the name of the library, if available,[br]the photo of the building, 0:10:19.046,0:10:21.497 the address data of the library, 0:10:21.497,0:10:25.120 the geo-coordinates[br]latitude and longitude, 0:10:25.120,0:10:26.367 and some other things, 0:10:26.367,0:10:29.187 including with all source references. 0:10:31.317,0:10:34.557 And what I was very impressed[br]about a year ago was this map. 0:10:34.557,0:10:37.337 This is a map about[br]public libraries in the U.K. 0:10:37.337,0:10:38.577 with all the colors. 0:10:38.577,0:10:43.017 And you can see that all the libraries[br]are layered by library organizations. 0:10:43.017,0:10:46.210 And when he showed this,[br]I was really, "Wow, that's cool." 0:10:46.637,0:10:49.138 So, then, one minute later, I thought, 0:10:49.138,0:10:52.918 "Well, let's do it[br]for the country for that one." 0:10:52.918,0:10:54.850 (laughter) 0:10:57.149,0:10:59.496 And something about public libraries[br]in the Netherlands-- 0:10:59.496,0:11:03.020 there are about 1,300 library[br]branches in our country, 0:11:03.020,0:11:06.710 grouped into 160 library organizations. 0:11:07.723,0:11:10.937 And you might wonder why[br]do I want to do this project? 0:11:10.997,0:11:14.137 Well, first of all, because[br]for the common good, for society, 0:11:14.137,0:11:16.707 because I think using Wikidata, 0:11:16.707,0:11:20.657 and from there,[br]creating Wikipedia articles, 0:11:20.657,0:11:23.417 and opening it up[br]via the linked open data cloud-- 0:11:23.417,0:11:29.006 it's improving visibility and reusability[br]of public libraries in the Netherlands. 0:11:30.110,0:11:32.197 And my second goal was actually[br]a more personal one, 0:11:32.197,0:11:36.517 because a year ago, I had this[br]yearly evaluation with my manager, 0:11:37.243,0:11:41.737 and we decided it was a good idea[br]that I got more practical skills 0:11:41.737,0:11:45.853 on linked open data, data modeling,[br]and also on Wikidata. 0:11:46.464,0:11:50.286 And of course, I wanted to be able to make[br]these kinds of maps myself. 0:11:50.286,0:11:51.396 (laughter) 0:11:54.345,0:11:57.100 Then you might wonder[br]why do I want to do this? 0:11:57.100,0:12:01.723 Isn't there already enough basic[br]library data out there in the Netherlands 0:12:02.450,0:12:04.233 to have a good coverage? 0:12:06.019,0:12:08.367 So, let me show you some of the websites 0:12:08.367,0:12:12.882 that are available to discover[br]address and location information 0:12:12.882,0:12:14.505 about Dutch public libraries. 0:12:14.505,0:12:17.722 And the first one is this one--[br]Gidsvoornederland.nl-- 0:12:17.722,0:12:20.641 and that's the official[br]public library inventory 0:12:20.641,0:12:23.037 maintained by my library,[br]the National Library. 0:12:23.727,0:12:29.160 And you can look up addresses[br]and geo-coordinates on that website. 0:12:30.493,0:12:32.797 Then there is this site,[br]Bibliotheekinzicht-- 0:12:32.797,0:12:36.502 this is also an official website[br]maintained by my National Library. 0:12:36.502,0:12:38.982 And this is about[br]public library statistics. 0:12:41.010,0:12:43.933 Then there is another one,[br]debibliotheken.nl-- 0:12:43.933,0:12:46.005 as you can see there is also[br]address information 0:12:46.005,0:12:49.659 about library organizations,[br]not about individual branches. 0:12:51.724,0:12:55.010 And there's even this one,[br]which also has address information. 0:12:56.546,0:12:59.028 And of course, there's something[br]like Google Maps, 0:12:59.028,0:13:02.157 which also has all the names[br]and the locations and the addresses. 0:13:03.455,0:13:06.218 And this one, the International[br]Library of Technology, 0:13:06.218,0:13:09.580 which has a worldwide[br]inventory of libraries, 0:13:09.646,0:13:11.393 including the Netherlands. 0:13:13.058,0:13:15.049 And I even discovered there is a data set 0:13:15.049,0:13:18.423 you can buy for 50 euros or so[br]to download it. 0:13:18.423,0:13:21.023 And there is also--seems to be[br]I didn't download it, 0:13:21.023,0:13:23.633 but there seems to be address[br]information available. 0:13:24.273,0:13:30.180 You might wonder is this kind of data[br]good enough for the purposes I had? 0:13:32.282,0:13:37.372 So, this is my birthday list[br]for my ideal public library data list. 0:13:37.439,0:13:39.105 And what's on my list? 0:13:39.173,0:13:43.830 First of all, the data I want to have[br]must be up-to-date-ish-- 0:13:43.830,0:13:45.604 it must be fairly up-to-date. 0:13:45.604,0:13:48.513 So, doesn't have to be real time, 0:13:48.513,0:13:51.323 but let's say, a couple[br]of months, or half a year, 0:13:53.284,0:13:57.354 delayed with official publication,[br]that's okay for my purposes. 0:13:58.116,0:14:00.956 And I want to have it both[br]library branches 0:14:00.956,0:14:02.697 and the library organizations. 0:14:04.206,0:14:08.400 Then I want my data to be structured,[br]because it has to be machine-readable. 0:14:08.301,0:14:11.986 It has to be in open file format,[br]such as CSV or JSON or RDF. 0:14:12.717,0:14:15.197 It has to be linked[br]to other resources preferably. 0:14:16.011,0:14:22.182 And the uses--the license on the data[br]needs to be manifest public domain or CC0. 0:14:23.520,0:14:26.192 Then, I would like my data to have an API, 0:14:26.599,0:14:30.548 which must be public, free,[br]and preferably also anonymous 0:14:30.548,0:14:34.900 so you don't have to use an API key,[br]or you have to register an account. 0:14:36.103,0:14:38.863 And I also want to have[br]a SPARQL interface. 0:14:41.131,0:14:43.651 So, now, these are all the sites[br]I just showed you. 0:14:43.717,0:14:46.450 And I'm going to make a big grid. 0:14:47.337,0:14:50.017 And then, this is about[br]the evaluation I did. 0:14:51.187,0:14:54.166 I'm not going into it,[br]but there is no single column 0:14:54.166,0:14:56.007 which has all green check marks. 0:14:56.007,0:14:57.997 That's the important thing to take away. 0:14:58.967,0:15:03.947 And so, in summary, there was no[br]linked public free linked open data 0:15:03.947,0:15:08.937 for Dutch public libraries available[br]before I started my project. 0:15:09.237,0:15:13.027 So, this was the ideal motivation[br]to actually work on it. 0:15:14.730,0:15:17.427 So, that's what I've been doing[br]for a year now. 0:15:17.717,0:15:22.977 And I've been adding libraries bit by bit,[br]organization by organization to Wikidata. 0:15:23.417,0:15:26.387 I created also a project website on it. 0:15:26.727,0:15:29.567 It's still rather messy,[br]but it has all the information, 0:15:29.567,0:15:33.240 and I try to keep it[br]as up-to-date as possible. 0:15:33.240,0:15:36.277 And also all the SPARQL queries[br]you can see are linked from here. 0:15:38.002,0:15:40.235 And I'm just adding[br]really basic information. 0:15:40.235,0:15:44.097 You see the instances,[br]images if available, 0:15:44.097,0:15:47.229 addresses, locations, et cetera,[br]municipalities. 0:15:48.534,0:15:53.276 And where possible, I also try to link[br]the libraries to external identifiers. 0:15:56.024,0:15:58.415 And then, you can really easily--[br]we all know, 0:15:58.415,0:16:03.050 generating some Listeria lists[br]with public libraries grouped 0:16:03.050,0:16:05.060 by organizations, for instance. 0:16:05.060,0:16:08.380 Or using SPARQL queries,[br]you can also do aggregation on data-- 0:16:08.380,0:16:11.060 let's say, give me all[br]the municipalities in the Netherlands 0:16:11.060,0:16:15.115 and the number of library branches[br]in all the municipalities. 0:16:17.025,0:16:20.228 With one click, you can make[br]these kinds of photo galleries. 0:16:22.092,0:16:23.655 And what I set out to do first, 0:16:23.655,0:16:26.036 you can really create these kinds of maps. 0:16:27.176,0:16:30.425 And you might wonder,[br]"Are there any libraries here or there?" 0:16:30.555,0:16:33.355 There are--they are not yet in Wikidata. 0:16:33.355,0:16:35.055 We're still working on that. 0:16:35.135,0:16:37.644 And actually, last week,[br]I spoke with a volunteer, 0:16:37.644,0:16:40.864 who's helping now[br]with entering the libraries. 0:16:41.644,0:16:45.394 You can really make cool--in Wikidata, 0:16:45.394,0:16:47.914 and also with using[br]the Cartographer extension, 0:16:47.914,0:16:50.244 you can use these kinds of maps. 0:16:51.724,0:16:53.736 And I even took it one step further. 0:16:53.911,0:16:57.399 I also have some Python skills,[br]and some Leaflet things skills-- 0:16:57.399,0:16:59.971 so, I created, and I'm quite[br]proud of it, actually. 0:16:59.971,0:17:03.482 I created this library heat map,[br]which is fully interactive. 0:17:03.482,0:17:05.956 You can zoom in to it,[br]and you can see all the libraries, 0:17:06.712,0:17:08.726 and you can also run it off Wiki. 0:17:08.726,0:17:10.552 So, you can just embed it[br]in your own website, 0:17:10.552,0:17:13.412 and it fully runs interactively. 0:17:15.131,0:17:17.592 So, now going back to my big scary table. 0:17:19.512,0:17:22.970 There is one column[br]on the right, which is blank. 0:17:22.970,0:17:24.940 And no surprise, it will be Wikidata. 0:17:24.940,0:17:26.448 Let's see how it scores there. 0:17:26.448,0:17:29.500 (cheering) 0:17:32.892,0:17:35.191 So, I actually think[br]of printing this on a T-shirt. 0:17:35.301,0:17:37.288 (laughter) 0:17:37.788,0:17:39.700 So, just to summarize this in words, 0:17:39.700,0:17:41.129 thanks to my project, now, 0:17:41.129,0:17:45.879 there is public free linked open data[br]available for Dutch public libraries. 0:17:47.124,0:17:49.686 And who can benefit from my effort? 0:17:50.333,0:17:52.002 Well, all kinds of parties-- 0:17:52.002,0:17:54.274 you see Wikipedia,[br]because you can generate lists 0:17:54.274,0:17:56.051 and overviews and articles, 0:17:56.051,0:17:59.908 for instance, using this[br]and be able to from Wikidata 0:17:59.908,0:18:01.976 for our National Library for-- 0:18:02.850,0:18:05.391 IFLA also has an inventory[br]of worldwide libraries, 0:18:05.391,0:18:07.216 they can also reuse the data. 0:18:07.650,0:18:09.497 And especially for Sandra, 0:18:09.549,0:18:13.237 it's also important for the Ministry--[br]Dutch Ministry of Culture-- 0:18:13.277,0:18:15.667 because Sandra is going[br]to have a talk about Wikidata 0:18:15.667,0:18:18.287 with the Ministry this Monday,[br]next Monday. 0:18:19.922,0:18:22.277 And also, on the righthand side, [br]for instance, 0:18:23.891,0:18:27.098 Amazon with Alexa, the assistant, 0:18:27.098,0:18:28.961 they're also using Wikidata, 0:18:28.961,0:18:30.995 so you can imagine that they also use, 0:18:30.995,0:18:33.357 if you're looking for public[br]library information, 0:18:33.357,0:18:36.580 they can also use Wikidata for that. 0:18:38.955,0:18:41.680 Because one year ago,[br]Simon Cobb inspired me 0:18:41.680,0:18:44.244 to do this project,[br]I would like to call upon you, 0:18:44.244,0:18:45.664 if you have time available, 0:18:45.664,0:18:49.532 and if you have data from your own country[br]about public libraries, 0:18:51.572,0:18:54.422 make the coverage better,[br]add more red dots, 0:18:54.982,0:18:56.982 and of course, I'm willing[br]to help you with that. 0:18:56.982,0:18:59.227 And Simon is also willing[br]to help with this. 0:18:59.870,0:19:01.471 And so, I hope next year, somebody else 0:19:01.471,0:19:03.901 will be at this conference[br]or another conference 0:19:03.901,0:19:06.291 and there will be more[br]red dots on the map. 0:19:07.551,0:19:08.911 Thank you very much. 0:19:09.004,0:19:12.740 (applause) 0:19:18.336,0:19:20.086 Thank you, Olaf. 0:19:20.086,0:19:23.554 Next we have Ursula Oberst[br]and Heleen Smits 0:19:23.613,0:19:27.734 presenting how can a small[br]research library benefit from Wikidata: 0:19:27.734,0:19:31.423 enhancing library products using Wikidata. 0:19:53.717,0:19:57.637 Okay. Good morning.[br]My name is Heleen Smits. 0:19:58.680,0:20:01.753 And my colleague,[br]Ursula Oberst--where are you? 0:20:01.753,0:20:03.873 (laughter) 0:20:04.371,0:20:09.220 And I work at the Library[br]of the African Studies Center 0:20:09.220,0:20:11.086 in Leiden, in the Netherlands. 0:20:11.086,0:20:15.038 And the African Studies Center[br]is a center devoted-- 0:20:15.038,0:20:21.464 is an academic institution[br]devoted entirely to the study of Africa, 0:20:21.464,0:20:23.986 focusing on Humanities and Social Studies. 0:20:24.672,0:20:28.123 We used to be an independent[br]research organization, 0:20:28.123,0:20:33.064 but in 2016, we became part[br]of Leiden University, 0:20:33.064,0:20:38.433 and our catalog was integrated[br]into the larger university catalog. 0:20:39.283,0:20:43.593 Though it remained possible[br]to do a search in the part of the Leiden-- 0:20:43.593,0:20:45.894 of the African Studies Catalog, alone, 0:20:47.960,0:20:50.505 we remained independent in some respects. 0:20:50.586,0:20:53.262 For example, with respect[br]to our thesaurus. 0:20:54.921,0:20:59.883 And also with respect[br]to the products we make for our users, 0:21:01.180,0:21:04.378 such as acquisition lists[br]and work dossiers. 0:21:05.158,0:21:11.975 And it is in the field of the web dossiers 0:21:11.975,0:21:14.582 that we have been looking 0:21:14.582,0:21:19.582 for possible ways to apply Wikidata, 0:21:19.582,0:21:23.372 and that's the part where Ursula[br]will in the second part of this talk 0:21:24.212,0:21:27.184 show you a bit[br]what we've been doing there. 0:21:31.250,0:21:35.160 The web dossiers are our collections 0:21:35.160,0:21:39.000 of titles from our catalog[br]that we compile 0:21:39.000,0:21:45.591 around a theme usually connected[br]to, for example, a conference, 0:21:45.591,0:21:51.227 or to a special event, and actually,[br]the most recent web dossier we made 0:21:51.227,0:21:56.017 was connected to the year[br]of indigenous languages, 0:21:56.017,0:21:59.547 and that was around proverbs[br]in African languages. 0:22:00.780,0:22:02.327 Our first steps-- 0:22:04.307,0:22:09.287 next slide--our first steps[br]on the Wiki path as a library, 0:22:10.267,0:22:15.046 were in 2013, when we were one[br]of 12 GLAM institutions 0:22:15.046,0:22:16.472 in the Netherlands, 0:22:16.472,0:22:20.952 part of the project[br]of Wikipedians in Residence, 0:22:20.952,0:22:26.443 and we had for two months,[br]a Wikipedian in the house, 0:22:27.035,0:22:32.527 and he gave us trainings[br]for adding articles to Wikipedia, 0:22:33.000,0:22:37.720 and also, we made a start with uploading[br]photo collections to Commons, 0:22:38.530,0:22:42.650 which always remained a little bit[br]dependent on funding, as well, 0:22:43.229,0:22:45.702 whether we would be able to digitize them, 0:22:45.702,0:22:50.350 and to mostly have[br]a student assistant to do this. 0:22:51.220,0:22:55.440 But it was actually a great adding [br]to what we could offer 0:22:55.440,0:22:57.560 as an academic library. 0:22:59.370,0:23:04.742 In May 2018, so is that my Ursula,[br]my colleague Ursula-- 0:23:04.742,0:23:09.465 she started to really explore--[br]dive into Wikidata 0:23:09.465,0:23:14.515 and see what we as a small[br]and not very much experienced library 0:23:14.515,0:23:18.175 in these fields could do with that. 0:23:25.050,0:23:26.995 So, I mentioned, we have[br]our own thesaurus. 0:23:28.210,0:23:30.689 And this is where we started. 0:23:30.689,0:23:34.502 This is a thesaurus of 13,000 terms, 0:23:34.502,0:23:37.670 all in the field of African studies. 0:23:37.670,0:23:41.457 It contains a lot of African languages, 0:23:43.417,0:23:46.360 names of ethnic groups in Africa, 0:23:47.586,0:23:49.431 and other proper names, 0:23:49.431,0:23:55.509 which are perhaps especially [br]interesting for Wikidata. 0:23:58.604,0:24:04.824 So, it is a real authority control 0:24:04.824,0:24:08.370 to vocabulary [br]with 5,000 preferred terms. 0:24:08.554,0:24:11.204 So, we submitted the request to Wikidata, 0:24:11.204,0:24:17.135 and that was actually very quickly[br]met with a positive response, 0:24:17.214,0:24:19.354 which was very encouraging for us. 0:24:22.884,0:24:25.574 Our thesaurus was loaded into Mix-n-Match, 0:24:25.574,0:24:31.691 and by now, 75% of the terms 0:24:31.691,0:24:36.145 have been manually matched with Wikidata. 0:24:38.061,0:24:42.081 So, it means, well, that we are now-- 0:24:42.971,0:24:47.687 we are added as an identifier-- 0:24:48.387,0:24:51.553 for example, if you click[br]on Swahili language, 0:24:52.463,0:24:57.152 what happens then in Wikidata[br]on the number that-- 0:24:59.004,0:25:02.354 that connects our term--[br]is the Wikidata term-- 0:25:02.560,0:25:05.620 we enter into our thesaurus, 0:25:05.620,0:25:10.000 and from there, you can do a search[br]directly in the catalog 0:25:10.000,0:25:12.560 by clicking the button again. 0:25:12.560,0:25:18.160 It means, also, that Wikidata[br]has not really integrated 0:25:18.160,0:25:19.572 into our catalog. 0:25:19.572,0:25:22.090 But that's also more difficult. 0:25:22.314,0:25:26.053 Okay, we have to give the floor 0:25:26.053,0:25:30.838 to Ursula for the next part. 0:25:30.838,0:25:32.554 (Ursula) Thank you very much, Heleen. 0:25:32.554,0:25:37.258 So, I will talk about our experiences 0:25:37.258,0:25:39.677 with incorporating Wikidata elements 0:25:39.677,0:25:41.356 to our web dossier. 0:25:41.356,0:25:44.607 A web dossier is--oh, sorry, yeah, sorry. 0:25:45.447,0:25:49.646 A web dossier, or a classical web dossier,[br]consists of three parts: 0:25:50.248,0:25:53.320 an introduction to the subject, 0:25:53.320,0:25:56.060 mostly written by one of our researchers; 0:25:56.060,0:26:01.328 a selection of titles, both books[br]and articles from our collection; 0:26:01.328,0:26:06.146 and the third part, an annotated list 0:26:06.146,0:26:08.876 with links to electronic resources. 0:26:09.161,0:26:15.815 And this year, we added a fourth part[br]to our web dossiers, 0:26:15.815,0:26:18.276 which is the Wikidata elements. 0:26:19.008,0:26:22.007 And it all started last year, 0:26:22.007,0:26:25.206 and my story is similar[br]to the story of Olaf, actually. 0:26:25.352,0:26:29.570 Last year, when I had no clue[br]about Wikidata, 0:26:29.570,0:26:33.402 and I discovered this wonderful[br]article by Alex Stinson 0:26:33.402,0:26:36.932 on how to write a query in Wikidata. 0:26:37.382,0:26:41.592 And he chose a subject--[br]a very appealing subject to me. 0:26:41.592,0:26:45.902 Namely, "Discovering Women Writers[br]from North Africa." 0:26:46.402,0:26:51.162 I can really recommend this article, 0:26:51.162,0:26:52.981 because it's very instructive. 0:26:52.981,0:26:57.422 And I thought I will be--[br]I'm going to work on this query, 0:26:57.422,0:27:02.662 and try to change it to:[br]"Southern African Women Writers," 0:27:02.662,0:27:07.034 and try to add a link[br]to their work in our catalog. 0:27:07.311,0:27:10.861 And on the right-hand side,[br]you see the SPARQL query 0:27:11.592,0:27:15.181 which searches for[br]"Southern African Women Writers." 0:27:15.181,0:27:20.686 If you click on the button,[br]on the blue button on the lefthand side, 0:27:21.526,0:27:23.971 the search result will appear beneath. 0:27:23.971,0:27:26.448 The search result can have[br]different formats. 0:27:26.448,0:27:29.871 In my case, the search result is a map. 0:27:29.871,0:27:32.850 And the nice thing about Wikidata 0:27:32.850,0:27:36.652 is that you can embed[br]to this search result 0:27:36.652,0:27:38.682 into your own webpage, 0:27:38.682,0:27:42.339 and that's what we are now doing[br]with our work dossiers. 0:27:42.339,0:27:47.039 So, this was the very first one[br]on Southern African women writers, 0:27:47.039,0:27:49.649 listed classical three elements, 0:27:49.649,0:27:53.209 plus this map on the lefthand side, 0:27:53.209,0:27:55.650 which gives extra information-- 0:27:55.650,0:27:58.219 a link to the Southern African[br]women writer-- 0:27:58.219,0:28:00.749 a link to her works in our catalog, 0:28:00.749,0:28:07.252 and a link to the Wikidata record[br]of her birth place, and her name, 0:28:08.219,0:28:13.099 her personal record, plus a photo,[br]if it's available on Wikidata. 0:28:16.231,0:28:20.329 And you have to retrieve a nice map 0:28:20.329,0:28:24.032 with a lot of red dots[br]on the African continent. 0:28:24.032,0:28:28.662 You need nice data in Wikidata,[br]complete, sufficient data. 0:28:29.042,0:28:33.442 So, with our second web dossier[br]on public art in Africa, 0:28:33.442,0:28:38.420 we also started to enhance[br]the data in Wikidata. 0:28:38.420,0:28:43.242 In this case, for a public art--[br]we edited geo-locations-- 0:28:43.242,0:28:46.919 geo-locations to Wikidata. 0:28:46.919,0:28:51.139 And we also searched for works[br]of public art in commons, 0:28:51.139,0:28:55.165 and if they don't have[br]a record on Wikidata yet, 0:28:55.165,0:29:00.670 we edited the record to Wikidata. 0:29:00.855,0:29:05.327 And the third thing we do, 0:29:05.327,0:29:09.958 because when we prepare a web dossier, 0:29:09.958,0:29:15.514 we download the titles from our catalog, 0:29:15.514,0:29:17.584 and the tiles are in MARC 21, 0:29:17.584,0:29:23.226 so we have to convert them to a format[br]that is presentable on the website, 0:29:23.226,0:29:28.229 and it takes not much time and effort[br]to convert the same set of titles 0:29:28.229,0:29:30.457 to Wikidata QuickStatements, 0:29:30.457,0:29:36.999 and then, we also upload[br]a title set to Wikidata, 0:29:36.999,0:29:41.254 and you can see the titles we uploaded 0:29:41.254,0:29:44.124 from our latest web dossier 0:29:44.124,0:29:47.514 on African proverbs in Scholia. 0:29:48.546,0:29:52.294 A really nice tool[br]that visualizes Scholia publications 0:29:52.294,0:29:54.674 being present in Wikidata. 0:29:54.674,0:29:59.674 And, one second--when it is possible,[br]we add a Scholia template 0:29:59.674,0:30:01.863 to our web dossier's topic. 0:30:01.863,0:30:03.272 Thank you very much. 0:30:03.272,0:30:08.079 (applause) 0:30:09.255,0:30:11.724 Thank you, Heleen and Ursula. 0:30:12.010,0:30:16.866 Next we have Adrian Pohl[br]presenting using Wikidata 0:30:16.866,0:30:22.265 to improve spatial subject indexing[br]and regional bibliography. 0:30:45.181,0:30:46.621 Okay, hello everybody. 0:30:46.621,0:30:49.630 I'm going right into the topic. 0:30:49.630,0:30:54.146 I only have ten minutes to present[br]a three-year project. 0:30:54.535,0:30:57.044 It wasn't full time. (laughs) 0:30:57.044,0:31:00.100 Okay, what's the NWBib? 0:31:00.100,0:31:04.404 It's an acronym for North-Rhine[br]Westphalian Bibliography. 0:31:04.404,0:31:07.944 It's a regional bibliography[br]that records literature 0:31:07.944,0:31:11.441 about people and places[br]in North Rhine-Westphalia. 0:31:12.534,0:31:14.103 And the monograph's in it-- 0:31:15.162,0:31:19.451 there are a lot of articles in it,[br]and most of them are quite unique, 0:31:19.451,0:31:22.052 so, that's the interesting thing[br]about this bibliography-- 0:31:22.052,0:31:25.472 because it's often[br]less quite obscure stuff-- 0:31:25.472,0:31:28.188 local people writing[br]about that tradition, 0:31:28.188,0:31:29.488 and something like this. 0:31:29.612,0:31:33.428 And there's over 400,000 entries in there. 0:31:33.428,0:31:37.689 And the bibliography started in 1983, 0:31:37.689,0:31:42.718 and so we only have titles[br]from this publication year onwards. 0:31:44.744,0:31:49.166 If you want to take a look at it,[br]it's at nwbib.de, 0:31:49.166,0:31:50.859 that's the web application. 0:31:50.859,0:31:55.389 It's based on our service,[br]lobid.org, the API. 0:31:57.148,0:32:01.220 Because it's cataloged as part[br]of the hbz union catalog, 0:32:01.220,0:32:04.988 which comprises around 20 million records, 0:32:04.988,0:32:08.869 it's an [inaudible] Aleph system[br]we get the data out of there, 0:32:08.869,0:32:11.308 and make RDF out of it, 0:32:11.308,0:32:16.408 and provide it as via JSON [br]or the HTTP API. 0:32:17.129,0:32:20.507 So, the initial status in 2017 0:32:20.507,0:32:25.307 was we had nearly 9,000 distinct strings 0:32:25.307,0:32:28.727 about places--referring to places,[br]in North Rhine-Westphalia. 0:32:28.727,0:32:34.187 Mostly, those were administrative areas,[br]like towns and districts, 0:32:34.187,0:32:38.458 but also monasteries, principalities,[br]or natural regions. 0:32:38.907,0:32:43.517 And we already used Wikidata in 2017, 0:32:43.517,0:32:48.496 and matched those strings[br]with Wikidata API to Wikidata entries 0:32:48.496,0:32:51.907 quite naively to get[br]the geo-coordinates from there, 0:32:51.907,0:32:57.210 and do some geo-based[br]discovery stuff with it. 0:32:57.326,0:32:59.910 But this had some drawbacks. 0:32:59.910,0:33:02.577 And so, the matching was really poor, 0:33:02.577,0:33:05.197 and there were a lot of false positives, 0:33:05.197,0:33:09.184 and we still had no hierarchy[br]in those places, 0:33:09.184,0:33:13.201 and we still had a lot[br]of non-unique names. 0:33:13.505,0:33:15.356 So, this is an example here. 0:33:16.616,0:33:18.378 Does this work? 0:33:18.494,0:33:22.314 Yeah, as you can see,[br]for one place, Brauweiler, 0:33:22.314,0:33:24.615 there are four different strings in there. 0:33:24.820,0:33:27.893 So, we all know how this happens. 0:33:27.893,0:33:31.994 If there's no authority file,[br]you end up with this data. 0:33:31.994,0:33:33.894 But we want to improve on that. 0:33:34.614,0:33:38.211 And as you can also see,[br]that while the matching didn't work-- 0:33:38.211,0:33:40.382 so you have this name of the place 0:33:40.382,0:33:45.170 and there's often the name [br]of the superior administrative area, 0:33:45.170,0:33:50.532 and even on the second level,[br]a superior administrative area 0:33:50.532,0:33:52.040 often in the name 0:33:52.040,0:33:58.909 to identify the place successfully. 0:33:58.909,0:34:04.679 So, the goal was to build a full-fledged[br]spatial classification based on this data, 0:34:04.679,0:34:07.109 with a hierarchical view of places, 0:34:09.079,0:34:11.389 with one entry or ID for each place. 0:34:11.518,0:34:17.488 And we got this mock-up[br]by NWBib editors in 2016, made in Excel, 0:34:18.048,0:34:23.116 to get a feeling of what[br]they would like to have. 0:34:25.006,0:34:28.198 There you have the--[br]Regierungsbezirk-- 0:34:28.198,0:34:31.016 that's the most superior[br]administrative area-- 0:34:31.016,0:34:34.918 we have in there some towns[br]or districts--rural districts-- 0:34:34.918,0:34:39.861 and then, it's going down[br]to the parts of towns, 0:34:39.861,0:34:42.011 even to this level. 0:34:43.225,0:34:46.232 And we chose Wikidata for this task. 0:34:46.232,0:34:50.087 We also looked at the GND,[br]the Integrated Authority File, 0:34:50.087,0:34:54.918 and GeoNames--but Wikidata[br]had the best coverage, 0:34:54.918,0:34:56.902 and the best infrastructure. 0:34:58.112,0:35:02.072 The coverage for the places[br]and the geo-coordinates we need, 0:35:02.072,0:35:04.512 and the hierarchical [br]information, for example. 0:35:04.512,0:35:06.732 There were a lot of places, [br]also, in the GND, 0:35:06.732,0:35:09.694 but there was no hierarchical[br]information in there. 0:35:11.170,0:35:13.682 And also, Wikidata provides[br]the infrastructure 0:35:13.682,0:35:15.343 for editing and versioning. 0:35:15.343,0:35:20.022 And there's also a community[br]that helps maintaining the data, 0:35:20.022,0:35:22.052 which was quite good. 0:35:22.950,0:35:26.882 Okay, but there was a requirement[br]by the NWBib editors. 0:35:27.682,0:35:31.447 They did not want to directly[br]rely on Wikidata, 0:35:31.447,0:35:32.972 which was understandable. 0:35:32.972,0:35:34.982 We don't have those servers[br]under our control, 0:35:34.982,0:35:38.002 and we won't know what's going on there. 0:35:38.084,0:35:41.944 There might be some unwelcome edits[br]that destroy the classification, 0:35:41.944,0:35:44.159 or parts of it, or vandalism. 0:35:44.159,0:35:50.794 So, we decide to put[br]an intermediate SKOS file in between, 0:35:50.794,0:35:55.534 on which the application would--[br]which should be generated from Wikidata. 0:35:57.113,0:35:59.462 And SKOS is the Simple Knowledge[br]Organization System-- 0:35:59.462,0:36:03.919 it's the standard way to model 0:36:03.919,0:36:07.519 a classification in the linked data world. 0:36:07.603,0:36:09.278 So, how we did it? Five steps. 0:36:09.278,0:36:14.037 I will come to each[br]of the steps in more detail. 0:36:14.037,0:36:18.460 We match the strings to Wikidata[br]with a better approach than before. 0:36:18.727,0:36:23.131 Created classification based[br]on Wikidata, edit, 0:36:23.131,0:36:26.255 then back the links[br]from Wikidata to NWBib 0:36:26.255,0:36:27.590 with a custom property. 0:36:27.590,0:36:32.659 And now, we are in the process[br]of establishing a good process 0:36:32.659,0:36:36.559 for updating the classification[br]in Wikidata. 0:36:36.619,0:36:38.888 Seeing--having a DIF[br]of the changes, 0:36:38.888,0:36:41.158 and then publishing it to the SKOS file. 0:36:42.813,0:36:44.646 I will come to the details. 0:36:44.646,0:36:46.261 So, the matching approach-- 0:36:46.261,0:36:48.356 as the API wasn't very sufficient, 0:36:48.356,0:36:53.585 and because we have those[br]different levels in the strings, 0:36:54.441,0:36:59.036 we build a custom Elasticsearch[br]index for our task. 0:36:59.596,0:37:04.378 I think by now, you could probably,[br]as well, use OpenRefine for doing this, 0:37:04.378,0:37:09.306 but at that point in time,[br]it wasn't available for Wikidata. 0:37:10.186,0:37:14.336 And we build this index base[br]on SPARQL query, 0:37:14.336,0:37:20.484 and for entities in NRW,[br]and with a specific type. 0:37:20.484,0:37:25.069 And the query evolved over time a lot. 0:37:25.148,0:37:29.157 And we have a few entries[br]that you can see the history on GitHub. 0:37:29.727,0:37:32.088 So, where we put in the matching index, 0:37:32.088,0:37:36.337 in the spatial object, [br]is what we need in our data. 0:37:36.337,0:37:39.662 It's the label and the ID[br]or the link to Wikidata, 0:37:40.222,0:37:43.874 the geo-coordinates, and the type[br]from Wikidata [inaudible], as well. 0:37:44.194,0:37:50.488 But also for the matching, very important[br]that aliases and the broader thing-- 0:37:50.488,0:37:54.138 and this is also an example where the name[br]of the broader entity 0:37:54.138,0:37:57.875 and the district itself are very similar. 0:37:57.937,0:38:03.096 So, it's important to have[br]some type information, as well, 0:38:03.096,0:38:04.606 for the matching. 0:38:04.900,0:38:07.900 So, the nationwide results[br]were very good. 0:38:07.900,0:38:11.110 We could automatically match[br]more than 99% of records 0:38:11.110,0:38:12.265 with this approach. 0:38:13.885,0:38:16.356 These were only 92% of the strings. 0:38:16.540,0:38:18.140 So, obviously, the results-- 0:38:18.140,0:38:20.610 those strings that only occurred[br]one or two times 0:38:20.610,0:38:22.419 often didn't appear in Wikidata. 0:38:22.419,0:38:26.309 And so, we had to do a lot of work[br]with those with the [long tail]. 0:38:27.905,0:38:32.039 And for around 1,000 strings,[br]the matching was incorrect. 0:38:32.114,0:38:34.950 But the catalogers did a lot of work[br]in the Aleph catalog, 0:38:34.950,0:38:39.869 but also in Wikidata, they made[br]more than 6,000 manual edits to Wikidata 0:38:39.869,0:38:45.019 to reach 100% coverage by adding[br]aliases-type information, 0:38:45.085,0:38:46.615 creating new entries. 0:38:46.615,0:38:49.100 Okay, so, I have to speed up. 0:38:49.546,0:38:54.295 We created classification based on this,[br]on the hierarchical statements. 0:38:54.295,0:38:58.580 P131 is the main property there. 0:38:59.827,0:39:02.495 We added the information to our data. 0:39:03.035,0:39:06.525 So, we now have this[br]in our data spatial object-- 0:39:06.525,0:39:11.535 and we focus this--the link to Wikidata,[br]and the types are there, 0:39:12.625,0:39:17.554 and here's the ID[br]from the SKOS classification 0:39:17.554,0:39:19.234 we built based on Wikidata. 0:39:20.034,0:39:23.555 And you can see there[br]are Q identifiers in there. 0:39:26.940,0:39:29.286 Now, you can basically query our API 0:39:29.286,0:39:34.051 with such a query using Wikidata URIs, 0:39:34.316,0:39:38.627 and get literature, in this example,[br]about Cologne back. 0:39:39.724,0:39:45.675 Then we created a Wikidata property[br]for NWBib and edit those links 0:39:45.675,0:39:50.995 from Wikidata to the classification--[br]batch load them with QuickStatements. 0:39:52.105,0:39:53.634 And there's also a nice-- 0:39:53.634,0:39:59.344 also a move to using a qualifier[br]on this property 0:39:59.344,0:40:02.994 to add the broader information there. 0:40:02.994,0:40:06.333 So, I think people won't mess around[br]that work with this, 0:40:06.333,0:40:09.223 and as with the P131 statement. 0:40:10.094,0:40:11.743 So, this is what it looks like. 0:40:12.563,0:40:16.142 This will go to the classification[br]where you can then start a query. 0:40:18.670,0:40:23.293 Now, we have to build this[br]update and review process, 0:40:23.293,0:40:28.692 and we will add those data like this, 0:40:28.692,0:40:32.452 with a zero sub-field to Aleph, 0:40:32.452,0:40:36.962 and the catalogers will start[br]using those Wikidata based IDs, 0:40:36.962,0:40:41.012 URIs, for cataloging for spatial indexing. 0:40:44.702,0:40:50.082 So, by now, there are more than 400,000[br]NWBib entries with links to Wikidata, 0:40:50.082,0:40:55.905 and more than 4,400 Wikidata entries[br]with links to NWBib. 0:40:56.617,0:40:58.042 Thank you. 0:40:58.042,0:41:03.182 (applause) 0:41:07.574,0:41:09.682 Thank you, Adrian. 0:41:13.312,0:41:15.472 I got it. Thank you. 0:41:31.122,0:41:34.402 So, as you've seen me before,[br]I'm Hilary Thorsen. 0:41:34.402,0:41:36.152 I'm Wikimedian in residence 0:41:36.152,0:41:38.382 with the Linked Data[br]for Production Project. 0:41:38.382,0:41:39.942 I am based at Stanford, 0:41:39.942,0:41:42.590 and I'm here today[br]with my colleague, Lena Denis, 0:41:42.590,0:41:45.581 who is Cartographic Assistant[br]at Harvard Library. 0:41:45.581,0:41:50.041 And Christine Fernsebner Eslao[br]is here in spirit. 0:41:50.041,0:41:53.530 She is currently back in Boston,[br]but supporting us from afar. 0:41:53.530,0:41:56.240 So, we'll be talking[br]about Wikidata and Libraries 0:41:56.240,0:42:00.350 as partners in data production,[br]organization, and project inspiration. 0:42:00.850,0:42:04.300 And our work is part of the Linked Data[br]for Production Project. 0:42:05.450,0:42:08.190 So, Linked Data for Production[br]is in its second phase, 0:42:08.190,0:42:10.450 called Pathway for Implementation. 0:42:10.450,0:42:13.291 And it's an Andrew W. Mellon[br]Foundation grant, 0:42:13.291,0:42:16.120 involving the partnership[br]of several universities, 0:42:16.120,0:42:20.280 with the goal of constructing a pathway[br]for shifting the catalog community 0:42:20.280,0:42:24.860 to begin describing library[br]resources with linked data. 0:42:24.860,0:42:26.919 And it builds upon a previous grant, 0:42:26.919,0:42:30.369 but this iteration is focused[br]on the practical aspects 0:42:30.369,0:42:32.009 of the transition. 0:42:33.559,0:42:35.650 One of these pathways of investigation 0:42:35.650,0:42:39.000 has been integrating[br]library metadata with Wikidata. 0:42:39.429,0:42:41.054 We have a lot of questions, 0:42:41.054,0:42:42.999 but some of the ones[br]we're most interested in 0:42:42.999,0:42:46.180 are how we can integrate[br]library metadata with Wikidata, 0:42:46.180,0:42:49.580 and make contribution[br]a part of our cataloging workflows, 0:42:49.580,0:42:53.589 how Wikidata can help us improve[br]our library discovery environment, 0:42:53.589,0:42:55.929 how it can help us reveal[br]more relationships 0:42:55.929,0:42:59.629 and connections within our data[br]and with external data sets, 0:42:59.629,0:43:04.370 and if we have connections in our own data[br]that can be added to Wikidata, 0:43:04.370,0:43:07.480 how libraries can help[br]fill in gaps in Wikidata, 0:43:07.480,0:43:09.969 and how libraries can work[br]with local communities 0:43:09.969,0:43:13.070 to describe library[br]and archival resources. 0:43:14.010,0:43:17.129 Finding answers to these questions[br]has focused on the mutual benefit 0:43:17.129,0:43:19.649 for the library and Wikidata communities. 0:43:19.649,0:43:22.949 We've learned through starting to work[br]on our different Wikidata projects, 0:43:22.949,0:43:25.279 that many of the issues[br]libraries grapple with, 0:43:25.279,0:43:29.451 like data modeling, identity management,[br]data maintenance, documentation, 0:43:29.451,0:43:31.289 and instruction on linked data, 0:43:31.289,0:43:33.970 are ones the Wikidata[br]community works on too. 0:43:34.370,0:43:36.099 I'm going to turn things over to Lena 0:43:36.099,0:43:39.640 to talk about what[br]she's been working on now. 0:43:46.550,0:43:51.040 Hi, so, as Hilary briefly mentioned,[br]I work as a map librarian at Harvard, 0:43:51.040,0:43:54.180 where I process maps, atlases,[br]and archives for our online catalog. 0:43:54.180,0:43:56.580 And while processing two-dimensional[br]cartographic works 0:43:56.580,0:43:59.572 is relatively straighforward,[br]cataloging archival collections 0:43:59.572,0:44:02.429 so that their cartographic resources[br]can be made discoverable, 0:44:02.429,0:44:04.119 has always been more difficult. 0:44:04.119,0:44:06.989 So, my use case for Wikidata[br]is visually modeling relationships 0:44:06.989,0:44:10.389 between archival collections[br]and the individual items within them, 0:44:10.389,0:44:13.210 as well as between archival drafts[br]in published works. 0:44:13.359,0:44:17.329 So, I used Wikidata to highlight the work[br]of our cartographer named Erwin Raisz, 0:44:17.329,0:44:19.890 who worked at Harvard[br]in the early 20th-century. 0:44:19.890,0:44:22.539 He was known for his vividly detailed[br]and artistic land forms, 0:44:22.539,0:44:23.939 like this one on the screen-- 0:44:23.939,0:44:26.294 but also for inventing[br]the armadillo projection, 0:44:26.294,0:44:29.020 writing the first cartography[br]textbook in English 0:44:29.020,0:44:31.318 and other various[br]important contributions 0:44:31.318,0:44:32.919 to the field of geography. 0:44:32.919,0:44:34.609 And at the Harvard Map Collection, 0:44:34.609,0:44:38.509 we have a 66-item collection[br]of Raisz's field notebooks, 0:44:38.509,0:44:41.359 which begin when he was a student[br]and end just before his death. 0:44:43.679,0:44:46.229 So, this is the collection-level record[br]that I made for them, 0:44:46.229,0:44:47.994 which merely gives an overview, 0:44:47.994,0:44:50.513 but his notebooks are full of information 0:44:50.513,0:44:53.351 that he used in later atlases,[br]maps, and textbooks. 0:44:53.351,0:44:56.313 But researchers don't know how to find[br]that trajectory information, 0:44:56.313,0:44:58.665 and the system[br]is not designed to show them. 0:45:01.030,0:45:03.734 So, I felt that with Wikidata,[br]and other Wikimedia platforms, 0:45:03.734,0:45:05.154 I'd be able to take advantage 0:45:05.154,0:45:08.075 of information that already exists[br]about him on the open web, 0:45:08.075,0:45:10.629 along with library records[br]and a notebook inventory 0:45:10.629,0:45:12.574 that I had made in an Excel spreadsheet 0:45:12.574,0:45:15.416 to show relationships and influences[br]between his works. 0:45:15.574,0:45:18.594 So here, you can see how I edited[br]and reconciled library data 0:45:18.594,0:45:20.165 in OpenRefine. 0:45:20.165,0:45:23.164 And then, I used QuickStatements[br]to batch import my results. 0:45:23.304,0:45:25.244 So, now, I was ready[br]to create knowledge graphs 0:45:25.244,0:45:27.864 with SPARQL queries[br]to show patterns of influence. 0:45:30.084,0:45:33.304 The examples here show[br]how I leveraged Wikimedia Commons images 0:45:33.304,0:45:34.664 that I connected to him. 0:45:34.664,0:45:36.459 And the hierarchy of some of his works 0:45:36.459,0:45:38.604 that were contributing[br]factors to other works. 0:45:38.604,0:45:42.354 So, modeling Raisz's works on Wikidata[br]allowed me to encompass in a single image, 0:45:42.354,0:45:45.890 or in this case, in two images,[br]the connections that require many pages 0:45:45.890,0:45:47.864 of bibliographic data to reveal. 0:45:51.684,0:45:55.544 So, this video is going to load. 0:45:55.563,0:45:57.233 Yes! Alright. 0:45:57.233,0:46:00.113 This video is a minute and a half long[br]screencast I made, 0:46:00.113,0:46:02.033 that I'm going to narrate as you watch. 0:46:02.033,0:46:05.423 It shows the process of inputting[br]and then running a SPARQL query, 0:46:05.423,0:46:09.283 showing hierarchical relationships[br]between notebooks, an atlas, and a map 0:46:09.283,0:46:11.033 that Raisz created about Cuba. 0:46:11.033,0:46:12.603 He worked there before the revolution, 0:46:12.603,0:46:14.633 so he had the unique position[br]of having support 0:46:14.633,0:46:17.013 from both the American[br]and the Cuban governments. 0:46:17.334,0:46:20.583 So, I made this query as an example[br]to show people who work on Raisz, 0:46:20.583,0:46:24.134 and who are interested in narrowing down[br]what materials they'd like to request 0:46:24.134,0:46:26.154 when they come to us for research. 0:46:26.154,0:46:29.684 To make the approach replicable[br]for other archival collections, 0:46:29.684,0:46:33.105 I hope that Harvard and other institutions[br]will prioritize Wikidata look-ups 0:46:33.105,0:46:35.414 as they move to linked data[br]cataloging production, 0:46:35.414,0:46:37.520 which my co-presenters[br]can speak to the progress on 0:46:37.520,0:46:38.854 better than I can. 0:46:38.854,0:46:41.543 But my work has brought me--[br]has brought to mind a particular issue 0:46:41.543,0:46:46.580 that I see as a future opportunity,[br]which is that of archival modeling. 0:46:47.369,0:46:52.302 So, to an archivist, an item[br]is a discrete archival material 0:46:52.302,0:46:55.000 within a larger collection[br]of archival materials 0:46:55.000,0:46:56.884 that is not a physical location. 0:46:56.884,0:47:00.663 So an archivist from the American National[br]Archives and Records Administration, 0:47:00.663,0:47:02.943 who is also a Wikidata enthusiast, 0:47:02.943,0:47:05.742 advised me when I was trying[br]to determine how to express this 0:47:05.742,0:47:07.734 using an example item, 0:47:07.734,0:47:10.456 that I'm going to show[br]as soon as this video is finally over. 0:47:11.433,0:47:14.391 Alright. Great. 0:47:20.437,0:47:22.100 Nope, that's not what I wanted. 0:47:22.135,0:47:23.536 Here we go. 0:47:31.190,0:47:32.280 It's doing that. 0:47:32.280,0:47:34.154 (humming) 0:47:34.208,0:47:37.418 Nope. Sorry. Sorry. 0:47:40.444,0:47:43.045 Alright, I don't know why[br]it's not going full screen again. 0:47:43.045,0:47:44.329 I can't get it to do anything. 0:47:44.329,0:47:46.880 But this is the-- oh, my gosh. 0:47:46.880,0:47:48.235 Stop that. Alright. 0:47:48.235,0:47:51.195 So, this is the item that I mentioned. 0:47:51.575,0:47:53.655 So, this was what the archivist 0:47:53.655,0:47:55.964 from the National Archives[br]and Records Administration 0:47:55.964,0:47:57.414 showed me as an example. 0:47:57.414,0:48:02.414 And he recommended this compromise,[br]which is to use the part of property 0:48:02.414,0:48:05.614 to connect a lower level description[br]to a higher level of description, 0:48:05.614,0:48:08.534 which allows the relationships[br]between different hierarchical levels 0:48:08.534,0:48:10.840 to be asserted as statements[br]and qualifiers. 0:48:10.840,0:48:12.884 So, in this example that's on screen, 0:48:12.884,0:48:16.294 the relationship between an item,[br]a series, a collection, and a record group 0:48:16.294,0:48:19.655 are thus contained and described[br]within a Wikidata item entity. 0:48:19.655,0:48:22.024 So, I followed this model[br]in my work on Raisz. 0:48:22.704,0:48:26.024 And one of my images is missing. 0:48:26.024,0:48:27.971 No, it's not. It's right there. I'm sorry. 0:48:28.210,0:48:30.613 And so, I followed this model[br]on my work on Raisz, 0:48:30.613,0:48:33.103 but I look forward[br]to further standardization. 0:48:38.983,0:48:41.352 So, another archival project[br]Harvard is working on 0:48:41.352,0:48:44.632 is the Arthur Freedman collection[br]of more than 2,000 hours 0:48:44.632,0:48:48.702 of punk rock performances[br]from the 1970s to early 2000s 0:48:48.702,0:48:51.970 in the Boston and Cambridge,[br]Massachussets areas. 0:48:51.970,0:48:55.145 It includes many bands and venues[br]that no longer exist. 0:48:55.604,0:48:59.505 So far, work has been done in OpenRefine[br]on reconciliation of the bands and venues 0:48:59.505,0:49:02.324 to see which need an item[br]created in Wikidata. 0:49:02.886,0:49:05.964 A basic item will be created[br]via batch process next spring, 0:49:05.964,0:49:08.697 and then, an edit-a-thon will be [br]held in conjunction 0:49:08.697,0:49:12.254 with the New England Music Library[br]Association's meeting in Boston 0:49:12.254,0:49:15.866 to focus on adding more statements[br]to the batch-created items, 0:49:15.866,0:49:18.937 by drawing on local music[br]community knowledge. 0:49:18.937,0:49:22.086 We're interested in learning more[br]about models for pairing librarians 0:49:22.086,0:49:26.310 and Wiki enthusiasts with new contributors[br]who have domain knowledge. 0:49:26.297,0:49:29.293 Items will eventually be linked[br]to digitized video 0:49:29.293,0:49:31.387 in Harvard's digital collection platform 0:49:31.387,0:49:33.167 once rights have[br]been cleared with artists, 0:49:33.167,0:49:35.147 which will likely be a slow process. 0:49:36.327,0:49:38.030 There's also a great amount of interest 0:49:38.030,0:49:41.680 in moving away from manual cataloging[br]and creation of authority data 0:49:41.680,0:49:43.247 towards identity management, 0:49:43.247,0:49:45.667 where descriptions[br]can be created in batches. 0:49:45.667,0:49:48.057 An additional project that focused on 0:49:48.057,0:49:51.297 creating international standard[br]name identifiers, or ISNIs, 0:49:51.297,0:49:53.477 for avant-garde and women filmmakers 0:49:53.477,0:49:57.657 can be adapted for creating Wikidata items[br]for these filmmakers, as well. 0:49:57.657,0:50:01.076 Spreadsheets with the ISNIs,[br]filmmaker names, and other details 0:50:01.076,0:50:04.697 can be reconciled in OpenRefine,[br]and uploaded with QuickStatements. 0:50:04.910,0:50:06.940 Once people in organizations[br]have been described, 0:50:06.940,0:50:09.316 we'll move toward describing[br]the films in Wikidata, 0:50:09.316,0:50:12.526 which will likely present[br]some additional modeling challenges. 0:50:13.446,0:50:15.486 A library presentation[br]wouldn't be complete 0:50:15.486,0:50:16.882 without a MARC record. 0:50:16.882,0:50:19.916 Here, you can see the record[br]for Karen Aqua's taxonomy film, 0:50:19.916,0:50:22.096 where her ISNI and Wikidata Q number 0:50:22.096,0:50:24.176 have been added to the 100 field. 0:50:24.176,0:50:26.636 The ISNIs and Wikidata Q numbers[br]that have been created 0:50:26.636,0:50:30.066 can then be batch added[br]back into MARC records via MarcEdit. 0:50:30.066,0:50:33.236 You might be asking why I'm showing you[br]this ugly MARC record, 0:50:33.236,0:50:35.596 instead of some beautiful[br]linked data statements. 0:50:35.596,0:50:38.576 And that's because our libraries[br]will be working in a hybrid environment 0:50:38.576,0:50:39.896 for some time. 0:50:39.896,0:50:42.326 Our library catalogs still relies[br]on MARC records, 0:50:42.326,0:50:44.076 so by adding in these URIs, 0:50:44.076,0:50:46.366 we can try to take advantage[br]of linked data, 0:50:46.366,0:50:48.346 while our systems still use MARC. 0:50:49.496,0:50:52.950 Adding URIs into MARC records[br]makes an additional aspect 0:50:52.950,0:50:54.335 of our project possible. 0:50:54.335,0:50:56.894 Work has been done at Stanford[br]and Cornell to bring data 0:50:56.894,0:51:01.873 from Wikidata into our library catalog[br]using URIs already in our MARC records. 0:51:02.334,0:51:05.090 You can see an example[br]of a knowledge panel, 0:51:05.090,0:51:06.984 where all the data is sourced[br]from Wikidata, 0:51:06.984,0:51:11.004 and links back to the item itself,[br]along with an invitation to contribute. 0:51:11.403,0:51:15.130 This is currently in a test environment,[br]not in production in our catalog. 0:51:15.130,0:51:17.444 Ideally, eventually,[br]these will be generated 0:51:17.444,0:51:19.916 from linked data descriptions[br]of library resources 0:51:19.916,0:51:22.954 created using Sinopia,[br]our linked data editor 0:51:22.954,0:51:24.563 developed for cataloging. 0:51:24.563,0:51:27.994 We found that adding a look-up[br]to Wikidata in Sinopia is difficult. 0:51:27.994,0:51:31.514 The scale and modeling of Wikidata[br]makes it hard to partition the data 0:51:31.514,0:51:33.544 to be able to look up typed entities, 0:51:33.544,0:51:34.900 and we've run into the problem 0:51:34.900,0:51:37.493 of SPARQL not being good[br]for keyword search, 0:51:37.493,0:51:41.883 but wanting our keyword APIs[br]to return SPARQL-like RDF descriptions. 0:51:41.883,0:51:45.043 So, as you can see, we still have[br]quite a bit of work to do. 0:51:45.043,0:51:47.937 This round of the grant[br]runs until June 2020, 0:51:47.937,0:51:50.163 so, we'll be continuing our exploration. 0:51:50.163,0:51:53.113 And I just wanted to invite anyone 0:51:53.113,0:51:57.573 who's continued an interest in talking[br]about Wikidata and libraries, 0:51:57.573,0:52:01.454 I lead a Wikidata Affinity Group[br]that's open to anyone to join. 0:52:01.454,0:52:03.013 We meet every two weeks, 0:52:03.013,0:52:05.513 and our next call is Tuesday,[br]November the 5th, 0:52:05.513,0:52:08.073 so if you're interested[br]in continuing discussions, 0:52:08.073,0:52:10.393 I would love to talk with you further. 0:52:10.393,0:52:11.890 Thank you, everyone. 0:52:11.890,0:52:13.623 And thank you to the other presenters 0:52:13.623,0:52:16.893 for talking about all[br]of their wonderful projects. 0:52:16.893,0:52:21.283 (applause)