WEBVTT 00:00:05.973 --> 00:00:07.908 Hi, guys! Can everybody hear me? 00:00:09.170 --> 00:00:11.898 So, hi! Nice to meet you all. I'm Erica Azzellini. 00:00:11.898 --> 00:00:14.606 I'm one of the Wikimovement Brazil's Liaison, 00:00:14.606 --> 00:00:17.829 and this is my first international Wikimedia event, 00:00:17.829 --> 00:00:21.023 so I'm super excited to be here and I hopefully, 00:00:21.023 --> 00:00:24.311 will share something interesting for you all here on this lengthy talk. 00:00:25.247 --> 00:00:30.441 So this work starts with research that I was developing in Brazil, 00:00:30.441 --> 00:00:34.219 Computational Journalism and Structured Narratives with Wikidata. 00:00:34.276 --> 00:00:35.958 So in journalism, 00:00:35.958 --> 00:00:39.616 they're using some natural language generation software 00:00:39.616 --> 00:00:41.418 for automating news 00:00:41.418 --> 00:00:46.535 for news that have quite similar narrative structure. 00:00:46.535 --> 00:00:51.600 And we developed this concept here of structured narratives, 00:00:51.600 --> 00:00:54.548 thinking about this practice on computational journalism, 00:00:54.548 --> 00:00:58.361 that is the development of verbal text, understandable by humans, 00:00:58.361 --> 00:01:01.274 automated from predetermined arrangements that process information 00:01:01.274 --> 00:01:05.395 from structured databases, which looks like that, 00:01:05.395 --> 00:01:10.043 the Wikimedia universe and on this tool that we developed. 00:01:10.043 --> 00:01:13.555 So, when I'm talking about verbal text understandable by humans, 00:01:13.555 --> 00:01:15.808 I'm talking about Wikipedia entries. 00:01:15.808 --> 00:01:17.778 When I'm talking about structured databases, 00:01:17.778 --> 00:01:20.017 of course, I'm talking about Wikidata here. 00:01:20.017 --> 00:01:22.777 And predetermined arrangement, I'm talking about Mbabel, 00:01:22.777 --> 00:01:24.271 that is this tool. 00:01:25.467 --> 00:01:31.216 The Mbabel tool was inspired by a template by user Pharos, right here in front of me, 00:01:31.279 --> 00:01:33.356 thank you very much, 00:01:33.356 --> 00:01:39.114 and it was developed with Ederporto that is right here too, 00:01:39.114 --> 00:01:40.974 the brilliant Ederporto. 00:01:42.599 --> 00:01:44.498 We developed this tool 00:01:44.498 --> 00:01:47.780 that automatically generates Wikipedia entries 00:01:47.780 --> 00:01:50.600 based on information from Wikidata. 00:01:53.189 --> 00:01:58.130 We actually do some thematic templates 00:01:58.130 --> 00:02:01.152 that are created on the Wikidata module, 00:02:01.573 --> 00:02:03.716 WikidataIB Module, 00:02:03.716 --> 00:02:07.835 and these templates are pre-determined, generic and editable templates 00:02:07.835 --> 00:02:09.677 for various article themes. 00:02:09.677 --> 00:02:15.411 We realized that many Wikipedia entries had a quite similar structured narrative 00:02:15.411 --> 00:02:18.922 so we could create a tool that automatically generates that 00:02:18.922 --> 00:02:21.598 for many Wikidata items. 00:02:24.207 --> 00:02:28.571 Until now we have templates for museums, works of art, books, films, 00:02:28.571 --> 00:02:31.265 journals, earthquakes, libraries, archives, 00:02:31.265 --> 00:02:34.855 and Brazilian municipal and state elections, and growing. 00:02:34.855 --> 00:02:38.984 So, everybody here is able to contribute and create new templates. 00:02:38.984 --> 00:02:43.508 Each narrative template includes an introduction, Wikidata infobox, 00:02:43.508 --> 00:02:46.158 section suggestions for the users, 00:02:46.158 --> 00:02:50.499 content tables or lists with Listeria, depending on the case, 00:02:50.499 --> 00:02:53.713 references and categories, and of course the sentences, 00:02:53.713 --> 00:02:55.776 that are created with the Wikidata information. 00:02:55.776 --> 00:02:58.642 I'm gonna show you in a sec an example of that. 00:03:00.137 --> 00:03:05.749 It's an integration with Wikipedia, integration with Wikidata, 00:03:05.749 --> 00:03:08.760 so the more properties properly filled on Wikidata, 00:03:08.760 --> 00:03:12.311 the more text entries you'll get on your article stub. 00:03:12.857 --> 00:03:15.623 That's very important to highlight here. 00:03:16.343 --> 00:03:18.969 Structuring this Wikidata can get more complex 00:03:18.969 --> 00:03:22.017 as I'm going to show you on the election projects that we've made. 00:03:22.017 --> 00:03:26.552 So I'm going to let you hear this Wikidata Lab XIV for you 00:03:26.552 --> 00:03:29.471 after this lengthy talk 00:03:29.471 --> 00:03:32.259 that is very brief, so you'll be able to choose 00:03:32.259 --> 00:03:34.554 on the work that we've been doing on structuring Wikidata 00:03:34.554 --> 00:03:36.005 for this purpose too. 00:03:37.272 --> 00:03:39.725 We have this challenge to build a narrative template 00:03:39.725 --> 00:03:44.383 that is generic enough to cover different Wikidata items 00:03:44.383 --> 00:03:46.347 and to suppress the gender 00:03:46.347 --> 00:03:50.359 and the number of difficulties of languages, 00:03:52.054 --> 00:03:54.252 and still sounding natural for the user 00:03:54.252 --> 00:03:59.252 because we don't want to sound like it doesn't click for the user 00:03:59.252 --> 00:04:00.546 to edit after that. 00:04:01.956 --> 00:04:07.625 This is how the Mbabel looks like on the bottom form. 00:04:07.625 --> 00:04:14.507 You just have insert the item number there and call the desired template 00:04:14.507 --> 00:04:21.673 and then you have article to edit and expand, and everything. 00:04:22.135 --> 00:04:26.856 So, more importantly, why we did it? Not because it's cool to develop 00:04:26.856 --> 00:04:30.922 things here in Wikidata, we know, we all hear, know about it. 00:04:30.922 --> 00:04:36.178 But we are experimenting this integration from Wikidata to Wikipedia 00:04:36.178 --> 00:04:39.226 and we want to focus on meaningful individual contributions. 00:04:39.226 --> 00:04:42.608 So we've been working on education programs 00:04:42.608 --> 00:04:45.067 and we want the students to feel the value 00:04:45.067 --> 00:04:47.280 of their entries too, but not only-- 00:04:47.280 --> 00:04:49.405 Oh, five minutes only, Geez, I'm gonna rush here. 00:04:49.405 --> 00:04:50.599 (laughing) 00:04:50.794 --> 00:04:54.160 And we want you all to make tasks for users in general, 00:04:54.270 --> 00:04:57.801 especially on tables and this kind of content 00:04:57.801 --> 00:04:59.988 that it's a bit of a rush to do. 00:05:02.456 --> 00:05:05.523 And we're working on this concept of abstract Wikipedia. 00:05:05.523 --> 00:05:09.269 Denny Vrandečić wrote an article super interesting about it 00:05:09.269 --> 00:05:11.500 so I linked here too. 00:05:11.500 --> 00:05:14.792 And we also want to now support small language communities 00:05:14.792 --> 00:05:17.845 to fill the lack of content there. 00:05:18.784 --> 00:05:23.885 This is an example of how we've been using this Mbabel tool for GLAM 00:05:23.885 --> 00:05:25.748 and education programs, 00:05:25.748 --> 00:05:29.861 and I showed you earlier the bottom form of the Mbabel tool 00:05:29.861 --> 00:05:34.264 but also we can make red links that aren't exactly empty. 00:05:34.264 --> 00:05:35.931 So you click on this red link 00:05:35.931 --> 00:05:38.862 and you automatically have this article draft 00:05:38.862 --> 00:05:41.660 on your user page to edit. 00:05:42.964 --> 00:05:48.762 And I'm going to briefly talk about it because I only have some minutes more. 00:05:50.009 --> 00:05:51.356 On educational projects, 00:05:51.356 --> 00:05:56.799 we've been doing this with elections in Brazil for journalism students. 00:05:56.799 --> 00:06:01.993 We have the experience with the [inaudible] students 00:06:02.087 --> 00:06:05.314 with user Joalpe-- he's not here right now, 00:06:05.314 --> 00:06:07.867 but we all know him, I think. 00:06:07.867 --> 00:06:11.930 And we realize that we have the data about Brazilian elections 00:06:11.930 --> 00:06:14.748 but we don't have media cover on it. 00:06:15.049 --> 00:06:18.249 So we were lacking also Wikipedia entries on it. 00:06:19.029 --> 00:06:23.000 How do we insert this meaningful information on Wikipedia 00:06:23.000 --> 00:06:24.672 that people really access? 00:06:24.672 --> 00:06:27.989 Next year we're going to have some election, 00:06:27.989 --> 00:06:30.710 people are going to look for this kind of information on Wikipedia 00:06:30.710 --> 00:06:32.433 and they simply won't find it. 00:06:32.433 --> 00:06:35.726 So this tool looks quite useful for this purpose 00:06:35.726 --> 00:06:40.214 and the students were introduced, not only to Wikipedia, 00:06:40.214 --> 00:06:42.701 but also to Wikidata. 00:06:42.701 --> 00:06:46.575 Actually, they were introduced to Wikipedia with Wikidata, 00:06:46.575 --> 00:06:50.675 which is an experience super interesting and we had a lot of fun, 00:06:50.675 --> 00:06:52.823 and it was quite challenging to organize all that. 00:06:52.823 --> 00:06:54.513 We can talk about it later too. 00:06:54.979 --> 00:06:58.582 And they also added the background and the analysis sections 00:06:58.582 --> 00:07:01.663 on these elections articles, 00:07:01.663 --> 00:07:05.336 because we don't want them to just simply automate the content there. 00:07:05.336 --> 00:07:06.660 We can do better. 00:07:06.660 --> 00:07:09.247 So this is the example I'm going to show you. 00:07:09.247 --> 00:07:13.106 This is from a municipal election in Brazil. 00:07:15.603 --> 00:07:17.121 Two minutes... oh my! 00:07:18.577 --> 00:07:23.268 This example here was entirely created with the Mbabel tool. 00:07:23.268 --> 00:07:29.496 You have here this introduction text. It really sounds natural for the reader. 00:07:29.496 --> 00:07:32.165 The Wikidata infobox here-- 00:07:32.165 --> 00:07:34.907 it's a masterpiece of Ederporto right there. 00:07:34.907 --> 00:07:36.769 (laughter) 00:07:37.438 --> 00:07:42.456 And we have here the tables with the election results for each position. 00:07:42.456 --> 00:07:46.415 And we also have these results here on the textual form too, 00:07:46.415 --> 00:07:51.767 so it really looks like an article that was made, that was handcrafted. 00:07:53.893 --> 00:07:57.814 The references here were also made with the Mbabel tool 00:07:57.814 --> 00:08:01.393 and we used identifiers to build these references here 00:08:01.393 --> 00:08:03.167 and the categories too. 00:08:10.726 --> 00:08:14.999 So, to wrap things up here, it is still a work in progress, 00:08:14.999 --> 00:08:19.326 and we have some challenges on outreach and technical 00:08:19.326 --> 00:08:22.999 to bring Mbabel to other language communities, 00:08:22.999 --> 00:08:24.844 especially the smaller ones, 00:08:24.844 --> 00:08:27.210 and how do we support those tools 00:08:27.210 --> 00:08:29.819 on lower resource language communities too. 00:08:29.819 --> 00:08:33.991 And finally, is it possible to create an Mbabel 00:08:33.991 --> 00:08:36.261 that overcomes language barriers? 00:08:36.261 --> 00:08:39.740 I think that's a question very interesting for the conference 00:08:39.740 --> 00:08:43.835 and hopefully we can figure that out together. 00:08:44.818 --> 00:08:49.799 So, thank you very much, and look for the Mbabel poster downstairs 00:08:49.799 --> 00:08:53.615 if you like to have all this information wrapped up, okay? 00:08:53.615 --> 00:08:55.038 Thank you. 00:08:55.288 --> 00:08:57.564 (audience clapping) 00:09:00.311 --> 00:09:02.778 (moderator) I'm afraid we're a little too short for questions 00:09:02.778 --> 00:09:05.783 but yes, Erica, as she said, has a poster and is very friendly. 00:09:05.783 --> 00:09:07.518 So I'm sure you can talk to her afterwards, 00:09:07.518 --> 00:09:09.389 and if there's time at the end, I'll allow it. 00:09:09.389 --> 00:09:12.131 But in the meantime, I'd like to bring up our next speaker... 00:09:12.237 --> 00:09:13.611 Thank you. 00:09:15.549 --> 00:09:17.140 (audience chattering) 00:09:23.058 --> 00:09:27.016 Next we've got Yolanda Gil, talking about Wikidata and Geosciences. 00:09:27.908 --> 00:09:29.031 Thank you. 00:09:29.031 --> 00:09:31.624 I come from the University of Southern California 00:09:31.624 --> 00:09:35.164 and I've been working with Semantic Technologies for a long time. 00:09:35.164 --> 00:09:37.894 I want to talk about geosciences in particular, 00:09:37.894 --> 00:09:41.225 where this idea of crowd-sourcing from the community is very important. 00:09:41.791 --> 00:09:45.033 So I'll give you a sense that individual scientists, 00:09:45.033 --> 00:09:47.070 most of them in colleges, 00:09:47.070 --> 00:09:50.085 collect their own data for their particular project. 00:09:50.085 --> 00:09:51.932 They describe it in their own way. 00:09:51.932 --> 00:09:55.352 They use their own properties, their own metadata characteristics. 00:09:55.352 --> 00:09:58.560 This is an example of some collaborators of mine 00:09:58.560 --> 00:10:00.124 that collect data from a river. 00:10:00.124 --> 00:10:02.091 They have their own sensors, their own robots, 00:10:02.091 --> 00:10:05.339 and they study the water quality. 00:10:05.339 --> 00:10:11.423 I'm going to talk today about an effort that we did to crowdsource metadata 00:10:11.423 --> 00:10:14.712 for a community that works in paleoclimate. 00:10:14.712 --> 00:10:17.747 The article just came out so it's in the slides if you're curious, 00:10:17.747 --> 00:10:20.619 but it's a pretty large community that work together 00:10:20.619 --> 00:10:24.042 to integrate data more efficiently through crowdsourcing. 00:10:24.042 --> 00:10:28.631 So, if you've heard of the hockey stick graphics for climate, 00:10:28.631 --> 00:10:31.680 this is the community that does this. 00:10:31.680 --> 00:10:34.520 This is a study for climate in the last 200 years, 00:10:34.520 --> 00:10:38.188 and it takes them literally many years to look at data 00:10:38.188 --> 00:10:39.618 from different parts of the globe. 00:10:39.618 --> 00:10:42.607 Each dataset is collected by a different investigator. 00:10:42.699 --> 00:10:44.433 The data is very, very different, 00:10:44.433 --> 00:10:47.017 so it takes them a long time to put together 00:10:47.017 --> 00:10:49.230 these global studies of climate, 00:10:49.230 --> 00:10:51.665 and our goal is to make that more efficient. 00:10:51.665 --> 00:10:53.690 So, I've done a lot of work over the years. 00:10:53.690 --> 00:10:56.585 Going back to 2005, we used to call it, 00:10:56.585 --> 00:10:59.615 "Knowledge Collection from Web Volunteers" 00:10:59.615 --> 00:11:02.236 or from netizens at that time. 00:11:02.236 --> 00:11:04.267 We had a system called "Learner." 00:11:04.267 --> 00:11:07.048 It collected 700,000 common sense, 00:11:07.048 --> 00:11:09.368 common knowledge statements about the world. 00:11:09.368 --> 00:11:11.367 We did a lot of different techniques. 00:11:11.367 --> 00:11:15.333 The forms that we did to extract knowledge from volunteers 00:11:15.333 --> 00:11:19.136 really fit the knowledge models, the data models that we used 00:11:19.136 --> 00:11:21.381 and the properties that we wanted to use. 00:11:21.381 --> 00:11:25.051 I worked with Denny in the system called "Shortipedia" 00:11:25.051 --> 00:11:27.259 when he was a Post Doc at ISI, 00:11:27.259 --> 00:11:31.946 looking at keeping track of the prominence of the assertions, 00:11:31.946 --> 00:11:35.129 and we started to build on Semantic Media Wiki software. 00:11:35.129 --> 00:11:37.113 So everything that I'm going to describe today 00:11:37.113 --> 00:11:38.936 builds on that software, 00:11:38.936 --> 00:11:41.117 but I think that now we have Wikibase, 00:11:41.117 --> 00:11:43.676 we'll be starting to work more on Wikibase. 00:11:43.676 --> 00:11:48.935 So the LinkedEarth is the project where we work with paleoclimate scientists 00:11:48.935 --> 00:11:50.636 to crowdsource the metadata, 00:11:50.636 --> 00:11:54.328 and seeing the title that we said, "controlled crowdsourcing." 00:11:54.328 --> 00:11:57.101 So we found a nice niche 00:11:57.101 --> 00:12:00.538 where we could let them create new properties 00:12:00.538 --> 00:12:02.599 but we had an editorial process for it. 00:12:02.599 --> 00:12:04.444 So I'll describe to you how it works. 00:12:04.444 --> 00:12:10.055 For them, if you're looking at a sample from lake sediments from 200 years ago, 00:12:10.055 --> 00:12:12.622 you use different properties to describe it 00:12:12.622 --> 00:12:15.692 than if you have coral sediments that you're looking at 00:12:15.692 --> 00:12:18.979 or coral samples that you're looking at that you extract from the ocean. 00:12:18.979 --> 00:12:23.532 Palmyra is a coral atoll in the Pacific. 00:12:23.532 --> 00:12:27.918 So if you have coral, you care about the species and the genus, 00:12:27.918 --> 00:12:31.691 but if you're just looking at lake sand, you don't have that. 00:12:31.691 --> 00:12:35.313 So each type of sample has very different properties. 00:12:35.313 --> 00:12:38.798 In LinkedEarth, they're able to see in a map 00:12:38.798 --> 00:12:40.264 where the datasets are. 00:12:40.264 --> 00:12:45.500 They actually annotate their own datasets or the datasets of other researchers 00:12:45.500 --> 00:12:46.787 when they're using it. 00:12:46.787 --> 00:12:50.254 So they have a reason why they want certain properties 00:12:50.254 --> 00:12:52.289 to describe those datasets. 00:12:52.289 --> 00:12:56.683 Whenever there are disagreements, or whenever there are agreements, 00:12:56.683 --> 00:12:58.595 there's community discussions about them 00:12:58.595 --> 00:13:02.894 and they're also polls to decide on what properties to settle. 00:13:02.894 --> 00:13:05.659 So it's a nice ecosystem. I'll give you examples. 00:13:05.659 --> 00:13:11.322 You look at a particular dataset, in this case it's a lake in Africa. 00:13:11.322 --> 00:13:14.241 So you have the category of the page; it can be a dataset, 00:13:14.241 --> 00:13:15.491 it can be other things. 00:13:15.491 --> 00:13:21.181 You can download the dataset itself and you have kind of canonical properties 00:13:21.181 --> 00:13:23.737 that they have all agreed to have for datasets, 00:13:23.737 --> 00:13:25.992 and then under Extra Information, 00:13:25.992 --> 00:13:29.369 those are properties that the person describing this dataset, 00:13:29.369 --> 00:13:31.007 added on their own accord. 00:13:31.007 --> 00:13:32.628 So these can be new properties. 00:13:32.628 --> 00:13:36.730 We call them "crowd properties," rather than "core properties." 00:13:37.291 --> 00:13:41.319 And then when you're describing your dataset, 00:13:41.319 --> 00:13:43.774 in this case it's an ice core that you got 00:13:43.774 --> 00:13:45.716 from a glacier dataset, 00:13:45.765 --> 00:13:49.178 and your'e adding a dataset you want to talk about measurements, 00:13:49.178 --> 00:13:54.073 you have an offering of all the existing properties 00:13:54.073 --> 00:13:55.278 that match what you're saying. 00:13:55.278 --> 00:13:58.409 So we do this search completion so that you can adopt that. 00:13:58.409 --> 00:14:00.140 That promotes normalization. 00:14:00.140 --> 00:14:04.260 The core of the properties has been agreed by the community 00:14:04.260 --> 00:14:06.220 so we're really extending that core. 00:14:06.220 --> 00:14:08.795 And that core is very important because it gives structure 00:14:08.795 --> 00:14:10.735 to all the extensions. 00:14:10.735 --> 00:14:14.382 We engage the community through many different ways. 00:14:14.382 --> 00:14:17.260 We had one face-to-face meeting at the beginning 00:14:17.260 --> 00:14:21.611 and after about a year and a half, we do have a new standard, 00:14:21.611 --> 00:14:25.154 and a new way for them to continue to evolve that standard. 00:14:25.154 --> 00:14:30.569 They have editors, very much in the Wikipedia style 00:14:30.569 --> 00:14:31.582 of editorial boards. 00:14:31.582 --> 00:14:34.098 They have working groups for different types of data. 00:14:34.098 --> 00:14:36.090 They do polls with the community, 00:14:36.090 --> 00:14:40.879 and they have pretty nice engagement of the community at large, 00:14:40.879 --> 00:14:43.706 even if they've never visited our Wiki. 00:14:43.706 --> 00:14:46.183 The metadata evolves 00:14:46.183 --> 00:14:48.775 so what we do is that people annotate their datasets, 00:14:48.775 --> 00:14:52.321 then the schema evolves, the properties evolve 00:14:52.321 --> 00:14:55.379 and we have an entire infrastructure and mechanisms 00:14:55.379 --> 00:15:00.336 to re-annotate the datasets with the new structure of the ontology 00:15:00.336 --> 00:15:01.711 and the new properties. 00:15:01.711 --> 00:15:05.210 This is described in the paper. I won't go into the details. 00:15:05.210 --> 00:15:07.583 But I think that having that kind of capability 00:15:07.583 --> 00:15:10.342 in Wikibase would be really interesting. 00:15:10.342 --> 00:15:14.041 We basically extended Semantic Media Wiki and Media Wiki 00:15:14.041 --> 00:15:15.722 to create our own infrastructure. 00:15:15.722 --> 00:15:18.855 I think a lot of this is now something that we find in Wikibase, 00:15:18.961 --> 00:15:20.615 but this is older than that. 00:15:20.615 --> 00:15:24.999 And in general, we have many projects where we look at crowdsourcing 00:15:24.999 --> 00:15:29.885 not just descriptions of datasets but also descriptions of hydrology models, 00:15:29.885 --> 00:15:33.563 descriptions of multi-step data analytic workflows 00:15:33.563 --> 00:15:36.080 and many other things in the sciences. 00:15:36.080 --> 00:15:42.833 So we are also interested in including in Wikidata additional things 00:15:42.833 --> 00:15:46.250 that are not just datasets or entities 00:15:46.250 --> 00:15:48.512 but also other things that have to do with science. 00:15:48.512 --> 00:15:53.770 I think Geosciences are more complex in this sense than Biology, for example. 00:15:54.923 --> 00:15:56.233 That's it. 00:15:56.513 --> 00:15:57.885 Thank you. (audience clapping) 00:16:01.640 --> 00:16:03.772 - Do I have time for questions? - Yes. 00:16:03.772 --> 00:16:06.871 (moderator) We have time for just a couple of short questions. 00:16:07.751 --> 00:16:11.342 When answering, can go back to the microphone? 00:16:12.529 --> 00:16:14.520 - Yes. - Hopefully, yeah. 00:16:21.314 --> 00:16:25.002 (audience 1) Does the structure allow tabular datasets to be described 00:16:25.002 --> 00:16:26.988 and can you talk a bit about that? 00:16:27.225 --> 00:16:32.667 Yes. So the properties of the datasets talk more about who collected them, 00:16:32.667 --> 00:16:36.759 what kind of data was collected, what kind of sample it was, 00:16:36.759 --> 00:16:39.790 and then there's a separate standard which is called "lipid" 00:16:39.790 --> 00:16:43.065 that's complementary and mapped to the properties 00:16:43.065 --> 00:16:46.994 that describes the format of the actual files 00:16:47.075 --> 00:16:49.343 and the actual structure of the data. 00:16:49.343 --> 00:16:53.631 So, you're right that there's both, "how do I find data about x" 00:16:53.631 --> 00:16:55.557 but also, "Now, how do I use it? 00:16:55.557 --> 00:17:00.211 How do I know where the temperature that I'm looking for 00:17:00.211 --> 00:17:03.013 is actually in the file?" 00:17:03.656 --> 00:17:05.394 (moderator) This will be the last. 00:17:06.887 --> 00:17:09.034 (audience 2) I'll have to make it relevant. 00:17:09.504 --> 00:17:15.667 So, you have shown this process of how users can suggest 00:17:15.667 --> 00:17:18.985 or like actually already put in properties, 00:17:18.985 --> 00:17:22.705 and I didn't fully understand how this thing works, 00:17:22.705 --> 00:17:24.027 or what's the process behind it. 00:17:24.027 --> 00:17:28.045 Is there some kind of folksonomy approach--obviously-- 00:17:28.045 --> 00:17:33.387 but how is it promoted into the core vocabulary 00:17:33.387 --> 00:17:36.255 if something is promoted? 00:17:36.255 --> 00:17:37.882 Yes, yes. It is. 00:17:37.882 --> 00:17:42.202 So what we do is we have a core ontology and the initial one was actually 00:17:42.202 --> 00:17:45.618 very thoughtfully put together through a lot of discussion 00:17:45.618 --> 00:17:47.964 by very few people. 00:17:47.964 --> 00:17:51.052 And then the idea was the whole community can extend that 00:17:51.052 --> 00:17:52.971 or propose changes to that. 00:17:52.971 --> 00:17:56.919 So, as they are describing datasets, they can add new properties 00:17:56.919 --> 00:17:59.526 and those become "crowd properties." 00:17:59.526 --> 00:18:02.941 And every now and then, the Editorial Committee 00:18:02.941 --> 00:18:04.367 looks at all of those properties, 00:18:04.367 --> 00:18:07.795 the working groups look at all of those crowd properties, 00:18:07.795 --> 00:18:11.714 and decide whether to incorporate them into the main ontology. 00:18:11.714 --> 00:18:15.804 So it could be because they're used for a lot of dataset descriptions. 00:18:15.804 --> 00:18:18.920 It could be because they are proposed by somebody 00:18:18.920 --> 00:18:23.339 and they're found to be really interesting or key, or uncontroversial. 00:18:23.339 --> 00:18:30.267 So there's an entire editorial process to incorporate those new crowd properties 00:18:30.267 --> 00:18:32.188 or the folksonomy part of it, 00:18:32.188 --> 00:18:36.308 but they are really built around the core of the ontology. 00:18:36.404 --> 00:18:40.280 The core ontology then grows with more crowd properties 00:18:40.280 --> 00:18:44.311 and then people propose additional crowd properties again. 00:18:44.311 --> 00:18:46.979 So we've gone through a couple of these iterations 00:18:46.979 --> 00:18:51.386 of rolling out a new core, and then extending it, 00:18:51.386 --> 00:18:55.570 and then rolling out a new core and then extending it. 00:18:55.570 --> 00:18:57.779 - (audience 2) Great. Thank you. - Thanks. 00:18:57.779 --> 00:19:00.437 (moderator) Thank you. (audience applauding) 00:19:02.295 --> 00:19:03.777 (moderator) Thank you, Yolanda. 00:19:03.777 --> 00:19:07.494 And now we have Adam Shorn with "Something About Wikibase," 00:19:07.599 --> 00:19:09.299 according to the title. 00:19:09.708 --> 00:19:12.956 Uh... where's the internet? There it is. 00:19:13.245 --> 00:19:18.925 So, I'm going to do a live demo, which is probably a bad idea 00:19:18.925 --> 00:19:21.362 but I'm going to try and do it as the birthday present later 00:19:21.362 --> 00:19:24.268 so I figure I might as well try it here. 00:19:24.292 --> 00:19:27.304 And I also have some notes on my phone because I have no slides. 00:19:29.349 --> 00:19:32.248 So, two years ago, I made these Wikibase doc images 00:19:32.248 --> 00:19:34.052 that quite a few people have tried out, 00:19:34.052 --> 00:19:38.087 and even before then, I was working on another project, 00:19:38.087 --> 00:19:42.363 which is kind of ready now, and here it is. 00:19:43.690 --> 00:19:46.832 It's a website that allows you to instantly create a Wikibase 00:19:46.900 --> 00:19:48.930 with a query service and quick statements, 00:19:48.930 --> 00:19:51.616 without needing to know about any of the technical details, 00:19:51.616 --> 00:19:54.295 without needing to manage any of them either. 00:19:54.295 --> 00:19:57.054 There are still lots of features to go and there's still some bugs, 00:19:57.054 --> 00:19:59.348 but here goes the demo. 00:19:59.348 --> 00:20:02.628 Let me get my emails up ready... because I need them too... 00:20:03.315 --> 00:20:06.514 Da da da... Stopwatch. 00:20:07.272 --> 00:20:08.488 Okay. 00:20:08.829 --> 00:20:14.253 So it's a simple as... at the moment it's locked down behind... 00:20:14.337 --> 00:20:16.495 Oh no! German keyboard! 00:20:16.495 --> 00:20:18.703 (audience laughing) 00:20:22.556 --> 00:20:23.923 Foiled... okay. 00:20:24.955 --> 00:20:26.214 Okay. 00:20:26.634 --> 00:20:28.417 (audience continues to laugh) 00:20:30.434 --> 00:20:31.989 Aha! Okay. 00:20:32.950 --> 00:20:35.335 I'll remember that for later. (laughs) 00:20:36.911 --> 00:20:38.119 Yes. 00:20:39.438 --> 00:20:40.855 ♪ (humming) ♪ 00:20:40.961 --> 00:20:44.932 Oh my god... now it's American. 00:20:53.871 --> 00:20:56.131 All you have to do is create an account... 00:20:58.570 --> 00:21:00.007 da da da... 00:21:00.566 --> 00:21:02.432 Click this button up here... 00:21:02.478 --> 00:21:05.512 Come up with a name for Wiki-- "Demo1" 00:21:05.862 --> 00:21:07.299 "Demo1" 00:21:07.568 --> 00:21:09.135 "Demo user" 00:21:09.203 --> 00:21:11.864 Agree to the terms which don't really exist yet. 00:21:12.298 --> 00:21:14.247 (audience laughing) 00:21:15.264 --> 00:21:17.698 Click on this thing which isn't a link. 00:21:21.519 --> 00:21:23.886 And then you have your Wikibase. 00:21:23.886 --> 00:21:26.602 (audience cheers and claps) 00:21:28.554 --> 00:21:30.421 Anmelden in German. 00:21:30.421 --> 00:21:35.126 Demo... oh god! I'm learning lots about my demo later. 00:21:35.569 --> 00:21:40.069 1-6-1-4-S-G... 00:21:40.166 --> 00:21:42.567 - (audience 3) Y... - (Adam) It's random. 00:21:43.016 --> 00:21:44.567 (audience laughing) 00:21:46.237 --> 00:21:47.958 Oh, come on.... (audience laughing) 00:21:48.001 --> 00:21:50.543 Oh no. It's because this is a capital U... 00:21:51.333 --> 00:21:53.283 (audience chattering) 00:21:54.453 --> 00:21:56.545 6-1-4.... 00:21:57.465 --> 00:22:01.248 S-G-ENJ... 00:22:01.623 --> 00:22:03.794 Is J... oh no. That's... oh yeah. Okay. 00:22:03.843 --> 00:22:06.242 I'm really... I'm gonna have to look at the laptop 00:22:06.242 --> 00:22:07.836 that I'm doing this on later. 00:22:07.836 --> 00:22:09.129 Cool... 00:22:11.046 --> 00:22:13.709 Da da da da da... 00:22:14.687 --> 00:22:17.040 Maybe I should have some things in my clipboard ready. 00:22:17.539 --> 00:22:19.093 Okay, so now I'm logged in. 00:22:22.631 --> 00:22:25.065 Oh... keyboards. 00:22:28.083 --> 00:22:30.012 So you can go and create an item... 00:22:36.194 --> 00:22:38.508 Yeah, maybe I should make a video. It might be easier. 00:22:38.927 --> 00:22:42.207 So, yeah. You can make items, you have quick statements here 00:22:42.207 --> 00:22:43.901 that have... oh... it is all in German. 00:22:43.901 --> 00:22:45.088 (audience laughing) 00:22:45.088 --> 00:22:46.297 (sighs) 00:22:46.926 --> 00:22:49.021 Oh, log in? Log in? 00:22:50.348 --> 00:22:52.088 It has... Oh, set up ready. 00:22:52.088 --> 00:22:53.482 Da da da... 00:22:55.965 --> 00:22:57.850 It's as easy as... 00:22:58.966 --> 00:23:01.350 I learned how to use Quick Statements yesterday... 00:23:01.350 --> 00:23:03.245 that's what I know how to do. 00:23:04.657 --> 00:23:07.089 I can then go back to the Wiki... 00:23:08.008 --> 00:23:09.804 We can go and see in Recent Changes 00:23:09.804 --> 00:23:11.942 that there are now two items, the one that I made 00:23:11.942 --> 00:23:13.759 and the one from Quick Statements... 00:23:13.759 --> 00:23:14.881 and then you go to Quick... 00:23:14.881 --> 00:23:16.511 ♪ (hums a tune) ♪ 00:23:17.637 --> 00:23:18.770 Stop...no... 00:23:18.927 --> 00:23:20.120 No... 00:23:20.454 --> 00:23:22.437 (audience laughing) 00:23:28.394 --> 00:23:30.006 Oh god... 00:23:30.061 --> 00:23:32.012 I'm glad I tried this out in advance. 00:23:33.464 --> 00:23:35.678 There you go. And the query service is updated. 00:23:35.830 --> 00:23:37.763 (audience clapping) 00:23:42.357 --> 00:23:45.359 And the idea of this is it'll allow people to try out Wikibases. 00:23:45.359 --> 00:23:48.493 Hopefully, it'll even be able to allow people to... 00:23:49.110 --> 00:23:50.945 have their real Wikibases here. 00:23:50.945 --> 00:23:53.783 At the moment you can create as many as you want 00:23:53.783 --> 00:23:55.653 and they all just appear in this lovely list. 00:23:55.653 --> 00:23:59.182 As I said, there's lots of bugs but it's all super quick. 00:23:59.914 --> 00:24:03.392 Exactly how this is going to continue in the future, we don't know yet 00:24:03.392 --> 00:24:05.757 because I only finished writing this in the last few days. 00:24:05.757 --> 00:24:09.286 It's currently behind an invitation code so that if you want to come try it out, 00:24:09.286 --> 00:24:10.888 come and talk to me. 00:24:11.645 --> 00:24:15.730 And if you have any other comments or thoughts, let me know. 00:24:15.861 --> 00:24:19.711 Oh, three minutes...40. That's... That's not that bad. 00:24:19.986 --> 00:24:21.022 Thanks. 00:24:21.022 --> 00:24:22.622 (audience clapping) 00:24:28.435 --> 00:24:30.006 Any questions? 00:24:31.020 --> 00:24:35.553 (audience 5) Does the Quick Statements and the Query Service 00:24:35.553 --> 00:24:38.602 are automatically updated? 00:24:39.553 --> 00:24:42.345 Yes. So the idea is that there will be somebody, 00:24:42.345 --> 00:24:43.500 at the moment, me, 00:24:43.500 --> 00:24:45.144 maintaining all of the horrible stuff 00:24:45.144 --> 00:24:47.290 that you don't have to behind the scenes. 00:24:47.657 --> 00:24:50.157 So kind of think of it like GitHub.com, 00:24:50.157 --> 00:24:54.058 but you don't have to know anything about Git to use it. It's just all there. 00:24:55.241 --> 00:24:56.886 - [inaudible] - Yeah, we'll get that. 00:24:56.886 --> 00:25:00.247 But any of those big hosted solution things. 00:25:00.833 --> 00:25:03.263 - (audience 6) A feature request. - Yes. 00:25:03.263 --> 00:25:05.479 Is there any-- In Scope 00:25:05.479 --> 00:25:09.799 do you have plans on making it so you can easily import existing... 00:25:09.799 --> 00:25:12.549 - Wikidata... - I have loads of plans. 00:25:12.549 --> 00:25:14.909 Like I want there to be a button where you can just import 00:25:14.909 --> 00:25:17.348 another whole Wikibase and all of--yeah. 00:25:17.436 --> 00:25:20.723 There will, in the future list that's really long. Yeah. 00:25:24.454 --> 00:25:28.406 (audience 7) I understand that it's... you want to make it user-friendly 00:25:28.406 --> 00:25:32.242 but if I want to access to the machine itself, can I do that? 00:25:32.242 --> 00:25:34.673 Nope. (audience laughing) 00:25:37.006 --> 00:25:40.863 So again, like, in the longer term future, there are possib... 00:25:40.863 --> 00:25:43.810 Everything's possible, but at the moment, no. 00:25:45.156 --> 00:25:49.743 (audience 8) Two questions. Is there a plan to have export tools 00:25:49.743 --> 00:25:52.791 so that you can export it to your own Wikibase maybe at some point? 00:25:52.791 --> 00:25:53.824 - Yes. - Great. 00:25:53.824 --> 00:25:55.565 And is this a business? 00:25:56.003 --> 00:25:58.164 I have no idea. (audience laughing) 00:26:00.015 --> 00:26:01.545 Not currently. 00:26:05.754 --> 00:26:08.451 (audience 9) What if I stop using it tomorrow, 00:26:08.451 --> 00:26:11.096 how long will the data be there? 00:26:11.181 --> 00:26:14.632 So my plan was at the end of WikidataCon I was going to delete all of the data 00:26:14.632 --> 00:26:18.060 and there's a Wikibase Workshop on a Sunday, 00:26:18.060 --> 00:26:21.671 and we will maybe be using this for the Wikibase workshop 00:26:21.671 --> 00:26:23.801 so that everyone can have their own Wikibase. 00:26:23.801 --> 00:26:27.366 And then, from that point, I probably won't be deleting the data 00:26:27.366 --> 00:26:29.008 so it will all just stay there. 00:26:31.763 --> 00:26:32.923 (moderator) Question. 00:26:34.524 --> 00:26:36.114 (audience 10) It's two minutes... 00:26:36.175 --> 00:26:39.505 Alright, fine. I'll allow two more questions if you talk quickly. 00:26:39.505 --> 00:26:41.550 (audience laughing) 00:26:47.370 --> 00:26:49.999 - Alright, good people. - Thank you, Adam. 00:26:49.999 --> 00:26:52.418 Thank you for letting me test my demo... I mean... 00:26:52.418 --> 00:26:54.640 I'm going to do it different. (audience clapping) 00:26:59.512 --> 00:27:00.753 (moderator) Thank you. 00:27:00.753 --> 00:27:03.869 Now we have Dennis Diefenbach presenting Q Answer. 00:27:04.489 --> 00:27:08.129 Hello, I'm Dennis Diefenbach, I would like to present Q-Answer 00:27:08.129 --> 00:27:11.392 which is a question-answering system on top of Wikidata. 00:27:11.392 --> 00:27:16.203 So, what we need are some questions and this is the interface of QAnswer. 00:27:16.203 --> 00:27:23.460 For example, where is WikidataCon? 00:27:23.901 --> 00:27:25.975 Alright, I think it's written like this. 00:27:27.432 --> 00:27:32.432 2019... And we get this response which is Berlin. 00:27:32.458 --> 00:27:38.425 So, other questions. For example, "When did Wikidata start?" 00:27:38.430 --> 00:27:42.383 It started the 30 October 2012 so it's birthday is approaching. 00:27:44.079 --> 00:27:48.014 It is 6 years old, so it will be their 7th birthday. 00:27:49.133 --> 00:27:51.583 Who is developing Wikidata? 00:27:51.583 --> 00:27:54.371 The Wikimedia Foundation and Wikimedia Deutschland, 00:27:54.371 --> 00:27:55.988 so thank you very much to them. 00:27:57.013 --> 00:28:02.947 Something like museums in Berlin... I don't know why this is not so... 00:28:05.494 --> 00:28:07.737 Only one museum... no, yeah, a few more. 00:28:09.167 --> 00:28:10.995 So, when you ask something like this, 00:28:10.995 --> 00:28:14.178 we allow the user to explore the information 00:28:14.178 --> 00:28:16.308 with different aggregations. 00:28:16.308 --> 00:28:18.953 For example, if there are many geo coordinates 00:28:18.953 --> 00:28:21.476 attached to the entities, we will display a map. 00:28:21.476 --> 00:28:26.357 If there are many images attached to them, we will display the images, 00:28:26.357 --> 00:28:29.057 and otherwise there is a list where you can explore 00:28:29.057 --> 00:28:30.855 the different entities. 00:28:33.236 --> 00:28:35.605 You can ask something like "Who is the mayor of Berlin," 00:28:36.643 --> 00:28:40.201 "Give me politicians born in Berlin," and things like this. 00:28:40.201 --> 00:28:44.428 So you can both ask keyword questions and foreign natural language questions. 00:28:45.171 --> 00:28:48.604 The whole data is coming from Wikidata 00:28:48.604 --> 00:28:55.346 so all entities which are in Wikidata are queryable by this service. 00:28:55.869 --> 00:28:59.244 And the data is really all from Wikidata 00:28:59.244 --> 00:29:01.207 in the sense, there are some Wikipedia snippets, 00:29:01.207 --> 00:29:04.851 there are images from Wikimedia Commons, 00:29:04.851 --> 00:29:07.644 but the rest is all Wikidata data. 00:29:08.760 --> 00:29:11.678 We can do this in several languages. This is now in Chinese. 00:29:11.678 --> 00:29:15.441 I don't know what is written there so do not ask me. 00:29:15.441 --> 00:29:19.893 We are currently supporting this languages with more or less good quality 00:29:19.893 --> 00:29:22.094 because... yeah. 00:29:23.332 --> 00:29:27.563 So, how can this be useful for the Wikidata community? 00:29:27.968 --> 00:29:30.052 I think there are different reasons. 00:29:30.052 --> 00:29:33.786 First of all, this thing helps you to generate SPARQL queries 00:29:33.786 --> 00:29:37.043 and I know there are even some workshops about how to use SPARQL. 00:29:37.043 --> 00:29:39.444 It's not a language that everyone speaks. 00:29:39.444 --> 00:29:45.147 So, if you ask something like "a philosopher born before 1908," 00:29:45.147 --> 00:29:48.697 to figure out, to construct a SPARQL query like this could be tricky, 00:29:50.001 --> 00:29:54.257 In fact when you ask a question, we generate many SPARQL queries 00:29:54.301 --> 00:29:57.486 and the first one is always the thing, the SPARQL query where we think 00:29:57.486 --> 00:29:59.008 this is the good one. 00:29:59.017 --> 00:30:02.651 So, if you ask your question and then you go on SPARQL list, 00:30:02.691 --> 00:30:06.468 then there is this button for the Wikidata query service 00:30:06.468 --> 00:30:11.811 and you have the SPARQL query right there and you will get the same result 00:30:11.811 --> 00:30:15.184 as you would get in the interface. 00:30:16.906 --> 00:30:19.289 Another thing where it could be useful for 00:30:19.289 --> 00:30:23.468 is for finding missing contextual information. 00:30:23.468 --> 00:30:27.057 For example, if you ask for actors in "The Lord of the Rings," 00:30:27.057 --> 00:30:30.776 most of these entities will have associated an image 00:30:30.776 --> 00:30:32.490 but not all of them. 00:30:32.490 --> 00:30:37.861 So here there is some missing metadata that could be added. 00:30:37.861 --> 00:30:40.376 You could go to this entity at an image 00:30:40.376 --> 00:30:45.462 and then see first that there is an image missing and so on. 00:30:46.457 --> 00:30:52.047 Another thing is that you could find schema issues. 00:30:52.047 --> 00:30:55.424 For example, if you ask "books by Andrea Camilleri," 00:30:55.428 --> 00:30:57.711 which is a famous Italian writer, 00:30:57.711 --> 00:30:59.981 you would currently get these three books. 00:30:59.981 --> 00:31:02.681 But he wrote many more. He wrote more than 50. 00:31:02.681 --> 00:31:05.701 And so the question is, are they not in Wikidata 00:31:05.701 --> 00:31:09.704 or is maybe my knowledge not correctly currently like it is. 00:31:09.704 --> 00:31:12.804 And in this case, I know there is another book from him, 00:31:12.804 --> 00:31:14.737 which is "Un mese con Montalbano." 00:31:14.737 --> 00:31:18.207 It has only an Italian label so you can only search it in Italian. 00:31:18.207 --> 00:31:22.103 And if you go to this entity, you will say that he has written it. 00:31:22.103 --> 00:31:27.504 It's a short story by Andrea Camilleri and it's an instance of literary work, 00:31:27.504 --> 00:31:29.220 but it's not instance of book 00:31:29.220 --> 00:31:31.338 so that's the reason why it doesn't appear. 00:31:31.338 --> 00:31:35.904 This is a way to track where things are missing 00:31:35.904 --> 00:31:37.499 in the Wikidata model 00:31:37.499 --> 00:31:39.539 not as you would expect. 00:31:40.794 --> 00:31:42.968 Another reason is just to have fun. 00:31:43.588 --> 00:31:47.546 I imagine that many of you added many Wikidata entities 00:31:47.546 --> 00:31:50.776 so just search for the ones that you care most 00:31:50.776 --> 00:31:52.529 or you have edited yourself. 00:31:52.529 --> 00:31:56.893 So in this case, who developed QAnswer, and that's it. 00:31:56.893 --> 00:32:00.226 For any other questions, go to www.QAnswer.eu/qa 00:32:00.226 --> 00:32:03.575 and hopefully we'll find an answer for you. 00:32:03.782 --> 00:32:05.649 (audience clapping) 00:32:13.994 --> 00:32:17.040 - Sorry. - I'm just the dumbest person here. 00:32:17.530 --> 00:32:22.722 (audience 11) So I want to know how is this kind of agnostic 00:32:22.752 --> 00:32:25.104 to Wikibase instance, 00:32:25.104 --> 00:32:29.020 or has it been tied to the exact like property numbers 00:32:29.020 --> 00:32:31.054 and things in Wikidata? 00:32:31.054 --> 00:32:33.442 Has it learned in some way or how was it set up? 00:32:33.442 --> 00:32:36.456 There is training data and we rely on training data 00:32:36.456 --> 00:32:40.585 and this is also most of the cases why you will not get good resutls. 00:32:40.585 --> 00:32:44.881 But we're training the system by the simple yes and no answer. 00:32:44.881 --> 00:32:48.936 When you ask a question, and we ask always for feedback, yes or no, 00:32:48.936 --> 00:32:51.899 and this feedback is used by the machine learning algorithm. 00:32:51.899 --> 00:32:54.124 This is where machine learning comes into play. 00:32:54.124 --> 00:32:58.600 But basically, we put up separate Wikibase instances 00:32:58.600 --> 00:33:00.482 and we can plug this in. 00:33:00.482 --> 00:33:04.249 In fact, the system is agnostic in the sense that it only wants RDF. 00:33:04.249 --> 00:33:06.618 And RDF, you have in each Wikibase, 00:33:06.618 --> 00:33:08.059 there are some few configurations 00:33:08.059 --> 00:33:10.432 but you can have this on top of any Wikibase. 00:33:11.654 --> 00:33:13.039 (audience 11) Awesome. 00:33:23.573 --> 00:33:27.004 (audience 12) You mentioned that it's being trained by yes/no answers. 00:33:27.073 --> 00:33:32.662 So I guess this is assuming that the Wikidata instance is free of errors 00:33:32.722 --> 00:33:34.356 or is it also...? 00:33:34.356 --> 00:33:37.140 You assume that the Wikidata instances... 00:33:37.140 --> 00:33:40.731 (audience 12) I guess I'm asking, like, are you distinguishing 00:33:40.731 --> 00:33:46.289 between source level errors or misunderstanding the question 00:33:46.289 --> 00:33:50.856 versus a bad mapping, etc.? 00:33:51.706 --> 00:33:55.474 Generally, we assume that the data in Wikidata is true. 00:33:55.474 --> 00:33:59.172 So if you click "no" and the data in Wikidata would be false, 00:33:59.172 --> 00:34:03.023 then yeah... we would not catch this difference. 00:34:03.023 --> 00:34:05.081 But sincerely, Wikidata quality is very good, 00:34:05.081 --> 00:34:08.231 so I rarely have had this problem. 00:34:16.592 --> 00:34:22.068 (audience 12) Is this data available as a dataset by any chance, sir? 00:34:22.209 --> 00:34:27.218 - What is... direct service? - The... dataset of... 00:34:27.218 --> 00:34:30.803 "is this answer correct versus the query versus the answer?" 00:34:30.872 --> 00:34:33.340 Is that something you're publishing as part of this? 00:34:33.340 --> 00:34:38.040 - The training data that you've... - We published the training data. 00:34:38.040 --> 00:34:43.423 We published some old training data but no, just a-- 00:34:44.573 --> 00:34:47.313 There is a question there. I don't know if we have still time. 00:34:51.215 --> 00:34:55.104 (audience 13) Maybe I just missed this but is it running on a live, 00:34:55.104 --> 00:34:57.080 like the Live Query Service, 00:34:57.080 --> 00:34:59.393 or is it running on some static dump you loaded 00:34:59.393 --> 00:35:01.690 or where is the data source for Wikidata? 00:35:01.784 --> 00:35:07.014 Yes. The problem is to apply this technology, 00:35:07.014 --> 00:35:08.414 you need a local dump. 00:35:08.414 --> 00:35:10.673 Because we do not rely only on the SPARQL end point, 00:35:10.673 --> 00:35:12.873 we rely on special indexes. 00:35:12.873 --> 00:35:16.192 So, we are currently loading the Wikidata dump. 00:35:16.192 --> 00:35:18.699 We are updating this every two weeks. 00:35:18.699 --> 00:35:20.756 We would like to do it more often, 00:35:20.756 --> 00:35:23.823 in fact we would like to get the difs for each day, for example, 00:35:23.823 --> 00:35:25.271 to put them in our index. 00:35:25.271 --> 00:35:28.719 But unfortunately, right now, the Wikidata dumps are released 00:35:28.719 --> 00:35:31.753 only once every week. 00:35:31.753 --> 00:35:35.150 So, we cannot be faster than that and we also need some time 00:35:35.150 --> 00:35:39.073 to re-index the data, so it takes one or two days. 00:35:39.073 --> 00:35:41.833 So we are always behind. Yeah. 00:35:48.202 --> 00:35:49.780 (moderator) Any more? 00:35:50.430 --> 00:35:53.268 - Okay, thank you very much. - Thank you all very much. 00:35:53.547 --> 00:35:54.966 (audience clapping) 00:35:57.266 --> 00:36:00.165 (moderator) And now last, we have Eugene Alvin Villar, 00:36:00.165 --> 00:36:02.049 talking about Panandâ. 00:36:10.630 --> 00:36:12.637 Good afternoon, my name is Eugene Alvin Villar 00:36:12.637 --> 00:36:15.297 and I'm from the Philippines, and I'll be talking about Panandâ: 00:36:15.297 --> 00:36:18.185 a mobile app powered by Wikidata. 00:36:18.862 --> 00:36:21.678 This is a follow-up to my lightning talk that I presented two years ago 00:36:21.678 --> 00:36:25.004 at WikidataCon 2017 together with Carlo Moskito. 00:36:25.004 --> 00:36:26.557 You can download the slides 00:36:26.557 --> 00:36:28.727 and there's a link to that presentation there. 00:36:28.727 --> 00:36:30.868 I'll give you a bit of a background. 00:36:30.868 --> 00:36:33.471 Wiki Society of the Philippines, formerly, Wikimedia Philippines, 00:36:33.471 --> 00:36:37.477 had a series of projects related to Philippine heritage and history. 00:36:37.477 --> 00:36:41.705 So we have the usual photo contests, Wikipedia Takes Manila, 00:36:41.705 --> 00:36:43.238 Wiki Loves Monuments, 00:36:43.238 --> 00:36:46.657 and then our media project was Cultural Heritage Mapping Project 00:36:46.657 --> 00:36:49.094 back in 2014-2015. 00:36:50.044 --> 00:36:53.039 In that project, we trained volunteers to edit articles 00:36:53.039 --> 00:36:54.389 related to cultural heritage. 00:36:54.914 --> 00:36:59.032 This is our biggest, and most successful project that we had. 00:36:59.032 --> 00:37:03.037 794 articles were created or improved, including 37 "Did You Knows" 00:37:03.037 --> 00:37:05.238 and 4 "Good Articles," 00:37:05.308 --> 00:37:08.688 and more than 5,000 images were uploaded to Commons. 00:37:08.688 --> 00:37:11.039 As a result of that, we then launched 00:37:11.039 --> 00:37:13.689 the Encyclopedia of Philippine Heritage program 00:37:13.689 --> 00:37:18.444 in order to expand the scope and also include Wikidata in the scope. 00:37:18.444 --> 00:37:21.695 Here's the Core Team: myself, Carlo and Roel. 00:37:21.695 --> 00:37:26.870 Our first pilot project was to document the country's historical markers 00:37:26.870 --> 00:37:29.153 in Wikidata and Commons, 00:37:29.153 --> 00:37:34.053 starting with those created by our historical national agency, NHCP. 00:37:34.053 --> 00:37:38.904 For example, they installed a marker for our national hero, here in Berlin, 00:37:38.904 --> 00:37:41.421 so there's no Wikidata page for that marker 00:37:41.421 --> 00:37:45.102 and a collection of photos of that marker in Commons. 00:37:46.166 --> 00:37:50.397 Unfortunately, the government agency does not keep a good database 00:37:50.397 --> 00:37:53.480 up-to-date or complete of their markers, 00:37:53.480 --> 00:37:58.004 so we have to painstakingly input these to Wikidata manually. 00:37:58.004 --> 00:38:02.772 After careful research and confirmation, here's a graph of the number of markers 00:38:02.772 --> 00:38:07.466 that we've added to Wikidata over time, over the past three years. 00:38:07.466 --> 00:38:11.230 And we've developed this Historical Markers Map web app 00:38:11.230 --> 00:38:15.289 that lets users view these markers on a map, 00:38:15.289 --> 00:38:21.051 so we can browse it as a list, view a good visualization of the markers 00:38:21.051 --> 00:38:23.253 with information and inscriptions. 00:38:23.253 --> 00:38:28.885 All of this is powered by Live Query from Wikidata Query Service. 00:38:29.732 --> 00:38:32.005 There's the link if you want to play around with it. 00:38:33.349 --> 00:38:37.428 And so we developed a mobile app for this one. 00:38:37.428 --> 00:38:42.117 To better publicize our project, I developed the Panandâ 00:38:42.117 --> 00:38:45.434 which is Tagalog for "marker", as an android app, 00:38:45.434 --> 00:38:48.393 that was published back in 2018, 00:38:48.393 --> 00:38:53.934 and I'll publish the IOS version sometime in the future, hopefully. 00:38:54.868 --> 00:38:57.892 I'd like to demo the app but we have no time, 00:38:57.892 --> 00:39:00.935 so here are some of the features of the app. 00:39:00.935 --> 00:39:04.586 There's a Map and a List view, with text search, 00:39:04.586 --> 00:39:07.452 so you can drill down as needed. 00:39:07.452 --> 00:39:10.169 You can filter by region or by distance, 00:39:10.169 --> 00:39:12.193 and whether you have marked these markers, 00:39:12.193 --> 00:39:15.499 as either you have visited them or you'd like to bookmark them 00:39:15.499 --> 00:39:16.949 for future visits. 00:39:16.949 --> 00:39:19.482 Then you can use your GPS on your mobile phone 00:39:19.482 --> 00:39:21.860 to use for distance filtering. 00:39:21.860 --> 00:39:26.765 For example, if I want markers that are near me, you can do that. 00:39:26.765 --> 00:39:30.918 And when you click on the Details page, you can see the same thing, 00:39:30.918 --> 00:39:35.850 photos from Commons, inscription about the marker, 00:39:35.850 --> 00:39:40.484 how to find the marker, its location and address, etc. 00:39:41.601 --> 00:39:45.993 And one thing that's unique for this app is you can, again, visit 00:39:46.011 --> 00:39:50.407 or put a bookmark of these, so on the map or on the list, 00:39:50.407 --> 00:39:51.692 or on the Details page, 00:39:51.692 --> 00:39:54.891 you can just tap on those buttons and say that you've visited them, 00:39:54.891 --> 00:39:58.520 or you'd like to bookmark them for future visits. 00:39:58.520 --> 00:40:03.527 And my app has been covered by the press and given recognition, 00:40:03.527 --> 00:40:06.743 so plenty of local press articles. 00:40:06.743 --> 00:40:11.281 Recently, it was selected as one of the Top 5 finalists 00:40:11.281 --> 00:40:15.247 for the Android Masters competition in the App for Social Good category. 00:40:15.247 --> 00:40:17.351 The final event will be next month. 00:40:17.351 --> 00:40:18.999 Hopefully, we'll win. 00:40:20.380 --> 00:40:22.378 Okay, so some behind the scenes. 00:40:22.378 --> 00:40:25.477 How did I develop this app? 00:40:25.477 --> 00:40:28.578 Panandâ is actually a hybrid app, it's not native. 00:40:28.578 --> 00:40:30.745 Basically it's just a web app packaged as a mobile app 00:40:30.745 --> 00:40:32.518 using Apache Cordova. 00:40:32.518 --> 00:40:34.026 That reduces development time 00:40:34.026 --> 00:40:36.181 because I don't have to learn a different language. 00:40:36.181 --> 00:40:37.769 I know JavaScript, HTML. 00:40:37.879 --> 00:40:42.131 It's cross-platform, allows code reuse from the Historical Markers Map. 00:40:42.385 --> 00:40:46.311 And the app is also FIN Open Source. under the MIT license. 00:40:46.311 --> 00:40:49.429 So there's the GitHub repository over there. 00:40:50.469 --> 00:40:53.624 The challenge is the apps data is not live. 00:40:54.750 --> 00:40:56.820 Because if you query the data live, 00:40:56.843 --> 00:41:00.638 it means you pulling around half a megabyte of compressed JSON every time 00:41:00.638 --> 00:41:03.594 which is not friendly for those on mobile data, 00:41:03.594 --> 00:41:06.723 incurs too much delay when starting the app, 00:41:06.723 --> 00:41:13.097 and if there are any errors in Wikidata, that may result in poor user experience. 00:41:14.253 --> 00:41:18.046 So instead, what I did was the app is updated every few months 00:41:18.046 --> 00:41:20.468 with fresh data, compiled using a Perl script 00:41:20.468 --> 00:41:23.037 that queries Wikidata Query Service, 00:41:23.037 --> 00:41:25.678 and this script also does some data validation 00:41:25.678 --> 00:41:30.944 to highlight consistency or schema errors, so that allows fixes before updates 00:41:30.944 --> 00:41:34.735 in order to provide a good experience for the mobile user. 00:41:35.174 --> 00:41:39.274 And here's the... if you're tech-oriented, here's the more or less, 00:41:39.274 --> 00:41:41.644 the technologies that I'm using. 00:41:41.644 --> 00:41:43.976 So a bunch of JavaScript libraries. 00:41:43.976 --> 00:41:46.287 Here's the first script that queries Wikidata, 00:41:46.287 --> 00:41:48.598 some Cordova plug-ins, 00:41:48.598 --> 00:41:53.035 and building it using Cordova and then publishing this app. 00:41:53.763 --> 00:41:55.586 And that's it. 00:41:55.748 --> 00:41:58.164 (audience clapping) 00:42:01.800 --> 00:42:04.072 (moderator) I hope you win. Alright, questions. 00:42:16.286 --> 00:42:17.990 (audience 14) Sorry if I missed this. 00:42:17.990 --> 00:42:21.317 Are you opening your code so the people can adapt your app 00:42:21.317 --> 00:42:24.501 and do it for other cities? 00:42:24.501 --> 00:42:28.516 Yes, as I've mentioned, the app is free and open source, 00:42:28.516 --> 00:42:31.095 - (audience 14) But where is it? - There's the GitHub repository. 00:42:31.095 --> 00:42:33.610 You can download the slides, and there's a link 00:42:33.610 --> 00:42:36.841 in one of the previous slides to the repository. 00:42:36.841 --> 00:42:38.732 (audience 14) Okay. Can you put it? 00:42:42.392 --> 00:42:43.747 Yeah, at the bottom. 00:42:46.577 --> 00:42:49.222 (audience 15) Hi. Sorry, maybe I also missed this, 00:42:49.222 --> 00:42:51.628 but how do you check for a schema errors? 00:42:53.055 --> 00:42:56.007 Basically, we have a Wikiproject on Wikidata, 00:42:56.106 --> 00:43:02.425 so we try to put the other guidelines on how to model these markers correctly. 00:43:02.425 --> 00:43:05.190 Although it's not updated right now. 00:43:06.197 --> 00:43:09.023 As far as I know, we're the only country 00:43:09.023 --> 00:43:12.874 that's currently modeling these in Wikidata. 00:43:13.930 --> 00:43:20.152 There's also an effort to add [inaudible] 00:43:20.161 --> 00:43:22.411 in Wikidata, 00:43:22.474 --> 00:43:25.705 but I think that's a different thing altogether. 00:43:34.056 --> 00:43:35.895 (audience 16) So I guess this may be part 00:43:35.895 --> 00:43:37.725 of this Wikiproject you just described, 00:43:37.725 --> 00:43:42.800 but for the consistency checks, have you considered moving those 00:43:42.800 --> 00:43:46.743 into like complex schema constraints that then can be flagged 00:43:46.743 --> 00:43:50.583 on the Wikidata side for what there is to fix on there? 00:43:52.930 --> 00:43:55.547 I'm actually interested in seeing if I can do, for example, 00:43:55.598 --> 00:44:00.296 shape expressions, so that, yeah, we can do those things. 00:44:04.256 --> 00:44:06.776 (moderator) At this point, we have quite a few minutes left. 00:44:06.776 --> 00:44:09.026 The speakers did very well, so if Erica is okay with it, 00:44:09.026 --> 00:44:11.238 I'm also going to allow some time for questions, 00:44:11.238 --> 00:44:13.407 still about this presentation, but also about Mbabel, 00:44:13.407 --> 00:44:15.498 if anyone wants to jump in with something there, 00:44:15.498 --> 00:44:17.318 either presentation is fair game. 00:44:22.790 --> 00:44:25.639 Unless like me, you're all so dazzled that you just want to go to snacks 00:44:25.639 --> 00:44:27.955 and think about it. (audience giggles) 00:44:29.308 --> 00:44:31.179 - (moderator) You know... - Yeah. 00:44:31.953 --> 00:44:34.491 (audience 17) I will always have questions about everything. 00:44:34.491 --> 00:44:37.642 So, I came in late for the Mbabel tool. 00:44:37.642 --> 00:44:40.350 But I was looking through and I saw there's a number of templates, 00:44:40.350 --> 00:44:43.232 and I was wondering if there's a place to contribute 00:44:43.232 --> 00:44:45.564 to adding more templates for different types 00:44:45.564 --> 00:44:47.620 or different languages and the like? 00:44:50.497 --> 00:44:53.683 (Erica) So for now, we're developing those narrative templates 00:44:53.683 --> 00:44:55.566 on Portuguese Wikipedia. 00:44:55.566 --> 00:44:57.856 I can show you if you like. 00:44:57.856 --> 00:45:02.051 We're inserting those templates on English Wikipedia too. 00:45:02.051 --> 00:45:07.017 It's not complicated to do but we have to expand for other languages. 00:45:07.017 --> 00:45:08.236 - French? - French. 00:45:08.236 --> 00:45:10.465 - Yes. - French and German already have. 00:45:10.465 --> 00:45:11.465 (laughing) 00:45:12.002 --> 00:45:13.018 Yeah. 00:45:15.755 --> 00:45:18.287 (inaudible chatter) 00:45:21.756 --> 00:45:24.446 (audience 18) I also have a question about Mbabel, 00:45:24.446 --> 00:45:27.676 which is, is this really just templates? 00:45:27.676 --> 00:45:33.893 Is this based on the LUA scripting? Is that all? Wow. Okay. 00:45:33.956 --> 00:45:37.404 Yeah, so it's very deployable. Okay. Cool. 00:45:38.102 --> 00:45:40.199 (moderator) Just to catch that for the live stream, 00:45:40.199 --> 00:45:42.745 the answer was an emphatic nod of the head, and a yes. 00:45:42.915 --> 00:45:44.648 (audience laughing) 00:45:44.754 --> 00:45:47.203 - (Erica) Super simple. - (moderator) Super simple. 00:45:47.745 --> 00:45:49.819 (audience 19) Yeah. I would also like to ask. 00:45:49.819 --> 00:45:53.386 Sorry I haven't delved into Mbabel earlier. 00:45:53.386 --> 00:45:57.018 I'm wondering, you're working also with the links, the red links. 00:45:57.018 --> 00:46:00.052 Are you adding some code there? 00:46:03.987 --> 00:46:07.970 - (Erica) For the lists? - Wherever the link comes from... 00:46:07.970 --> 00:46:11.595 (audience 19) The architecture. Maybe I will have to look into it. 00:46:11.595 --> 00:46:13.355 (Erica) I'll show you later. 00:46:20.506 --> 00:46:23.221 (moderator) Alright. You're all ready for snack break, I can tell. 00:46:23.221 --> 00:46:24.456 So let's wrap it up. 00:46:24.456 --> 00:46:26.429 But our kind speakers, I'm sure will stick around 00:46:26.429 --> 00:46:27.958 if you have questions for them. 00:46:27.958 --> 00:46:31.179 Please join me in giving... first of all we didn't give a round of applause yet. 00:46:31.179 --> 00:46:33.221 I can tell you're interested in doing so. 00:46:33.221 --> 00:46:34.886 (audience clapping)