1 00:00:05,973 --> 00:00:07,908 Hi, guys! Can everybody hear me? 2 00:00:09,170 --> 00:00:11,898 So, hi! Nice to meet you all. I'm Erica Azzellini. 3 00:00:11,898 --> 00:00:14,606 I'm one of the Wikimovement Brazil's Liaison, 4 00:00:14,606 --> 00:00:17,829 and this is my first international Wikimedia event, 5 00:00:17,829 --> 00:00:21,023 so I'm super excited to be here and I hopefully, 6 00:00:21,023 --> 00:00:24,311 will share something interesting for you all here on this lengthy talk. 7 00:00:25,247 --> 00:00:30,441 So this work starts with research that I was developing in Brazil, 8 00:00:30,441 --> 00:00:34,219 Computational Journalism and Structured Narratives with Wikidata. 9 00:00:34,276 --> 00:00:35,958 So in journalism, 10 00:00:35,958 --> 00:00:39,616 they're using some natural language generation software 11 00:00:39,616 --> 00:00:41,418 for automating news 12 00:00:41,418 --> 00:00:46,535 for news that have quite similar narrative structure. 13 00:00:46,535 --> 00:00:51,600 And we developed this concept here of structured narratives, 14 00:00:51,600 --> 00:00:54,548 thinking about this practice on computational journalism, 15 00:00:54,548 --> 00:00:58,361 that is the development of verbal text, understandable by humans, 16 00:00:58,361 --> 00:01:01,274 automated from predetermined arrangements that process information 17 00:01:01,274 --> 00:01:05,395 from structured databases, which looks like that, 18 00:01:05,395 --> 00:01:10,043 the Wikimedia universe and on this tool that we developed. 19 00:01:10,043 --> 00:01:13,555 So, when I'm talking about verbal text understandable by humans, 20 00:01:13,555 --> 00:01:15,808 I'm talking about Wikipedia entries. 21 00:01:15,808 --> 00:01:17,778 When I'm talking about structured databases, 22 00:01:17,778 --> 00:01:20,017 of course, I'm talking about Wikidata here. 23 00:01:20,017 --> 00:01:22,777 And predetermined arrangement, I'm talking about Mbabel, 24 00:01:22,777 --> 00:01:24,271 that is this tool. 25 00:01:25,467 --> 00:01:31,216 The Mbabel tool was inspired by a template by user Pharos, right here in front of me, 26 00:01:31,279 --> 00:01:33,356 thank you very much, 27 00:01:33,356 --> 00:01:39,114 and it was developed with Ederporto that is right here too, 28 00:01:39,114 --> 00:01:40,974 the brilliant Ederporto. 29 00:01:42,599 --> 00:01:44,498 We developed this tool 30 00:01:44,498 --> 00:01:47,780 that automatically generates Wikipedia entries 31 00:01:47,780 --> 00:01:50,600 based on information from Wikidata. 32 00:01:53,189 --> 00:01:58,130 We actually do some thematic templates 33 00:01:58,130 --> 00:02:01,152 that are created on the Wikidata module, 34 00:02:01,573 --> 00:02:03,716 WikidataIB Module, 35 00:02:03,716 --> 00:02:07,835 and these templates are pre-determined, generic and editable templates 36 00:02:07,835 --> 00:02:09,677 for various article themes. 37 00:02:09,677 --> 00:02:15,411 We realized that many Wikipedia entries had a quite similar structured narrative 38 00:02:15,411 --> 00:02:18,922 so we could create a tool that automatically generates that 39 00:02:18,922 --> 00:02:21,598 for many Wikidata items. 40 00:02:24,207 --> 00:02:28,571 Until now we have templates for museums, works of art, books, films, 41 00:02:28,571 --> 00:02:31,265 journals, earthquakes, libraries, archives, 42 00:02:31,265 --> 00:02:34,855 and Brazilian municipal and state elections, and growing. 43 00:02:34,855 --> 00:02:38,984 So, everybody here is able to contribute and create new templates. 44 00:02:38,984 --> 00:02:43,508 Each narrative template includes an introduction, Wikidata infobox, 45 00:02:43,508 --> 00:02:46,158 section suggestions for the users, 46 00:02:46,158 --> 00:02:50,499 content tables or lists with Listeria, depending on the case, 47 00:02:50,499 --> 00:02:53,713 references and categories, and of course the sentences, 48 00:02:53,713 --> 00:02:55,776 that are created with the Wikidata information. 49 00:02:55,776 --> 00:02:58,642 I'm gonna show you in a sec an example of that. 50 00:03:00,137 --> 00:03:05,749 It's an integration with Wikipedia, integration with Wikidata, 51 00:03:05,749 --> 00:03:08,760 so the more properties properly filled on Wikidata, 52 00:03:08,760 --> 00:03:12,311 the more text entries you'll get on your article stub. 53 00:03:12,857 --> 00:03:15,623 That's very important to highlight here. 54 00:03:16,343 --> 00:03:18,969 Structuring this Wikidata can get more complex 55 00:03:18,969 --> 00:03:22,017 as I'm going to show you on the election projects that we've made. 56 00:03:22,017 --> 00:03:26,552 So I'm going to let you hear this Wikidata Lab XIV for you 57 00:03:26,552 --> 00:03:29,471 after this lengthy talk 58 00:03:29,471 --> 00:03:32,259 that is very brief, so you'll be able to choose 59 00:03:32,259 --> 00:03:34,554 on the work that we've been doing on structuring Wikidata 60 00:03:34,554 --> 00:03:36,005 for this purpose too. 61 00:03:37,272 --> 00:03:39,725 We have this challenge to build a narrative template 62 00:03:39,725 --> 00:03:44,383 that is generic enough to cover different Wikidata items 63 00:03:44,383 --> 00:03:46,347 and to suppress the gender 64 00:03:46,347 --> 00:03:50,359 and the number of difficulties of languages, 65 00:03:52,054 --> 00:03:54,252 and still sounding natural for the user 66 00:03:54,252 --> 00:03:59,252 because we don't want to sound like it doesn't click for the user 67 00:03:59,252 --> 00:04:00,546 to edit after that. 68 00:04:01,956 --> 00:04:07,625 This is how the Mbabel looks like on the bottom form. 69 00:04:07,625 --> 00:04:14,507 You just have insert the item number there and call the desired template 70 00:04:14,507 --> 00:04:21,673 and then you have article to edit and expand, and everything. 71 00:04:22,135 --> 00:04:26,856 So, more importantly, why we did it? Not because it's cool to develop 72 00:04:26,856 --> 00:04:30,922 things here in Wikidata, we know, we all hear, know about it. 73 00:04:30,922 --> 00:04:36,178 But we are experimenting this integration from Wikidata to Wikipedia 74 00:04:36,178 --> 00:04:39,226 and we want to focus on meaningful individual contributions. 75 00:04:39,226 --> 00:04:42,608 So we've been working on education programs 76 00:04:42,608 --> 00:04:45,067 and we want the students to feel the value 77 00:04:45,067 --> 00:04:47,280 of their entries too, but not only-- 78 00:04:47,280 --> 00:04:49,405 Oh, five minutes only, Geez, I'm gonna rush here. 79 00:04:49,405 --> 00:04:50,599 (laughing) 80 00:04:50,794 --> 00:04:54,160 And we want you all to make tasks for users in general, 81 00:04:54,270 --> 00:04:57,801 especially on tables and this kind of content 82 00:04:57,801 --> 00:04:59,988 that it's a bit of a rush to do. 83 00:05:02,456 --> 00:05:05,523 And we're working on this concept of abstract Wikipedia. 84 00:05:05,523 --> 00:05:09,269 Denny Vrandečić wrote an article super interesting about it 85 00:05:09,269 --> 00:05:11,500 so I linked here too. 86 00:05:11,500 --> 00:05:14,792 And we also want to now support small language communities 87 00:05:14,792 --> 00:05:17,845 to fill the lack of content there. 88 00:05:18,784 --> 00:05:23,885 This is an example of how we've been using this Mbabel tool for GLAM 89 00:05:23,885 --> 00:05:25,748 and education programs, 90 00:05:25,748 --> 00:05:29,861 and I showed you earlier the bottom form of the Mbabel tool 91 00:05:29,861 --> 00:05:34,264 but also we can make red links that aren't exactly empty. 92 00:05:34,264 --> 00:05:35,931 So you click on this red link 93 00:05:35,931 --> 00:05:38,862 and you automatically have this article draft 94 00:05:38,862 --> 00:05:41,660 on your user page to edit. 95 00:05:42,964 --> 00:05:48,762 And I'm going to briefly talk about it because I only have some minutes more. 96 00:05:50,009 --> 00:05:51,356 On educational projects, 97 00:05:51,356 --> 00:05:56,799 we've been doing this with elections in Brazil for journalism students. 98 00:05:56,799 --> 00:06:01,993 We have the experience with the [inaudible] students 99 00:06:02,087 --> 00:06:05,314 with user Joalpe-- he's not here right now, 100 00:06:05,314 --> 00:06:07,867 but we all know him, I think. 101 00:06:07,867 --> 00:06:11,930 And we realize that we have the data about Brazilian elections 102 00:06:11,930 --> 00:06:14,748 but we don't have media cover on it. 103 00:06:15,049 --> 00:06:18,249 So we were lacking also Wikipedia entries on it. 104 00:06:19,029 --> 00:06:23,000 How do we insert this meaningful information on Wikipedia 105 00:06:23,000 --> 00:06:24,672 that people really access? 106 00:06:24,672 --> 00:06:27,989 Next year we're going to have some election, 107 00:06:27,989 --> 00:06:30,710 people are going to look for this kind of information on Wikipedia 108 00:06:30,710 --> 00:06:32,433 and they simply won't find it. 109 00:06:32,433 --> 00:06:35,726 So this tool looks quite useful for this purpose 110 00:06:35,726 --> 00:06:40,214 and the students were introduced, not only to Wikipedia, 111 00:06:40,214 --> 00:06:42,701 but also to Wikidata. 112 00:06:42,701 --> 00:06:46,575 Actually, they were introduced to Wikipedia with Wikidata, 113 00:06:46,575 --> 00:06:50,675 which is an experience super interesting and we had a lot of fun, 114 00:06:50,675 --> 00:06:52,823 and it was quite challenging to organize all that. 115 00:06:52,823 --> 00:06:54,513 We can talk about it later too. 116 00:06:54,979 --> 00:06:58,582 And they also added the background and the analysis sections 117 00:06:58,582 --> 00:07:01,663 on these elections articles, 118 00:07:01,663 --> 00:07:05,336 because we don't want them to just simply automate the content there. 119 00:07:05,336 --> 00:07:06,660 We can do better. 120 00:07:06,660 --> 00:07:09,247 So this is the example I'm going to show you. 121 00:07:09,247 --> 00:07:13,106 This is from a municipal election in Brazil. 122 00:07:15,603 --> 00:07:17,121 Two minutes... oh my! 123 00:07:18,577 --> 00:07:23,268 This example here was entirely created with the Mbabel tool. 124 00:07:23,268 --> 00:07:29,496 You have here this introduction text. It really sounds natural for the reader. 125 00:07:29,496 --> 00:07:32,165 The Wikidata infobox here-- 126 00:07:32,165 --> 00:07:34,907 it's a masterpiece of Ederporto right there. 127 00:07:34,907 --> 00:07:36,769 (laughter) 128 00:07:37,438 --> 00:07:42,456 And we have here the tables with the election results for each position. 129 00:07:42,456 --> 00:07:46,415 And we also have these results here on the textual form too, 130 00:07:46,415 --> 00:07:51,767 so it really looks like an article that was made, that was handcrafted. 131 00:07:53,893 --> 00:07:57,814 The references here were also made with the Mbabel tool 132 00:07:57,814 --> 00:08:01,393 and we used identifiers to build these references here 133 00:08:01,393 --> 00:08:03,167 and the categories too. 134 00:08:10,726 --> 00:08:14,999 So, to wrap things up here, it is still a work in progress, 135 00:08:14,999 --> 00:08:19,326 and we have some challenges on outreach and technical 136 00:08:19,326 --> 00:08:22,999 to bring Mbabel to other language communities, 137 00:08:22,999 --> 00:08:24,844 especially the smaller ones, 138 00:08:24,844 --> 00:08:27,210 and how do we support those tools 139 00:08:27,210 --> 00:08:29,819 on lower resource language communities too. 140 00:08:29,819 --> 00:08:33,991 And finally, is it possible to create an Mbabel 141 00:08:33,991 --> 00:08:36,261 that overcomes language barriers? 142 00:08:36,261 --> 00:08:39,740 I think that's a question very interesting for the conference 143 00:08:39,740 --> 00:08:43,835 and hopefully we can figure that out together. 144 00:08:44,818 --> 00:08:49,799 So, thank you very much, and look for the Mbabel poster downstairs 145 00:08:49,799 --> 00:08:53,615 if you like to have all this information wrapped up, okay? 146 00:08:53,615 --> 00:08:55,038 Thank you. 147 00:08:55,288 --> 00:08:57,564 (audience clapping) 148 00:09:00,311 --> 00:09:02,778 (moderator) I'm afraid we're a little too short for questions 149 00:09:02,778 --> 00:09:05,783 but yes, Erica, as she said, has a poster and is very friendly. 150 00:09:05,783 --> 00:09:07,518 So I'm sure you can talk to her afterwards, 151 00:09:07,518 --> 00:09:09,389 and if there's time at the end, I'll allow it. 152 00:09:09,389 --> 00:09:12,131 But in the meantime, I'd like to bring up our next speaker... 153 00:09:12,237 --> 00:09:13,611 Thank you. 154 00:09:15,549 --> 00:09:17,140 (audience chattering) 155 00:09:23,058 --> 00:09:27,016 Next we've got Yolanda Gil, talking about Wikidata and Geosciences. 156 00:09:27,908 --> 00:09:29,031 Thank you. 157 00:09:29,031 --> 00:09:31,624 I come from the University of Southern California 158 00:09:31,624 --> 00:09:35,164 and I've been working with Semantic Technologies for a long time. 159 00:09:35,164 --> 00:09:37,894 I want to talk about geosciences in particular, 160 00:09:37,894 --> 00:09:41,225 where this idea of crowd-sourcing from the community is very important. 161 00:09:41,791 --> 00:09:45,033 So I'll give you a sense that individual scientists, 162 00:09:45,033 --> 00:09:47,070 most of them in colleges, 163 00:09:47,070 --> 00:09:50,085 collect their own data for their particular project. 164 00:09:50,085 --> 00:09:51,932 They describe it in their own way. 165 00:09:51,932 --> 00:09:55,352 They use their own properties, their own metadata characteristics. 166 00:09:55,352 --> 00:09:58,560 This is an example of some collaborators of mine 167 00:09:58,560 --> 00:10:00,124 that collect data from a river. 168 00:10:00,124 --> 00:10:02,091 They have their own sensors, their own robots, 169 00:10:02,091 --> 00:10:05,339 and they study the water quality. 170 00:10:05,339 --> 00:10:11,423 I'm going to talk today about an effort that we did to crowdsource metadata 171 00:10:11,423 --> 00:10:14,712 for a community that works in paleoclimate. 172 00:10:14,712 --> 00:10:17,747 The article just came out so it's in the slides if you're curious, 173 00:10:17,747 --> 00:10:20,619 but it's a pretty large community that work together 174 00:10:20,619 --> 00:10:24,042 to integrate data more efficiently through crowdsourcing. 175 00:10:24,042 --> 00:10:28,631 So, if you've heard of the hockey stick graphics for climate, 176 00:10:28,631 --> 00:10:31,680 this is the community that does this. 177 00:10:31,680 --> 00:10:34,520 This is a study for climate in the last 200 years, 178 00:10:34,520 --> 00:10:38,188 and it takes them literally many years to look at data 179 00:10:38,188 --> 00:10:39,618 from different parts of the globe. 180 00:10:39,618 --> 00:10:42,607 Each dataset is collected by a different investigator. 181 00:10:42,699 --> 00:10:44,433 The data is very, very different, 182 00:10:44,433 --> 00:10:47,017 so it takes them a long time to put together 183 00:10:47,017 --> 00:10:49,230 these global studies of climate, 184 00:10:49,230 --> 00:10:51,665 and our goal is to make that more efficient. 185 00:10:51,665 --> 00:10:53,690 So, I've done a lot of work over the years. 186 00:10:53,690 --> 00:10:56,585 Going back to 2005, we used to call it, 187 00:10:56,585 --> 00:10:59,615 "Knowledge Collection from Web Volunteers" 188 00:10:59,615 --> 00:11:02,236 or from netizens at that time. 189 00:11:02,236 --> 00:11:04,267 We had a system called "Learner." 190 00:11:04,267 --> 00:11:07,048 It collected 700,000 common sense, 191 00:11:07,048 --> 00:11:09,368 common knowledge statements about the world. 192 00:11:09,368 --> 00:11:11,367 We did a lot of different techniques. 193 00:11:11,367 --> 00:11:15,333 The forms that we did to extract knowledge from volunteers 194 00:11:15,333 --> 00:11:19,136 really fit the knowledge models, the data models that we used 195 00:11:19,136 --> 00:11:21,381 and the properties that we wanted to use. 196 00:11:21,381 --> 00:11:25,051 I worked with Denny in the system called "Shortipedia" 197 00:11:25,051 --> 00:11:27,259 when he was a Post Doc at ISI, 198 00:11:27,259 --> 00:11:31,946 looking at keeping track of the prominence of the assertions, 199 00:11:31,946 --> 00:11:35,129 and we started to build on Semantic Media Wiki software. 200 00:11:35,129 --> 00:11:37,113 So everything that I'm going to describe today 201 00:11:37,113 --> 00:11:38,936 builds on that software, 202 00:11:38,936 --> 00:11:41,117 but I think that now we have Wikibase, 203 00:11:41,117 --> 00:11:43,676 we'll be starting to work more on Wikibase. 204 00:11:43,676 --> 00:11:48,935 So the LinkedEarth is the project where we work with paleoclimate scientists 205 00:11:48,935 --> 00:11:50,636 to crowdsource the metadata, 206 00:11:50,636 --> 00:11:54,328 and seeing the title that we said, "controlled crowdsourcing." 207 00:11:54,328 --> 00:11:57,101 So we found a nice niche 208 00:11:57,101 --> 00:12:00,538 where we could let them create new properties 209 00:12:00,538 --> 00:12:02,599 but we had an editorial process for it. 210 00:12:02,599 --> 00:12:04,444 So I'll describe to you how it works. 211 00:12:04,444 --> 00:12:10,055 For them, if you're looking at a sample from lake sediments from 200 years ago, 212 00:12:10,055 --> 00:12:12,622 you use different properties to describe it 213 00:12:12,622 --> 00:12:15,692 than if you have coral sediments that you're looking at 214 00:12:15,692 --> 00:12:18,979 or coral samples that you're looking at that you extract from the ocean. 215 00:12:18,979 --> 00:12:23,532 Palmyra is a coral atoll in the Pacific. 216 00:12:23,532 --> 00:12:27,918 So if you have coral, you care about the species and the genus, 217 00:12:27,918 --> 00:12:31,691 but if you're just looking at lake sand, you don't have that. 218 00:12:31,691 --> 00:12:35,313 So each type of sample has very different properties. 219 00:12:35,313 --> 00:12:38,798 In LinkedEarth, they're able to see in a map 220 00:12:38,798 --> 00:12:40,264 where the datasets are. 221 00:12:40,264 --> 00:12:45,500 They actually annotate their own datasets or the datasets of other researchers 222 00:12:45,500 --> 00:12:46,787 when they're using it. 223 00:12:46,787 --> 00:12:50,254 So they have a reason why they want certain properties 224 00:12:50,254 --> 00:12:52,289 to describe those datasets. 225 00:12:52,289 --> 00:12:56,683 Whenever there are disagreements, or whenever there are agreements, 226 00:12:56,683 --> 00:12:58,595 there's community discussions about them 227 00:12:58,595 --> 00:13:02,894 and they're also polls to decide on what properties to settle. 228 00:13:02,894 --> 00:13:05,659 So it's a nice ecosystem. I'll give you examples. 229 00:13:05,659 --> 00:13:11,322 You look at a particular dataset, in this case it's a lake in Africa. 230 00:13:11,322 --> 00:13:14,241 So you have the category of the page; it can be a dataset, 231 00:13:14,241 --> 00:13:15,491 it can be other things. 232 00:13:15,491 --> 00:13:21,181 You can download the dataset itself and you have kind of canonical properties 233 00:13:21,181 --> 00:13:23,737 that they have all agreed to have for datasets, 234 00:13:23,737 --> 00:13:25,992 and then under Extra Information, 235 00:13:25,992 --> 00:13:29,369 those are properties that the person describing this dataset, 236 00:13:29,369 --> 00:13:31,007 added on their own accord. 237 00:13:31,007 --> 00:13:32,628 So these can be new properties. 238 00:13:32,628 --> 00:13:36,730 We call them "crowd properties," rather than "core properties." 239 00:13:37,291 --> 00:13:41,319 And then when you're describing your dataset, 240 00:13:41,319 --> 00:13:43,774 in this case it's an ice core that you got 241 00:13:43,774 --> 00:13:45,716 from a glacier dataset, 242 00:13:45,765 --> 00:13:49,178 and your'e adding a dataset you want to talk about measurements, 243 00:13:49,178 --> 00:13:54,073 you have an offering of all the existing properties 244 00:13:54,073 --> 00:13:55,278 that match what you're saying. 245 00:13:55,278 --> 00:13:58,409 So we do this search completion so that you can adopt that. 246 00:13:58,409 --> 00:14:00,140 That promotes normalization. 247 00:14:00,140 --> 00:14:04,260 The core of the properties has been agreed by the community 248 00:14:04,260 --> 00:14:06,220 so we're really extending that core. 249 00:14:06,220 --> 00:14:08,795 And that core is very important because it gives structure 250 00:14:08,795 --> 00:14:10,735 to all the extensions. 251 00:14:10,735 --> 00:14:14,382 We engage the community through many different ways. 252 00:14:14,382 --> 00:14:17,260 We had one face-to-face meeting at the beginning 253 00:14:17,260 --> 00:14:21,611 and after about a year and a half, we do have a new standard, 254 00:14:21,611 --> 00:14:25,154 and a new way for them to continue to evolve that standard. 255 00:14:25,154 --> 00:14:30,569 They have editors, very much in the Wikipedia style 256 00:14:30,569 --> 00:14:31,582 of editorial boards. 257 00:14:31,582 --> 00:14:34,098 They have working groups for different types of data. 258 00:14:34,098 --> 00:14:36,090 They do polls with the community, 259 00:14:36,090 --> 00:14:40,879 and they have pretty nice engagement of the community at large, 260 00:14:40,879 --> 00:14:43,706 even if they've never visited our Wiki. 261 00:14:43,706 --> 00:14:46,183 The metadata evolves 262 00:14:46,183 --> 00:14:48,775 so what we do is that people annotate their datasets, 263 00:14:48,775 --> 00:14:52,321 then the schema evolves, the properties evolve 264 00:14:52,321 --> 00:14:55,379 and we have an entire infrastructure and mechanisms 265 00:14:55,379 --> 00:15:00,336 to re-annotate the datasets with the new structure of the ontology 266 00:15:00,336 --> 00:15:01,711 and the new properties. 267 00:15:01,711 --> 00:15:05,210 This is described in the paper. I won't go into the details. 268 00:15:05,210 --> 00:15:07,583 But I think that having that kind of capability 269 00:15:07,583 --> 00:15:10,342 in Wikibase would be really interesting. 270 00:15:10,342 --> 00:15:14,041 We basically extended Semantic Media Wiki and Media Wiki 271 00:15:14,041 --> 00:15:15,722 to create our own infrastructure. 272 00:15:15,722 --> 00:15:18,855 I think a lot of this is now something that we find in Wikibase, 273 00:15:18,961 --> 00:15:20,615 but this is older than that. 274 00:15:20,615 --> 00:15:24,999 And in general, we have many projects where we look at crowdsourcing 275 00:15:24,999 --> 00:15:29,885 not just descriptions of datasets but also descriptions of hydrology models, 276 00:15:29,885 --> 00:15:33,563 descriptions of multi-step data analytic workflows 277 00:15:33,563 --> 00:15:36,080 and many other things in the sciences. 278 00:15:36,080 --> 00:15:42,833 So we are also interested in including in Wikidata additional things 279 00:15:42,833 --> 00:15:46,250 that are not just datasets or entities 280 00:15:46,250 --> 00:15:48,512 but also other things that have to do with science. 281 00:15:48,512 --> 00:15:53,770 I think Geosciences are more complex in this sense than Biology, for example. 282 00:15:54,923 --> 00:15:56,233 That's it. 283 00:15:56,513 --> 00:15:57,885 Thank you. (audience clapping) 284 00:16:01,640 --> 00:16:03,772 - Do I have time for questions? - Yes. 285 00:16:03,772 --> 00:16:06,871 (moderator) We have time for just a couple of short questions. 286 00:16:07,751 --> 00:16:11,342 When answering, can go back to the microphone? 287 00:16:12,529 --> 00:16:14,520 - Yes. - Hopefully, yeah. 288 00:16:21,314 --> 00:16:25,002 (audience 1) Does the structure allow tabular datasets to be described 289 00:16:25,002 --> 00:16:26,988 and can you talk a bit about that? 290 00:16:27,225 --> 00:16:32,667 Yes. So the properties of the datasets talk more about who collected them, 291 00:16:32,667 --> 00:16:36,759 what kind of data was collected, what kind of sample it was, 292 00:16:36,759 --> 00:16:39,790 and then there's a separate standard which is called "lipid" 293 00:16:39,790 --> 00:16:43,065 that's complementary and mapped to the properties 294 00:16:43,065 --> 00:16:46,994 that describes the format of the actual files 295 00:16:47,075 --> 00:16:49,343 and the actual structure of the data. 296 00:16:49,343 --> 00:16:53,631 So, you're right that there's both, "how do I find data about x" 297 00:16:53,631 --> 00:16:55,557 but also, "Now, how do I use it? 298 00:16:55,557 --> 00:17:00,211 How do I know where the temperature that I'm looking for 299 00:17:00,211 --> 00:17:03,013 is actually in the file?" 300 00:17:03,656 --> 00:17:05,394 (moderator) This will be the last. 301 00:17:06,887 --> 00:17:09,034 (audience 2) I'll have to make it relevant. 302 00:17:09,504 --> 00:17:15,667 So, you have shown this process of how users can suggest 303 00:17:15,667 --> 00:17:18,985 or like actually already put in properties, 304 00:17:18,985 --> 00:17:22,705 and I didn't fully understand how this thing works, 305 00:17:22,705 --> 00:17:24,027 or what's the process behind it. 306 00:17:24,027 --> 00:17:28,045 Is there some kind of folksonomy approach--obviously-- 307 00:17:28,045 --> 00:17:33,387 but how is it promoted into the core vocabulary 308 00:17:33,387 --> 00:17:36,255 if something is promoted? 309 00:17:36,255 --> 00:17:37,882 Yes, yes. It is. 310 00:17:37,882 --> 00:17:42,202 So what we do is we have a core ontology and the initial one was actually 311 00:17:42,202 --> 00:17:45,618 very thoughtfully put together through a lot of discussion 312 00:17:45,618 --> 00:17:47,964 by very few people. 313 00:17:47,964 --> 00:17:51,052 And then the idea was the whole community can extend that 314 00:17:51,052 --> 00:17:52,971 or propose changes to that. 315 00:17:52,971 --> 00:17:56,919 So, as they are describing datasets, they can add new properties 316 00:17:56,919 --> 00:17:59,526 and those become "crowd properties." 317 00:17:59,526 --> 00:18:02,941 And every now and then, the Editorial Committee 318 00:18:02,941 --> 00:18:04,367 looks at all of those properties, 319 00:18:04,367 --> 00:18:07,795 the working groups look at all of those crowd properties, 320 00:18:07,795 --> 00:18:11,714 and decide whether to incorporate them into the main ontology. 321 00:18:11,714 --> 00:18:15,804 So it could be because they're used for a lot of dataset descriptions. 322 00:18:15,804 --> 00:18:18,920 It could be because they are proposed by somebody 323 00:18:18,920 --> 00:18:23,339 and they're found to be really interesting or key, or uncontroversial. 324 00:18:23,339 --> 00:18:30,267 So there's an entire editorial process to incorporate those new crowd properties 325 00:18:30,267 --> 00:18:32,188 or the folksonomy part of it, 326 00:18:32,188 --> 00:18:36,308 but they are really built around the core of the ontology. 327 00:18:36,404 --> 00:18:40,280 The core ontology then grows with more crowd properties 328 00:18:40,280 --> 00:18:44,311 and then people propose additional crowd properties again. 329 00:18:44,311 --> 00:18:46,979 So we've gone through a couple of these iterations 330 00:18:46,979 --> 00:18:51,386 of rolling out a new core, and then extending it, 331 00:18:51,386 --> 00:18:55,570 and then rolling out a new core and then extending it. 332 00:18:55,570 --> 00:18:57,779 - (audience 2) Great. Thank you. - Thanks. 333 00:18:57,779 --> 00:19:00,437 (moderator) Thank you. (audience applauding) 334 00:19:02,295 --> 00:19:03,777 (moderator) Thank you, Yolanda. 335 00:19:03,777 --> 00:19:07,494 And now we have Adam Shorn with "Something About Wikibase," 336 00:19:07,599 --> 00:19:09,299 according to the title. 337 00:19:09,708 --> 00:19:12,956 Uh... where's the internet? There it is. 338 00:19:13,245 --> 00:19:18,925 So, I'm going to do a live demo, which is probably a bad idea 339 00:19:18,925 --> 00:19:21,362 but I'm going to try and do it as the birthday present later 340 00:19:21,362 --> 00:19:24,268 so I figure I might as well try it here. 341 00:19:24,292 --> 00:19:27,304 And I also have some notes on my phone because I have no slides. 342 00:19:29,349 --> 00:19:32,248 So, two years ago, I made these Wikibase doc images 343 00:19:32,248 --> 00:19:34,052 that quite a few people have tried out, 344 00:19:34,052 --> 00:19:38,087 and even before then, I was working on another project, 345 00:19:38,087 --> 00:19:42,363 which is kind of ready now, and here it is. 346 00:19:43,690 --> 00:19:46,832 It's a website that allows you to instantly create a Wikibase 347 00:19:46,900 --> 00:19:48,930 with a query service and quick statements, 348 00:19:48,930 --> 00:19:51,616 without needing to know about any of the technical details, 349 00:19:51,616 --> 00:19:54,295 without needing to manage any of them either. 350 00:19:54,295 --> 00:19:57,054 There are still lots of features to go and there's still some bugs, 351 00:19:57,054 --> 00:19:59,348 but here goes the demo. 352 00:19:59,348 --> 00:20:02,628 Let me get my emails up ready... because I need them too... 353 00:20:03,315 --> 00:20:06,514 Da da da... Stopwatch. 354 00:20:07,272 --> 00:20:08,488 Okay. 355 00:20:08,829 --> 00:20:14,253 So it's a simple as... at the moment it's locked down behind... 356 00:20:14,337 --> 00:20:16,495 Oh no! German keyboard! 357 00:20:16,495 --> 00:20:18,703 (audience laughing) 358 00:20:22,556 --> 00:20:23,923 Foiled... okay. 359 00:20:24,955 --> 00:20:26,214 Okay. 360 00:20:26,634 --> 00:20:28,417 (audience continues to laugh) 361 00:20:30,434 --> 00:20:31,989 Aha! Okay. 362 00:20:32,950 --> 00:20:35,335 I'll remember that for later. (laughs) 363 00:20:36,911 --> 00:20:38,119 Yes. 364 00:20:39,438 --> 00:20:40,855 ♪ (humming) ♪ 365 00:20:40,961 --> 00:20:44,932 Oh my god... now it's American. 366 00:20:53,871 --> 00:20:56,131 All you have to do is create an account... 367 00:20:58,570 --> 00:21:00,007 da da da... 368 00:21:00,566 --> 00:21:02,432 Click this button up here... 369 00:21:02,478 --> 00:21:05,512 Come up with a name for Wiki-- "Demo1" 370 00:21:05,862 --> 00:21:07,299 "Demo1" 371 00:21:07,568 --> 00:21:09,135 "Demo user" 372 00:21:09,203 --> 00:21:11,864 Agree to the terms which don't really exist yet. 373 00:21:12,298 --> 00:21:14,247 (audience laughing) 374 00:21:15,264 --> 00:21:17,698 Click on this thing which isn't a link. 375 00:21:21,519 --> 00:21:23,886 And then you have your Wikibase. 376 00:21:23,886 --> 00:21:26,602 (audience cheers and claps) 377 00:21:28,554 --> 00:21:30,421 Anmelden in German. 378 00:21:30,421 --> 00:21:35,126 Demo... oh god! I'm learning lots about my demo later. 379 00:21:35,569 --> 00:21:40,069 1-6-1-4-S-G... 380 00:21:40,166 --> 00:21:42,567 - (audience 3) Y... - (Adam) It's random. 381 00:21:43,016 --> 00:21:44,567 (audience laughing) 382 00:21:46,237 --> 00:21:47,958 Oh, come on.... (audience laughing) 383 00:21:48,001 --> 00:21:50,543 Oh no. It's because this is a capital U... 384 00:21:51,333 --> 00:21:53,283 (audience chattering) 385 00:21:54,453 --> 00:21:56,545 6-1-4.... 386 00:21:57,465 --> 00:22:01,248 S-G-ENJ... 387 00:22:01,623 --> 00:22:03,794 Is J... oh no. That's... oh yeah. Okay. 388 00:22:03,843 --> 00:22:06,242 I'm really... I'm gonna have to look at the laptop 389 00:22:06,242 --> 00:22:07,836 that I'm doing this on later. 390 00:22:07,836 --> 00:22:09,129 Cool... 391 00:22:11,046 --> 00:22:13,709 Da da da da da... 392 00:22:14,687 --> 00:22:17,040 Maybe I should have some things in my clipboard ready. 393 00:22:17,539 --> 00:22:19,093 Okay, so now I'm logged in. 394 00:22:22,631 --> 00:22:25,065 Oh... keyboards. 395 00:22:28,083 --> 00:22:30,012 So you can go and create an item... 396 00:22:36,194 --> 00:22:38,508 Yeah, maybe I should make a video. It might be easier. 397 00:22:38,927 --> 00:22:42,207 So, yeah. You can make items, you have quick statements here 398 00:22:42,207 --> 00:22:43,901 that have... oh... it is all in German. 399 00:22:43,901 --> 00:22:45,088 (audience laughing) 400 00:22:45,088 --> 00:22:46,297 (sighs) 401 00:22:46,926 --> 00:22:49,021 Oh, log in? Log in? 402 00:22:50,348 --> 00:22:52,088 It has... Oh, set up ready. 403 00:22:52,088 --> 00:22:53,482 Da da da... 404 00:22:55,965 --> 00:22:57,850 It's as easy as... 405 00:22:58,966 --> 00:23:01,350 I learned how to use Quick Statements yesterday... 406 00:23:01,350 --> 00:23:03,245 that's what I know how to do. 407 00:23:04,657 --> 00:23:07,089 I can then go back to the Wiki... 408 00:23:08,008 --> 00:23:09,804 We can go and see in Recent Changes 409 00:23:09,804 --> 00:23:11,942 that there are now two items, the one that I made 410 00:23:11,942 --> 00:23:13,759 and the one from Quick Statements... 411 00:23:13,759 --> 00:23:14,881 and then you go to Quick... 412 00:23:14,881 --> 00:23:16,511 ♪ (hums a tune) ♪ 413 00:23:17,637 --> 00:23:18,770 Stop...no... 414 00:23:18,927 --> 00:23:20,120 No... 415 00:23:20,454 --> 00:23:22,437 (audience laughing) 416 00:23:28,394 --> 00:23:30,006 Oh god... 417 00:23:30,061 --> 00:23:32,012 I'm glad I tried this out in advance. 418 00:23:33,464 --> 00:23:35,678 There you go. And the query service is updated. 419 00:23:35,830 --> 00:23:37,763 (audience clapping) 420 00:23:42,357 --> 00:23:45,359 And the idea of this is it'll allow people to try out Wikibases. 421 00:23:45,359 --> 00:23:48,493 Hopefully, it'll even be able to allow people to... 422 00:23:49,110 --> 00:23:50,945 have their real Wikibases here. 423 00:23:50,945 --> 00:23:53,783 At the moment you can create as many as you want 424 00:23:53,783 --> 00:23:55,653 and they all just appear in this lovely list. 425 00:23:55,653 --> 00:23:59,182 As I said, there's lots of bugs but it's all super quick. 426 00:23:59,914 --> 00:24:03,392 Exactly how this is going to continue in the future, we don't know yet 427 00:24:03,392 --> 00:24:05,757 because I only finished writing this in the last few days. 428 00:24:05,757 --> 00:24:09,286 It's currently behind an invitation code so that if you want to come try it out, 429 00:24:09,286 --> 00:24:10,888 come and talk to me. 430 00:24:11,645 --> 00:24:15,730 And if you have any other comments or thoughts, let me know. 431 00:24:15,861 --> 00:24:19,711 Oh, three minutes...40. That's... That's not that bad. 432 00:24:19,986 --> 00:24:21,022 Thanks. 433 00:24:21,022 --> 00:24:22,622 (audience clapping) 434 00:24:28,435 --> 00:24:30,006 Any questions? 435 00:24:31,020 --> 00:24:35,553 (audience 5) Does the Quick Statements and the Query Service 436 00:24:35,553 --> 00:24:38,602 are automatically updated? 437 00:24:39,553 --> 00:24:42,345 Yes. So the idea is that there will be somebody, 438 00:24:42,345 --> 00:24:43,500 at the moment, me, 439 00:24:43,500 --> 00:24:45,144 maintaining all of the horrible stuff 440 00:24:45,144 --> 00:24:47,290 that you don't have to behind the scenes. 441 00:24:47,657 --> 00:24:50,157 So kind of think of it like GitHub.com, 442 00:24:50,157 --> 00:24:54,058 but you don't have to know anything about Git to use it. It's just all there. 443 00:24:55,241 --> 00:24:56,886 - [inaudible] - Yeah, we'll get that. 444 00:24:56,886 --> 00:25:00,247 But any of those big hosted solution things. 445 00:25:00,833 --> 00:25:03,263 - (audience 6) A feature request. - Yes. 446 00:25:03,263 --> 00:25:05,479 Is there any-- In Scope 447 00:25:05,479 --> 00:25:09,799 do you have plans on making it so you can easily import existing... 448 00:25:09,799 --> 00:25:12,549 - Wikidata... - I have loads of plans. 449 00:25:12,549 --> 00:25:14,909 Like I want there to be a button where you can just import 450 00:25:14,909 --> 00:25:17,348 another whole Wikibase and all of--yeah. 451 00:25:17,436 --> 00:25:20,723 There will, in the future list that's really long. Yeah. 452 00:25:24,454 --> 00:25:28,406 (audience 7) I understand that it's... you want to make it user-friendly 453 00:25:28,406 --> 00:25:32,242 but if I want to access to the machine itself, can I do that? 454 00:25:32,242 --> 00:25:34,673 Nope. (audience laughing) 455 00:25:37,006 --> 00:25:40,863 So again, like, in the longer term future, there are possib... 456 00:25:40,863 --> 00:25:43,810 Everything's possible, but at the moment, no. 457 00:25:45,156 --> 00:25:49,743 (audience 8) Two questions. Is there a plan to have export tools 458 00:25:49,743 --> 00:25:52,791 so that you can export it to your own Wikibase maybe at some point? 459 00:25:52,791 --> 00:25:53,824 - Yes. - Great. 460 00:25:53,824 --> 00:25:55,565 And is this a business? 461 00:25:56,003 --> 00:25:58,164 I have no idea. (audience laughing) 462 00:26:00,015 --> 00:26:01,545 Not currently. 463 00:26:05,754 --> 00:26:08,451 (audience 9) What if I stop using it tomorrow, 464 00:26:08,451 --> 00:26:11,096 how long will the data be there? 465 00:26:11,181 --> 00:26:14,632 So my plan was at the end of WikidataCon I was going to delete all of the data 466 00:26:14,632 --> 00:26:18,060 and there's a Wikibase Workshop on a Sunday, 467 00:26:18,060 --> 00:26:21,671 and we will maybe be using this for the Wikibase workshop 468 00:26:21,671 --> 00:26:23,801 so that everyone can have their own Wikibase. 469 00:26:23,801 --> 00:26:27,366 And then, from that point, I probably won't be deleting the data 470 00:26:27,366 --> 00:26:29,008 so it will all just stay there. 471 00:26:31,763 --> 00:26:32,923 (moderator) Question. 472 00:26:34,524 --> 00:26:36,114 (audience 10) It's two minutes... 473 00:26:36,175 --> 00:26:39,505 Alright, fine. I'll allow two more questions if you talk quickly. 474 00:26:39,505 --> 00:26:41,550 (audience laughing) 475 00:26:47,370 --> 00:26:49,999 - Alright, good people. - Thank you, Adam. 476 00:26:49,999 --> 00:26:52,418 Thank you for letting me test my demo... I mean... 477 00:26:52,418 --> 00:26:54,640 I'm going to do it different. (audience clapping) 478 00:26:59,512 --> 00:27:00,753 (moderator) Thank you. 479 00:27:00,753 --> 00:27:03,869 Now we have Dennis Diefenbach presenting Q Answer. 480 00:27:04,489 --> 00:27:08,129 Hello, I'm Dennis Diefenbach, I would like to present Q-Answer 481 00:27:08,129 --> 00:27:11,392 which is a question-answering system on top of Wikidata. 482 00:27:11,392 --> 00:27:16,203 So, what we need are some questions and this is the interface of QAnswer. 483 00:27:16,203 --> 00:27:23,460 For example, where is WikidataCon? 484 00:27:23,901 --> 00:27:25,975 Alright, I think it's written like this. 485 00:27:27,432 --> 00:27:32,432 2019... And we get this response which is Berlin. 486 00:27:32,458 --> 00:27:38,425 So, other questions. For example, "When did Wikidata start?" 487 00:27:38,430 --> 00:27:42,383 It started the 30 October 2012 so it's birthday is approaching. 488 00:27:44,079 --> 00:27:48,014 It is 6 years old, so it will be their 7th birthday. 489 00:27:49,133 --> 00:27:51,583 Who is developing Wikidata? 490 00:27:51,583 --> 00:27:54,371 The Wikimedia Foundation and Wikimedia Deutschland, 491 00:27:54,371 --> 00:27:55,988 so thank you very much to them. 492 00:27:57,013 --> 00:28:02,947 Something like museums in Berlin... I don't know why this is not so... 493 00:28:05,494 --> 00:28:07,737 Only one museum... no, yeah, a few more. 494 00:28:09,167 --> 00:28:10,995 So, when you ask something like this, 495 00:28:10,995 --> 00:28:14,178 we allow the user to explore the information 496 00:28:14,178 --> 00:28:16,308 with different aggregations. 497 00:28:16,308 --> 00:28:18,953 For example, if there are many geo coordinates 498 00:28:18,953 --> 00:28:21,476 attached to the entities, we will display a map. 499 00:28:21,476 --> 00:28:26,357 If there are many images attached to them, we will display the images, 500 00:28:26,357 --> 00:28:29,057 and otherwise there is a list where you can explore 501 00:28:29,057 --> 00:28:30,855 the different entities. 502 00:28:33,236 --> 00:28:35,605 You can ask something like "Who is the mayor of Berlin," 503 00:28:36,643 --> 00:28:40,201 "Give me politicians born in Berlin," and things like this. 504 00:28:40,201 --> 00:28:44,428 So you can both ask keyword questions and foreign natural language questions. 505 00:28:45,171 --> 00:28:48,604 The whole data is coming from Wikidata 506 00:28:48,604 --> 00:28:55,346 so all entities which are in Wikidata are queryable by this service. 507 00:28:55,869 --> 00:28:59,244 And the data is really all from Wikidata 508 00:28:59,244 --> 00:29:01,207 in the sense, there are some Wikipedia snippets, 509 00:29:01,207 --> 00:29:04,851 there are images from Wikimedia Commons, 510 00:29:04,851 --> 00:29:07,644 but the rest is all Wikidata data. 511 00:29:08,760 --> 00:29:11,678 We can do this in several languages. This is now in Chinese. 512 00:29:11,678 --> 00:29:15,441 I don't know what is written there so do not ask me. 513 00:29:15,441 --> 00:29:19,893 We are currently supporting this languages with more or less good quality 514 00:29:19,893 --> 00:29:22,094 because... yeah. 515 00:29:23,332 --> 00:29:27,563 So, how can this be useful for the Wikidata community? 516 00:29:27,968 --> 00:29:30,052 I think there are different reasons. 517 00:29:30,052 --> 00:29:33,786 First of all, this thing helps you to generate SPARQL queries 518 00:29:33,786 --> 00:29:37,043 and I know there are even some workshops about how to use SPARQL. 519 00:29:37,043 --> 00:29:39,444 It's not a language that everyone speaks. 520 00:29:39,444 --> 00:29:45,147 So, if you ask something like "a philosopher born before 1908," 521 00:29:45,147 --> 00:29:48,697 to figure out, to construct a SPARQL query like this could be tricky, 522 00:29:50,001 --> 00:29:54,257 In fact when you ask a question, we generate many SPARQL queries 523 00:29:54,301 --> 00:29:57,486 and the first one is always the thing, the SPARQL query where we think 524 00:29:57,486 --> 00:29:59,008 this is the good one. 525 00:29:59,017 --> 00:30:02,651 So, if you ask your question and then you go on SPARQL list, 526 00:30:02,691 --> 00:30:06,468 then there is this button for the Wikidata query service 527 00:30:06,468 --> 00:30:11,811 and you have the SPARQL query right there and you will get the same result 528 00:30:11,811 --> 00:30:15,184 as you would get in the interface. 529 00:30:16,906 --> 00:30:19,289 Another thing where it could be useful for 530 00:30:19,289 --> 00:30:23,468 is for finding missing contextual information. 531 00:30:23,468 --> 00:30:27,057 For example, if you ask for actors in "The Lord of the Rings," 532 00:30:27,057 --> 00:30:30,776 most of these entities will have associated an image 533 00:30:30,776 --> 00:30:32,490 but not all of them. 534 00:30:32,490 --> 00:30:37,861 So here there is some missing metadata that could be added. 535 00:30:37,861 --> 00:30:40,376 You could go to this entity at an image 536 00:30:40,376 --> 00:30:45,462 and then see first that there is an image missing and so on. 537 00:30:46,457 --> 00:30:52,047 Another thing is that you could find schema issues. 538 00:30:52,047 --> 00:30:55,424 For example, if you ask "books by Andrea Camilleri," 539 00:30:55,428 --> 00:30:57,711 which is a famous Italian writer, 540 00:30:57,711 --> 00:30:59,981 you would currently get these three books. 541 00:30:59,981 --> 00:31:02,681 But he wrote many more. He wrote more than 50. 542 00:31:02,681 --> 00:31:05,701 And so the question is, are they not in Wikidata 543 00:31:05,701 --> 00:31:09,704 or is maybe my knowledge not correctly currently like it is. 544 00:31:09,704 --> 00:31:12,804 And in this case, I know there is another book from him, 545 00:31:12,804 --> 00:31:14,737 which is "Un mese con Montalbano." 546 00:31:14,737 --> 00:31:18,207 It has only an Italian label so you can only search it in Italian. 547 00:31:18,207 --> 00:31:22,103 And if you go to this entity, you will say that he has written it. 548 00:31:22,103 --> 00:31:27,504 It's a short story by Andrea Camilleri and it's an instance of literary work, 549 00:31:27,504 --> 00:31:29,220 but it's not instance of book 550 00:31:29,220 --> 00:31:31,338 so that's the reason why it doesn't appear. 551 00:31:31,338 --> 00:31:35,904 This is a way to track where things are missing 552 00:31:35,904 --> 00:31:37,499 in the Wikidata model 553 00:31:37,499 --> 00:31:39,539 not as you would expect. 554 00:31:40,794 --> 00:31:42,968 Another reason is just to have fun. 555 00:31:43,588 --> 00:31:47,546 I imagine that many of you added many Wikidata entities 556 00:31:47,546 --> 00:31:50,776 so just search for the ones that you care most 557 00:31:50,776 --> 00:31:52,529 or you have edited yourself. 558 00:31:52,529 --> 00:31:56,893 So in this case, who developed QAnswer, and that's it. 559 00:31:56,893 --> 00:32:00,226 For any other questions, go to www.QAnswer.eu/qa 560 00:32:00,226 --> 00:32:03,575 and hopefully we'll find an answer for you. 561 00:32:03,782 --> 00:32:05,649 (audience clapping) 562 00:32:13,994 --> 00:32:17,040 - Sorry. - I'm just the dumbest person here. 563 00:32:17,530 --> 00:32:22,722 (audience 11) So I want to know how is this kind of agnostic 564 00:32:22,752 --> 00:32:25,104 to Wikibase instance, 565 00:32:25,104 --> 00:32:29,020 or has it been tied to the exact like property numbers 566 00:32:29,020 --> 00:32:31,054 and things in Wikidata? 567 00:32:31,054 --> 00:32:33,442 Has it learned in some way or how was it set up? 568 00:32:33,442 --> 00:32:36,456 There is training data and we rely on training data 569 00:32:36,456 --> 00:32:40,585 and this is also most of the cases why you will not get good resutls. 570 00:32:40,585 --> 00:32:44,881 But we're training the system by the simple yes and no answer. 571 00:32:44,881 --> 00:32:48,936 When you ask a question, and we ask always for feedback, yes or no, 572 00:32:48,936 --> 00:32:51,899 and this feedback is used by the machine learning algorithm. 573 00:32:51,899 --> 00:32:54,124 This is where machine learning comes into play. 574 00:32:54,124 --> 00:32:58,600 But basically, we put up separate Wikibase instances 575 00:32:58,600 --> 00:33:00,482 and we can plug this in. 576 00:33:00,482 --> 00:33:04,249 In fact, the system is agnostic in the sense that it only wants RDF. 577 00:33:04,249 --> 00:33:06,618 And RDF, you have in each Wikibase, 578 00:33:06,618 --> 00:33:08,059 there are some few configurations 579 00:33:08,059 --> 00:33:10,432 but you can have this on top of any Wikibase. 580 00:33:11,654 --> 00:33:13,039 (audience 11) Awesome. 581 00:33:23,573 --> 00:33:27,004 (audience 12) You mentioned that it's being trained by yes/no answers. 582 00:33:27,073 --> 00:33:32,662 So I guess this is assuming that the Wikidata instance is free of errors 583 00:33:32,722 --> 00:33:34,356 or is it also...? 584 00:33:34,356 --> 00:33:37,140 You assume that the Wikidata instances... 585 00:33:37,140 --> 00:33:40,731 (audience 12) I guess I'm asking, like, are you distinguishing 586 00:33:40,731 --> 00:33:46,289 between source level errors or misunderstanding the question 587 00:33:46,289 --> 00:33:50,856 versus a bad mapping, etc.? 588 00:33:51,706 --> 00:33:55,474 Generally, we assume that the data in Wikidata is true. 589 00:33:55,474 --> 00:33:59,172 So if you click "no" and the data in Wikidata would be false, 590 00:33:59,172 --> 00:34:03,023 then yeah... we would not catch this difference. 591 00:34:03,023 --> 00:34:05,081 But sincerely, Wikidata quality is very good, 592 00:34:05,081 --> 00:34:08,231 so I rarely have had this problem. 593 00:34:16,592 --> 00:34:22,068 (audience 12) Is this data available as a dataset by any chance, sir? 594 00:34:22,209 --> 00:34:27,218 - What is... direct service? - The... dataset of... 595 00:34:27,218 --> 00:34:30,803 "is this answer correct versus the query versus the answer?" 596 00:34:30,872 --> 00:34:33,340 Is that something you're publishing as part of this? 597 00:34:33,340 --> 00:34:38,040 - The training data that you've... - We published the training data. 598 00:34:38,040 --> 00:34:43,423 We published some old training data but no, just a-- 599 00:34:44,573 --> 00:34:47,313 There is a question there. I don't know if we have still time. 600 00:34:51,215 --> 00:34:55,104 (audience 13) Maybe I just missed this but is it running on a live, 601 00:34:55,104 --> 00:34:57,080 like the Live Query Service, 602 00:34:57,080 --> 00:34:59,393 or is it running on some static dump you loaded 603 00:34:59,393 --> 00:35:01,690 or where is the data source for Wikidata? 604 00:35:01,784 --> 00:35:07,014 Yes. The problem is to apply this technology, 605 00:35:07,014 --> 00:35:08,414 you need a local dump. 606 00:35:08,414 --> 00:35:10,673 Because we do not rely only on the SPARQL end point, 607 00:35:10,673 --> 00:35:12,873 we rely on special indexes. 608 00:35:12,873 --> 00:35:16,192 So, we are currently loading the Wikidata dump. 609 00:35:16,192 --> 00:35:18,699 We are updating this every two weeks. 610 00:35:18,699 --> 00:35:20,756 We would like to do it more often, 611 00:35:20,756 --> 00:35:23,823 in fact we would like to get the difs for each day, for example, 612 00:35:23,823 --> 00:35:25,271 to put them in our index. 613 00:35:25,271 --> 00:35:28,719 But unfortunately, right now, the Wikidata dumps are released 614 00:35:28,719 --> 00:35:31,753 only once every week. 615 00:35:31,753 --> 00:35:35,150 So, we cannot be faster than that and we also need some time 616 00:35:35,150 --> 00:35:39,073 to re-index the data, so it takes one or two days. 617 00:35:39,073 --> 00:35:41,833 So we are always behind. Yeah. 618 00:35:48,202 --> 00:35:49,780 (moderator) Any more? 619 00:35:50,430 --> 00:35:53,268 - Okay, thank you very much. - Thank you all very much. 620 00:35:53,547 --> 00:35:54,966 (audience clapping) 621 00:35:57,266 --> 00:36:00,165 (moderator) And now last, we have Eugene Alvin Villar, 622 00:36:00,165 --> 00:36:02,049 talking about Panandâ. 623 00:36:10,630 --> 00:36:12,637 Good afternoon, my name is Eugene Alvin Villar 624 00:36:12,637 --> 00:36:15,297 and I'm from the Philippines, and I'll be talking about Panandâ: 625 00:36:15,297 --> 00:36:18,185 a mobile app powered by Wikidata. 626 00:36:18,862 --> 00:36:21,678 This is a follow-up to my lightning talk that I presented two years ago 627 00:36:21,678 --> 00:36:25,004 at WikidataCon 2017 together with Carlo Moskito. 628 00:36:25,004 --> 00:36:26,557 You can download the slides 629 00:36:26,557 --> 00:36:28,727 and there's a link to that presentation there. 630 00:36:28,727 --> 00:36:30,868 I'll give you a bit of a background. 631 00:36:30,868 --> 00:36:33,471 Wiki Society of the Philippines, formerly, Wikimedia Philippines, 632 00:36:33,471 --> 00:36:37,477 had a series of projects related to Philippine heritage and history. 633 00:36:37,477 --> 00:36:41,705 So we have the usual photo contests, Wikipedia Takes Manila, 634 00:36:41,705 --> 00:36:43,238 Wiki Loves Monuments, 635 00:36:43,238 --> 00:36:46,657 and then our media project was Cultural Heritage Mapping Project 636 00:36:46,657 --> 00:36:49,094 back in 2014-2015. 637 00:36:50,044 --> 00:36:53,039 In that project, we trained volunteers to edit articles 638 00:36:53,039 --> 00:36:54,389 related to cultural heritage. 639 00:36:54,914 --> 00:36:59,032 This is our biggest, and most successful project that we had. 640 00:36:59,032 --> 00:37:03,037 794 articles were created or improved, including 37 "Did You Knows" 641 00:37:03,037 --> 00:37:05,238 and 4 "Good Articles," 642 00:37:05,308 --> 00:37:08,688 and more than 5,000 images were uploaded to Commons. 643 00:37:08,688 --> 00:37:11,039 As a result of that, we then launched 644 00:37:11,039 --> 00:37:13,689 the Encyclopedia of Philippine Heritage program 645 00:37:13,689 --> 00:37:18,444 in order to expand the scope and also include Wikidata in the scope. 646 00:37:18,444 --> 00:37:21,695 Here's the Core Team: myself, Carlo and Roel. 647 00:37:21,695 --> 00:37:26,870 Our first pilot project was to document the country's historical markers 648 00:37:26,870 --> 00:37:29,153 in Wikidata and Commons, 649 00:37:29,153 --> 00:37:34,053 starting with those created by our historical national agency, NHCP. 650 00:37:34,053 --> 00:37:38,904 For example, they installed a marker for our national hero, here in Berlin, 651 00:37:38,904 --> 00:37:41,421 so there's no Wikidata page for that marker 652 00:37:41,421 --> 00:37:45,102 and a collection of photos of that marker in Commons. 653 00:37:46,166 --> 00:37:50,397 Unfortunately, the government agency does not keep a good database 654 00:37:50,397 --> 00:37:53,480 up-to-date or complete of their markers, 655 00:37:53,480 --> 00:37:58,004 so we have to painstakingly input these to Wikidata manually. 656 00:37:58,004 --> 00:38:02,772 After careful research and confirmation, here's a graph of the number of markers 657 00:38:02,772 --> 00:38:07,466 that we've added to Wikidata over time, over the past three years. 658 00:38:07,466 --> 00:38:11,230 And we've developed this Historical Markers Map web app 659 00:38:11,230 --> 00:38:15,289 that lets users view these markers on a map, 660 00:38:15,289 --> 00:38:21,051 so we can browse it as a list, view a good visualization of the markers 661 00:38:21,051 --> 00:38:23,253 with information and inscriptions. 662 00:38:23,253 --> 00:38:28,885 All of this is powered by Live Query from Wikidata Query Service. 663 00:38:29,732 --> 00:38:32,005 There's the link if you want to play around with it. 664 00:38:33,349 --> 00:38:37,428 And so we developed a mobile app for this one. 665 00:38:37,428 --> 00:38:42,117 To better publicize our project, I developed the Panandâ 666 00:38:42,117 --> 00:38:45,434 which is Tagalog for "marker", as an android app, 667 00:38:45,434 --> 00:38:48,393 that was published back in 2018, 668 00:38:48,393 --> 00:38:53,934 and I'll publish the IOS version sometime in the future, hopefully. 669 00:38:54,868 --> 00:38:57,892 I'd like to demo the app but we have no time, 670 00:38:57,892 --> 00:39:00,935 so here are some of the features of the app. 671 00:39:00,935 --> 00:39:04,586 There's a Map and a List view, with text search, 672 00:39:04,586 --> 00:39:07,452 so you can drill down as needed. 673 00:39:07,452 --> 00:39:10,169 You can filter by region or by distance, 674 00:39:10,169 --> 00:39:12,193 and whether you have marked these markers, 675 00:39:12,193 --> 00:39:15,499 as either you have visited them or you'd like to bookmark them 676 00:39:15,499 --> 00:39:16,949 for future visits. 677 00:39:16,949 --> 00:39:19,482 Then you can use your GPS on your mobile phone 678 00:39:19,482 --> 00:39:21,860 to use for distance filtering. 679 00:39:21,860 --> 00:39:26,765 For example, if I want markers that are near me, you can do that. 680 00:39:26,765 --> 00:39:30,918 And when you click on the Details page, you can see the same thing, 681 00:39:30,918 --> 00:39:35,850 photos from Commons, inscription about the marker, 682 00:39:35,850 --> 00:39:40,484 how to find the marker, its location and address, etc. 683 00:39:41,601 --> 00:39:45,993 And one thing that's unique for this app is you can, again, visit 684 00:39:46,011 --> 00:39:50,407 or put a bookmark of these, so on the map or on the list, 685 00:39:50,407 --> 00:39:51,692 or on the Details page, 686 00:39:51,692 --> 00:39:54,891 you can just tap on those buttons and say that you've visited them, 687 00:39:54,891 --> 00:39:58,520 or you'd like to bookmark them for future visits. 688 00:39:58,520 --> 00:40:03,527 And my app has been covered by the press and given recognition, 689 00:40:03,527 --> 00:40:06,743 so plenty of local press articles. 690 00:40:06,743 --> 00:40:11,281 Recently, it was selected as one of the Top 5 finalists 691 00:40:11,281 --> 00:40:15,247 for the Android Masters competition in the App for Social Good category. 692 00:40:15,247 --> 00:40:17,351 The final event will be next month. 693 00:40:17,351 --> 00:40:18,999 Hopefully, we'll win. 694 00:40:20,380 --> 00:40:22,378 Okay, so some behind the scenes. 695 00:40:22,378 --> 00:40:25,477 How did I develop this app? 696 00:40:25,477 --> 00:40:28,578 Panandâ is actually a hybrid app, it's not native. 697 00:40:28,578 --> 00:40:30,745 Basically it's just a web app packaged as a mobile app 698 00:40:30,745 --> 00:40:32,518 using Apache Cordova. 699 00:40:32,518 --> 00:40:34,026 That reduces development time 700 00:40:34,026 --> 00:40:36,181 because I don't have to learn a different language. 701 00:40:36,181 --> 00:40:37,769 I know JavaScript, HTML. 702 00:40:37,879 --> 00:40:42,131 It's cross-platform, allows code reuse from the Historical Markers Map. 703 00:40:42,385 --> 00:40:46,311 And the app is also FIN Open Source. under the MIT license. 704 00:40:46,311 --> 00:40:49,429 So there's the GitHub repository over there. 705 00:40:50,469 --> 00:40:53,624 The challenge is the apps data is not live. 706 00:40:54,750 --> 00:40:56,820 Because if you query the data live, 707 00:40:56,843 --> 00:41:00,638 it means you pulling around half a megabyte of compressed JSON every time 708 00:41:00,638 --> 00:41:03,594 which is not friendly for those on mobile data, 709 00:41:03,594 --> 00:41:06,723 incurs too much delay when starting the app, 710 00:41:06,723 --> 00:41:13,097 and if there are any errors in Wikidata, that may result in poor user experience. 711 00:41:14,253 --> 00:41:18,046 So instead, what I did was the app is updated every few months 712 00:41:18,046 --> 00:41:20,468 with fresh data, compiled using a Perl script 713 00:41:20,468 --> 00:41:23,037 that queries Wikidata Query Service, 714 00:41:23,037 --> 00:41:25,678 and this script also does some data validation 715 00:41:25,678 --> 00:41:30,944 to highlight consistency or schema errors, so that allows fixes before updates 716 00:41:30,944 --> 00:41:34,735 in order to provide a good experience for the mobile user. 717 00:41:35,174 --> 00:41:39,274 And here's the... if you're tech-oriented, here's the more or less, 718 00:41:39,274 --> 00:41:41,644 the technologies that I'm using. 719 00:41:41,644 --> 00:41:43,976 So a bunch of JavaScript libraries. 720 00:41:43,976 --> 00:41:46,287 Here's the first script that queries Wikidata, 721 00:41:46,287 --> 00:41:48,598 some Cordova plug-ins, 722 00:41:48,598 --> 00:41:53,035 and building it using Cordova and then publishing this app. 723 00:41:53,763 --> 00:41:55,586 And that's it. 724 00:41:55,748 --> 00:41:58,164 (audience clapping) 725 00:42:01,800 --> 00:42:04,072 (moderator) I hope you win. Alright, questions. 726 00:42:16,286 --> 00:42:17,990 (audience 14) Sorry if I missed this. 727 00:42:17,990 --> 00:42:21,317 Are you opening your code so the people can adapt your app 728 00:42:21,317 --> 00:42:24,501 and do it for other cities? 729 00:42:24,501 --> 00:42:28,516 Yes, as I've mentioned, the app is free and open source, 730 00:42:28,516 --> 00:42:31,095 - (audience 14) But where is it? - There's the GitHub repository. 731 00:42:31,095 --> 00:42:33,610 You can download the slides, and there's a link 732 00:42:33,610 --> 00:42:36,841 in one of the previous slides to the repository. 733 00:42:36,841 --> 00:42:38,732 (audience 14) Okay. Can you put it? 734 00:42:42,392 --> 00:42:43,747 Yeah, at the bottom. 735 00:42:46,577 --> 00:42:49,222 (audience 15) Hi. Sorry, maybe I also missed this, 736 00:42:49,222 --> 00:42:51,628 but how do you check for a schema errors? 737 00:42:53,055 --> 00:42:56,007 Basically, we have a Wikiproject on Wikidata, 738 00:42:56,106 --> 00:43:02,425 so we try to put the other guidelines on how to model these markers correctly. 739 00:43:02,425 --> 00:43:05,190 Although it's not updated right now. 740 00:43:06,197 --> 00:43:09,023 As far as I know, we're the only country 741 00:43:09,023 --> 00:43:12,874 that's currently modeling these in Wikidata. 742 00:43:13,930 --> 00:43:20,152 There's also an effort to add [inaudible] 743 00:43:20,161 --> 00:43:22,411 in Wikidata, 744 00:43:22,474 --> 00:43:25,705 but I think that's a different thing altogether. 745 00:43:34,056 --> 00:43:35,895 (audience 16) So I guess this may be part 746 00:43:35,895 --> 00:43:37,725 of this Wikiproject you just described, 747 00:43:37,725 --> 00:43:42,800 but for the consistency checks, have you considered moving those 748 00:43:42,800 --> 00:43:46,743 into like complex schema constraints that then can be flagged 749 00:43:46,743 --> 00:43:50,583 on the Wikidata side for what there is to fix on there? 750 00:43:52,930 --> 00:43:55,547 I'm actually interested in seeing if I can do, for example, 751 00:43:55,598 --> 00:44:00,296 shape expressions, so that, yeah, we can do those things. 752 00:44:04,256 --> 00:44:06,776 (moderator) At this point, we have quite a few minutes left. 753 00:44:06,776 --> 00:44:09,026 The speakers did very well, so if Erica is okay with it, 754 00:44:09,026 --> 00:44:11,238 I'm also going to allow some time for questions, 755 00:44:11,238 --> 00:44:13,407 still about this presentation, but also about Mbabel, 756 00:44:13,407 --> 00:44:15,498 if anyone wants to jump in with something there, 757 00:44:15,498 --> 00:44:17,318 either presentation is fair game. 758 00:44:22,790 --> 00:44:25,639 Unless like me, you're all so dazzled that you just want to go to snacks 759 00:44:25,639 --> 00:44:27,955 and think about it. (audience giggles) 760 00:44:29,308 --> 00:44:31,179 - (moderator) You know... - Yeah. 761 00:44:31,953 --> 00:44:34,491 (audience 17) I will always have questions about everything. 762 00:44:34,491 --> 00:44:37,642 So, I came in late for the Mbabel tool. 763 00:44:37,642 --> 00:44:40,350 But I was looking through and I saw there's a number of templates, 764 00:44:40,350 --> 00:44:43,232 and I was wondering if there's a place to contribute 765 00:44:43,232 --> 00:44:45,564 to adding more templates for different types 766 00:44:45,564 --> 00:44:47,620 or different languages and the like? 767 00:44:50,497 --> 00:44:53,683 (Erica) So for now, we're developing those narrative templates 768 00:44:53,683 --> 00:44:55,566 on Portuguese Wikipedia. 769 00:44:55,566 --> 00:44:57,856 I can show you if you like. 770 00:44:57,856 --> 00:45:02,051 We're inserting those templates on English Wikipedia too. 771 00:45:02,051 --> 00:45:07,017 It's not complicated to do but we have to expand for other languages. 772 00:45:07,017 --> 00:45:08,236 - French? - French. 773 00:45:08,236 --> 00:45:10,465 - Yes. - French and German already have. 774 00:45:10,465 --> 00:45:11,465 (laughing) 775 00:45:12,002 --> 00:45:13,018 Yeah. 776 00:45:15,755 --> 00:45:18,287 (inaudible chatter) 777 00:45:21,756 --> 00:45:24,446 (audience 18) I also have a question about Mbabel, 778 00:45:24,446 --> 00:45:27,676 which is, is this really just templates? 779 00:45:27,676 --> 00:45:33,893 Is this based on the LUA scripting? Is that all? Wow. Okay. 780 00:45:33,956 --> 00:45:37,404 Yeah, so it's very deployable. Okay. Cool. 781 00:45:38,102 --> 00:45:40,199 (moderator) Just to catch that for the live stream, 782 00:45:40,199 --> 00:45:42,745 the answer was an emphatic nod of the head, and a yes. 783 00:45:42,915 --> 00:45:44,648 (audience laughing) 784 00:45:44,754 --> 00:45:47,203 - (Erica) Super simple. - (moderator) Super simple. 785 00:45:47,745 --> 00:45:49,819 (audience 19) Yeah. I would also like to ask. 786 00:45:49,819 --> 00:45:53,386 Sorry I haven't delved into Mbabel earlier. 787 00:45:53,386 --> 00:45:57,018 I'm wondering, you're working also with the links, the red links. 788 00:45:57,018 --> 00:46:00,052 Are you adding some code there? 789 00:46:03,987 --> 00:46:07,970 - (Erica) For the lists? - Wherever the link comes from... 790 00:46:07,970 --> 00:46:11,595 (audience 19) The architecture. Maybe I will have to look into it. 791 00:46:11,595 --> 00:46:13,355 (Erica) I'll show you later. 792 00:46:20,506 --> 00:46:23,221 (moderator) Alright. You're all ready for snack break, I can tell. 793 00:46:23,221 --> 00:46:24,456 So let's wrap it up. 794 00:46:24,456 --> 00:46:26,429 But our kind speakers, I'm sure will stick around 795 00:46:26,429 --> 00:46:27,958 if you have questions for them. 796 00:46:27,958 --> 00:46:31,179 Please join me in giving... first of all we didn't give a round of applause yet. 797 00:46:31,179 --> 00:46:33,221 I can tell you're interested in doing so. 798 00:46:33,221 --> 00:46:34,886 (audience clapping)