1 00:00:02,651 --> 00:00:03,900 Asaf Bartov: Testing, testing. 2 00:00:10,036 --> 00:00:12,640 Is this heard in the room? 3 00:00:15,190 --> 00:00:15,690 Testing. 4 00:00:22,620 --> 00:00:24,930 Hello, everyone. 5 00:00:24,930 --> 00:00:29,460 This is a gentle introduction to Wikidata 6 00:00:29,460 --> 00:00:31,922 for absolute beginners. 7 00:00:31,922 --> 00:00:34,130 If you're an absolute beginner, if you've never heard 8 00:00:34,130 --> 00:00:38,210 of Wikidata, or if you've heard of Wikidata but don't quite get 9 00:00:38,210 --> 00:00:41,360 it, don't know what it's good for, have only used it 10 00:00:41,360 --> 00:00:43,880 for inter-wiki links-- 11 00:00:43,880 --> 00:00:46,247 if you're anywhere on this range, 12 00:00:46,247 --> 00:00:47,330 you're in the right place. 13 00:00:50,990 --> 00:00:52,040 My name is Asaf Bartov. 14 00:00:52,040 --> 00:00:54,590 I work for the Wikimedia Foundation, 15 00:00:54,590 --> 00:00:59,790 and I am a Wikidata enthusiast. 16 00:00:59,790 --> 00:01:05,620 So the first thing I want to say is that you are lucky. 17 00:01:05,620 --> 00:01:10,540 You are lucky because Wikidata is already 18 00:01:10,540 --> 00:01:15,415 and is quickly becoming even more of an important research 19 00:01:15,415 --> 00:01:21,730 tool for anyone who's trying to ask questions 20 00:01:21,730 --> 00:01:25,030 about large amounts of information. 21 00:01:25,030 --> 00:01:29,770 It will become more and more used across the humanities, 22 00:01:29,770 --> 00:01:33,460 in particular, because of the things that it's able to do, 23 00:01:33,460 --> 00:01:37,090 some of which we will demonstrate shortly. 24 00:01:37,090 --> 00:01:40,750 And you are lucky because you get to find out about it now 25 00:01:40,750 --> 00:01:43,400 before most of the world. 26 00:01:43,400 --> 00:01:49,120 So by the end of this talk, you will be a Wikidata hipster 27 00:01:49,120 --> 00:01:51,250 because you'll be able to say, oh yeah. 28 00:01:51,250 --> 00:01:53,470 I knew about Wikidata before it was cool. 29 00:01:56,090 --> 00:02:00,370 So before we actually visit Wikidata, 30 00:02:00,370 --> 00:02:08,620 I want to share two key problems that Wikidata seeks to solve 31 00:02:08,620 --> 00:02:12,940 and which would help us understand why it exists. 32 00:02:12,940 --> 00:02:17,640 The first problem is that have of dated data, that 33 00:02:17,640 --> 00:02:20,880 is data that is out of date. 34 00:02:20,880 --> 00:02:23,960 And this is apparent on Wikipedia 35 00:02:23,960 --> 00:02:27,870 across our free knowledge encyclopedias. 36 00:02:27,870 --> 00:02:32,160 Data on Wikipedia is not always up to date. 37 00:02:32,160 --> 00:02:37,470 And the more obscure it is, the more likely 38 00:02:37,470 --> 00:02:40,280 it is not to be up to date. 39 00:02:40,280 --> 00:02:49,360 So the Polish Wikipedia may have an article about a small town 40 00:02:49,360 --> 00:02:55,480 in Argentina, and that article will include information 41 00:02:55,480 --> 00:03:00,910 about that town like population size, name of the mayor. 42 00:03:00,910 --> 00:03:04,580 And that information, ideally, was 43 00:03:04,580 --> 00:03:08,540 correct at the time the article was created on the Polish 44 00:03:08,540 --> 00:03:10,370 Wikipedia-- 45 00:03:10,370 --> 00:03:13,760 maybe translated from another wiki. 46 00:03:13,760 --> 00:03:17,900 But then how likely is it to be kept up to date? 47 00:03:17,900 --> 00:03:20,960 How likely is it that the Polish Wikipedia would give us 48 00:03:20,960 --> 00:03:25,880 the correct and latest numbers or data about the population 49 00:03:25,880 --> 00:03:28,370 size of that town or the mayor, right? 50 00:03:28,370 --> 00:03:31,720 So this is the kind of data that does go out of date, right? 51 00:03:31,720 --> 00:03:34,250 Every few years-- five, 10 years-- 52 00:03:34,250 --> 00:03:37,850 there is a census, and now there are new population figures. 53 00:03:37,850 --> 00:03:42,440 Now the census in Argentina will be made available in Argentina 54 00:03:42,440 --> 00:03:45,500 in Spanish, probably, which brings us 55 00:03:45,500 --> 00:03:48,710 to another component of the problem of dated data, which 56 00:03:48,710 --> 00:03:53,810 is there are no obvious triggers for updating the data. 57 00:03:53,810 --> 00:03:58,520 So the Polish Wikipedian is not sent an email 58 00:03:58,520 --> 00:04:00,680 by the Argentinean government saying, hey, 59 00:04:00,680 --> 00:04:01,820 we have a new census. 60 00:04:01,820 --> 00:04:05,420 There are new population numbers for you to update on Wikipedia. 61 00:04:05,420 --> 00:04:07,550 No such email is sent. 62 00:04:07,550 --> 00:04:10,146 So it's kind of hard to notice when. 63 00:04:10,146 --> 00:04:12,770 And of course, multiply that by all the different jurisdictions 64 00:04:12,770 --> 00:04:14,670 around the world. 65 00:04:14,670 --> 00:04:16,610 There's no easy way and notice when 66 00:04:16,610 --> 00:04:17,790 your data goes out of date. 67 00:04:20,620 --> 00:04:24,070 So that's difficult to keep up to date. 68 00:04:24,070 --> 00:04:27,940 And even if we were to receive some kind of indication-- 69 00:04:27,940 --> 00:04:31,310 oh, there's a new census in Argentina, 70 00:04:31,310 --> 00:04:33,100 so a whole bunch of population figures 71 00:04:33,100 --> 00:04:34,960 have now gone out of date. 72 00:04:34,960 --> 00:04:37,240 Updating it on the Polish Wikipedia 73 00:04:37,240 --> 00:04:40,090 and the French Wikipedia and the Indonesian Wikipedia 74 00:04:40,090 --> 00:04:44,920 and the Arabic Wikipedia is a whole bunch of repetitive work 75 00:04:44,920 --> 00:04:46,540 that a lot of different volunteers 76 00:04:46,540 --> 00:04:49,900 will need to do just for that one updated piece 77 00:04:49,900 --> 00:04:54,810 of information about Argentina. 78 00:04:54,810 --> 00:04:57,720 So I hope this is clear and resonates 79 00:04:57,720 --> 00:05:01,920 with some of your experience editing Wikipedia-- 80 00:05:01,920 --> 00:05:04,170 data that is out of date or that needs 81 00:05:04,170 --> 00:05:08,640 to be updated manually, menially, 82 00:05:08,640 --> 00:05:16,190 on a fairly frequent schedule across the different countries 83 00:05:16,190 --> 00:05:18,410 and data sources. 84 00:05:18,410 --> 00:05:22,340 The other-- and I think maybe more interesting-- 85 00:05:22,340 --> 00:05:26,210 shortcoming or problem that I want to discuss 86 00:05:26,210 --> 00:05:30,260 is what I call the inflexible ways 87 00:05:30,260 --> 00:05:36,020 of lateral queries, crosscutting queries of knowledge. 88 00:05:36,020 --> 00:05:43,980 So if I want an answer to the question, what countries 89 00:05:43,980 --> 00:05:48,740 in the world export rubber-- 90 00:05:52,300 --> 00:05:54,790 that's a reasonable question, right? 91 00:05:54,790 --> 00:05:57,460 That information is on Wikipedia. 92 00:05:57,460 --> 00:05:58,630 Do you agree? 93 00:05:58,630 --> 00:06:00,640 If you go to Wikipedia and read up 94 00:06:00,640 --> 00:06:05,560 about Brazil, about Peru, about Germany, somewhere in there-- 95 00:06:05,560 --> 00:06:09,010 maybe a sub-article called Economics of Brazil-- 96 00:06:09,010 --> 00:06:13,600 you will find the main exports of that country. 97 00:06:13,600 --> 00:06:15,400 And you can find out whether or not 98 00:06:15,400 --> 00:06:16,930 that country exports rubber. 99 00:06:16,930 --> 00:06:19,994 But what if I don't want to go country by country 100 00:06:19,994 --> 00:06:21,160 looking for the word rubber? 101 00:06:21,160 --> 00:06:22,090 I just want an answer. 102 00:06:22,090 --> 00:06:25,540 What are the countries that export rubber? 103 00:06:25,540 --> 00:06:28,360 Even though that information is in Wikipedia, 104 00:06:28,360 --> 00:06:29,680 it's hard to get at. 105 00:06:29,680 --> 00:06:31,680 It's hard to query. 106 00:06:31,680 --> 00:06:35,770 Now, you may say, well, that's what we have categories for, 107 00:06:35,770 --> 00:06:36,270 right? 108 00:06:36,270 --> 00:06:39,820 Categories are a way to cut across Wikipedia. 109 00:06:39,820 --> 00:06:45,110 So if someone made a category called rubber 110 00:06:45,110 --> 00:06:48,380 exporting countries, then you can go to that category 111 00:06:48,380 --> 00:06:51,560 and see a list of countries that export rubber. 112 00:06:51,560 --> 00:06:53,390 And if nobody has made it yet, well, you 113 00:06:53,390 --> 00:06:56,990 can create that category and, with a kind of one-time effort, 114 00:06:56,990 --> 00:06:59,730 populate that category, and you're done. 115 00:06:59,730 --> 00:07:01,970 Well, yes. 116 00:07:01,970 --> 00:07:04,250 That's still not very convenient. 117 00:07:04,250 --> 00:07:06,980 But also, it's still very, very limited, 118 00:07:06,980 --> 00:07:12,380 because what if I only want countries that export rubber 119 00:07:12,380 --> 00:07:15,950 and have a democratic system of government, 120 00:07:15,950 --> 00:07:18,770 or any other kind of additional condition 121 00:07:18,770 --> 00:07:20,510 that I would like to add to this? 122 00:07:20,510 --> 00:07:22,230 Or take a completely different example. 123 00:07:22,230 --> 00:07:26,750 What if I want to know which Flemish town had 124 00:07:26,750 --> 00:07:31,510 the most painters born in it? 125 00:07:31,510 --> 00:07:34,480 There's a ton of Flemish painters. 126 00:07:34,480 --> 00:07:37,870 Most of them were born somewhere. 127 00:07:37,870 --> 00:07:39,685 We could theoretically, just you know, 128 00:07:39,685 --> 00:07:43,900 look up all the birthplaces of all the Flemish painters 129 00:07:43,900 --> 00:07:46,900 and tally up the numbers and figure out 130 00:07:46,900 --> 00:07:51,610 what is the place where the most Flemish painters come from? 131 00:07:51,610 --> 00:07:53,050 I don't know the answer to that. 132 00:07:53,050 --> 00:07:55,420 It would be nice to be able to get that answer. 133 00:07:55,420 --> 00:07:57,610 Again, the data is in Wikipedia. 134 00:07:57,610 --> 00:08:00,400 Those birthplaces are listed in the articles 135 00:08:00,400 --> 00:08:01,636 about those painters. 136 00:08:01,636 --> 00:08:05,710 But there's no easy way to get that information. 137 00:08:05,710 --> 00:08:13,420 What if I want to ask, who are some painters whose father was 138 00:08:13,420 --> 00:08:14,245 also a painter? 139 00:08:16,840 --> 00:08:18,500 That's a thing that exists, right? 140 00:08:18,500 --> 00:08:22,630 Some painters are sons of painters. 141 00:08:22,630 --> 00:08:26,560 You know, Bruegel comes to mind as an obvious example. 142 00:08:26,560 --> 00:08:28,240 But there's a bunch of others, right? 143 00:08:28,240 --> 00:08:29,380 So who are those people? 144 00:08:29,380 --> 00:08:30,930 What if I want to ask that question? 145 00:08:30,930 --> 00:08:33,400 That's the kind of question that not only Wikipedia 146 00:08:33,400 --> 00:08:34,600 doesn't answer today. 147 00:08:34,600 --> 00:08:41,500 If you walk to your friendly university library reference 148 00:08:41,500 --> 00:08:45,010 desk and say, hello, I would like 149 00:08:45,010 --> 00:08:49,290 a list of painters whose father was also a painter, 150 00:08:49,290 --> 00:08:52,820 how would that librarian help you? 151 00:08:52,820 --> 00:08:57,960 There's no easy way to get an answer to a question like that. 152 00:08:57,960 --> 00:09:01,100 What if you only want a list of painters 153 00:09:01,100 --> 00:09:05,870 who were immigrants, painters who lived somewhere else 154 00:09:05,870 --> 00:09:08,240 than where they were born? 155 00:09:08,240 --> 00:09:09,770 There's no book. 156 00:09:09,770 --> 00:09:11,720 I guess maybe there is, but you know, 157 00:09:11,720 --> 00:09:15,590 it's not obvious that there's a ready resource that says, list 158 00:09:15,590 --> 00:09:17,840 of painters who are immigrants. 159 00:09:17,840 --> 00:09:19,910 And the librarian would probably refer you 160 00:09:19,910 --> 00:09:22,760 to a book on the shelf called, I don't know, 161 00:09:22,760 --> 00:09:24,200 The Complete Dictionary of Flemish 162 00:09:24,200 --> 00:09:26,300 Painters and go, look up the index, 163 00:09:26,300 --> 00:09:28,520 you know, and if you see a similar surname, 164 00:09:28,520 --> 00:09:29,910 maybe they're father and son. 165 00:09:29,910 --> 00:09:35,000 And kind of cobble together the answer on your own. 166 00:09:35,000 --> 00:09:37,100 The reason I'm comparing this to a library 167 00:09:37,100 --> 00:09:42,170 is to show you that this is a kind of question that is not 168 00:09:42,170 --> 00:09:46,760 readily satisfiable today. 169 00:09:46,760 --> 00:09:50,240 Now, these questions may sound contrived to you. 170 00:09:50,240 --> 00:09:52,460 You may say to yourself, well, you 171 00:09:52,460 --> 00:09:54,860 know, painters who are also sons of painters, yeah. 172 00:09:54,860 --> 00:09:57,680 You know, that never occurred to me 173 00:09:57,680 --> 00:09:59,610 as a question I might care about. 174 00:09:59,610 --> 00:10:01,850 But I want to invite you to consider 175 00:10:01,850 --> 00:10:06,380 that this kind of question, questions like that question, 176 00:10:06,380 --> 00:10:09,260 may well be questions you do care about. 177 00:10:09,260 --> 00:10:12,740 And I also want to suggest that the fact it is so nearly 178 00:10:12,740 --> 00:10:16,250 impossible, the fact that there's no obvious way 179 00:10:16,250 --> 00:10:19,250 to ask that kind of question today, 180 00:10:19,250 --> 00:10:21,200 is partly responsible to your not 181 00:10:21,200 --> 00:10:22,970 coming up with those questions, right? 182 00:10:22,970 --> 00:10:25,850 We tend to be limited by the possible. 183 00:10:25,850 --> 00:10:30,080 You know, until human flight was made possible, 184 00:10:30,080 --> 00:10:32,840 it did not occur to anyone to say, oh yeah, by this time 185 00:10:32,840 --> 00:10:34,430 next week I will be in Australia, 186 00:10:34,430 --> 00:10:36,630 because that was just impossible. 187 00:10:36,630 --> 00:10:38,587 But when flight is possible, there's 188 00:10:38,587 --> 00:10:40,670 all kinds of things that suddenly become possible, 189 00:10:40,670 --> 00:10:42,740 and there's all kinds of needs that 190 00:10:42,740 --> 00:10:46,430 arise based on the availability of resources 191 00:10:46,430 --> 00:10:48,600 to fulfill those needs. 192 00:10:48,600 --> 00:10:54,120 So many of these research questions, compound lateral 193 00:10:54,120 --> 00:10:58,520 cross-cutting queries, are not being asked because people have 194 00:10:58,520 --> 00:11:00,410 internalized the fact that there is no way 195 00:11:00,410 --> 00:11:05,750 to get an answer to questions like, 196 00:11:05,750 --> 00:11:13,270 what is the most popular first name among British politicians? 197 00:11:13,270 --> 00:11:14,520 I just made that up, you know? 198 00:11:14,520 --> 00:11:15,340 Is it John? 199 00:11:15,340 --> 00:11:16,510 Maybe. 200 00:11:16,510 --> 00:11:19,030 Maybe it's William, for whatever reason. 201 00:11:19,030 --> 00:11:22,030 You know, these are the kinds of questions we don't routinely 202 00:11:22,030 --> 00:11:25,855 ask because we know that it's like, who are you going to ask? 203 00:11:25,855 --> 00:11:28,330 How are you going to get an answer to that? 204 00:11:28,330 --> 00:11:36,040 So this problem of not having very flexible ways of querying 205 00:11:36,040 --> 00:11:38,220 the data that we already have-- 206 00:11:38,220 --> 00:11:41,230 in Wikipedia, in Wikisource, elsewhere-- 207 00:11:41,230 --> 00:11:45,060 is a significant limitation. 208 00:11:45,060 --> 00:11:50,880 So these two key problems have one solution. 209 00:11:50,880 --> 00:11:55,500 And that is an editable, central storage 210 00:11:55,500 --> 00:12:00,510 for structured and linked data on a wiki, 211 00:12:00,510 --> 00:12:05,160 under a free license, which is a very long way of saying 212 00:12:05,160 --> 00:12:07,290 Wikidata. 213 00:12:07,290 --> 00:12:08,470 That is Wikidata. 214 00:12:08,470 --> 00:12:11,190 Wikidata is an editable, central storage 215 00:12:11,190 --> 00:12:15,840 for structured and linked data on a wiki, 216 00:12:15,840 --> 00:12:17,700 under a free license. 217 00:12:17,700 --> 00:12:22,590 So let's take this apart and unpack it. 218 00:12:22,590 --> 00:12:24,820 First of all, it's a central storage. 219 00:12:24,820 --> 00:12:27,660 This relates to the first problem, right? 220 00:12:27,660 --> 00:12:34,370 If we had one place containing data like population size, 221 00:12:34,370 --> 00:12:38,270 we would be able to update that one place and then have 222 00:12:38,270 --> 00:12:42,260 all of the different Wikipedias draw the data from that one 223 00:12:42,260 --> 00:12:45,320 place so that we wouldn't have to manually, 224 00:12:45,320 --> 00:12:49,980 repetitively update it across our hundreds of projects. 225 00:12:49,980 --> 00:12:53,690 So having central storage makes, I hope, kind 226 00:12:53,690 --> 00:12:57,230 of immediate, intuitive sense. 227 00:12:57,230 --> 00:13:02,840 But what do I mean by structured and linked data? 228 00:13:02,840 --> 00:13:10,120 So structured data means that each datum, each piece-- 229 00:13:10,120 --> 00:13:15,880 individual piece-- of data is managed on its own, 230 00:13:15,880 --> 00:13:19,660 is identified and defined on its own, 231 00:13:19,660 --> 00:13:21,040 as distinct from Wikipedia. 232 00:13:21,040 --> 00:13:22,990 Wikipedia has articles. 233 00:13:22,990 --> 00:13:27,190 The article about Brazil includes a ton of data, 234 00:13:27,190 --> 00:13:31,570 all kinds of information, and it's presented as text, 235 00:13:31,570 --> 00:13:34,270 as several paragraphs-- several pages-- 236 00:13:34,270 --> 00:13:36,540 of text, right? 237 00:13:36,540 --> 00:13:41,460 Now, we do have an approximation of structured data 238 00:13:41,460 --> 00:13:43,580 on Wikipedia. 239 00:13:43,580 --> 00:13:45,300 If you've browsed Wikipedia a little, 240 00:13:45,300 --> 00:13:49,100 you've noticed that we often have an info box, what we 241 00:13:49,100 --> 00:13:50,750 call an info box on Wikipedia. 242 00:13:50,750 --> 00:13:55,220 That's the table on the right side if it's a left to right 243 00:13:55,220 --> 00:13:57,200 language, the table on the right side 244 00:13:57,200 --> 00:14:02,270 that has information that is easy to tabulate, right? 245 00:14:02,270 --> 00:14:08,210 So you know, birth date, birth place, death date, death place, 246 00:14:08,210 --> 00:14:09,710 nationality-- 247 00:14:09,710 --> 00:14:16,670 or if it's about a country, area, population, anthem, 248 00:14:16,670 --> 00:14:20,090 type of government, whatever you are likely to find. 249 00:14:20,090 --> 00:14:23,150 If it's a movie, then you know, starring, 250 00:14:23,150 --> 00:14:27,350 genre, box office receipts, whatever pieces of data 251 00:14:27,350 --> 00:14:29,900 are relevant to an article about a movie. 252 00:14:29,900 --> 00:14:34,940 So we do already kind of group pieces of information 253 00:14:34,940 --> 00:14:40,160 on Wikipedia into this kind of structured format. 254 00:14:40,160 --> 00:14:43,630 Those of you who have ever looked at the source, 255 00:14:43,630 --> 00:14:45,970 at what the wiki code under that looks like, 256 00:14:45,970 --> 00:14:49,640 know that it's only semi-structured. 257 00:14:49,640 --> 00:14:52,370 It looks neat and organized in a table, 258 00:14:52,370 --> 00:14:55,660 but really, it's just a bunch of text that is put there. 259 00:14:55,660 --> 00:14:57,140 It is not centralized. 260 00:14:57,140 --> 00:15:00,100 Every Wikipedia has its own copy of that data. 261 00:15:00,100 --> 00:15:02,930 And if I go and update the population size 262 00:15:02,930 --> 00:15:07,070 on Spanish Wikipedia of that Argentinean town, 263 00:15:07,070 --> 00:15:10,190 it does not get updated automagically 264 00:15:10,190 --> 00:15:13,520 on the English Wikipedia or the Arabic Wikipedia, right? 265 00:15:13,520 --> 00:15:17,150 So the structured data that we already have on Wikipedia 266 00:15:17,150 --> 00:15:20,939 is not managed centrally. 267 00:15:20,939 --> 00:15:22,480 The other thing about structured data 268 00:15:22,480 --> 00:15:29,250 is, when you have a notion of an individual piece of data, that 269 00:15:29,250 --> 00:15:33,390 is the cornerstone of allowing the kinds of queries 270 00:15:33,390 --> 00:15:34,770 that I was talking about. 271 00:15:34,770 --> 00:15:40,440 That is what will allow me to ask questions like, 272 00:15:40,440 --> 00:15:43,470 what is the Flemish town where the most painters were born, 273 00:15:43,470 --> 00:15:46,650 or what are the world's largest cities that 274 00:15:46,650 --> 00:15:49,730 have a female mayor? 275 00:15:49,730 --> 00:15:52,430 I could come up with other examples all day long, right? 276 00:15:52,430 --> 00:15:55,280 These are all questions that you can ask, 277 00:15:55,280 --> 00:15:59,390 once you break down your data into individual pieces, each 278 00:15:59,390 --> 00:16:02,300 of which is-- 279 00:16:02,300 --> 00:16:06,950 you're able to refer to each of those programmatically. 280 00:16:06,950 --> 00:16:10,430 The computer can identify, isolate, 281 00:16:10,430 --> 00:16:14,700 and calculate based on each of those pieces of data. 282 00:16:14,700 --> 00:16:17,060 So that's why the structure is important. 283 00:16:17,060 --> 00:16:22,520 Now, Wikidata is also a linked data repository. 284 00:16:22,520 --> 00:16:24,890 What does it mean that the data is linked? 285 00:16:24,890 --> 00:16:29,700 Well, it means that a single piece of data can point at, 286 00:16:29,700 --> 00:16:34,770 can link to another whole bag of data. 287 00:16:34,770 --> 00:16:43,360 So if we are describing, for example, a person, 288 00:16:43,360 --> 00:16:46,960 and we record the single piece of data 289 00:16:46,960 --> 00:16:54,820 that this person was born in Salem, Massachusetts, 290 00:16:54,820 --> 00:17:02,300 that single piece of data links to the item about Salem, 291 00:17:02,300 --> 00:17:04,060 Massachusetts because, of course, 292 00:17:04,060 --> 00:17:07,010 we know a lot of things about that place, Salem, 293 00:17:07,010 --> 00:17:07,869 Massachusetts. 294 00:17:07,869 --> 00:17:09,245 So it's not just the text-- 295 00:17:09,245 --> 00:17:13,450 S-A-L-E-M. It's not just, that's where they were born. 296 00:17:13,450 --> 00:17:17,170 But it's a link to all the data that we have 297 00:17:17,170 --> 00:17:19,270 about Salem, Massachusetts. 298 00:17:19,270 --> 00:17:24,940 If we say someone's nationality is French, 299 00:17:24,940 --> 00:17:26,589 that is a link to France. 300 00:17:26,589 --> 00:17:30,700 That is a link to everything we know about the country France. 301 00:17:30,700 --> 00:17:34,150 The fact that the data is linked and structured 302 00:17:34,150 --> 00:17:37,630 allows not only humans, but also computers 303 00:17:37,630 --> 00:17:41,620 to traverse information and to bring 304 00:17:41,620 --> 00:17:44,950 us different pieces of relevant information 305 00:17:44,950 --> 00:17:49,000 programmatically, automatically, based on those links. 306 00:17:49,000 --> 00:17:52,000 Because it's not just text, it's an actual link 307 00:17:52,000 --> 00:17:56,700 to another chunk of data. 308 00:17:56,700 --> 00:17:58,880 If this sounds a little abstract, 309 00:17:58,880 --> 00:18:01,190 it will become much clearer in just a second 310 00:18:01,190 --> 00:18:03,230 when we see it in action. 311 00:18:03,230 --> 00:18:06,200 But the other components of this little definition are, 312 00:18:06,200 --> 00:18:09,650 of course, this central storage of structured and linked data 313 00:18:09,650 --> 00:18:12,620 needs to be editable, of course, because we 314 00:18:12,620 --> 00:18:14,370 need to keep it up to date. 315 00:18:14,370 --> 00:18:16,460 We need to correct mistakes. 316 00:18:16,460 --> 00:18:21,300 And we want it on a wiki under a free license. 317 00:18:21,300 --> 00:18:23,940 The free license is, of course, essential to enable 318 00:18:23,940 --> 00:18:30,910 reuse of that data, to enable all kinds of reuse of the data. 319 00:18:30,910 --> 00:18:34,060 And Wikidata, unlike Wikipedia, is released 320 00:18:34,060 --> 00:18:36,160 under a different free license. 321 00:18:36,160 --> 00:18:41,590 Wikidata is released under CC0 waiver. 322 00:18:41,590 --> 00:18:44,920 That means unlike Wikipedia, where 323 00:18:44,920 --> 00:18:51,160 you have to attribute Wikipedia when you reuse information 324 00:18:51,160 --> 00:18:55,150 from Wikipedia, you do not need to attribute Wikidata, 325 00:18:55,150 --> 00:18:57,040 and you do not need to share alike your work. 326 00:18:57,040 --> 00:19:02,020 It's an unencumbered license to reuse the data in any way you 327 00:19:02,020 --> 00:19:03,267 want, including commercially. 328 00:19:03,267 --> 00:19:05,350 You don't have to say that it comes from Wikidata. 329 00:19:05,350 --> 00:19:07,390 I mean, it could be nice, but you don't have to. 330 00:19:07,390 --> 00:19:09,280 You're under no obligation to do it. 331 00:19:09,280 --> 00:19:14,080 And that is important to allow certain kinds of reuse 332 00:19:14,080 --> 00:19:17,140 where, for example, if you're building some kind of device, 333 00:19:17,140 --> 00:19:20,680 you may not have a practical way to give attribution. 334 00:19:20,680 --> 00:19:23,920 And had we required that to use Wikidata, 335 00:19:23,920 --> 00:19:27,250 we would have made Wikidata less reusable. 336 00:19:27,250 --> 00:19:32,940 So Wikidata is unencumbered by the requirement of attribution. 337 00:19:32,940 --> 00:19:35,730 And of course, because it's on a wiki, 338 00:19:35,730 --> 00:19:40,421 we get all the benefits that we are used to expect from a wiki, 339 00:19:40,421 --> 00:19:40,920 right? 340 00:19:40,920 --> 00:19:42,810 So it's a wiki, which means, yes. 341 00:19:42,810 --> 00:19:44,910 It has discussion pages. 342 00:19:44,910 --> 00:19:46,500 It has revision histories. 343 00:19:46,500 --> 00:19:47,620 It remembers everything. 344 00:19:47,620 --> 00:19:50,610 So if you screw it up, you can always go a version back. 345 00:19:50,610 --> 00:19:52,380 Or if someone else vandalized the content, 346 00:19:52,380 --> 00:19:54,610 we can always go back, just like Wikipedia. 347 00:19:54,610 --> 00:19:56,880 So we get all the benefits we're used to-- 348 00:19:56,880 --> 00:20:01,260 user talk pages, group discussion pages, watch lists, 349 00:20:01,260 --> 00:20:03,755 all the features that we expect in a wiki. 350 00:20:06,740 --> 00:20:11,170 In short, Wikidata is love. 351 00:20:11,170 --> 00:20:14,100 I hope you agree with me by the end of this talk. 352 00:20:14,100 --> 00:20:18,580 So let's zoom in and see what this structured data 353 00:20:18,580 --> 00:20:21,420 looks like. 354 00:20:21,420 --> 00:20:29,460 So structured data on Wikidata is collected in statements. 355 00:20:29,460 --> 00:20:31,930 And statements have the general form 356 00:20:31,930 --> 00:20:39,490 of this triple, this tripartite ascription-- 357 00:20:39,490 --> 00:20:43,550 items, properties, and values. 358 00:20:43,550 --> 00:20:46,930 Now an item is the subject, is the topic 359 00:20:46,930 --> 00:20:48,820 that we are trying to describe. 360 00:20:48,820 --> 00:20:52,164 It can be any topic that Wikipedia can cover, 361 00:20:52,164 --> 00:20:53,830 and many others that Wikipedia wouldn't. 362 00:20:53,830 --> 00:20:57,490 So the topic, the item can be Germany, 363 00:20:57,490 --> 00:21:00,520 or it can be Salem, Massachusetts, 364 00:21:00,520 --> 00:21:03,340 or it can be the concept of redemption. 365 00:21:03,340 --> 00:21:04,610 It can be anything at all. 366 00:21:04,610 --> 00:21:10,000 Anything you can imagine describing in any way with data 367 00:21:10,000 --> 00:21:11,990 can be the item. 368 00:21:11,990 --> 00:21:15,430 So the item, consider it like the title 369 00:21:15,430 --> 00:21:17,480 of the rest of the data. 370 00:21:17,480 --> 00:21:20,860 And then what do we say about Salem, Massachusetts 371 00:21:20,860 --> 00:21:22,330 or about Germany? 372 00:21:22,330 --> 00:21:26,770 Well, that's a series of properties and values, 373 00:21:26,770 --> 00:21:28,450 properties and values. 374 00:21:28,450 --> 00:21:32,680 The property is the kind of datum, 375 00:21:32,680 --> 00:21:39,770 like birth date or language spoken or manner of death. 376 00:21:39,770 --> 00:21:42,640 These are all real properties. 377 00:21:42,640 --> 00:21:46,030 Or national anthem, if I'm trying to describe a country-- 378 00:21:46,030 --> 00:21:47,830 these are properties. 379 00:21:47,830 --> 00:21:49,880 And then they have values, right? 380 00:21:49,880 --> 00:21:55,740 So this person, this imaginary person's place 381 00:21:55,740 --> 00:21:59,640 of birth, the value of the property place of birth 382 00:21:59,640 --> 00:22:02,430 is Salem, Massachusetts. 383 00:22:02,430 --> 00:22:06,690 So you can think about it as like a government form-- 384 00:22:06,690 --> 00:22:09,540 or not government, just any form that you're filling out-- 385 00:22:09,540 --> 00:22:12,420 where there are field names, and then empty spaces for you 386 00:22:12,420 --> 00:22:13,110 to fill out. 387 00:22:13,110 --> 00:22:14,460 That's the value, OK? 388 00:22:14,460 --> 00:22:18,150 So the field names or the categories 389 00:22:18,150 --> 00:22:19,350 are the properties, right? 390 00:22:19,350 --> 00:22:22,960 So name, language, occupation, date of birth-- 391 00:22:22,960 --> 00:22:24,420 these are all properties. 392 00:22:24,420 --> 00:22:26,640 And the values are the actual piece 393 00:22:26,640 --> 00:22:31,391 of data, the actual information that we have. 394 00:22:31,391 --> 00:22:33,870 And of course, different kinds of data 395 00:22:33,870 --> 00:22:40,170 are relevant for describing different kinds of items. 396 00:22:40,170 --> 00:22:45,030 And the key in the value is it can be either a literal value-- 397 00:22:45,030 --> 00:22:50,370 like if we're describing the height of a mountain, 398 00:22:50,370 --> 00:22:55,826 we might say just the number 8,848. 399 00:22:55,826 --> 00:22:57,325 That's the height of which mountain? 400 00:23:01,990 --> 00:23:04,070 Not everyone at once. 401 00:23:04,070 --> 00:23:07,430 Oh, because it's meters, the metric system. 402 00:23:07,430 --> 00:23:08,270 Yeah, Mt. 403 00:23:08,270 --> 00:23:12,390 Everest is 8,848 meters. 404 00:23:12,390 --> 00:23:14,160 Yes. 405 00:23:14,160 --> 00:23:15,780 Get with it, America. 406 00:23:15,780 --> 00:23:17,630 The metric system. 407 00:23:17,630 --> 00:23:20,930 All right, so that can be a literal value 408 00:23:20,930 --> 00:23:22,580 like an actual number. 409 00:23:22,580 --> 00:23:28,280 Or it can be a link to an item, pointing at another item. 410 00:23:28,280 --> 00:23:30,890 But in this statement, it is the value. 411 00:23:30,890 --> 00:23:35,150 So if I'm talking about Germany, the item is Germany. 412 00:23:35,150 --> 00:23:39,680 And the property capital city has the value Berlin. 413 00:23:39,680 --> 00:23:43,130 But the value is not B-E-R-L-I-N. 414 00:23:43,130 --> 00:23:48,740 The value is a pointer to the item Berlin, right? 415 00:23:48,740 --> 00:23:51,410 That's the link. 416 00:23:51,410 --> 00:23:56,671 So a single item is described by a series of such statements, 417 00:23:56,671 --> 00:23:57,170 right? 418 00:23:57,170 --> 00:24:01,400 There's hundreds and hundreds of things I can say about Germany. 419 00:24:01,400 --> 00:24:04,280 There's hundreds of things I can say about a person. 420 00:24:04,280 --> 00:24:06,350 And these will generally take the form 421 00:24:06,350 --> 00:24:08,330 of a property and a value. 422 00:24:08,330 --> 00:24:11,720 By the way, some properties may have more than one value. 423 00:24:11,720 --> 00:24:15,920 Consider the property languages spoken. 424 00:24:15,920 --> 00:24:18,050 People can speak more than one language, right? 425 00:24:18,050 --> 00:24:20,330 So if I'm from describing myself, 426 00:24:20,330 --> 00:24:22,400 we can say languages spoken-- 427 00:24:22,400 --> 00:24:26,000 English, Hebrew, Latin, whatever. 428 00:24:26,000 --> 00:24:27,860 So a property can have more than one value. 429 00:24:30,970 --> 00:24:34,010 So if the item is about a country, 430 00:24:34,010 --> 00:24:38,890 it would have statements about properties like population, 431 00:24:38,890 --> 00:24:43,180 land area, official languages, borders with, anthem, 432 00:24:43,180 --> 00:24:45,070 capital city. 433 00:24:45,070 --> 00:24:48,580 If I'm describing a person, I have a whole mostly different 434 00:24:48,580 --> 00:24:51,220 set of properties that are relevant, right? 435 00:24:51,220 --> 00:24:54,160 Date of birth, place of birth, citizenship, occupation, 436 00:24:54,160 --> 00:24:56,950 father, mother, religion, notable works-- 437 00:24:56,950 --> 00:24:59,780 now, are all of these relevant for all people? 438 00:24:59,780 --> 00:25:00,970 No, of course not. 439 00:25:00,970 --> 00:25:02,140 It depends. 440 00:25:02,140 --> 00:25:05,220 And different items about different people 441 00:25:05,220 --> 00:25:08,920 will either have or not have these fields, right? 442 00:25:08,920 --> 00:25:12,640 So we wouldn't record religion for absolutely every person. 443 00:25:12,640 --> 00:25:14,200 Some people manage to do without. 444 00:25:14,200 --> 00:25:17,710 And also, it's not relevant for a lot of people, like, 445 00:25:17,710 --> 00:25:20,320 what their religion happens to be. 446 00:25:20,320 --> 00:25:22,840 Date of birth is generally relevant for most people 447 00:25:22,840 --> 00:25:24,060 that we're documenting. 448 00:25:24,060 --> 00:25:29,390 So some properties kind of crop up more commonly than others. 449 00:25:29,390 --> 00:25:33,220 A person's height, for example, is not generally 450 00:25:33,220 --> 00:25:35,596 considered of encyclopedic value, right? 451 00:25:35,596 --> 00:25:36,970 We don't, for example, if we have 452 00:25:36,970 --> 00:25:40,840 an article about even a really well-documented person 453 00:25:40,840 --> 00:25:45,610 like Winston Churchill, does Wikipedia mention his height? 454 00:25:45,610 --> 00:25:47,620 I don't think it does. 455 00:25:47,620 --> 00:25:50,320 Even though I'm sure we could probably 456 00:25:50,320 --> 00:25:52,810 find a source somewhere that lists his height, 457 00:25:52,810 --> 00:25:55,570 it's just not a very relevant piece 458 00:25:55,570 --> 00:25:57,506 of information about Churchill. 459 00:25:57,506 --> 00:25:59,380 With everything else that's written about him 460 00:25:59,380 --> 00:26:00,796 and that we know about him that we 461 00:26:00,796 --> 00:26:03,460 want to include in the article, a person's height 462 00:26:03,460 --> 00:26:08,180 is not really something of great value most of the time. 463 00:26:08,180 --> 00:26:14,420 But if we are describing Michael Jordan, it is relevant. 464 00:26:14,420 --> 00:26:15,430 I'm dating myself. 465 00:26:15,430 --> 00:26:19,230 People still know Michael Jordan, right? 466 00:26:19,230 --> 00:26:21,600 You know, a basketball player, that's 467 00:26:21,600 --> 00:26:24,204 when height is very relevant, right? 468 00:26:24,204 --> 00:26:25,620 That's one of the first things you 469 00:26:25,620 --> 00:26:28,020 say when you're describing a basketball player, 470 00:26:28,020 --> 00:26:31,380 is list their height. 471 00:26:31,380 --> 00:26:33,690 So even within the class of person, 472 00:26:33,690 --> 00:26:36,480 some properties may be more or less relevant, 473 00:26:36,480 --> 00:26:38,320 depending on the context. 474 00:26:38,320 --> 00:26:40,090 So let's look at some examples. 475 00:26:40,090 --> 00:26:42,870 These are examples of statements. 476 00:26:42,870 --> 00:26:45,400 Each line is a statement. 477 00:26:45,400 --> 00:26:47,130 So here's the first one. 478 00:26:47,130 --> 00:26:53,270 I want to state, about the item Earth, our planet. 479 00:26:53,270 --> 00:26:55,760 And what I want to say about Earth 480 00:26:55,760 --> 00:27:00,980 is that the property highest point on Earth 481 00:27:00,980 --> 00:27:03,310 has the value Mt. 482 00:27:03,310 --> 00:27:04,817 Everest. 483 00:27:04,817 --> 00:27:05,900 Would you agree with that? 484 00:27:05,900 --> 00:27:09,580 That is the highest point on Earth. 485 00:27:09,580 --> 00:27:11,100 That's a statement. 486 00:27:11,100 --> 00:27:14,020 It says something specific, one piece 487 00:27:14,020 --> 00:27:15,517 of information about Earth. 488 00:27:15,517 --> 00:27:17,350 Now of course, there's a lot of other things 489 00:27:17,350 --> 00:27:18,820 we want to say about Earth-- 490 00:27:18,820 --> 00:27:21,165 circumference, average temperature, 491 00:27:21,165 --> 00:27:22,540 I don't know, all kinds of things 492 00:27:22,540 --> 00:27:26,750 we can describe the planet with, density, it's a galaxy, 493 00:27:26,750 --> 00:27:28,250 it belongs to, all that. 494 00:27:28,250 --> 00:27:30,400 But here's one piece of information, 495 00:27:30,400 --> 00:27:37,370 one very specific field in the detailed form about Earth. 496 00:27:37,370 --> 00:27:38,990 The highest point is Mt. 497 00:27:38,990 --> 00:27:39,590 Everest. 498 00:27:39,590 --> 00:27:41,570 Now here's a second statement. 499 00:27:41,570 --> 00:27:42,920 This time Mt. 500 00:27:42,920 --> 00:27:46,690 Everest itself is the item that I'm describing, right? 501 00:27:46,690 --> 00:27:48,590 The topic has changed. 502 00:27:48,590 --> 00:27:50,120 Now I'm saying something about Mt. 503 00:27:50,120 --> 00:27:52,340 Everest, and what I'm saying about Mt. 504 00:27:52,340 --> 00:27:56,860 Everest is elevation above sea level. 505 00:27:56,860 --> 00:28:01,190 Sounds the same but it isn't, because the highest 506 00:28:01,190 --> 00:28:04,670 point on Earth answers the question where, 507 00:28:04,670 --> 00:28:08,090 like on the planet, what is the highest point? 508 00:28:08,090 --> 00:28:08,720 It's Mt. 509 00:28:08,720 --> 00:28:09,630 Everest. 510 00:28:09,630 --> 00:28:12,911 But how high is that highest point is a different piece 511 00:28:12,911 --> 00:28:13,535 of information. 512 00:28:13,535 --> 00:28:14,710 Do you agree? 513 00:28:14,710 --> 00:28:16,790 It's the actual altitude. 514 00:28:16,790 --> 00:28:19,600 It's not where on the planet it is. 515 00:28:19,600 --> 00:28:21,680 So it may sound similar, but these are actually 516 00:28:21,680 --> 00:28:24,030 very different pieces of information. 517 00:28:24,030 --> 00:28:27,800 So that highest point, how high is it? 518 00:28:27,800 --> 00:28:31,790 Well, it's 8,848 meters high. 519 00:28:31,790 --> 00:28:36,550 Now the third statement gives another piece of information 520 00:28:36,550 --> 00:28:37,960 about the first item. 521 00:28:37,960 --> 00:28:40,870 Same item-- I could have grouped them together. 522 00:28:40,870 --> 00:28:42,400 Another thing I know about the Earth 523 00:28:42,400 --> 00:28:46,480 is that the deepest point on the planet 524 00:28:46,480 --> 00:28:53,050 is the Challenger Deep, part of the so-called Mariana 525 00:28:53,050 --> 00:28:54,760 Trench in the ocean. 526 00:28:54,760 --> 00:28:56,530 So that is the deepest point. 527 00:28:56,530 --> 00:28:58,180 And how deep is it? 528 00:28:58,180 --> 00:29:01,384 I again use the elevation above sea level. 529 00:29:01,384 --> 00:29:03,550 That's the name of the property even though it's not 530 00:29:03,550 --> 00:29:04,750 above sea level. 531 00:29:04,750 --> 00:29:08,260 I have a negative value because the elevation of the Challenger 532 00:29:08,260 --> 00:29:13,700 Deep is minus 11 kilometers, more or less. 533 00:29:13,700 --> 00:29:14,200 All right? 534 00:29:14,200 --> 00:29:15,620 So these are statements. 535 00:29:15,620 --> 00:29:18,820 These are four individual pieces of data. 536 00:29:18,820 --> 00:29:21,160 And I could also look at it this way. 537 00:29:21,160 --> 00:29:25,210 Maybe that's closer to the government form example 538 00:29:25,210 --> 00:29:26,620 that I was giving, right? 539 00:29:26,620 --> 00:29:29,190 So I want to say something about Earth. 540 00:29:29,190 --> 00:29:30,760 What do I want to say? 541 00:29:30,760 --> 00:29:33,580 Two things-- highest point. 542 00:29:33,580 --> 00:29:36,760 That's the field, that's the property, 543 00:29:36,760 --> 00:29:37,780 and this is the value. 544 00:29:37,780 --> 00:29:39,190 The highest point is Mt. 545 00:29:39,190 --> 00:29:40,240 Everest. 546 00:29:40,240 --> 00:29:42,880 The deepest point is Challenger Deep. 547 00:29:42,880 --> 00:29:46,450 And then I have things to say about Challenger Deep-- 548 00:29:46,450 --> 00:29:49,630 the property of elevation above sea level, the value 549 00:29:49,630 --> 00:29:52,280 is minus 11 kilometers. 550 00:29:55,900 --> 00:30:00,600 Now here's yet another view of the same data 551 00:30:00,600 --> 00:30:04,530 once more, with numeric IDs. 552 00:30:04,530 --> 00:30:08,150 So this is the same information, the same four statements. 553 00:30:08,150 --> 00:30:13,020 But this time, in addition to using words, 554 00:30:13,020 --> 00:30:21,270 I'm also including weird numbers following either Q or P. 555 00:30:21,270 --> 00:30:25,890 So P stands for property. 556 00:30:25,890 --> 00:30:30,330 So the highest point property is P610. 557 00:30:30,330 --> 00:30:34,216 And the deepest point property is P1589. 558 00:30:34,216 --> 00:30:35,340 What do these numbers mean? 559 00:30:35,340 --> 00:30:36,985 They don't mean anything at all. 560 00:30:36,985 --> 00:30:37,860 They're just numbers. 561 00:30:37,860 --> 00:30:39,760 They're just sequential numbers. 562 00:30:39,760 --> 00:30:42,600 And if I create a new Wikidata item right now, 563 00:30:42,600 --> 00:30:46,020 it'll get just the next available number. 564 00:30:46,020 --> 00:30:47,790 So they're just numbers. 565 00:30:47,790 --> 00:30:49,080 So P stands for property. 566 00:30:49,080 --> 00:30:51,480 What does Q stand for? 567 00:30:51,480 --> 00:30:53,460 Does anyone know? 568 00:30:53,460 --> 00:30:58,500 It's a trick question because it's hard to guess. 569 00:30:58,500 --> 00:31:01,896 But the principal architect of Wikidata, 570 00:31:01,896 --> 00:31:07,860 a Wikipedian named Danny [INAUDIBLE] and data scientist, 571 00:31:07,860 --> 00:31:10,950 is married to a lovely lady named [INAUDIBLE] 572 00:31:10,950 --> 00:31:16,320 spelled with a Q. And this is a loving tribute. 573 00:31:16,320 --> 00:31:21,780 And she's also a Wikipedian and an admin of Uzbek Wikipedia. 574 00:31:21,780 --> 00:31:31,650 So Q2 is just the numeric identifier of the item Earth. 575 00:31:31,650 --> 00:31:36,190 And Q513 is the identifier of Mt. 576 00:31:36,190 --> 00:31:37,310 Everest. 577 00:31:37,310 --> 00:31:42,950 You notice that we use that ID across the statement, right? 578 00:31:42,950 --> 00:31:48,520 So from Wikidata's perspective, this 579 00:31:48,520 --> 00:31:53,290 is actually what the database actually contains. 580 00:31:53,290 --> 00:31:55,030 What we were saying with words-- 581 00:31:55,030 --> 00:31:57,650 the Earth, highest point, whatever-- 582 00:31:57,650 --> 00:31:58,540 never mind that. 583 00:31:58,540 --> 00:32:03,250 Q2 has P610 with a value Q513. 584 00:32:03,250 --> 00:32:06,190 That's what Wikidata cares about, OK? 585 00:32:06,190 --> 00:32:09,770 Now that, you'll agree, is a little inaccessible. 586 00:32:09,770 --> 00:32:13,120 Just these lists of numbers, that's a little hard. 587 00:32:13,120 --> 00:32:16,240 So Wikidata understands and allows 588 00:32:16,240 --> 00:32:19,690 us to continue using our words. 589 00:32:19,690 --> 00:32:23,650 But actually, it gets translated into numeric IDs. 590 00:32:23,650 --> 00:32:25,050 Now why is this a good idea? 591 00:32:30,070 --> 00:32:33,070 Why can't we just say Earth or Mt. 592 00:32:33,070 --> 00:32:35,120 Everest? 593 00:32:35,120 --> 00:32:36,170 Any thoughts? 594 00:32:36,170 --> 00:32:39,530 This is an open question. 595 00:32:39,530 --> 00:32:41,540 Why is this a good idea to use numbers 596 00:32:41,540 --> 00:32:43,260 instead of the names of things? 597 00:32:47,000 --> 00:32:51,750 Yes, because more than one thing can have the same name. 598 00:32:51,750 --> 00:32:52,590 What do you mean? 599 00:32:52,590 --> 00:32:53,460 There's only one Mt. 600 00:32:53,460 --> 00:32:54,480 Everest. 601 00:32:54,480 --> 00:32:55,510 Well, yeah. 602 00:32:55,510 --> 00:32:58,710 But there there's also a movie called-- and probably 603 00:32:58,710 --> 00:33:00,000 more than one-- called Mt. 604 00:33:00,000 --> 00:33:04,080 Everest, or a TV documentary literally called Mt. 605 00:33:04,080 --> 00:33:06,590 Everest. 606 00:33:06,590 --> 00:33:09,960 And of course, if I'm describing a person named 607 00:33:09,960 --> 00:33:14,930 Frank Johnson, not the only Frank Johnson on the planet, 608 00:33:14,930 --> 00:33:16,180 right? 609 00:33:16,180 --> 00:33:17,760 But wait, you say. 610 00:33:17,760 --> 00:33:20,640 On Wikipedia we deal with that problem, right? 611 00:33:20,640 --> 00:33:23,490 How do we deal with that problem on Wikipedia? 612 00:33:23,490 --> 00:33:26,270 Does anyone in the audience know? 613 00:33:26,270 --> 00:33:27,969 The standard way to deal with the fact 614 00:33:27,969 --> 00:33:30,260 that there is more than one Frank Johnson in the world, 615 00:33:30,260 --> 00:33:35,600 on Wikipedia, is to use parentheses after the name. 616 00:33:35,600 --> 00:33:39,200 So there is Frank Johnson (actor) 617 00:33:39,200 --> 00:33:42,620 and Frank Johnson (politician), for example, 618 00:33:42,620 --> 00:33:44,700 if that's the distinction we need to make. 619 00:33:44,700 --> 00:33:48,140 So you put in parentheses kind of the minimal amount 620 00:33:48,140 --> 00:33:51,840 of information you need to tell apart these Frank Johnsons. 621 00:33:51,840 --> 00:33:54,530 What if there's two politician Frank Johnsons? 622 00:33:54,530 --> 00:33:58,880 Well, then you would say Frank Johnson, (Delaware politician) 623 00:33:58,880 --> 00:34:01,960 versus Frank Johnson (California politician), right? 624 00:34:01,960 --> 00:34:05,210 You just put in that bit of context to tell them apart. 625 00:34:05,210 --> 00:34:07,640 So that's the solution that Wikipedians came up 626 00:34:07,640 --> 00:34:12,469 with years and years ago because they did need 627 00:34:12,469 --> 00:34:15,560 a unique name for the article. 628 00:34:15,560 --> 00:34:18,170 You can't have two articles literally called 629 00:34:18,170 --> 00:34:20,790 Frank Johnson on Wikipedia. 630 00:34:20,790 --> 00:34:23,570 So that's the solution on Wikipedia. 631 00:34:23,570 --> 00:34:28,429 But Wikidata was designed much later, more than a decade 632 00:34:28,429 --> 00:34:31,340 after Wikipedia, and was able to kind of learn 633 00:34:31,340 --> 00:34:34,520 from the experience of Wikipedia, which 634 00:34:34,520 --> 00:34:39,380 has tremendous experience with multilingualism, much 635 00:34:39,380 --> 00:34:42,870 more than most sites and projects, as we know. 636 00:34:42,870 --> 00:34:44,659 And so the Wikidata team understood 637 00:34:44,659 --> 00:34:47,840 from the get go that this will be an issue, 638 00:34:47,840 --> 00:34:50,989 and it's better to use numbers that are unequivocally 639 00:34:50,989 --> 00:34:54,800 different from each other instead of labels, 640 00:34:54,800 --> 00:34:57,290 instead of the actual name, the actual text, 641 00:34:57,290 --> 00:34:59,630 because names are not unique. 642 00:34:59,630 --> 00:35:03,260 Names can change, right? 643 00:35:03,260 --> 00:35:08,960 Just last year, there was a big naming reform in Ukraine 644 00:35:08,960 --> 00:35:13,610 and a whole bunch of towns and districts were renamed. 645 00:35:13,610 --> 00:35:17,330 Does that mean we should change all the data that we have, like 646 00:35:17,330 --> 00:35:19,550 lose all the data that we have about the old name? 647 00:35:19,550 --> 00:35:22,130 No, we ideally just want to change the name 648 00:35:22,130 --> 00:35:24,020 without breaking links. 649 00:35:24,020 --> 00:35:28,550 So having the links actually refer to the numbers 650 00:35:28,550 --> 00:35:32,090 is one way to ensure the integrity of the data, 651 00:35:32,090 --> 00:35:35,360 of the links, when renaming happens. 652 00:35:35,360 --> 00:35:39,230 Another reason is well, even if the name doesn't change, 653 00:35:39,230 --> 00:35:42,230 not all humans call everything the same, right? 654 00:35:42,230 --> 00:35:46,180 So Earth is Earth in English, but it's 655 00:35:46,180 --> 00:35:48,210 [SPEAKING ARABIC] in Arabic. 656 00:35:48,210 --> 00:35:49,585 It's [SPEAKING HEBREW] in Hebrew. 657 00:35:53,480 --> 00:35:56,570 So obviously, Earth-- even that is not 658 00:35:56,570 --> 00:36:01,920 as unambiguous or unequivocal as you might think. 659 00:36:01,920 --> 00:36:03,500 And so that is the reason Wikidata, 660 00:36:03,500 --> 00:36:07,640 which is built to be multilingual from the start, 661 00:36:07,640 --> 00:36:11,230 talks about numbers rather than labels. 662 00:36:11,230 --> 00:36:12,150 OK. 663 00:36:12,150 --> 00:36:15,370 Ha, I had a whole slide about that and I forgot. 664 00:36:15,370 --> 00:36:17,830 Yes, so even London, again, is not 665 00:36:17,830 --> 00:36:20,710 just London, England, which is what you were thinking about. 666 00:36:20,710 --> 00:36:22,030 It's also a city in Canada. 667 00:36:22,030 --> 00:36:26,260 And it's also a family name, like Jack London. 668 00:36:26,260 --> 00:36:27,430 It's also a movie company. 669 00:36:27,430 --> 00:36:32,230 There must be some hotel named London somewhere. 670 00:36:32,230 --> 00:36:36,070 This is a good opportunity to remind everyone 671 00:36:36,070 --> 00:36:41,110 that the vast majority of humankind 672 00:36:41,110 --> 00:36:45,700 does not speak a word of English. 673 00:36:45,700 --> 00:36:48,790 That's a statistic worth remembering. 674 00:36:48,790 --> 00:36:55,240 The vast majority of the planet does not speak English at all. 675 00:36:55,240 --> 00:36:57,070 That does not contradict the datum 676 00:36:57,070 --> 00:37:00,070 that English is the most widely spoken language. 677 00:37:00,070 --> 00:37:02,860 And yet, in aggregate, a majority of people 678 00:37:02,860 --> 00:37:07,180 speak other languages, and not English at all. 679 00:37:07,180 --> 00:37:13,150 So moving swiftly on, this is a pause for questions 680 00:37:13,150 --> 00:37:15,610 about what I've covered so far. 681 00:37:15,610 --> 00:37:17,390 Any questions in the audience? 682 00:37:17,390 --> 00:37:19,450 If not, we moved to IRC. 683 00:37:19,450 --> 00:37:21,042 If there are any questions-- 684 00:37:23,880 --> 00:37:26,891 Any questions? 685 00:37:26,891 --> 00:37:27,390 No? 686 00:37:27,390 --> 00:37:28,305 IRC? 687 00:37:28,305 --> 00:37:29,490 Any questions? 688 00:37:33,580 --> 00:37:34,180 OK. 689 00:37:34,180 --> 00:37:38,170 We will have additional pauses for questions later. 690 00:37:38,170 --> 00:37:41,470 But enough of my hand-waving. 691 00:37:41,470 --> 00:37:44,590 Let's go explore Wikidata. 692 00:37:44,590 --> 00:37:49,730 So Wikidata lives at wikidata.org. 693 00:37:49,730 --> 00:37:59,570 And Wikidata already has more than 25 million items. 694 00:37:59,570 --> 00:38:05,570 That is, it collects statements about more than 25 695 00:38:05,570 --> 00:38:08,270 million topics. 696 00:38:08,270 --> 00:38:12,170 It has many, many more than 25 million statements 697 00:38:12,170 --> 00:38:14,660 because many of these items have dozens or hundreds 698 00:38:14,660 --> 00:38:16,370 of statements. 699 00:38:16,370 --> 00:38:20,720 So it documents 25 million things-- 700 00:38:20,720 --> 00:38:23,153 people, books, rivers, whatever. 701 00:38:26,010 --> 00:38:28,800 Just to give us a sense of how big that number is, 702 00:38:28,800 --> 00:38:32,430 how many articles do we have on English Wikipedia? 703 00:38:32,430 --> 00:38:35,610 More than-- yes, more than 5 million articles. 704 00:38:35,610 --> 00:38:37,990 And that's the largest Wikipedia. 705 00:38:37,990 --> 00:38:41,100 So Wikidata is already describing 706 00:38:41,100 --> 00:38:45,450 more than five times, or about five times as many items 707 00:38:45,450 --> 00:38:48,460 as even our largest Wikipedia. 708 00:38:48,460 --> 00:38:50,840 So obviously, Wikidata contains data 709 00:38:50,840 --> 00:38:56,900 about things that have no article on any Wikipedia. 710 00:38:56,900 --> 00:39:01,980 It is a much, much larger, more comprehensive project. 711 00:39:01,980 --> 00:39:04,250 All right, the second thing we might notice 712 00:39:04,250 --> 00:39:07,610 is, well, this looks kind of like Wikipedia, right? 713 00:39:07,610 --> 00:39:11,210 If we've never visited, it looks kind of like Wikipedia. 714 00:39:11,210 --> 00:39:13,490 It has this sidebar. 715 00:39:13,490 --> 00:39:15,290 It has these buttons at the top. 716 00:39:15,290 --> 00:39:17,810 It looks like it's from the '90s. 717 00:39:17,810 --> 00:39:18,770 Yeah. 718 00:39:18,770 --> 00:39:20,900 So the reason it looks like Wikipedia 719 00:39:20,900 --> 00:39:24,410 is that it is a wiki running on Mediawiki software. 720 00:39:24,410 --> 00:39:28,430 It is running on software very much like Wikipedia. 721 00:39:28,430 --> 00:39:32,180 But it is running on a kind of modification 722 00:39:32,180 --> 00:39:34,010 of the standard wiki software. 723 00:39:34,010 --> 00:39:36,170 It has an additional, very important component 724 00:39:36,170 --> 00:39:38,630 named Wikibase, which gives it all 725 00:39:38,630 --> 00:39:42,700 of its structured and linked data power. 726 00:39:42,700 --> 00:39:46,763 So let's start exploring Wikidata. 727 00:39:52,830 --> 00:39:55,770 Let's take something local-- 728 00:39:55,770 --> 00:39:57,530 Harvey Milk. 729 00:39:57,530 --> 00:40:00,190 Harvey Milk. 730 00:40:00,190 --> 00:40:03,460 What does Wikidata know about Harvey Milk? 731 00:40:03,460 --> 00:40:06,730 For those on YouTube who may not be local, 732 00:40:06,730 --> 00:40:15,580 he's a San Francisco politician and gay rights activist 733 00:40:15,580 --> 00:40:18,380 who was murdered in the '70s. 734 00:40:18,380 --> 00:40:21,280 It was very significant in the history of those struggles 735 00:40:21,280 --> 00:40:22,710 in this country. 736 00:40:22,710 --> 00:40:27,220 So what does Wikidata tell us about Harvey Milk? 737 00:40:27,220 --> 00:40:29,770 Well, the first thing is it knows 738 00:40:29,770 --> 00:40:34,562 that Harvey Milk is Q17141. 739 00:40:34,562 --> 00:40:36,520 That's the most important piece of information, 740 00:40:36,520 --> 00:40:38,770 is first of all, that is the identifier. 741 00:40:38,770 --> 00:40:42,490 That is the item number of all the data 742 00:40:42,490 --> 00:40:46,150 that we will collect about Harvey Milk. 743 00:40:46,150 --> 00:40:50,020 The second thing you see right under the title 744 00:40:50,020 --> 00:40:54,730 is this line, this very, very brief summary, right? 745 00:40:54,730 --> 00:40:59,620 "American politician who became a martyr in the gay community." 746 00:40:59,620 --> 00:41:02,080 This line is the description line. 747 00:41:02,080 --> 00:41:04,640 So the name of the item-- 748 00:41:04,640 --> 00:41:05,980 this is the label. 749 00:41:05,980 --> 00:41:07,450 We call it label on Wikidata. 750 00:41:07,450 --> 00:41:08,740 That's the label. 751 00:41:08,740 --> 00:41:10,990 And this line is the description. 752 00:41:10,990 --> 00:41:13,480 Now why is this description important? 753 00:41:13,480 --> 00:41:16,990 This is the description that helps us tell this Harvey 754 00:41:16,990 --> 00:41:23,230 Milk from any other Harvey Milk that may exist, all right? 755 00:41:23,230 --> 00:41:26,530 So again, this would be useful if I'm 756 00:41:26,530 --> 00:41:30,190 looking up someone with a slightly more generic name. 757 00:41:30,190 --> 00:41:33,910 That line will help me tell apart the item about Harvey 758 00:41:33,910 --> 00:41:38,860 Milk the gay activist rather than Harvey Milk the film 759 00:41:38,860 --> 00:41:41,750 actor, OK? 760 00:41:41,750 --> 00:41:43,100 And where is it coming from? 761 00:41:43,100 --> 00:41:48,690 Well, Wikidata has this whole table, 762 00:41:48,690 --> 00:41:52,790 as you can see, with descriptions and labels 763 00:41:52,790 --> 00:41:54,750 in other languages. 764 00:41:54,750 --> 00:41:59,600 So Wikidata is able to refer to Harvey Milk in Arabic which, 765 00:41:59,600 --> 00:42:04,010 don't panic, is written from right to left. 766 00:42:04,010 --> 00:42:07,730 It also knows what to call him in Bulgarian. 767 00:42:07,730 --> 00:42:11,030 I mean, it's the same name, but it's in a different script. 768 00:42:11,030 --> 00:42:13,640 In French, in Hebrew, and that's it? 769 00:42:13,640 --> 00:42:17,960 Does it not know a name for Harvey Milk in Italian? 770 00:42:17,960 --> 00:42:19,760 Of course it does. 771 00:42:19,760 --> 00:42:22,250 It actually has labels for this person 772 00:42:22,250 --> 00:42:24,435 in many, many, many languages. 773 00:42:24,435 --> 00:42:30,080 It doesn't have descriptions in every language, as you can see. 774 00:42:30,080 --> 00:42:30,800 OK? 775 00:42:30,800 --> 00:42:36,240 So why was Wikidata showing me these languages and not others? 776 00:42:36,240 --> 00:42:39,260 I mean, why this somewhat arbitrary collection-- 777 00:42:39,260 --> 00:42:42,860 English, Arabic, Bulgarian, German, French, and Hebrew? 778 00:42:42,860 --> 00:42:45,300 Because I told it to. 779 00:42:45,300 --> 00:42:50,390 So if we briefly click over to my user page-- 780 00:42:50,390 --> 00:42:52,730 again, like every wiki, you have user accounts. 781 00:42:52,730 --> 00:42:53,960 You have user pages. 782 00:42:53,960 --> 00:42:55,380 This is my user page. 783 00:42:55,380 --> 00:42:59,750 And as you can see, there's this little user 784 00:42:59,750 --> 00:43:03,230 information box here called a Babel box by Wikipedians, 785 00:43:03,230 --> 00:43:06,610 where I list the languages that I speak. 786 00:43:06,610 --> 00:43:11,000 And Wikidata uses this box just to kind of helpfully 787 00:43:11,000 --> 00:43:12,944 show me these languages. 788 00:43:12,944 --> 00:43:14,360 Of course, all the other languages 789 00:43:14,360 --> 00:43:19,580 are still available, as you saw, by clicking the more languages. 790 00:43:19,580 --> 00:43:22,940 But this is just a useful little way 791 00:43:22,940 --> 00:43:27,590 of getting the languages I care about up there first. 792 00:43:27,590 --> 00:43:29,060 By the way, this is a lie. 793 00:43:29,060 --> 00:43:31,170 I don't actually speak Bulgarian. 794 00:43:31,170 --> 00:43:33,740 That stayed on my user page because I was demonstrating 795 00:43:33,740 --> 00:43:37,010 this in Bulgaria and I wanted that label to show up there 796 00:43:37,010 --> 00:43:38,420 during the talk-- 797 00:43:38,420 --> 00:43:40,250 just in case you were going to tell me 798 00:43:40,250 --> 00:43:43,840 a really good Bulgarian joke. 799 00:43:43,840 --> 00:43:48,470 OK so for example, Hebrew is my mother tongue. 800 00:43:48,470 --> 00:43:51,730 And we have a Hebrew label for Harvey Milk. 801 00:43:51,730 --> 00:43:53,810 But we don't have a description. 802 00:43:53,810 --> 00:44:00,950 So let's fix that right now by clicking the edit button right 803 00:44:00,950 --> 00:44:01,960 here. 804 00:44:01,960 --> 00:44:05,930 I click edit, and this table became editable. 805 00:44:05,930 --> 00:44:09,661 And now I can very briefly type a description. 806 00:44:22,899 --> 00:44:24,440 AUDIENCE: Online in about 20 seconds. 807 00:44:24,440 --> 00:44:25,400 But can we hold it? 808 00:44:25,400 --> 00:44:26,066 ASAF BARTOV: OK. 809 00:44:28,454 --> 00:44:30,430 That was good timing for the screen to crash. 810 00:44:53,642 --> 00:44:54,142 OK? 811 00:44:59,082 --> 00:45:01,800 Are we back? 812 00:45:01,800 --> 00:45:02,850 OK. 813 00:45:02,850 --> 00:45:03,690 Sorry about that. 814 00:45:03,690 --> 00:45:07,500 So this was all about what to call him in different languages 815 00:45:07,500 --> 00:45:09,930 and scripts and how to tell this person apart 816 00:45:09,930 --> 00:45:13,590 from other people with potentially the same name. 817 00:45:13,590 --> 00:45:17,930 Let's scroll down and see what else does Wikidata 818 00:45:17,930 --> 00:45:19,680 know about this person? 819 00:45:19,680 --> 00:45:24,060 So as you can see, this is a list of statements, right? 820 00:45:24,060 --> 00:45:25,500 This is a list of statements. 821 00:45:25,500 --> 00:45:27,900 And the properties are on the left, 822 00:45:27,900 --> 00:45:30,340 the values are on the right. 823 00:45:30,340 --> 00:45:33,870 So the first thing Wikidata knows about Harvey Milk 824 00:45:33,870 --> 00:45:38,520 is a very important property called instance of. 825 00:45:38,520 --> 00:45:39,910 Instance of. 826 00:45:39,910 --> 00:45:44,690 And the property instance of answers the very basic question 827 00:45:44,690 --> 00:45:49,460 what kind of thing is this that I'm describing? 828 00:45:49,460 --> 00:45:50,870 Is it a book? 829 00:45:50,870 --> 00:45:51,980 Is it a poem? 830 00:45:51,980 --> 00:45:53,570 Is it a mountain? 831 00:45:53,570 --> 00:45:55,520 Is it a theological concept? 832 00:45:55,520 --> 00:45:57,800 No, it's a human. 833 00:45:57,800 --> 00:46:00,020 It's a person, OK? 834 00:46:00,020 --> 00:46:01,880 The item about Mt. 835 00:46:01,880 --> 00:46:07,070 Everest will say instance of mountain, OK? 836 00:46:07,070 --> 00:46:10,790 This is a very important property. 837 00:46:10,790 --> 00:46:12,500 Why is it important? 838 00:46:12,500 --> 00:46:14,630 Wouldn't anyone looking at this know that this is 839 00:46:14,630 --> 00:46:15,550 a human being? 840 00:46:15,550 --> 00:46:16,310 Yes. 841 00:46:16,310 --> 00:46:18,720 Anyone looking at this will know. 842 00:46:18,720 --> 00:46:23,780 But if I want a computer to be able to pull information 843 00:46:23,780 --> 00:46:28,160 about people, I want to be able to easily exclude 844 00:46:28,160 --> 00:46:30,680 all the mountains and poems and other things that 845 00:46:30,680 --> 00:46:33,440 are not people from my query. 846 00:46:33,440 --> 00:46:37,400 So this single datum, this single piece of data, 847 00:46:37,400 --> 00:46:41,720 is what tells computers and algorithms very clearly, 848 00:46:41,720 --> 00:46:42,890 this is a human. 849 00:46:42,890 --> 00:46:47,340 Things that aren't instance of human are other things. 850 00:46:47,340 --> 00:46:48,230 OK? 851 00:46:48,230 --> 00:46:50,145 So it may sound very trivial, but it's not. 852 00:46:50,145 --> 00:46:51,770 It's very important to have an instance 853 00:46:51,770 --> 00:46:54,077 of field for Wikidata items. 854 00:46:54,077 --> 00:46:55,410 All right, what else do we know? 855 00:46:55,410 --> 00:46:59,360 Well, Wikidata knows about an image for Harvey Milk. 856 00:46:59,360 --> 00:47:02,982 Again, we can find a ton of images-- or maybe not a ton, 857 00:47:02,982 --> 00:47:04,940 but we can find dozens of images of Harvey Milk 858 00:47:04,940 --> 00:47:10,430 on Commons, on our Wikimedia multimedia repository. 859 00:47:10,430 --> 00:47:13,430 So why should we have a single image here on Wikidata? 860 00:47:13,430 --> 00:47:16,280 Again, this is mostly for reusers. 861 00:47:16,280 --> 00:47:18,920 If I'm building some kind of tool that pulls information 862 00:47:18,920 --> 00:47:21,680 from Wikidata, it's nice if there's 863 00:47:21,680 --> 00:47:24,680 at least one representative image to kind of use 864 00:47:24,680 --> 00:47:30,300 as the default or immediate image for Harvey Milk 865 00:47:30,300 --> 00:47:33,120 in some other reused context. 866 00:47:33,120 --> 00:47:34,770 All right, sex or gender-- 867 00:47:34,770 --> 00:47:35,670 male. 868 00:47:35,670 --> 00:47:38,790 Country of citizenship-- United States of America. 869 00:47:38,790 --> 00:47:39,910 Given name is Harvey. 870 00:47:39,910 --> 00:47:41,580 The date of birth is so and so. 871 00:47:41,580 --> 00:47:44,340 The place of birth is Woodmere. 872 00:47:44,340 --> 00:47:45,870 The place of death is San Francisco. 873 00:47:45,870 --> 00:47:48,640 The manner of death is homicide. 874 00:47:48,640 --> 00:47:50,930 Wikidata knows that. 875 00:47:50,930 --> 00:47:55,700 Now again, every little datum like that 876 00:47:55,700 --> 00:48:02,210 is the basis for later querying and answering questions. 877 00:48:02,210 --> 00:48:07,390 So the fact that we record the manner of death of people-- 878 00:48:07,390 --> 00:48:09,230 or at least of some people-- 879 00:48:09,230 --> 00:48:11,900 will allow us later to go, you know, 880 00:48:11,900 --> 00:48:17,120 who are some people from Belgium who died by homicide? 881 00:48:17,120 --> 00:48:24,650 That's a question Wikidata can answer, thanks to this field. 882 00:48:24,650 --> 00:48:27,680 The other thing I mentioned is that things are links. 883 00:48:27,680 --> 00:48:29,680 So the place of birth is Woodmere. 884 00:48:29,680 --> 00:48:31,900 I don't know where Woodmere is, but I 885 00:48:31,900 --> 00:48:34,390 can click that and find out. 886 00:48:34,390 --> 00:48:38,270 Here is the Wikidata item about Woodmere, right? 887 00:48:38,270 --> 00:48:41,230 It was the value in the statement about Harvey Milk, 888 00:48:41,230 --> 00:48:43,900 but now I'm looking at the item about Woodmere. 889 00:48:43,900 --> 00:48:48,047 And it turns out it's in Nassau County, New York, right? 890 00:48:48,047 --> 00:48:50,380 And of course, Wikidata has a whole bunch of information 891 00:48:50,380 --> 00:48:55,450 for me about Woodmere-- 892 00:48:55,450 --> 00:48:59,720 what country it's in and the coordinates and the population 893 00:48:59,720 --> 00:49:06,230 and the area, all the things you would expect about a place, OK? 894 00:49:06,230 --> 00:49:07,512 Let's get back to Harvey Milk. 895 00:49:10,370 --> 00:49:13,260 So the manner of death, the cause of death-- 896 00:49:13,260 --> 00:49:16,880 now here, Wikidata gives us excellent information. 897 00:49:16,880 --> 00:49:20,390 The actual cause of death is ballistic trauma. 898 00:49:20,390 --> 00:49:22,160 That's a professional term. 899 00:49:22,160 --> 00:49:27,560 And this statement has qualifiers. 900 00:49:27,560 --> 00:49:30,650 So until now, I was talking about triples, right? 901 00:49:30,650 --> 00:49:33,260 The item has a property with a certain value. 902 00:49:33,260 --> 00:49:35,270 Actually, each statement can also 903 00:49:35,270 --> 00:49:38,030 have a number of qualifiers which 904 00:49:38,030 --> 00:49:45,424 add aspects of information, still about that one question 905 00:49:45,424 --> 00:49:46,590 that we're answering, right? 906 00:49:46,590 --> 00:49:49,904 So if this property answers cause of death, 907 00:49:49,904 --> 00:49:51,320 it's not discussing anything else. 908 00:49:51,320 --> 00:49:52,880 It's not discussing languages. 909 00:49:52,880 --> 00:49:54,920 It's not discussing date of birth, right? 910 00:49:54,920 --> 00:49:56,930 It's talking about the cause of death. 911 00:49:56,930 --> 00:49:59,300 But we're not just saying ballistic trauma. 912 00:49:59,300 --> 00:50:04,550 We're saying ballistic trauma with the quantity attribute 913 00:50:04,550 --> 00:50:05,660 being five. 914 00:50:05,660 --> 00:50:07,550 What does that mean? 915 00:50:07,550 --> 00:50:08,870 Five bullets, right? 916 00:50:08,870 --> 00:50:12,780 There are five ballistic traumas. 917 00:50:12,780 --> 00:50:15,300 He was he was shot five times. 918 00:50:15,300 --> 00:50:18,210 And he was shot by this person named Dan White. 919 00:50:18,210 --> 00:50:25,020 And this ballistic trauma, like this actual shooting, 920 00:50:25,020 --> 00:50:28,420 is itself the subject of this other thing. 921 00:50:28,420 --> 00:50:31,440 This is a link to a whole other Wikidata 922 00:50:31,440 --> 00:50:35,510 item about the Moscone-Milk assassinations. 923 00:50:35,510 --> 00:50:38,610 Moscone was the San Francisco mayor at the time. 924 00:50:43,540 --> 00:50:47,510 We'll see slightly better or easier to understand examples 925 00:50:47,510 --> 00:50:49,460 of qualifiers in a bit. 926 00:50:49,460 --> 00:50:54,440 So if this was confusing, hang on. 927 00:50:54,440 --> 00:50:55,970 So he was killed by Dan White. 928 00:50:55,970 --> 00:50:57,800 He spoke English. 929 00:50:57,800 --> 00:50:59,960 His occupation-- here's an example 930 00:50:59,960 --> 00:51:03,140 of a property with more than one value, right? 931 00:51:03,140 --> 00:51:06,260 So Milk was a politician. 932 00:51:06,260 --> 00:51:09,710 But he was also a Navy officer, at least for a while. 933 00:51:09,710 --> 00:51:12,980 That was another thing that he did during his life. 934 00:51:12,980 --> 00:51:15,350 And he was a human rights activist, right? 935 00:51:15,350 --> 00:51:20,600 So some people are writers and translators. 936 00:51:20,600 --> 00:51:22,610 So people can have more than one occupation. 937 00:51:22,610 --> 00:51:26,310 People can speak more than one language. 938 00:51:26,310 --> 00:51:29,130 Here's a better example of a qualifier. 939 00:51:29,130 --> 00:51:35,090 So the property award received has the value Presidential 940 00:51:35,090 --> 00:51:37,560 Medal of Freedom. 941 00:51:37,560 --> 00:51:42,570 And that award has an attribute called point in time, 942 00:51:42,570 --> 00:51:44,070 like when was this? 943 00:51:44,070 --> 00:51:46,580 This was in 2009. 944 00:51:46,580 --> 00:51:50,510 Do you see that this piece of data-- 945 00:51:50,510 --> 00:52:04,780 2009-- is a sub-statement or is subjugated 946 00:52:04,780 --> 00:52:09,621 to the context of this award, was the Presidential Medal 947 00:52:09,621 --> 00:52:10,120 of Freedom? 948 00:52:10,120 --> 00:52:13,430 It can't just kind of free float in the article. 949 00:52:13,430 --> 00:52:17,650 It's not that 2009 is itself a meaningful thing, right? 950 00:52:17,650 --> 00:52:21,550 This medal was awarded in 2009. 951 00:52:21,550 --> 00:52:22,170 If 952 00:52:22,170 --> 00:52:24,070 Wikidata doesn't tell us, for example, 953 00:52:24,070 --> 00:52:27,130 when he was a Navy officer, OK? 954 00:52:27,130 --> 00:52:30,100 But if we were, for example, to look that up right now 955 00:52:30,100 --> 00:52:33,820 and find out that Milk was a Navy officer between 1962 956 00:52:33,820 --> 00:52:39,542 and 1964, we could go back here to the Navy officer bit 957 00:52:39,542 --> 00:52:41,010 and click edit. 958 00:52:41,010 --> 00:52:44,190 This is how I edit this particular little piece 959 00:52:44,190 --> 00:52:45,360 of information. 960 00:52:45,360 --> 00:52:49,350 And add a qualifier like this. 961 00:52:49,350 --> 00:52:51,300 I click Add Qualifier. 962 00:52:51,300 --> 00:52:57,660 And I could pick start time and end time, right? 963 00:52:57,660 --> 00:53:04,990 And then I could type 1962 to 1964, 964 00:53:04,990 --> 00:53:08,000 and that would be teaching Wikidata. 965 00:53:08,000 --> 00:53:10,660 Oh, I'm sorry, I meant to do that for Navy officer. 966 00:53:10,660 --> 00:53:11,230 OK. 967 00:53:11,230 --> 00:53:14,800 But, you know, that is the exact-- 968 00:53:14,800 --> 00:53:18,400 the accurate time span of that statement. 969 00:53:18,400 --> 00:53:22,850 So it's true to say about a person, he was a Navy officer, 970 00:53:22,850 --> 00:53:25,990 even if of course he wasn't a Navy officer his entire life. 971 00:53:25,990 --> 00:53:28,120 But it's better and it's more accurate, 972 00:53:28,120 --> 00:53:32,260 to say he was a Navy officer between 1962 and 1964. 973 00:53:32,260 --> 00:53:35,380 Don't worry, I'm not saving this. 974 00:53:35,380 --> 00:53:39,150 No vandalizing of Wikidata in this session. 975 00:53:39,150 --> 00:53:40,450 OK. 976 00:53:40,450 --> 00:53:41,140 Moving on. 977 00:53:41,140 --> 00:53:42,430 What else does Wikidata know? 978 00:53:42,430 --> 00:53:43,960 He was educated at this university. 979 00:53:43,960 --> 00:53:46,970 He was a member of this political party. 980 00:53:46,970 --> 00:53:47,470 Right? 981 00:53:47,470 --> 00:53:49,428 That's of course if they're a relevant property 982 00:53:49,428 --> 00:53:52,270 for a politician. 983 00:53:52,270 --> 00:53:56,500 Religion, military branch, what is the category on commons 984 00:53:56,500 --> 00:53:58,720 that discusses this item, is something 985 00:53:58,720 --> 00:54:00,790 that Wikidata can tell us. 986 00:54:00,790 --> 00:54:02,200 And that's it. 987 00:54:02,200 --> 00:54:04,570 Now, is that everything that we could possibly 988 00:54:04,570 --> 00:54:07,780 say in a structured way about Harvey Milk? 989 00:54:07,780 --> 00:54:08,680 No. 990 00:54:08,680 --> 00:54:13,570 We could probably find at least a few more things to say. 991 00:54:13,570 --> 00:54:17,170 We will see how to contribute new information to Wikidata 992 00:54:17,170 --> 00:54:19,990 in just a minute with a different example. 993 00:54:19,990 --> 00:54:23,360 But this-- all this was a set of statements. 994 00:54:23,360 --> 00:54:23,860 Right? 995 00:54:23,860 --> 00:54:25,927 This was the title statements here. 996 00:54:28,840 --> 00:54:31,160 But at the bottom of the list of statements is 997 00:54:31,160 --> 00:54:34,300 another section called identifiers. 998 00:54:34,300 --> 00:54:36,960 And I want to spend a minute talking about what that is. 999 00:54:36,960 --> 00:54:43,630 So identifiers is a collection of keys. 1000 00:54:43,630 --> 00:54:47,980 A collection of IDs, or codes, that 1001 00:54:47,980 --> 00:54:52,890 are keys to other information sources. 1002 00:54:52,890 --> 00:54:58,560 And a lot of Wikidata items have a whole series of keys 1003 00:54:58,560 --> 00:55:03,030 to other databases, other sites, other repositories, 1004 00:55:03,030 --> 00:55:08,340 that help you or a computer be able to access not just 1005 00:55:08,340 --> 00:55:12,240 some database and look for information about Harvey Milk, 1006 00:55:12,240 --> 00:55:16,950 but access the exact record relevant to Harvey Milk. 1007 00:55:16,950 --> 00:55:20,280 And again, if you imagine someone named John Smith, 1008 00:55:20,280 --> 00:55:21,690 that is really valuable, right? 1009 00:55:21,690 --> 00:55:23,250 If you're not just told, oh yeah, 1010 00:55:23,250 --> 00:55:24,875 you can look at the Library of Congress 1011 00:55:24,875 --> 00:55:27,840 for John Smith, good luck with that. 1012 00:55:27,840 --> 00:55:30,240 Or if I tell you, go to the Library of Congress 1013 00:55:30,240 --> 00:55:35,810 to this record for this John Smith, you see the difference. 1014 00:55:35,810 --> 00:55:42,080 So Wikidata tells us that on VIAF, which is the Virtual 1015 00:55:42,080 --> 00:55:44,570 International Authority File. 1016 00:55:44,570 --> 00:55:50,140 It's an aggregated master index built by bibliographers, 1017 00:55:50,140 --> 00:55:52,831 by librarians, of people. 1018 00:55:52,831 --> 00:55:53,330 Right? 1019 00:55:53,330 --> 00:55:56,720 It tries to kind of aggregate information about people 1020 00:55:56,720 --> 00:55:59,270 across library catalogs everywhere. 1021 00:55:59,270 --> 00:56:05,120 So the VIAF ID for Harvey Milk is this number. 1022 00:56:05,120 --> 00:56:07,340 And conveniently, if I click that, 1023 00:56:07,340 --> 00:56:10,160 I'm not taking to some Wikidata item. 1024 00:56:10,160 --> 00:56:13,010 I'm actually taken to the relevant site. 1025 00:56:13,010 --> 00:56:16,760 So this took me right to viaf.org, the Virtual 1026 00:56:16,760 --> 00:56:21,770 International Authority File, directly to their record 1027 00:56:21,770 --> 00:56:23,310 about Harvey Milk. 1028 00:56:23,310 --> 00:56:23,810 All right? 1029 00:56:23,810 --> 00:56:27,290 And that itself leads me to national catalogs 1030 00:56:27,290 --> 00:56:29,630 of national libraries all over the world. 1031 00:56:29,630 --> 00:56:32,360 We won't get into the things you can do with VIAF. 1032 00:56:32,360 --> 00:56:37,220 The point is Wikidata contained the piece of thread 1033 00:56:37,220 --> 00:56:40,820 that I could tug on to arrive directly 1034 00:56:40,820 --> 00:56:44,840 to that information in other databases. 1035 00:56:44,840 --> 00:56:45,680 Yes. 1036 00:56:45,680 --> 00:56:49,670 And it has that for many, many kinds of databases. 1037 00:56:49,670 --> 00:56:53,150 The BNF, for example, that's the National Library of France. 1038 00:56:53,150 --> 00:56:56,270 And that will take me to that index card. 1039 00:56:56,270 --> 00:56:57,320 IMDB. 1040 00:56:57,320 --> 00:56:58,620 We all know IMDB, right? 1041 00:56:58,620 --> 00:57:03,320 So here I have the key to Harvey Milk in IMDB. 1042 00:57:03,320 --> 00:57:05,810 And this is what IMDB says about Harvey Milk, right? 1043 00:57:05,810 --> 00:57:08,480 They have their own piece of information about him, 1044 00:57:08,480 --> 00:57:11,590 of course, with filmography and everything else. 1045 00:57:11,590 --> 00:57:15,140 And see, I did not have to search IMDB for it. 1046 00:57:15,140 --> 00:57:19,070 I just had the key right there waiting for me. 1047 00:57:19,070 --> 00:57:21,080 Now, again, this is very convenient for me 1048 00:57:21,080 --> 00:57:24,590 as I just showed you the human use case for this. 1049 00:57:24,590 --> 00:57:27,530 But it's even more powerful in aggregate 1050 00:57:27,530 --> 00:57:35,450 when we allow computers to traverse this network of links 1051 00:57:35,450 --> 00:57:36,110 between-- 1052 00:57:36,110 --> 00:57:41,690 not just within wiki data, but between data storage facilities 1053 00:57:41,690 --> 00:57:43,850 and repositories. 1054 00:57:43,850 --> 00:57:49,790 This is sometimes referred to as the linked data open cloud. 1055 00:57:49,790 --> 00:57:52,670 Cloud, because it's multiple different repositories 1056 00:57:52,670 --> 00:57:54,740 that are interlinked. 1057 00:57:54,740 --> 00:58:02,210 And Wikidata is already, and to a growing extent, the Nexus, 1058 00:58:02,210 --> 00:58:04,460 the connection point between a lot 1059 00:58:04,460 --> 00:58:06,780 of these different databases. 1060 00:58:06,780 --> 00:58:09,230 So IMDB, for example, it's a good example 1061 00:58:09,230 --> 00:58:11,300 because it's site almost everyone knows, 1062 00:58:11,300 --> 00:58:14,000 IMDB has information about Harvey Milk. 1063 00:58:14,000 --> 00:58:16,670 But that information does not include a link 1064 00:58:16,670 --> 00:58:19,140 to the French National Library. 1065 00:58:19,140 --> 00:58:19,645 Right? 1066 00:58:19,645 --> 00:58:20,770 Do you see what I'm saying? 1067 00:58:20,770 --> 00:58:25,550 So IMDB is a data repository with IDs and allows linking. 1068 00:58:25,550 --> 00:58:28,100 But it does not give you what Wikidata gives you which 1069 00:58:28,100 --> 00:58:32,850 is this kind of collection of-- 1070 00:58:32,850 --> 00:58:36,330 it's like a junction of all these different data sources. 1071 00:58:36,330 --> 00:58:37,910 So Wikidata is the place where you 1072 00:58:37,910 --> 00:58:40,730 can document these interrelationships 1073 00:58:40,730 --> 00:58:41,640 or equivalencies. 1074 00:58:41,640 --> 00:58:42,140 Right? 1075 00:58:42,140 --> 00:58:48,770 So ID, you know, 587548 on IMDB is discussing the same topic 1076 00:58:48,770 --> 00:58:52,260 as French National Library ID whatever. 1077 00:58:52,260 --> 00:58:55,210 Wikidata contains that piece of information. 1078 00:58:55,210 --> 00:58:59,090 that this ID in this database is about the same person 1079 00:58:59,090 --> 00:59:04,050 as that ID in that database. 1080 00:59:04,050 --> 00:59:05,290 OK. 1081 00:59:05,290 --> 00:59:07,420 So that's what identifiers are about. 1082 00:59:07,420 --> 00:59:11,320 Still scrolling down the Wikidata item about Harvey 1083 00:59:11,320 --> 00:59:15,500 Milk, we have the site links. 1084 00:59:15,500 --> 00:59:20,840 The site links are links to Wikimedia projects 1085 00:59:20,840 --> 00:59:22,770 that are related to this item. 1086 00:59:22,770 --> 00:59:25,250 So of course there are Wikipedia articles 1087 00:59:25,250 --> 00:59:28,880 about Harvey Milk in many, many different wikipedias. 1088 00:59:28,880 --> 00:59:31,700 Quite a few language versions. 1089 00:59:31,700 --> 00:59:34,960 And there are pages on Wikiquote, 1090 00:59:34,960 --> 00:59:36,680 one of the sister projects. 1091 00:59:36,680 --> 00:59:38,630 There are pages on Wikiquote with some quotes 1092 00:59:38,630 --> 00:59:40,130 from Harvey Milk. 1093 00:59:40,130 --> 00:59:45,060 And there is even a page for Harvey Milk on Wikisource. 1094 00:59:45,060 --> 00:59:45,560 Right? 1095 00:59:45,560 --> 00:59:47,840 So this is a collection of those links. 1096 00:59:47,840 --> 00:59:52,760 And those of you who have maybe only dealt with Wikidata data 1097 00:59:52,760 --> 00:59:57,290 for inter-wiki links, which we used to do in the old days 1098 00:59:57,290 --> 00:59:59,600 manually within the article text, 1099 00:59:59,600 --> 01:00:01,716 now we do it through Wikidata, so maybe that's 1100 01:00:01,716 --> 01:00:03,590 the only thing you didn't know about Wikidata 1101 01:00:03,590 --> 01:00:10,130 is how to update these inter-wiki tables on Wikidata. 1102 01:00:10,130 --> 01:00:11,430 All right. 1103 01:00:11,430 --> 01:00:14,090 So that concludes our little tour 1104 01:00:14,090 --> 01:00:18,560 of the anatomy of a Wikidata page. 1105 01:00:18,560 --> 01:00:22,370 I will just remind you that it's a wiki page, which 1106 01:00:22,370 --> 01:00:26,120 means it has a discussion page, a talk page. 1107 01:00:26,120 --> 01:00:27,960 This one happens to be empty. 1108 01:00:27,960 --> 01:00:30,092 But, you know, if we have concerns or arguments 1109 01:00:30,092 --> 01:00:31,550 about some of the data here that is 1110 01:00:31,550 --> 01:00:33,290 what we would use to discuss this 1111 01:00:33,290 --> 01:00:36,830 and to arrive at consensus. 1112 01:00:36,830 --> 01:00:41,760 It also has a history view just like every Wikipedia article. 1113 01:00:41,760 --> 01:00:47,402 So you can see here a list of edits. 1114 01:00:47,402 --> 01:00:48,860 Maybe some of you have never looked 1115 01:00:48,860 --> 01:00:51,710 at a history page on Wikipedia, so this looks overwhelming. 1116 01:00:51,710 --> 01:00:55,040 But every line here, every entry here, 1117 01:00:55,040 --> 01:00:58,240 is a single edit, a single revision, a single change 1118 01:00:58,240 --> 01:01:00,440 to this Wikidata item. 1119 01:01:00,440 --> 01:01:01,670 Just Harvey Milk. 1120 01:01:01,670 --> 01:01:04,250 And you can see at the very top this edit that I just 1121 01:01:04,250 --> 01:01:06,680 made-- this is my volunteer account 1122 01:01:06,680 --> 01:01:09,650 and I just made this edit, and in parentheses you 1123 01:01:09,650 --> 01:01:10,790 can see what I did. 1124 01:01:10,790 --> 01:01:14,640 I added an HE, Hebrew, description. 1125 01:01:14,640 --> 01:01:16,930 And this is the text that I added in Hebrew. 1126 01:01:16,930 --> 01:01:17,430 Right? 1127 01:01:17,430 --> 01:01:21,470 So we can see who added what to the Wikidata item, 1128 01:01:21,470 --> 01:01:24,960 just like we can do the same on Wikipedia. 1129 01:01:24,960 --> 01:01:26,390 So we have the revision history. 1130 01:01:26,390 --> 01:01:27,560 We can undo edits. 1131 01:01:27,560 --> 01:01:30,320 We can revert, just like on Wikipedia. 1132 01:01:34,420 --> 01:01:36,940 And what else did I want to show here? 1133 01:01:36,940 --> 01:01:40,930 We can add an item to my watch list using the star, 1134 01:01:40,930 --> 01:01:42,020 just like on Wikipedia. 1135 01:01:42,020 --> 01:01:46,670 So we have all these standard wiki features 1136 01:01:46,670 --> 01:01:47,878 that we would come to expect. 1137 01:01:50,440 --> 01:01:54,270 Let's pause for questions. 1138 01:01:54,270 --> 01:01:58,412 Any questions about what we've covered so far? 1139 01:02:02,573 --> 01:02:03,073 Yes. 1140 01:02:06,950 --> 01:02:11,345 Are attributes of statements precept for the specific value? 1141 01:02:16,640 --> 01:02:19,830 No they're not reset. 1142 01:02:19,830 --> 01:02:29,760 And generally Wikidata data does not enforce by default logic. 1143 01:02:29,760 --> 01:02:32,130 So, I mean, there's nothing to prevent you 1144 01:02:32,130 --> 01:02:38,700 from editing the item about Brazil, 1145 01:02:38,700 --> 01:02:42,990 and adding the property height. 1146 01:02:46,690 --> 01:02:50,430 Now height is not a relevant property for a country. 1147 01:02:50,430 --> 01:02:50,970 Right? 1148 01:02:50,970 --> 01:02:53,880 I mean, maybe average elevation, maybe. 1149 01:02:53,880 --> 01:02:56,400 But not just height, which is used for humans 1150 01:02:56,400 --> 01:02:59,040 or for physical things. 1151 01:02:59,040 --> 01:03:02,400 So you could add that property to Brazil and save it 1152 01:03:02,400 --> 01:03:04,650 and the wiki would not complain. 1153 01:03:04,650 --> 01:03:07,590 Now in the background there are kind 1154 01:03:07,590 --> 01:03:13,020 of extra wiki outside the wiki prostheses for constraint 1155 01:03:13,020 --> 01:03:13,710 validation. 1156 01:03:13,710 --> 01:03:16,050 So there are bots and other processes that 1157 01:03:16,050 --> 01:03:17,940 run, and occasionally, for example, 1158 01:03:17,940 --> 01:03:26,570 identify non-living things with a date of birth field. 1159 01:03:26,570 --> 01:03:27,720 That's nonsensical. 1160 01:03:27,720 --> 01:03:29,010 That should not exist. 1161 01:03:29,010 --> 01:03:31,710 If someone mistakenly added that there are processes 1162 01:03:31,710 --> 01:03:34,350 that would flag that to be fixed. 1163 01:03:34,350 --> 01:03:36,690 But the wiki itself, Wikidata, will not 1164 01:03:36,690 --> 01:03:38,550 prevent you from adding that. 1165 01:03:38,550 --> 01:03:41,940 And that is by design to keep things flexible. 1166 01:03:41,940 --> 01:03:43,930 So that people don't run into, oh wait, 1167 01:03:43,930 --> 01:03:46,560 but I can't add this because nobody thought 1168 01:03:46,560 --> 01:03:49,830 that I would need this, maybe. 1169 01:03:49,830 --> 01:03:54,530 I hope that answers your question. 1170 01:03:54,530 --> 01:03:57,290 You say helpful answer, question mark. 1171 01:03:57,290 --> 01:03:59,510 So was it a helpful answer, or? 1172 01:04:03,940 --> 01:04:04,440 OK. 1173 01:04:04,440 --> 01:04:05,426 Yes, Eleanor. 1174 01:04:05,426 --> 01:04:10,707 AUDIENCE: [INAUDIBLE] 1175 01:04:10,707 --> 01:04:12,040 ASAF BARTOV: Excellent question. 1176 01:04:12,040 --> 01:04:13,030 I'll repeat it. 1177 01:04:13,030 --> 01:04:16,180 You ask how do I find the wiki data item 1178 01:04:16,180 --> 01:04:18,370 number from Wikipedia. 1179 01:04:18,370 --> 01:04:21,580 If I'm reading about Harvey Milk and I want to look at the data 1180 01:04:21,580 --> 01:04:23,600 how do I do that? 1181 01:04:23,600 --> 01:04:27,400 That is an excellent question and let's skip to Wikipedia. 1182 01:04:27,400 --> 01:04:32,030 Conveniently I have the link right here on English. 1183 01:04:32,030 --> 01:04:35,600 So this is the Wikipedia article about Harvey Milk 1184 01:04:35,600 --> 01:04:42,740 and every item on Wikipedia should have a wiki data 1185 01:04:42,740 --> 01:04:47,660 item associated with it, but it doesn't happen automatically. 1186 01:04:47,660 --> 01:04:51,470 So if I just created a page on Wikipedia 1187 01:04:51,470 --> 01:04:55,010 I also need to create a Wikidata entity for it 1188 01:04:55,010 --> 01:04:57,170 if it doesn't already exist. 1189 01:04:57,170 --> 01:04:59,420 It could already exist because it was already 1190 01:04:59,420 --> 01:05:01,970 covered in a different language, for example. 1191 01:05:01,970 --> 01:05:05,390 So that was parenthetical. 1192 01:05:05,390 --> 01:05:09,020 But every article on Wikipedia should have, here on the side, 1193 01:05:09,020 --> 01:05:14,270 on the side are under Tools, a link called Wikidata item. 1194 01:05:14,270 --> 01:05:15,450 Right here. 1195 01:05:15,450 --> 01:05:16,160 OK. 1196 01:05:16,160 --> 01:05:18,110 That Wikidata data item is a link 1197 01:05:18,110 --> 01:05:21,710 that takes you to Wikidata, to the entity, 1198 01:05:21,710 --> 01:05:23,510 and there you find the number. 1199 01:05:23,510 --> 01:05:25,370 You can-- you don't even have to click it. 1200 01:05:25,370 --> 01:05:27,830 I mean, the URL itself tells you the number. 1201 01:05:27,830 --> 01:05:34,620 The number, you see, it's wikidata.org/wiki/q17141. 1202 01:05:34,620 --> 01:05:35,444 OK. 1203 01:05:35,444 --> 01:05:36,860 So that was an excellent question. 1204 01:05:36,860 --> 01:05:37,686 Other questions? 1205 01:05:37,686 --> 01:05:38,185 Yes. 1206 01:05:41,470 --> 01:05:44,430 Yeah, about the additional attributes, the qualifiers. 1207 01:05:44,430 --> 01:05:46,920 So, yes, I answered more generically. 1208 01:05:46,920 --> 01:05:49,370 But just like the properties themselves 1209 01:05:49,370 --> 01:05:53,390 are not limited per item, the qualifiers per statement 1210 01:05:53,390 --> 01:05:57,750 are also not entirely preordained. 1211 01:05:57,750 --> 01:05:59,570 But there is some structure to it. 1212 01:05:59,570 --> 01:06:03,140 I don't want to go into it at great length right now. 1213 01:06:03,140 --> 01:06:06,320 If we have time in the end we can get back to that. 1214 01:06:06,320 --> 01:06:09,590 But some qualifiers are again relevant for some things, 1215 01:06:09,590 --> 01:06:13,180 start time, end time, and others won't be. 1216 01:06:13,180 --> 01:06:16,280 Wikidata does try to offer you-- 1217 01:06:16,280 --> 01:06:18,710 you may remember when I clicked add qualifier, 1218 01:06:18,710 --> 01:06:22,170 it gave me kind of drop down of some relevant qualifiers. 1219 01:06:22,170 --> 01:06:24,475 So it does try to help you in that way. 1220 01:06:27,280 --> 01:06:28,160 Other question? 1221 01:06:28,160 --> 01:06:31,180 Are the values for instance of already 1222 01:06:31,180 --> 01:06:33,310 mappable to external ontologies? 1223 01:06:36,500 --> 01:06:41,310 That is a complicated question. 1224 01:06:41,310 --> 01:06:43,490 I'll help people understand the question first. 1225 01:06:43,490 --> 01:06:48,570 So an ontology is a structure, some kind 1226 01:06:48,570 --> 01:06:52,350 of hierarchy or cloud, of entities 1227 01:06:52,350 --> 01:06:54,510 and their interrelationships. 1228 01:06:54,510 --> 01:06:56,920 An ontology would say, for example, 1229 01:06:56,920 --> 01:06:58,710 a person is a living thing. 1230 01:06:58,710 --> 01:06:59,670 So is a dog. 1231 01:06:59,670 --> 01:07:02,340 They're both living things, but they're different things. 1232 01:07:02,340 --> 01:07:09,910 And then, you know, say things about those entities 1233 01:07:09,910 --> 01:07:11,350 and their interrelationships. 1234 01:07:11,350 --> 01:07:13,300 Now there are many, many competing, 1235 01:07:13,300 --> 01:07:17,230 or coexisting models of ontology's. 1236 01:07:17,230 --> 01:07:19,840 Many of them were created for specific needs. 1237 01:07:19,840 --> 01:07:25,170 Many of them want to be a universal ontology. 1238 01:07:25,170 --> 01:07:27,790 But of course it's impossible to quite 1239 01:07:27,790 --> 01:07:32,150 agree on one complete and simple ontology. 1240 01:07:32,150 --> 01:07:34,240 And so there are many ontology's. 1241 01:07:34,240 --> 01:07:38,520 Which brings up your question, can we map across ontology's? 1242 01:07:38,520 --> 01:07:43,840 Can we say that when wiki data says instance of book that 1243 01:07:43,840 --> 01:07:47,260 is equivalent to some other ontology saying instance 1244 01:07:47,260 --> 01:07:49,940 of bibliographic record? 1245 01:07:49,940 --> 01:07:50,860 And the answer is yes. 1246 01:07:50,860 --> 01:07:52,360 There are some such mappings. 1247 01:07:52,360 --> 01:07:54,420 They are incomplete. 1248 01:07:54,420 --> 01:07:58,240 And there's no kind of auto magic thing happening 1249 01:07:58,240 --> 01:08:01,180 in the wiki vis-a-vis those other ontology's. 1250 01:08:01,180 --> 01:08:03,250 That's kind of left as an exercise 1251 01:08:03,250 --> 01:08:06,280 for those dealing with those other ontology's, and for tool 1252 01:08:06,280 --> 01:08:09,880 builders and other platform improvements 1253 01:08:09,880 --> 01:08:13,050 beyond Wikidata itself. 1254 01:08:13,050 --> 01:08:13,750 OK. 1255 01:08:13,750 --> 01:08:15,190 Other questions? 1256 01:08:15,190 --> 01:08:17,430 Yeah, we have one from the YouTube stream. 1257 01:08:17,430 --> 01:08:21,160 Someone asked, why can't I link Howard Carter's occupation 1258 01:08:21,160 --> 01:08:26,439 to archeologists when I use an info box that fetches info 1259 01:08:26,439 --> 01:08:28,960 from Wikidata? 1260 01:08:28,960 --> 01:08:33,160 Why can't I link it from the info box? 1261 01:08:33,160 --> 01:08:35,500 So, someone on the stream answered 1262 01:08:35,500 --> 01:08:37,659 saying, because it's an improper connection, 1263 01:08:37,659 --> 01:08:39,700 because the target is not about the subject only. 1264 01:08:43,020 --> 01:08:46,710 The target is not about the subject? 1265 01:08:46,710 --> 01:08:48,479 If I understand the question correctly, 1266 01:08:48,479 --> 01:08:53,130 what you would want to be able to do is from within Wikipedia 1267 01:08:53,130 --> 01:08:59,130 be able to say occupation and link to a Wikidata entry 1268 01:08:59,130 --> 01:09:01,050 about archeology. 1269 01:09:01,050 --> 01:09:03,569 That doesn't quite work that way. 1270 01:09:03,569 --> 01:09:05,430 We will get to a little discussion 1271 01:09:05,430 --> 01:09:08,460 of that in an upcoming section of this talk. 1272 01:09:08,460 --> 01:09:13,260 So I will defer the rest of my answer to then. 1273 01:09:13,260 --> 01:09:15,319 OK. 1274 01:09:15,319 --> 01:09:19,160 So we're done with questions for this phase, 1275 01:09:19,160 --> 01:09:22,850 and my browser got tired of waiting for me. 1276 01:09:22,850 --> 01:09:26,551 So, yes. 1277 01:09:26,551 --> 01:09:27,050 All right. 1278 01:09:27,050 --> 01:09:36,850 So we took a look at Wikidata, and we took questions. 1279 01:09:36,850 --> 01:09:41,020 So now, let's teach Wikidata some new things. 1280 01:09:41,020 --> 01:09:44,020 Some things it doesn't already know. 1281 01:09:44,020 --> 01:09:47,109 Let's look at this item here. 1282 01:09:47,109 --> 01:09:50,950 So this item is about one of my favorite writers, 1283 01:09:50,950 --> 01:09:53,840 an American writer named Helen Dewitt. 1284 01:09:53,840 --> 01:10:01,570 Wikidata, of course, fondly refers to her as q54674, 1285 01:10:01,570 --> 01:10:03,070 but we can call her Helen Dewitt. 1286 01:10:03,070 --> 01:10:05,740 And what can we contribute here? 1287 01:10:05,740 --> 01:10:10,600 So Wikidata has far less information about Helen Dewitt. 1288 01:10:10,600 --> 01:10:13,144 Most of you probably haven't heard of her, that's OK. 1289 01:10:13,144 --> 01:10:14,560 What does Wikidata know about her? 1290 01:10:14,560 --> 01:10:16,450 Well instance of human. 1291 01:10:16,450 --> 01:10:17,800 We have a photo of her. 1292 01:10:17,800 --> 01:10:18,780 She's female. 1293 01:10:18,780 --> 01:10:20,530 She's an American. 1294 01:10:20,530 --> 01:10:21,790 Her name is Helen. 1295 01:10:21,790 --> 01:10:22,630 Date of birth. 1296 01:10:22,630 --> 01:10:23,650 Place of birth. 1297 01:10:23,650 --> 01:10:25,970 She's an author, a novelist, a writer. 1298 01:10:25,970 --> 01:10:28,840 She was educated at the University of Oxford. 1299 01:10:28,840 --> 01:10:33,160 And Wikidata knows what her official website is. 1300 01:10:33,160 --> 01:10:35,780 That's useful, but that's it. 1301 01:10:35,780 --> 01:10:37,780 Now we can contribute information here. 1302 01:10:37,780 --> 01:10:43,120 For example, she's an American author writing in English. 1303 01:10:43,120 --> 01:10:45,550 So we could add that information. 1304 01:10:45,550 --> 01:10:48,430 We could click the Add button here. 1305 01:10:48,430 --> 01:10:50,200 And this is a good moment to acknowledge 1306 01:10:50,200 --> 01:10:54,830 that the user interface of Wikidata is a work in progress. 1307 01:10:54,830 --> 01:10:56,740 It's not as intuitive as it might be. 1308 01:10:56,740 --> 01:10:58,570 So you need to understand that click-- 1309 01:10:58,570 --> 01:11:01,630 to add a completely new property, 1310 01:11:01,630 --> 01:11:04,060 You need to click this Add button. 1311 01:11:04,060 --> 01:11:08,020 If you want to add an additional value to the property official 1312 01:11:08,020 --> 01:11:11,530 website, you need to click this Add button. 1313 01:11:11,530 --> 01:11:13,780 It makes a kind of sense with a shaded box. 1314 01:11:13,780 --> 01:11:15,880 But, you know, you need to kind of pay attention, 1315 01:11:15,880 --> 01:11:18,901 and it's not as friendly as it might be. 1316 01:11:18,901 --> 01:11:20,650 [COUGHING] Excuse me. 1317 01:11:20,650 --> 01:11:23,380 So, let's add a property here. 1318 01:11:23,380 --> 01:11:25,690 Click the Add button. 1319 01:11:25,690 --> 01:11:29,740 Again, Wikidata tries to be useful by suggesting 1320 01:11:29,740 --> 01:11:32,760 some relevant properties for humans. 1321 01:11:32,760 --> 01:11:36,640 A bit more morbidly it suggests, how about date of death? 1322 01:11:36,640 --> 01:11:38,700 That's not cool, Wikidata. 1323 01:11:38,700 --> 01:11:40,480 Helen Dewitt is still alive. 1324 01:11:40,480 --> 01:11:42,700 So I will not add date of death, but I 1325 01:11:42,700 --> 01:11:46,140 can add languages spoken, written, or signed. 1326 01:11:46,140 --> 01:11:48,370 OK, so I click that. 1327 01:11:48,370 --> 01:11:51,670 And she writes in English. 1328 01:11:51,670 --> 01:11:54,450 I just type English-- whoops. 1329 01:11:54,450 --> 01:11:56,750 Not in Hebrew. 1330 01:11:56,750 --> 01:11:58,380 Don't panic. 1331 01:11:58,380 --> 01:12:01,010 I type English here. 1332 01:12:01,010 --> 01:12:04,250 And, oh, and of course Wikidata has auto-complete, right? 1333 01:12:04,250 --> 01:12:06,080 So it tries to help me along. 1334 01:12:06,080 --> 01:12:10,100 But you will notice that it has all kinds of things 1335 01:12:10,100 --> 01:12:10,940 called English. 1336 01:12:10,940 --> 01:12:14,030 I mean, it turns out that there is a place in Indiana 1337 01:12:14,030 --> 01:12:16,370 called English, Indiana. 1338 01:12:16,370 --> 01:12:17,150 Did I mean that? 1339 01:12:17,150 --> 01:12:20,210 No, of course I didn't mean that she writes her books 1340 01:12:20,210 --> 01:12:21,961 in English, Indiana. 1341 01:12:21,961 --> 01:12:22,460 Right? 1342 01:12:22,460 --> 01:12:26,180 But, you know, Wikidata gives me the option of linking to that. 1343 01:12:26,180 --> 01:12:30,530 I also don't mean the botanist Carl Schwartz English. 1344 01:12:30,530 --> 01:12:32,870 No, no I mean the west Germanic language 1345 01:12:32,870 --> 01:12:34,029 originating in England. 1346 01:12:34,029 --> 01:12:34,820 That's what I mean. 1347 01:12:34,820 --> 01:12:36,110 So I click that. 1348 01:12:36,110 --> 01:12:37,760 And I click Save. 1349 01:12:37,760 --> 01:12:38,450 And that's it. 1350 01:12:38,450 --> 01:12:41,780 Again I have just made an edit to Wikidata. 1351 01:12:41,780 --> 01:12:47,750 I have just taught Wikidata that this author speaks English. 1352 01:12:47,750 --> 01:12:50,370 Now, again, this may be very obvious. 1353 01:12:50,370 --> 01:12:52,280 She's American. 1354 01:12:52,280 --> 01:12:54,560 Of course not all Americans write in English. 1355 01:12:54,560 --> 01:12:56,930 It may be obvious if you look at her books. 1356 01:12:56,930 --> 01:12:59,060 The important thing is that now Wikidata 1357 01:12:59,060 --> 01:13:02,090 knows this as a piece of data. 1358 01:13:02,090 --> 01:13:04,610 And, again, think ahead to queries, which we will 1359 01:13:04,610 --> 01:13:06,980 demonstrate in a little bit. 1360 01:13:06,980 --> 01:13:09,000 Without this piece of information 1361 01:13:09,000 --> 01:13:14,060 that I just added, if I were to ask Wikidata five minutes ago, 1362 01:13:14,060 --> 01:13:19,760 give me a list of novelists writing in English, OK, 1363 01:13:19,760 --> 01:13:22,730 Wikidata would have returned thousands of results. 1364 01:13:22,730 --> 01:13:27,600 But Helen Dewitt would not have been among them. 1365 01:13:27,600 --> 01:13:32,000 Because up until two minutes ago Wikidata 1366 01:13:32,000 --> 01:13:35,640 didn't know that Helen Dewitt writes in English and not 1367 01:13:35,640 --> 01:13:37,520 in Spanish. 1368 01:13:37,520 --> 01:13:38,730 Do you see? 1369 01:13:38,730 --> 01:13:42,570 It is this explicit statement that will now 1370 01:13:42,570 --> 01:13:46,560 make her be included in any future queries that asks, 1371 01:13:46,560 --> 01:13:48,700 who are novelists writing in English? 1372 01:13:53,250 --> 01:13:54,500 OK. 1373 01:13:54,500 --> 01:13:58,560 By the way, she's a PhD in Classics. 1374 01:13:58,560 --> 01:14:05,590 She speaks-- or at least reads and writes Latin and Greek, 1375 01:14:05,590 --> 01:14:07,270 ancient Greek, and I could-- 1376 01:14:07,270 --> 01:14:09,610 I can-- I mean, I happen to know that. 1377 01:14:09,610 --> 01:14:12,420 But wait, wait, wait, wait, wait, you say. 1378 01:14:12,420 --> 01:14:14,130 What about original research? 1379 01:14:14,130 --> 01:14:18,890 I mean, you can't just add stuff like that to Wikidata. 1380 01:14:18,890 --> 01:14:19,920 Don't you need sources? 1381 01:14:19,920 --> 01:14:22,860 Citations? 1382 01:14:22,860 --> 01:14:23,890 Of course I do. 1383 01:14:23,890 --> 01:14:25,020 Yes. 1384 01:14:25,020 --> 01:14:27,720 Let's add some sources to this. 1385 01:14:27,720 --> 01:14:31,410 So on Wikidata, just like Wikipedia, 1386 01:14:31,410 --> 01:14:34,980 things should generally be supported by citations, 1387 01:14:34,980 --> 01:14:36,990 by references. 1388 01:14:36,990 --> 01:14:43,290 And just like Wikipedia, they aren't always supported 1389 01:14:43,290 --> 01:14:44,650 in that way. 1390 01:14:44,650 --> 01:14:48,870 OK so, I mean, I can just add it to Wikidata. 1391 01:14:48,870 --> 01:14:49,442 Watch me. 1392 01:14:49,442 --> 01:14:50,400 I just did that, right? 1393 01:14:50,400 --> 01:14:54,450 I just added English and Latin without any citation, 1394 01:14:54,450 --> 01:14:56,850 and I will not be arrested for it. 1395 01:14:56,850 --> 01:14:59,520 Just like I could edit a Wikipedia article 1396 01:14:59,520 --> 01:15:02,610 and add some information without a citation. 1397 01:15:02,610 --> 01:15:03,600 It may stick. 1398 01:15:03,600 --> 01:15:06,810 It may stay in the article, or it may be reverted. 1399 01:15:06,810 --> 01:15:11,010 It depends on the kind of information I'm adding. 1400 01:15:11,010 --> 01:15:13,740 It depends how many people are paying attention 1401 01:15:13,740 --> 01:15:15,060 to the article on Wikipedia. 1402 01:15:15,060 --> 01:15:18,420 And it works the same way on Wikidata. 1403 01:15:18,420 --> 01:15:21,780 OK, so, you can add some things without references. 1404 01:15:21,780 --> 01:15:23,970 Ideally, when you add, information you 1405 01:15:23,970 --> 01:15:25,570 should include references. 1406 01:15:25,570 --> 01:15:30,990 So let's be good Wikidata citizens and add a source. 1407 01:15:30,990 --> 01:15:34,395 Here is an article that I prepared in advance. 1408 01:15:38,100 --> 01:15:39,370 This is Helen Dewitt. 1409 01:15:39,370 --> 01:15:44,450 And in this article, somewhere, it actually 1410 01:15:44,450 --> 01:15:51,770 says right at the bottom here, see, 1411 01:15:51,770 --> 01:15:54,990 Dewitt knows, in descending order of proficiency, Latin, 1412 01:15:54,990 --> 01:15:57,010 ancient Greek, French, German, Spanish, 1413 01:15:57,010 --> 01:15:59,460 and Portuguese, Dutch, Danish, Norwegian, Swedish, Arabic, 1414 01:15:59,460 --> 01:16:01,680 Hebrew and Japanese. 1415 01:16:01,680 --> 01:16:04,770 This may sound excessive, but it's true. 1416 01:16:04,770 --> 01:16:06,330 I met this woman. 1417 01:16:06,330 --> 01:16:09,670 So anyway, we don't have to include all of that. 1418 01:16:09,670 --> 01:16:13,050 The point is this article from a reasonably reliable source, 1419 01:16:13,050 --> 01:16:15,840 this magazine, this interview, can 1420 01:16:15,840 --> 01:16:19,270 count as a source for the languages she speaks. 1421 01:16:19,270 --> 01:16:20,700 So I copy the URL. 1422 01:16:20,700 --> 01:16:23,130 I just copied off my browser. 1423 01:16:23,130 --> 01:16:27,530 And, whoops-- that's not-- 1424 01:16:27,530 --> 01:16:28,580 here we go. 1425 01:16:28,580 --> 01:16:31,610 And I can just add a reference here 1426 01:16:31,610 --> 01:16:34,670 to the information that I just added to Wikidata, right? 1427 01:16:34,670 --> 01:16:38,300 I can click Add Reference. 1428 01:16:38,300 --> 01:16:45,800 And then just say the reference URL is, and I just paste. 1429 01:16:45,800 --> 01:16:48,840 I paste this URL. 1430 01:16:48,840 --> 01:16:50,160 Hit Enter. 1431 01:16:50,160 --> 01:16:51,060 And that's it. 1432 01:16:51,060 --> 01:16:55,380 And now the fact that she speaks Latin has a reference. 1433 01:16:55,380 --> 01:16:58,320 If you look at the other things here on Wikidata, 1434 01:16:58,320 --> 01:17:02,660 you can see that these IDs, for example, have references, too. 1435 01:17:02,660 --> 01:17:03,420 Right? 1436 01:17:03,420 --> 01:17:06,570 In this case, the reference just says, excuse me-- 1437 01:17:14,760 --> 01:17:18,600 In this case it just as imported from English Wikipedia. 1438 01:17:18,600 --> 01:17:24,970 But wait, you say, can Wikipedia be a source? 1439 01:17:24,970 --> 01:17:26,620 Not properly, no. 1440 01:17:26,620 --> 01:17:30,100 I mean, just like Wikipedia itself doesn't cite itself. 1441 01:17:30,100 --> 01:17:33,790 We don't say, this person was born in this city 1442 01:17:33,790 --> 01:17:34,870 how do we know? 1443 01:17:34,870 --> 01:17:37,210 We read it on Wikipedia in another language. 1444 01:17:37,210 --> 01:17:39,610 That's not a good citation. 1445 01:17:39,610 --> 01:17:41,400 It's not a good citation for Wikidata 1446 01:17:41,400 --> 01:17:45,040 either so why do we put it here? 1447 01:17:45,040 --> 01:17:49,240 Well you can see the qualifier here is different, right? 1448 01:17:49,240 --> 01:17:53,535 It's not reference URL, which is what I put in for Latin here. 1449 01:18:17,020 --> 01:18:20,320 It's not reference URL here, it's a different qualifier. 1450 01:18:20,320 --> 01:18:23,020 It says-- saying, imported from. 1451 01:18:23,020 --> 01:18:25,960 So this is not an actual reference that 1452 01:18:25,960 --> 01:18:27,610 supports this piece of data. 1453 01:18:27,610 --> 01:18:30,730 It just shows where did this data come from. 1454 01:18:30,730 --> 01:18:33,670 It's a slightly different thing, because this data was 1455 01:18:33,670 --> 01:18:37,210 mass imported into Wikidata. 1456 01:18:37,210 --> 01:18:40,960 So it wasn't input by hand by some volunteer. 1457 01:18:40,960 --> 01:18:44,770 It was imported into Wikidata en masse by a script, 1458 01:18:44,770 --> 01:18:46,180 by a program. 1459 01:18:46,180 --> 01:18:49,820 And we want to know, where did this number come from? 1460 01:18:49,820 --> 01:18:51,440 Well it came from English Wikipedia. 1461 01:18:51,440 --> 01:18:54,130 So again, that's not a proper reference 1462 01:18:54,130 --> 01:18:56,200 for the validity of the information, 1463 01:18:56,200 --> 01:18:59,200 but it does at least tell us it came from English Wikipedia. 1464 01:18:59,200 --> 01:19:03,460 We can click and look on English Wikipedia and find out. 1465 01:19:03,460 --> 01:19:05,230 Maybe there's a footnote there that 1466 01:19:05,230 --> 01:19:08,970 says where it did come from. 1467 01:19:08,970 --> 01:19:11,000 OK. 1468 01:19:11,000 --> 01:19:15,320 So this was an example of teaching Wikidata something 1469 01:19:15,320 --> 01:19:16,910 that it didn't know. 1470 01:19:16,910 --> 01:19:18,512 Something about the languages. 1471 01:19:18,512 --> 01:19:20,720 And of course I could add this reference for English. 1472 01:19:20,720 --> 01:19:23,210 I could add all the other languages that she speaks. 1473 01:19:23,210 --> 01:19:26,060 And I won't bore you with that, but that is basically 1474 01:19:26,060 --> 01:19:27,050 how it's done. 1475 01:19:27,050 --> 01:19:29,720 So you click this Add to add a completely new-- 1476 01:19:32,650 --> 01:19:34,030 completely new statement. 1477 01:19:34,030 --> 01:19:36,250 Now, by the way, the fact that these are the only two 1478 01:19:36,250 --> 01:19:39,220 suggestions that Wikidata can think of, 1479 01:19:39,220 --> 01:19:42,100 doesn't mean these are the only options. 1480 01:19:42,100 --> 01:19:46,750 OK, you can just type anything that may be relevant. 1481 01:19:46,750 --> 01:19:50,950 We could add, for example, award. 1482 01:19:50,950 --> 01:19:52,570 Just start typing award. 1483 01:19:52,570 --> 01:19:54,910 And here I have I have a bunch of properties 1484 01:19:54,910 --> 01:19:56,510 that are relevant for awards. 1485 01:19:56,510 --> 01:20:00,100 Awards received, together with, conferred by, right? 1486 01:20:00,100 --> 01:20:05,790 There's all kinds of properties that I could rely on. 1487 01:20:05,790 --> 01:20:09,600 And of course there is a list of all the properties of Wikidata. 1488 01:20:09,600 --> 01:20:11,580 And that list is also sorted by type. 1489 01:20:11,580 --> 01:20:15,480 So yes, there is a list of properties relevant to people 1490 01:20:15,480 --> 01:20:17,130 so that you don't have to guess. 1491 01:20:17,130 --> 01:20:18,660 But a surprising amount of the time 1492 01:20:18,660 --> 01:20:22,760 you can just start typing and get the right properties 1493 01:20:22,760 --> 01:20:25,340 suggested to you. 1494 01:20:25,340 --> 01:20:27,230 OK. 1495 01:20:27,230 --> 01:20:33,050 So we taught Wikidata something new, 1496 01:20:33,050 --> 01:20:38,980 and now let's teach Wikidata something completely new. 1497 01:20:38,980 --> 01:20:39,480 Right? 1498 01:20:39,480 --> 01:20:42,480 So how do we create a new Wikidata item? 1499 01:20:42,480 --> 01:20:46,880 So, like I said, if I created a Wikipedia article 1500 01:20:46,880 --> 01:20:49,520 about something that was not previously covered 1501 01:20:49,520 --> 01:20:53,540 on any other Wikipedia, chances are 1502 01:20:53,540 --> 01:20:57,170 there would not be an already existing Wikidata item. 1503 01:20:57,170 --> 01:21:03,190 Sometimes there might be, because Wikidata 1504 01:21:03,190 --> 01:21:06,857 does have 25 million entities. 1505 01:21:06,857 --> 01:21:08,190 But sometimes there wouldn't be. 1506 01:21:08,190 --> 01:21:10,148 So, first of all, I could search for it, right? 1507 01:21:10,148 --> 01:21:14,210 So I could go to Wikidata to the search box 1508 01:21:14,210 --> 01:21:17,390 here and just start typing, and search for what I want, right? 1509 01:21:17,390 --> 01:21:20,690 So if I'm searching for Helen Dewitt I just say Helen, 1510 01:21:20,690 --> 01:21:25,590 and I can see whether or not it exists. 1511 01:21:25,590 --> 01:21:29,240 And there's a detailed search results page, et cetera, 1512 01:21:29,240 --> 01:21:33,074 where I can where I can find out if the item does exist or not. 1513 01:21:33,074 --> 01:21:35,240 Excuse me, this reminds me of a very important thing 1514 01:21:35,240 --> 01:21:36,620 I wanted to demonstrate, and that 1515 01:21:36,620 --> 01:21:42,710 is the multilingualism of Wikidata. 1516 01:21:42,710 --> 01:21:49,340 So remember all these labels in other languages. 1517 01:21:49,340 --> 01:21:54,390 Wikidata knows what to call Helen Dewitt in Hebrew. 1518 01:21:54,390 --> 01:22:00,800 And it will show it to Wikidata users whose language is Hebrew. 1519 01:22:00,800 --> 01:22:04,220 Mine is set to English, for your sake. 1520 01:22:04,220 --> 01:22:08,830 But if I change this I go to Preferences here and change 1521 01:22:08,830 --> 01:22:09,740 my language. 1522 01:22:09,740 --> 01:22:15,475 [INAUDIBLE] All right, and I hit Save. 1523 01:22:15,475 --> 01:22:20,350 Wikidata will start talking to me in Hebrew. 1524 01:22:20,350 --> 01:22:23,090 Now brace yourselves. 1525 01:22:23,090 --> 01:22:24,620 Are you ready? 1526 01:22:24,620 --> 01:22:28,430 Don't panic, it's right to left. 1527 01:22:28,430 --> 01:22:32,630 Oh my god everything is topsy-turvy. 1528 01:22:32,630 --> 01:22:36,590 So this is the same article in Hebrew. 1529 01:22:36,590 --> 01:22:39,290 So the sidebar has switched direction, 1530 01:22:39,290 --> 01:22:41,300 and I know most of you cannot read it. 1531 01:22:41,300 --> 01:22:42,480 Bear with me. 1532 01:22:42,480 --> 01:22:44,750 This is the label that we previously 1533 01:22:44,750 --> 01:22:46,840 saw in the label box. 1534 01:22:46,840 --> 01:22:49,580 This is how you spell Helen Dewitt in Hebrew. 1535 01:22:49,580 --> 01:22:52,550 And here is the description in Hebrew. 1536 01:22:52,550 --> 01:22:54,980 It's not the description in English, this description, 1537 01:22:54,980 --> 01:22:57,380 American writer, which I was shown previously. 1538 01:22:57,380 --> 01:23:00,740 Now I'm shown the Hebrew description, appropriately. 1539 01:23:00,740 --> 01:23:03,500 But more interestingly, oh my god! 1540 01:23:03,500 --> 01:23:07,640 All these statements are suddenly in Hebrew. 1541 01:23:07,640 --> 01:23:08,940 How did that happen? 1542 01:23:11,570 --> 01:23:15,560 Well this tiny word here is the very concise way 1543 01:23:15,560 --> 01:23:22,450 to say in Hebrew, instance of, and this word here means human. 1544 01:23:22,450 --> 01:23:25,960 So these are links to the same things, right? 1545 01:23:25,960 --> 01:23:28,100 It still links to Q5. 1546 01:23:28,100 --> 01:23:31,780 Q5 is the Wikidata entity for human. 1547 01:23:31,780 --> 01:23:33,370 These are still the same things. 1548 01:23:33,370 --> 01:23:37,600 But because Wikidata has multiple labels for everything, 1549 01:23:37,600 --> 01:23:39,580 it has multiple labels for items. 1550 01:23:39,580 --> 01:23:42,760 And it also has multiple labels for property names. 1551 01:23:42,760 --> 01:23:46,450 So Wikidata knows how to say, instance of, 1552 01:23:46,450 --> 01:23:50,140 and award received, in other languages. 1553 01:23:50,140 --> 01:23:54,490 That is why it is able to show me all this data in Hebrew 1554 01:23:54,490 --> 01:23:59,890 even if none of that data was actually input into Wikidata 1555 01:23:59,890 --> 01:24:01,870 by a Hebrew speaker. 1556 01:24:01,870 --> 01:24:04,900 That data could have been input by English speakers, 1557 01:24:04,900 --> 01:24:08,230 but thanks to the fact that someone once 1558 01:24:08,230 --> 01:24:12,760 translated the word photo into Hebrew, 1559 01:24:12,760 --> 01:24:14,830 I can see this field in Hebrew. 1560 01:24:17,750 --> 01:24:21,230 So one of the things you can do to help Wikidata, 1561 01:24:21,230 --> 01:24:23,600 right now, without any special knowledge 1562 01:24:23,600 --> 01:24:26,210 is to help translate those labels. 1563 01:24:26,210 --> 01:24:29,030 Every label only needs to be translated just once. 1564 01:24:29,030 --> 01:24:31,310 So you can see that all of these properties, date 1565 01:24:31,310 --> 01:24:34,720 of birth, name et cetera, they all have Hebrew labels. 1566 01:24:34,720 --> 01:24:36,760 Maybe one of these would not. 1567 01:24:36,760 --> 01:24:38,361 No, they all have Hebrew labels. 1568 01:24:38,361 --> 01:24:39,110 Doing pretty good. 1569 01:24:42,960 --> 01:24:45,810 And I'm able to search in my own language. 1570 01:24:45,810 --> 01:24:48,210 I'm able to click Add. 1571 01:24:48,210 --> 01:24:49,890 This word is Add, so I click this, 1572 01:24:49,890 --> 01:24:51,780 and now I have the Add screen. 1573 01:24:51,780 --> 01:24:55,860 It all speaks my language, and it's awesome. 1574 01:24:55,860 --> 01:25:00,330 And now for your sake I will switch back to English, 1575 01:25:00,330 --> 01:25:03,090 but it is important to know you can 1576 01:25:03,090 --> 01:25:05,740 edit Wikidata in any language. 1577 01:25:05,740 --> 01:25:09,050 And it is far more multi-lingual and multi-lingual friendly 1578 01:25:09,050 --> 01:25:13,260 than, for example commons, which is also a project we all share. 1579 01:25:13,260 --> 01:25:17,730 But commons has some limitations on how multi-lingual it is. 1580 01:25:17,730 --> 01:25:21,410 For example, the category names, et cetera. 1581 01:25:21,410 --> 01:25:23,270 OK. 1582 01:25:23,270 --> 01:25:25,670 So we were beginning to discuss creating 1583 01:25:25,670 --> 01:25:27,140 something completely new. 1584 01:25:27,140 --> 01:25:29,360 AUDIENCE: Quick questions, if that's OK? 1585 01:25:29,360 --> 01:25:30,980 So there's two questions on IRC. 1586 01:25:30,980 --> 01:25:33,890 The first one is, can you show search for something 1587 01:25:33,890 --> 01:25:35,420 like getting the list of things? 1588 01:25:35,420 --> 01:25:38,360 I want to learn how to search for something properly like, 1589 01:25:38,360 --> 01:25:43,705 show me all the items with this value of this property. 1590 01:25:43,705 --> 01:25:45,080 ASAF BARTOV: Yes. 1591 01:25:45,080 --> 01:25:47,540 That is part of this talk, but I'll 1592 01:25:47,540 --> 01:25:49,250 get to that in a little bit later. 1593 01:25:49,250 --> 01:25:52,010 There's a whole section where I will demonstrate the very, very 1594 01:25:52,010 --> 01:25:55,190 powerful query system of Wikidata 1595 01:25:55,190 --> 01:25:57,170 where I will cash that check that I gave 1596 01:25:57,170 --> 01:25:59,090 at the beginning of all these painters 1597 01:25:59,090 --> 01:26:01,029 who are sons of painters queries et cetera 1598 01:26:01,029 --> 01:26:02,570 So I will demonstrate how to do that. 1599 01:26:02,570 --> 01:26:04,190 AUDIENCE: Other question. 1600 01:26:04,190 --> 01:26:07,250 How does Wikidata data deal with link rot, and other issues 1601 01:26:07,250 --> 01:26:09,680 streaming from their URL refs. 1602 01:26:13,528 --> 01:26:16,290 ASAF BARTOV: URLs break. 1603 01:26:16,290 --> 01:26:18,730 We call that link rot. 1604 01:26:18,730 --> 01:26:22,470 Wikidata doesn't have any particular magic 1605 01:26:22,470 --> 01:26:24,730 around link rot, just like Wikipedia. 1606 01:26:24,730 --> 01:26:29,100 So if you do use a bare URL it may well rot. 1607 01:26:29,100 --> 01:26:34,230 But you can add qualifiers with back up URLs else 1608 01:26:34,230 --> 01:26:37,680 on the Internet Archive, or another mirroring service. 1609 01:26:37,680 --> 01:26:42,780 And potentially that could be a software feature for Wikidata 1610 01:26:42,780 --> 01:26:46,590 to automatically save or ensure that something 1611 01:26:46,590 --> 01:26:48,660 is saved on Internet Archive, but I don't 1612 01:26:48,660 --> 01:26:50,670 know that it is doing so now. 1613 01:26:50,670 --> 01:26:56,040 So, just like Wikipedia, if it is a bear URL it may rot. 1614 01:26:56,040 --> 01:27:00,240 And may need to be replaced, possibly by bot. 1615 01:27:00,240 --> 01:27:01,390 Other questions? 1616 01:27:09,840 --> 01:27:12,650 All right, so let's talk about how you 1617 01:27:12,650 --> 01:27:15,090 create a completely new item. 1618 01:27:15,090 --> 01:27:16,300 It's very simple. 1619 01:27:16,300 --> 01:27:21,810 You go to Wikidata and you click here on the side. 1620 01:27:21,810 --> 01:27:30,180 There's a link, create new item, which gives you this screen. 1621 01:27:30,180 --> 01:27:35,030 And let's create an item about a book 1622 01:27:35,030 --> 01:27:39,500 that I'm reading right now by this Bulgarian writer. 1623 01:27:39,500 --> 01:27:43,950 So we have an article about this writer guy named Deyan Enev. 1624 01:27:43,950 --> 01:27:48,530 But we don't have an article or a Wikidata item 1625 01:27:48,530 --> 01:28:07,980 about one of his famous books called Circus Bulgaria. 1626 01:28:07,980 --> 01:28:10,050 That's the book I'm reading, his first collection 1627 01:28:10,050 --> 01:28:11,216 of short stories in English. 1628 01:28:11,216 --> 01:28:14,280 Circus Bulgaria came out in 2010, Portobello Books, 1629 01:28:14,280 --> 01:28:17,099 translated by Kapka Kassabova. 1630 01:28:17,099 --> 01:28:18,390 So that's the book I'm reading. 1631 01:28:18,390 --> 01:28:20,520 As you can see it's not a link on Wikipedia. 1632 01:28:20,520 --> 01:28:23,370 There's no article about it, and there's not even 1633 01:28:23,370 --> 01:28:26,310 a Wikidata entity item about it. 1634 01:28:26,310 --> 01:28:32,220 But we can totally create it, even without a Wikipedia 1635 01:28:32,220 --> 01:28:33,090 article. 1636 01:28:33,090 --> 01:28:34,980 So let's create this new item. 1637 01:28:34,980 --> 01:28:37,260 Let's create it in English for the purposes 1638 01:28:37,260 --> 01:28:38,880 of our demonstration. 1639 01:28:38,880 --> 01:28:44,910 The name of the item is Circus Bulgaria. 1640 01:28:44,910 --> 01:28:47,520 Circus Bulgaria, that's the name. 1641 01:28:47,520 --> 01:28:50,670 Not Circus Bulgaria parentheses book, 1642 01:28:50,670 --> 01:28:53,520 or anything you may be used to from Wikipedia. 1643 01:28:53,520 --> 01:28:56,520 It's the actual name of the book, 1644 01:28:56,520 --> 01:29:00,450 and the description, again, remember, 1645 01:29:00,450 --> 01:29:03,270 the description field is just to kind of help 1646 01:29:03,270 --> 01:29:08,681 tell apart this Circus Bulgaria from any other potential Circus 1647 01:29:08,681 --> 01:29:09,180 Bulgaria. 1648 01:29:09,180 --> 01:29:11,280 Maybe there's a film or something. 1649 01:29:11,280 --> 01:29:20,480 So it's enough to just say something like short story 1650 01:29:20,480 --> 01:29:23,270 collection. 1651 01:29:23,270 --> 01:29:27,830 I might add by Deyan Enev and if just in case, again, 1652 01:29:27,830 --> 01:29:31,910 some future other short story collection by some other author 1653 01:29:31,910 --> 01:29:33,560 happens to have that same name. 1654 01:29:33,560 --> 01:29:36,391 That should be disambiguating enough. 1655 01:29:36,391 --> 01:29:36,890 OK. 1656 01:29:36,890 --> 01:29:39,770 Short story collection by Deyan Enev. 1657 01:29:39,770 --> 01:29:42,050 I could have aliases for this. 1658 01:29:42,050 --> 01:29:47,240 The aliases assist find-ability. 1659 01:29:47,240 --> 01:29:51,020 This particular book has just this one name, so that's fine. 1660 01:29:51,020 --> 01:29:52,260 And I click Create. 1661 01:29:52,260 --> 01:29:52,760 That's it. 1662 01:29:52,760 --> 01:29:55,990 I just start with a label, and a description. 1663 01:29:55,990 --> 01:29:58,740 I click Create. 1664 01:29:58,740 --> 01:30:03,890 I have a brand new queue number for my new Wikidata item. 1665 01:30:03,890 --> 01:30:05,960 And Wikidata knows what to call it. 1666 01:30:05,960 --> 01:30:09,320 And a description in one language at least. 1667 01:30:09,320 --> 01:30:11,930 And that's it, and I can start populating it. 1668 01:30:11,930 --> 01:30:15,050 As it can see, it it has no site links, 1669 01:30:15,050 --> 01:30:17,450 but it's ready to be taught. 1670 01:30:17,450 --> 01:30:20,450 So, for example, I can start by teaching 1671 01:30:20,450 --> 01:30:24,610 it the name of the book in another language 1672 01:30:24,610 --> 01:30:25,870 that I happened to speak. 1673 01:30:29,050 --> 01:30:31,720 Now it has two labels in English and Hebrew. 1674 01:30:31,720 --> 01:30:36,880 I could also look up the book Areon, 1675 01:30:36,880 --> 01:30:39,510 the original Bulgarian label for this book. 1676 01:30:39,510 --> 01:30:41,550 Seems relevant. 1677 01:30:41,550 --> 01:30:43,320 Again, I do not speak Bulgarian. 1678 01:30:43,320 --> 01:30:49,860 But I can go to the Bulgarian Wikipedia through into Wiki. 1679 01:30:49,860 --> 01:30:51,510 This is this gentleman. 1680 01:30:51,510 --> 01:30:54,510 And I could find-- 1681 01:30:54,510 --> 01:30:59,190 I can read Cyrillic so I could easily find-- 1682 01:30:59,190 --> 01:31:00,030 when I say easily-- 1683 01:31:02,940 --> 01:31:05,710 when I say easily-- 1684 01:31:05,710 --> 01:31:12,731 maybe not so easy, but I can search for it. 1685 01:31:21,070 --> 01:31:22,180 Here we go. 1686 01:31:22,180 --> 01:31:25,190 Tsirk Bulgaria. 1687 01:31:25,190 --> 01:31:27,510 That is the name of the book. 1688 01:31:27,510 --> 01:31:28,910 Tsirk, as in circus. 1689 01:31:28,910 --> 01:31:30,440 No problem. 1690 01:31:30,440 --> 01:31:32,725 So I just copy this right here. 1691 01:31:35,240 --> 01:31:38,090 And I go back to my new item. 1692 01:31:38,090 --> 01:31:45,725 My new item, which is here, and I edit the Bulgarian field. 1693 01:31:48,260 --> 01:31:49,950 And here it is. 1694 01:31:49,950 --> 01:31:50,720 Awesome. 1695 01:31:50,720 --> 01:31:51,220 All right. 1696 01:31:51,220 --> 01:31:55,420 But I still haven't told Wikidata anything about this. 1697 01:31:55,420 --> 01:31:56,920 I know I'm talking about a book. 1698 01:31:56,920 --> 01:31:59,110 Wikidata that doesn't know that yet. 1699 01:31:59,110 --> 01:32:02,630 So let's start by adding some statements. 1700 01:32:02,630 --> 01:32:05,390 First of all, I click Add. 1701 01:32:05,390 --> 01:32:07,190 Wikidata sensibly says, how about we 1702 01:32:07,190 --> 01:32:08,630 start with instance of. 1703 01:32:08,630 --> 01:32:11,090 Tell me what kind of animal-- no, not kind of animal. 1704 01:32:11,090 --> 01:32:13,940 What kind of thing are you trying to describe here? 1705 01:32:13,940 --> 01:32:18,130 Well it's an instance of a book. 1706 01:32:18,130 --> 01:32:20,930 Not in Hebrew, please. 1707 01:32:20,930 --> 01:32:22,180 So it's an instance of a book. 1708 01:32:22,180 --> 01:32:23,763 I could even be a little more specific 1709 01:32:23,763 --> 01:32:31,920 and say it's an instance of a short story collection. 1710 01:32:31,920 --> 01:32:34,620 There we go, short story collection. 1711 01:32:34,620 --> 01:32:36,800 I hit Save. 1712 01:32:36,800 --> 01:32:37,430 Awesome. 1713 01:32:37,430 --> 01:32:39,680 So now we know what kind of thing it is. 1714 01:32:39,680 --> 01:32:42,860 It's not a human, it's not a mountain, it's not a concept. 1715 01:32:42,860 --> 01:32:44,760 It's a short story collection. 1716 01:32:44,760 --> 01:32:46,400 Now I can add some other things. 1717 01:32:46,400 --> 01:32:48,770 See, Wikidata is already working for me. 1718 01:32:48,770 --> 01:32:51,020 Because it's a short story collection 1719 01:32:51,020 --> 01:32:53,960 it's offering me to populate these properties, and not 1720 01:32:53,960 --> 01:32:54,890 other ones. 1721 01:32:54,890 --> 01:32:56,990 Publication date, original language, 1722 01:32:56,990 --> 01:33:00,350 genre, country of origin, these are all relevant, right? 1723 01:33:00,350 --> 01:33:04,220 So let's start with original language of the work 1724 01:33:04,220 --> 01:33:07,410 is Bulgarian. 1725 01:33:07,410 --> 01:33:09,810 Not Bulgaria, Bulgarian. 1726 01:33:09,810 --> 01:33:12,040 This is the item I want to link. 1727 01:33:12,040 --> 01:33:21,570 Hit Save, and whatever. 1728 01:33:21,570 --> 01:33:22,890 Author. 1729 01:33:22,890 --> 01:33:26,540 Let's identify the author. 1730 01:33:26,540 --> 01:33:29,350 So the author, the main creator of the work, 1731 01:33:29,350 --> 01:33:32,470 is that gentleman Deyan Enev. 1732 01:33:32,470 --> 01:33:34,750 And remember, he has a Wikipedia article. 1733 01:33:34,750 --> 01:33:37,210 He also has a Wikidata entity. 1734 01:33:37,210 --> 01:33:39,640 So Wikidata does know about him. 1735 01:33:39,640 --> 01:33:48,930 So I hit Save, and I can add something about the translator. 1736 01:33:52,530 --> 01:33:54,390 And what was that lady's name? 1737 01:33:57,990 --> 01:34:00,120 Kapka Kassabova. 1738 01:34:00,120 --> 01:34:05,430 Now it so happens that Wikidata already knows about this lady. 1739 01:34:08,330 --> 01:34:08,840 See? 1740 01:34:08,840 --> 01:34:12,290 So I can just start typing and then just link to it. 1741 01:34:12,290 --> 01:34:12,840 Awesome. 1742 01:34:12,840 --> 01:34:13,824 But what if it didn't? 1743 01:34:13,824 --> 01:34:15,740 What if it was translated by someone who isn't 1744 01:34:15,740 --> 01:34:17,690 already covered on Wikidata? 1745 01:34:17,690 --> 01:34:22,190 Well I could just type the name as a string, 1746 01:34:22,190 --> 01:34:25,760 but ideally I could create a Wikidata entity 1747 01:34:25,760 --> 01:34:28,940 about this translator so that there is a possibility 1748 01:34:28,940 --> 01:34:30,350 to link to her. 1749 01:34:33,560 --> 01:34:36,920 Now I might actually add a qualifier here 1750 01:34:36,920 --> 01:34:40,310 because, she's not the translator of the book, right? 1751 01:34:40,310 --> 01:34:43,620 She's the translator of the book into English. 1752 01:34:43,620 --> 01:34:44,440 Right. 1753 01:34:44,440 --> 01:34:50,151 So the language that she translated into is English. 1754 01:34:50,151 --> 01:34:50,650 Right? 1755 01:34:50,650 --> 01:34:53,620 This book-- remember I'm describing the book. 1756 01:34:53,620 --> 01:34:55,376 The item is about the book. 1757 01:34:55,376 --> 01:34:57,250 So the book would have a different translator 1758 01:34:57,250 --> 01:34:58,510 into Polish. 1759 01:34:58,510 --> 01:35:02,320 So this is an example of a property or a statement 1760 01:35:02,320 --> 01:35:06,430 that doesn't make sense without one of those qualifiers. 1761 01:35:06,430 --> 01:35:08,140 It's just not correct. 1762 01:35:08,140 --> 01:35:11,320 It doesn't make sense to say that translator is. 1763 01:35:11,320 --> 01:35:14,950 The English translator, or even this English translator. 1764 01:35:14,950 --> 01:35:17,770 In 50 years maybe there would be an additional English 1765 01:35:17,770 --> 01:35:18,940 translation. 1766 01:35:18,940 --> 01:35:24,774 So that's an example of needing that qualifier. 1767 01:35:24,774 --> 01:35:27,190 And of course I could go on and populate the other fields. 1768 01:35:27,190 --> 01:35:29,710 We don't have to do that right now. 1769 01:35:29,710 --> 01:35:32,960 Publication date, country of origin, et cetera. 1770 01:35:32,960 --> 01:35:35,440 So this is already beginning to look like all those items 1771 01:35:35,440 --> 01:35:38,440 that we already saw, but just a moment ago it didn't exist. 1772 01:35:38,440 --> 01:35:43,920 Just a moment ago Wikidata had no concept of this work. 1773 01:35:43,920 --> 01:35:46,500 This happens to be one of his notable works. 1774 01:35:46,500 --> 01:35:52,080 So I could actually go to the item about Deyan Enev which 1775 01:35:52,080 --> 01:35:56,190 has all this information already, occupation, languages, 1776 01:35:56,190 --> 01:35:59,170 and add a property. 1777 01:35:59,170 --> 01:36:01,050 Remember, I'm not limited to these. 1778 01:36:01,050 --> 01:36:06,180 I can add a property called notable works, 1779 01:36:06,180 --> 01:36:08,670 and mention my new item. 1780 01:36:08,670 --> 01:36:12,120 Circus Bulgaria. 1781 01:36:12,120 --> 01:36:12,750 See? 1782 01:36:12,750 --> 01:36:15,180 My new item is showing up, and thanks 1783 01:36:15,180 --> 01:36:18,660 to this description that I wrote, short story collection, 1784 01:36:18,660 --> 01:36:22,650 it's already appearing here in the dropdown very conveniently. 1785 01:36:22,650 --> 01:36:24,270 So I linked to this. 1786 01:36:24,270 --> 01:36:25,154 I hit Save. 1787 01:36:28,680 --> 01:36:32,310 Ideally again I should find some references showing 1788 01:36:32,310 --> 01:36:34,620 that this is a notable work by him, 1789 01:36:34,620 --> 01:36:37,000 but we won't spend time on that right now. 1790 01:36:37,000 --> 01:36:39,010 But the point is we created a new item. 1791 01:36:39,010 --> 01:36:40,410 We populated it a little bit. 1792 01:36:40,410 --> 01:36:44,400 We linked to it so that it's more discoverable by mentioning 1793 01:36:44,400 --> 01:36:47,760 it in the author name, and of course the book item 1794 01:36:47,760 --> 01:36:50,710 itself mentions the author and links to the author. 1795 01:36:50,710 --> 01:36:52,770 So that's all good. 1796 01:36:52,770 --> 01:36:57,780 One last thing we shall do is give it some useful identifier 1797 01:36:57,780 --> 01:37:02,880 so let's add, say, the Library of Congress record 1798 01:37:02,880 --> 01:37:03,940 for this book. 1799 01:37:03,940 --> 01:37:04,440 OK. 1800 01:37:04,440 --> 01:37:07,710 So I have prepared this in advance. 1801 01:37:07,710 --> 01:37:08,760 Ooh. 1802 01:37:08,760 --> 01:37:12,720 Just in time, with 80 seconds to go before it's giving up on me. 1803 01:37:12,720 --> 01:37:14,310 Oh it has already given up on me. 1804 01:37:14,310 --> 01:37:15,490 That is very unfortunate. 1805 01:37:23,300 --> 01:37:29,110 So I go to the Library of Congress and I find this book. 1806 01:37:29,110 --> 01:37:33,050 I find this entry, right? 1807 01:37:33,050 --> 01:37:37,320 In the Library of Congress database about this book. 1808 01:37:37,320 --> 01:37:39,120 And it has a permalink. 1809 01:37:39,120 --> 01:37:42,570 It has a kind of guaranteed to be permanent link. 1810 01:37:42,570 --> 01:37:47,950 I can just copy that link, go back to my little book, 1811 01:37:47,950 --> 01:37:55,770 and say the Library of Congress. 1812 01:37:55,770 --> 01:38:01,070 Yeah, LCCN, that's what they call their IDs, the call 1813 01:38:01,070 --> 01:38:02,120 number. 1814 01:38:02,120 --> 01:38:06,502 And I paste it here. 1815 01:38:06,502 --> 01:38:08,210 I actually don't need the URL. 1816 01:38:08,210 --> 01:38:09,136 I need just a number. 1817 01:38:12,440 --> 01:38:13,520 And there we go. 1818 01:38:13,520 --> 01:38:16,550 I have added it, and now Wikidata 1819 01:38:16,550 --> 01:38:20,630 knows how to find bibliographic information about this book. 1820 01:38:20,630 --> 01:38:24,710 And any re-user of Wikidata, some program, 1821 01:38:24,710 --> 01:38:28,950 some tool that connects books to authors 1822 01:38:28,950 --> 01:38:32,870 or does statistical analysis or whatever, some future yet to be 1823 01:38:32,870 --> 01:38:35,090 imagined tool could automatically 1824 01:38:35,090 --> 01:38:39,170 find additional metadata on the Library of Congress site thanks 1825 01:38:39,170 --> 01:38:41,840 to this connection that I just made. 1826 01:38:41,840 --> 01:38:44,150 And of course I could add many other IDs 1827 01:38:44,150 --> 01:38:46,460 to other catalogs around the world, 1828 01:38:46,460 --> 01:38:48,150 and we won't do that right now. 1829 01:38:48,150 --> 01:38:51,840 You can see that it's now showing up under identifiers. 1830 01:38:51,840 --> 01:38:56,330 So this is how we created a brand new piece of data. 1831 01:38:56,330 --> 01:38:59,632 Questions about this, about creating new items? 1832 01:39:18,100 --> 01:39:19,180 Yeah, all right. 1833 01:39:19,180 --> 01:39:25,510 So we've seen how to contribute to Wikidata on our own, 1834 01:39:25,510 --> 01:39:26,350 kind of through-- 1835 01:39:26,350 --> 01:39:27,840 directly through Wikidata. 1836 01:39:30,680 --> 01:39:35,220 Now you may you may be thinking, but Asaf, this 1837 01:39:35,220 --> 01:39:39,880 sounds like a ton of work recording 1838 01:39:39,880 --> 01:39:44,500 all of these little tiny bits of information about every person 1839 01:39:44,500 --> 01:39:47,410 and every book and every town. 1840 01:39:47,410 --> 01:39:50,520 And if you think that you would be correct. 1841 01:39:50,520 --> 01:39:52,730 That is a ton of work. 1842 01:39:52,730 --> 01:39:54,600 It's a lot of work. 1843 01:39:54,600 --> 01:39:59,930 However, it is centralized, so it is reusable on other wikis 1844 01:39:59,930 --> 01:40:03,860 and we will show in just a moment how we pull information 1845 01:40:03,860 --> 01:40:07,296 from Wikidata into Wikipedia or other projects. 1846 01:40:10,860 --> 01:40:13,780 We will show that in just a moment. 1847 01:40:13,780 --> 01:40:18,660 But here's an awesome little game 1848 01:40:18,660 --> 01:40:23,205 that we Wikidata volunteer, Magnis Monska, 1849 01:40:23,205 --> 01:40:30,900 has authored called the Wikidata game, in which he 1850 01:40:30,900 --> 01:40:31,920 tricks people-- 1851 01:40:31,920 --> 01:40:35,730 sorry, helps people make contributions 1852 01:40:35,730 --> 01:40:41,500 to Wikidata in a very, very easy and pleasant way. 1853 01:40:41,500 --> 01:40:44,410 Let's look at the Wikidata game. 1854 01:40:44,410 --> 01:40:47,840 So the first thing you need to do in that Wikidata game 1855 01:40:47,840 --> 01:40:50,660 is to log in, because the Wikidata 1856 01:40:50,660 --> 01:40:53,150 game makes edits in your name. 1857 01:40:53,150 --> 01:40:54,980 So we need to authorize it. 1858 01:40:54,980 --> 01:40:57,250 It's perfectly safe. 1859 01:40:57,250 --> 01:41:01,090 And after you do that you can go to the Wikidata game. 1860 01:41:01,090 --> 01:41:02,020 So this is the game. 1861 01:41:02,020 --> 01:41:03,520 Now I'm logged in. 1862 01:41:03,520 --> 01:41:05,230 And the Wikidata game actually includes 1863 01:41:05,230 --> 01:41:06,970 a number of different games. 1864 01:41:06,970 --> 01:41:09,310 Let's start with a person game. 1865 01:41:09,310 --> 01:41:14,170 So Wikidata shows you-- 1866 01:41:14,170 --> 01:41:20,800 shows you an item, and asks you a very simple question. 1867 01:41:20,800 --> 01:41:23,200 Person, or not a person? 1868 01:41:26,410 --> 01:41:30,550 So Wikidata goes through Wikidata entities 1869 01:41:30,550 --> 01:41:35,540 that don't even have the instance of property. 1870 01:41:35,540 --> 01:41:37,520 Which is why Wikidata doesn't know, 1871 01:41:37,520 --> 01:41:41,120 literally doesn't know, if this is a person, or a mountain, 1872 01:41:41,120 --> 01:41:44,390 or a city, or a country, or anything else. 1873 01:41:44,390 --> 01:41:47,150 So it asks you, because this is the kind of question that 1874 01:41:47,150 --> 01:41:50,300 Wikidata cannot decide on its own, 1875 01:41:50,300 --> 01:41:54,800 but for us humans it's generally trivial to be able to say 1876 01:41:54,800 --> 01:41:58,220 whether something that we're looking at is a person or not. 1877 01:41:58,220 --> 01:42:03,590 It gets slightly trickier when the information is in Javanese, 1878 01:42:03,590 --> 01:42:06,470 as it is here, rather than English. 1879 01:42:06,470 --> 01:42:10,010 So this item happens to be described in Javanese. 1880 01:42:10,010 --> 01:42:14,360 My Javanese, spoken in Indonesia, is very weak. 1881 01:42:14,360 --> 01:42:19,620 However, I can tell that this is not a person. 1882 01:42:19,620 --> 01:42:20,730 How can I tell? 1883 01:42:20,730 --> 01:42:23,220 Without understanding a word of Japanese 1884 01:42:23,220 --> 01:42:25,950 I see that it mentions 1000 kilometers 1885 01:42:25,950 --> 01:42:28,860 and square kilometers, see? 1886 01:42:28,860 --> 01:42:32,520 So this is about a place, or an area, 1887 01:42:32,520 --> 01:42:36,090 or a region, or whatever, but not a person. 1888 01:42:36,090 --> 01:42:39,060 So this is an example of how even 1889 01:42:39,060 --> 01:42:41,100 without understanding language you can sometimes 1890 01:42:41,100 --> 01:42:42,400 make a determination. 1891 01:42:42,400 --> 01:42:45,030 However, of course, you should be sure. 1892 01:42:45,030 --> 01:42:47,700 This is definitely not what the Wikipedia article 1893 01:42:47,700 --> 01:42:49,150 about a person looks like. 1894 01:42:49,150 --> 01:42:50,430 So this is not a person. 1895 01:42:50,430 --> 01:42:52,780 I just click it and I'm shown the next item. 1896 01:42:56,600 --> 01:42:59,660 This item is in another language I do not speak, 1897 01:42:59,660 --> 01:43:00,950 and I just don't know. 1898 01:43:00,950 --> 01:43:03,740 I do not know if this is about a person or not. 1899 01:43:03,740 --> 01:43:07,350 So I click Not Sure. 1900 01:43:07,350 --> 01:43:11,190 This is in Swedish, and it's about Sulawesi, still 1901 01:43:11,190 --> 01:43:13,770 Indonesia. 1902 01:43:13,770 --> 01:43:16,530 And it is not about a person. 1903 01:43:16,530 --> 01:43:18,150 I have enough Swedish for that. 1904 01:43:18,150 --> 01:43:21,750 So I click not a person. 1905 01:43:21,750 --> 01:43:24,420 Now, you may say, well, do I really 1906 01:43:24,420 --> 01:43:28,350 have to deal with all these languages that I don't speak? 1907 01:43:28,350 --> 01:43:29,190 The answer is no. 1908 01:43:29,190 --> 01:43:30,630 You don't have to. 1909 01:43:30,630 --> 01:43:32,580 Here at the bottom of the Wikidata game 1910 01:43:32,580 --> 01:43:33,840 there are settings. 1911 01:43:33,840 --> 01:43:38,270 You can click that and tell Wikidata, 1912 01:43:38,270 --> 01:43:41,840 I cannot even read Chinese or Japanese, 1913 01:43:41,840 --> 01:43:44,600 so please don't show me items in those languages. 1914 01:43:44,600 --> 01:43:47,060 Because I wouldn't even be able to guess. 1915 01:43:47,060 --> 01:43:50,000 I prefer these languages in which I can relatively easily 1916 01:43:50,000 --> 01:43:51,380 make determinations. 1917 01:43:51,380 --> 01:43:54,601 And I can even tell Wikidata to only show me these languages. 1918 01:43:54,601 --> 01:43:55,100 You see? 1919 01:43:55,100 --> 01:43:57,350 This was not selected, which is why I 1920 01:43:57,350 --> 01:44:00,600 was shown some other languages. 1921 01:44:00,600 --> 01:44:04,240 I could say, only use these languages, and save. 1922 01:44:04,240 --> 01:44:06,100 And now I can try this game again. 1923 01:44:06,100 --> 01:44:07,980 However, that can slow it down a little. 1924 01:44:07,980 --> 01:44:09,000 So here we go. 1925 01:44:09,000 --> 01:44:11,640 Here's a Spanish-- which is one of the languages I 1926 01:44:11,640 --> 01:44:14,640 told Wikidata game it can use. 1927 01:44:14,640 --> 01:44:16,480 This is a Spanish item. 1928 01:44:16,480 --> 01:44:19,265 Now is it about a person or not? 1929 01:44:22,120 --> 01:44:23,230 It is not about a person. 1930 01:44:25,906 --> 01:44:26,780 Is it about a person? 1931 01:44:29,155 --> 01:44:29,655 No. 1932 01:44:32,900 --> 01:44:35,180 Yes, it is right? 1933 01:44:35,180 --> 01:44:38,550 Monk Cistercian, Pedro de Ovideo Falconi. 1934 01:44:38,550 --> 01:44:40,890 That sounds like a person. 1935 01:44:40,890 --> 01:44:42,680 Frau Pedro Nasser. 1936 01:44:42,680 --> 01:44:44,960 Yeah, he was born in Madrid 1577. 1937 01:44:44,960 --> 01:44:46,280 This is a person. 1938 01:44:46,280 --> 01:44:47,060 OK. 1939 01:44:47,060 --> 01:44:49,730 So I click person. 1940 01:44:49,730 --> 01:44:52,100 Again, if you're not sure, click not sure. 1941 01:44:52,100 --> 01:44:55,100 The point is, just by clicking person and as you can see 1942 01:44:55,100 --> 01:44:57,780 this would work very well on mobile, 1943 01:44:57,780 --> 01:45:01,430 which is why I said you can contribute on your commute. 1944 01:45:01,430 --> 01:45:04,100 You can just hold your phone or tablet or whatever, 1945 01:45:04,100 --> 01:45:05,840 and just tap. 1946 01:45:05,840 --> 01:45:07,040 Person, not a person. 1947 01:45:07,040 --> 01:45:08,900 Person, not a person. 1948 01:45:08,900 --> 01:45:12,500 The amazing thing is that just tapping person has actually 1949 01:45:12,500 --> 01:45:15,830 made an edit to Wikidata on my behalf, which 1950 01:45:15,830 --> 01:45:21,560 I can find out, like every wiki, by clicking contributions. 1951 01:45:21,560 --> 01:45:24,200 And as you can see in addition to the stuff about circus 1952 01:45:24,200 --> 01:45:28,340 Bulgaria, my latest edit is in fact about this Pedro de Ovideo 1953 01:45:28,340 --> 01:45:30,130 Falconi person. 1954 01:45:30,130 --> 01:45:32,000 And the edit was, you can-- 1955 01:45:32,000 --> 01:45:38,030 I hope you can see this, created the claim instance of human. 1956 01:45:38,030 --> 01:45:39,110 So I added-- 1957 01:45:39,110 --> 01:45:43,100 I mean Wikidata game added for me the statement 1958 01:45:43,100 --> 01:45:44,180 instance of human. 1959 01:45:44,180 --> 01:45:47,780 Now, the awesome thing is that it was super easy to do. 1960 01:45:47,780 --> 01:45:51,890 I didn't have to go into that entity, click the Add button, 1961 01:45:51,890 --> 01:45:57,080 choose the instance of property, choose human, hit Save. 1962 01:45:57,080 --> 01:45:59,210 Instead of all these operations I just 1963 01:45:59,210 --> 01:46:04,250 tapped on my screen, person, not a person. 1964 01:46:04,250 --> 01:46:10,280 And I can do hundreds of edits during my daily commute. 1965 01:46:10,280 --> 01:46:12,410 There are other games, like the gender game. 1966 01:46:12,410 --> 01:46:14,810 So this is about-- 1967 01:46:14,810 --> 01:46:17,240 this is when Wikidata already knows 1968 01:46:17,240 --> 01:46:19,760 that this item is a person, but it doesn't 1969 01:46:19,760 --> 01:46:21,710 know the gender of this person. 1970 01:46:21,710 --> 01:46:25,340 Which is another one of the more basic items. 1971 01:46:25,340 --> 01:46:27,770 And this is taking a long time because of the language 1972 01:46:27,770 --> 01:46:29,870 limitations that I set on it. 1973 01:46:29,870 --> 01:46:32,660 I guess the less exotic languages have already 1974 01:46:32,660 --> 01:46:35,130 been exhausted in the game. 1975 01:46:35,130 --> 01:46:36,880 We don't have to wait all this time. 1976 01:46:40,280 --> 01:46:44,970 We can try something else. 1977 01:46:44,970 --> 01:46:45,950 How about occupation? 1978 01:46:45,950 --> 01:46:46,850 The occupation game. 1979 01:46:46,850 --> 01:46:49,400 Here we go, this is in Russian. 1980 01:46:49,400 --> 01:46:55,540 And what is the occupation of this gentleman? 1981 01:46:55,540 --> 01:46:58,630 Well he is an [INAUDIBLE]. 1982 01:46:58,630 --> 01:47:00,700 He's a church person. 1983 01:47:00,700 --> 01:47:04,300 However, so the occupation game is 1984 01:47:04,300 --> 01:47:06,490 where Wikidata game will automatically 1985 01:47:06,490 --> 01:47:10,990 pull likely occupations from the article text 1986 01:47:10,990 --> 01:47:13,810 and ask for confirmation. 1987 01:47:13,810 --> 01:47:16,840 So if he-- if this person really is a deacon, 1988 01:47:16,840 --> 01:47:17,770 I should click that. 1989 01:47:17,770 --> 01:47:19,990 But I'm not sure. 1990 01:47:19,990 --> 01:47:24,950 I'm not clear on the Russian church's distinctions between-- 1991 01:47:24,950 --> 01:47:26,620 I mean [INAUDIBLE] is pretty senior, 1992 01:47:26,620 --> 01:47:28,690 but I don't know if that automatically also means 1993 01:47:28,690 --> 01:47:30,100 he's a deacon or not. 1994 01:47:30,100 --> 01:47:32,720 And [INAUDIBLE] is not listed here. 1995 01:47:32,720 --> 01:47:36,380 So I will click not listed. 1996 01:47:36,380 --> 01:47:39,540 Also, these guesses are not always correct. 1997 01:47:39,540 --> 01:47:42,680 So, this guy for example, is in Russian. 1998 01:47:42,680 --> 01:47:43,430 I can read this. 1999 01:47:43,430 --> 01:47:44,470 He's a philologist. 2000 01:47:44,470 --> 01:47:45,380 He's a linguist. 2001 01:47:45,380 --> 01:47:48,510 So I can confirm it and click linguist. 2002 01:47:48,510 --> 01:47:49,010 All right? 2003 01:47:49,010 --> 01:47:51,950 And again, if we look at my contributions 2004 01:47:51,950 --> 01:47:55,700 we can see the Wikidata game on my behalf 2005 01:47:55,700 --> 01:47:59,930 created occupation linguist. 2006 01:47:59,930 --> 01:48:02,450 OK. 2007 01:48:02,450 --> 01:48:04,370 Just by typing linguist there. 2008 01:48:04,370 --> 01:48:07,040 Now if it's taken from the article, 2009 01:48:07,040 --> 01:48:09,860 why would it ever be wrong? 2010 01:48:09,860 --> 01:48:15,970 Well Jesus was the son of a carpenter. 2011 01:48:15,970 --> 01:48:18,870 The word carpenter appears in the text. 2012 01:48:18,870 --> 01:48:22,840 That doesn't mean it's correct to say Jesus was a carpenter. 2013 01:48:22,840 --> 01:48:23,340 OK? 2014 01:48:23,340 --> 01:48:24,660 Just a trivial example, right? 2015 01:48:24,660 --> 01:48:30,250 So many, many articles will say, you know, born to a physician. 2016 01:48:30,250 --> 01:48:32,850 And so the word physician could be guessed, 2017 01:48:32,850 --> 01:48:36,030 but it wouldn't be correct unless the son is also 2018 01:48:36,030 --> 01:48:38,090 a physician. 2019 01:48:38,090 --> 01:48:43,540 So I hope it gives you the gist of it. 2020 01:48:43,540 --> 01:48:47,500 There is also a distributed Wikidata game, 2021 01:48:47,500 --> 01:48:48,774 which is pretty awesome. 2022 01:48:51,450 --> 01:48:54,320 Here we go, which has additional games. 2023 01:48:54,320 --> 01:49:02,610 So, for example, the key on game gives you, 2024 01:49:02,610 --> 01:49:06,940 maybe it gives you, some items to play with. 2025 01:49:16,610 --> 01:49:17,110 Yes? 2026 01:49:17,110 --> 01:49:17,610 No? 2027 01:49:17,610 --> 01:49:18,430 OK. 2028 01:49:18,430 --> 01:49:20,830 So it gives you this little card, 2029 01:49:20,830 --> 01:49:27,940 and asks you to confirm is this instance of human settlement? 2030 01:49:27,940 --> 01:49:30,480 That is, is it a village, town, city, whatever. 2031 01:49:30,480 --> 01:49:33,310 Is it a kind of human settlement or not? 2032 01:49:33,310 --> 01:49:34,340 Or maybe it's a book. 2033 01:49:34,340 --> 01:49:35,540 Maybe it's a poem. 2034 01:49:35,540 --> 01:49:38,980 Again, so, is it an English settlement? 2035 01:49:38,980 --> 01:49:41,500 And you can click the languages here to see the information. 2036 01:49:41,500 --> 01:49:43,270 So I can click English. 2037 01:49:43,270 --> 01:49:44,572 And indeed the article-- 2038 01:49:44,572 --> 01:49:46,030 I mean the actual Wikipedia article 2039 01:49:46,030 --> 01:49:49,360 says Camigji is a town and territory 2040 01:49:49,360 --> 01:49:51,370 in this district in the Congo. 2041 01:49:51,370 --> 01:49:54,640 So yes, this is an instance of human settlement. 2042 01:49:54,640 --> 01:49:57,580 So I clicked yes. 2043 01:49:57,580 --> 01:50:00,460 And just clicking yes again went to that item, 2044 01:50:00,460 --> 01:50:02,740 and added property of human settlement. 2045 01:50:02,740 --> 01:50:05,560 Now the point of all these games is 2046 01:50:05,560 --> 01:50:08,140 these are tools, written by programmers, 2047 01:50:08,140 --> 01:50:12,490 making kind of semi educated guesses about these fairly 2048 01:50:12,490 --> 01:50:14,120 basic properties. 2049 01:50:14,120 --> 01:50:17,770 And they are meant to semi automate, to assist, 2050 01:50:17,770 --> 01:50:23,730 in the accumulation of all these important pieces of data. 2051 01:50:23,730 --> 01:50:26,640 Now every single click here helps 2052 01:50:26,640 --> 01:50:31,000 Wikidata give better results, richer results 2053 01:50:31,000 --> 01:50:32,380 in future queries. 2054 01:50:32,380 --> 01:50:38,130 Again, as of right now Wikidata can include Camigji 2055 01:50:38,130 --> 01:50:42,690 if I ask it, you know, what are some towns in Congo? 2056 01:50:42,690 --> 01:50:44,220 Until now it could not. 2057 01:50:44,220 --> 01:50:46,830 Because it literally didn't know. 2058 01:50:46,830 --> 01:50:51,950 So every time we click male, female, person, not a person, 2059 01:50:51,950 --> 01:50:56,640 make these decisions, we help improve Wikidata 2060 01:50:56,640 --> 01:51:01,560 and enrich the results that we could receive. 2061 01:51:01,560 --> 01:51:04,590 Any questions about this, about kind of micro contributions 2062 01:51:04,590 --> 01:51:07,010 through the Wikidata game? 2063 01:51:07,010 --> 01:51:09,890 If that looks appealing I encourage 2064 01:51:09,890 --> 01:51:12,860 you to go and visit the Wikidata game 2065 01:51:12,860 --> 01:51:15,205 and start contributing in that way. 2066 01:51:19,580 --> 01:51:21,650 There is a question here. 2067 01:51:21,650 --> 01:51:24,650 If I make an article about Circus Bulgaria how should 2068 01:51:24,650 --> 01:51:26,630 I correctly connect them? 2069 01:51:26,630 --> 01:51:28,740 That is an excellent question. 2070 01:51:28,740 --> 01:51:33,090 So once-- so now there is a Wikidata item about that book, 2071 01:51:33,090 --> 01:51:37,650 but there is no Wikipedia article anywhere. 2072 01:51:37,650 --> 01:51:41,460 Now suppose I write one in, Bulgarian maybe, 2073 01:51:41,460 --> 01:51:42,870 you go to Wikidata. 2074 01:51:42,870 --> 01:51:45,180 You find the item by searching. 2075 01:51:45,180 --> 01:51:49,170 You find the item, and then the empty site links section 2076 01:51:49,170 --> 01:51:50,850 right at the bottom there-- 2077 01:51:50,850 --> 01:51:52,020 where are we? 2078 01:51:52,020 --> 01:51:53,100 We have this? 2079 01:51:53,100 --> 01:51:55,050 Circus Bulgaria. 2080 01:51:55,050 --> 01:51:56,010 Let's demonstrate this. 2081 01:51:56,010 --> 01:51:58,000 So here is the item about the book. 2082 01:51:58,000 --> 01:52:01,030 Let's say that now there is an article 2083 01:52:01,030 --> 01:52:03,670 because I just created it. 2084 01:52:03,670 --> 01:52:07,450 I can go here to the empty Wikipedia link section, 2085 01:52:07,450 --> 01:52:11,760 click Edit, type the name of the wiki, 2086 01:52:11,760 --> 01:52:16,430 let's say English, and then type the name of the page 2087 01:52:16,430 --> 01:52:18,230 that I just created. 2088 01:52:18,230 --> 01:52:20,790 Circus-- right? 2089 01:52:20,790 --> 01:52:23,400 And again, it offers me auto-complete 2090 01:52:23,400 --> 01:52:25,080 for my convenience. 2091 01:52:25,080 --> 01:52:28,260 Now we don't actually have the article created, 2092 01:52:28,260 --> 01:52:30,480 but I could let's just say this was the article. 2093 01:52:30,480 --> 01:52:33,330 I can just click this, hit Save, and that 2094 01:52:33,330 --> 01:52:36,450 would associate the new Wikipedia article 2095 01:52:36,450 --> 01:52:38,130 with this Wikidata item. 2096 01:52:38,130 --> 01:52:41,940 That is the beginning of the inter-wiki list for this item. 2097 01:52:41,940 --> 01:52:43,620 I will not click Save Now, because we 2098 01:52:43,620 --> 01:52:45,289 didn't have the article yet. 2099 01:52:45,289 --> 01:52:46,830 So I hope that answers that question. 2100 01:52:46,830 --> 01:52:50,340 Was there another question that I missed here? 2101 01:52:50,340 --> 01:52:51,450 No. 2102 01:52:51,450 --> 01:52:53,170 OK. 2103 01:52:53,170 --> 01:52:55,300 Any questions about the Wikidata game? 2104 01:52:55,300 --> 01:53:00,740 About this idea of micro contributions? 2105 01:53:00,740 --> 01:53:05,330 If not then we can move on to embedding data, 2106 01:53:05,330 --> 01:53:07,490 and after that we can discuss queries, 2107 01:53:07,490 --> 01:53:12,000 how to get at all this data from Wikidata. 2108 01:53:12,000 --> 01:53:16,500 So the short version of how to embed data from Wikidata 2109 01:53:16,500 --> 01:53:19,920 is that there is this little magic incantation. 2110 01:53:19,920 --> 01:53:25,410 Curly brace, curly brace, hash mark, property. 2111 01:53:25,410 --> 01:53:29,820 It looks like a template, but it isn't because of that hash. 2112 01:53:29,820 --> 01:53:31,320 And that is magic. 2113 01:53:31,320 --> 01:53:34,170 Take a look at this little demo that I prepared. 2114 01:53:34,170 --> 01:53:37,950 This page, which is off my user page on meta, 2115 01:53:37,950 --> 01:53:40,110 but it could be on any wiki. 2116 01:53:40,110 --> 01:53:42,490 OK. 2117 01:53:42,490 --> 01:53:49,420 Says, since San Francisco is item Q62 in Wikidata, 2118 01:53:49,420 --> 01:53:55,240 and since population is property P1082, I can tell you 2119 01:53:55,240 --> 01:53:58,840 that according to Wikidata the population of San Francisco 2120 01:53:58,840 --> 01:54:02,180 is this. 2121 01:54:02,180 --> 01:54:08,420 And this bolded number here was produced with this incantation. 2122 01:54:08,420 --> 01:54:14,420 Curly brace, curly brace, hash mark, property P1082, 2123 01:54:14,420 --> 01:54:18,751 that's population, type from what item? 2124 01:54:18,751 --> 01:54:19,250 Right? 2125 01:54:19,250 --> 01:54:21,650 Cause I'm pulling an arbitrary number. 2126 01:54:21,650 --> 01:54:23,570 I could put any property in any item 2127 01:54:23,570 --> 01:54:27,020 here, and kind of include it, embedded, into my text. 2128 01:54:27,020 --> 01:54:29,630 This isn't even about-- you notice this is my user page. 2129 01:54:29,630 --> 01:54:32,480 This isn't even the article about San Francisco. 2130 01:54:32,480 --> 01:54:35,210 I just want to pull that number into this thing 2131 01:54:35,210 --> 01:54:36,410 that I'm writing. 2132 01:54:36,410 --> 01:54:38,820 So it's fairly simple. 2133 01:54:38,820 --> 01:54:40,970 I identify the property. 2134 01:54:40,970 --> 01:54:43,440 I identify the item to take it from. 2135 01:54:43,440 --> 01:54:47,120 And Wikidata will, I mean Wikipedia, 2136 01:54:47,120 --> 01:54:50,480 or the wiki I'm on, in this case meta, will go to Wikipedia 2137 01:54:50,480 --> 01:54:52,820 and fetch it for me. 2138 01:54:52,820 --> 01:54:56,480 Likewise, since Denny Vrandecic, the designer of Wikidata 2139 01:54:56,480 --> 01:55:01,370 is item 18618629, right? 2140 01:55:01,370 --> 01:55:04,790 I mean, he's a notable person, so he has a Wikidata entity. 2141 01:55:04,790 --> 01:55:09,160 And since occupation is property 106, and date of birth is 569, 2142 01:55:09,160 --> 01:55:12,290 and place of birth is 19, because 2143 01:55:12,290 --> 01:55:14,720 of all that I can tell you that Vrandecic was born 2144 01:55:14,720 --> 01:55:19,130 in Stuttgart, on this date, and is researcher, programmer, 2145 01:55:19,130 --> 01:55:20,850 and computer scientist. 2146 01:55:20,850 --> 01:55:25,010 If you look at the source for this page, click Edit Source, 2147 01:55:25,010 --> 01:55:28,700 you can see that the word Stuttgart does not appear here, 2148 01:55:28,700 --> 01:55:30,530 because it came from Wikidata. 2149 01:55:30,530 --> 01:55:34,171 I did not write this into my little demo page here. 2150 01:55:34,171 --> 01:55:34,670 See? 2151 01:55:34,670 --> 01:55:37,380 Place of birth is-- 2152 01:55:37,380 --> 01:55:37,880 where is it? 2153 01:55:37,880 --> 01:55:38,380 Here. 2154 01:55:38,380 --> 01:55:43,790 Born in property 19 from queue number so-and-so. 2155 01:55:43,790 --> 01:55:46,970 That is how easy it is to pull stuff 2156 01:55:46,970 --> 01:55:51,890 into a wiki from Wikidata. 2157 01:55:51,890 --> 01:55:55,280 OK now there's some nuance to it. 2158 01:55:55,280 --> 01:55:57,470 And there's there are some additional parameters 2159 01:55:57,470 --> 01:55:58,130 you can give. 2160 01:55:58,130 --> 01:56:00,230 And you can ask Wikidata to give you 2161 01:56:00,230 --> 01:56:03,635 not just the text of the values, but actually make it links. 2162 01:56:06,750 --> 01:56:14,825 So, for example, if I change this from property to values-- 2163 01:56:25,950 --> 01:56:29,142 No, that did not work at all. 2164 01:56:29,142 --> 01:56:29,850 Wasn't it values? 2165 01:56:29,850 --> 01:56:30,350 What was it? 2166 01:56:33,370 --> 01:56:34,614 Values and then-- 2167 01:57:19,265 --> 01:57:19,890 Oh, statements. 2168 01:57:19,890 --> 01:57:20,710 My bad, sorry. 2169 01:57:20,710 --> 01:57:22,980 The Magic word is statements. 2170 01:57:22,980 --> 01:57:24,010 Statements. 2171 01:57:24,010 --> 01:57:28,680 So going back here. 2172 01:57:28,680 --> 01:57:35,385 If I change the word property to the word statements 2173 01:57:35,385 --> 01:57:40,890 here then this same value-- 2174 01:57:40,890 --> 01:57:43,300 that did not work at all. 2175 01:57:43,300 --> 01:57:46,690 Oh, because I'm on meta. 2176 01:57:46,690 --> 01:57:48,670 So because I'm on meta, meta doesn't 2177 01:57:48,670 --> 01:57:52,230 have an article named researcher, programmer, 2178 01:57:52,230 --> 01:57:53,500 or computer scientist. 2179 01:57:53,500 --> 01:57:55,120 But Wikipedia does. 2180 01:57:55,120 --> 01:58:00,210 If I included this same syntax in Wikipedia, 2181 01:58:00,210 --> 01:58:02,950 like English Wikipedia, for example-- 2182 01:58:02,950 --> 01:58:04,855 So let's go there right now. 2183 01:58:11,240 --> 01:58:13,480 And go-- go to my-- 2184 01:58:18,550 --> 01:58:19,345 Go to my sandbox. 2185 01:58:23,090 --> 01:58:27,982 If I just brutally paste this on my sandbox here-- 2186 01:58:32,690 --> 01:58:35,810 So, see, these became links. 2187 01:58:35,810 --> 01:58:39,740 Because Wikipedia has an article called programmer and computer 2188 01:58:39,740 --> 01:58:40,910 scientist. 2189 01:58:40,910 --> 01:58:43,460 So, like I said, there's some additional nuance 2190 01:58:43,460 --> 01:58:44,840 to the embedding. 2191 01:58:44,840 --> 01:58:47,030 The important thing is that this is 2192 01:58:47,030 --> 01:58:51,470 the key to delivering on that first problem that I mentioned. 2193 01:58:51,470 --> 01:58:55,970 How to get data from a central location 2194 01:58:55,970 --> 01:58:58,850 onto your wiki in your language. 2195 01:58:58,850 --> 01:59:04,460 Basically using property and statements magic incantations. 2196 01:59:04,460 --> 01:59:07,100 And of course, usually, this would be 2197 01:59:07,100 --> 01:59:10,010 in the context of an info box. 2198 01:59:10,010 --> 01:59:14,180 Some wikis-- English Wikipedia is not leading the way there. 2199 01:59:14,180 --> 01:59:16,490 Some smaller wikis are more advanced 2200 01:59:16,490 --> 01:59:22,070 actually in integrating Wikidata embeddings like this 2201 01:59:22,070 --> 01:59:24,620 into their info boxes. 2202 01:59:24,620 --> 01:59:26,300 So that instead of the info box just 2203 01:59:26,300 --> 01:59:30,620 being a template on the wiki with field equals value, 2204 01:59:30,620 --> 01:59:31,685 field equals value. 2205 01:59:31,685 --> 01:59:35,700 That template of the info box on the wiki 2206 01:59:35,700 --> 01:59:40,160 pulls the values, the birthdate, the languages, et cetera, 2207 01:59:40,160 --> 01:59:44,210 pulls them from Wikidata. 2208 01:59:44,210 --> 01:59:49,820 So basically just-- I just demonstrated single calls 2209 01:59:49,820 --> 01:59:52,550 to this, but of course an info box template 2210 01:59:52,550 --> 01:59:56,270 would include maybe 20 or 40 such embeds, 2211 01:59:56,270 --> 01:59:57,710 and that is not a problem. 2212 01:59:57,710 --> 02:00:01,460 Of course, before you go and edit the English Wikipedia's 2213 02:00:01,460 --> 02:00:06,050 info box person and replace it all with Wikidata embeds, 2214 02:00:06,050 --> 02:00:09,050 you should discuss it with the English Wikipedia community. 2215 02:00:09,050 --> 02:00:12,000 These discussions have already been taking place. 2216 02:00:12,000 --> 02:00:13,640 There are some concerns about how 2217 02:00:13,640 --> 02:00:17,150 to patrol this, how to keep it newbie friendly, et cetera. 2218 02:00:17,150 --> 02:00:20,690 So there are legitimate concerns with just moving everything 2219 02:00:20,690 --> 02:00:22,910 to be embedded from Wikidata. 2220 02:00:22,910 --> 02:00:26,450 But the communities are gradually handling this. 2221 02:00:26,450 --> 02:00:29,390 I mean this ability to embed from Wikidata is not very old. 2222 02:00:29,390 --> 02:00:31,550 It's been around for about a year. 2223 02:00:31,550 --> 02:00:35,150 So communities are still working on kind 2224 02:00:35,150 --> 02:00:37,560 of integrating that technology. 2225 02:00:37,560 --> 02:00:40,190 But that is that is kind of just the basics of how 2226 02:00:40,190 --> 02:00:44,210 to pull data, individual bits of data, that's not querying, 2227 02:00:44,210 --> 02:00:47,330 that's not asking those sweeping questions that I was talking 2228 02:00:47,330 --> 02:00:48,850 about yet. 2229 02:00:48,850 --> 02:00:50,720 We'll get to that right now this is 2230 02:00:50,720 --> 02:00:55,310 how to pull a specific datum, a specific piece of data, 2231 02:00:55,310 --> 02:00:57,395 from Wikidata. 2232 02:01:01,530 --> 02:01:02,530 OK. 2233 02:01:02,530 --> 02:01:07,080 So here's another quick thing to demonstrate 2234 02:01:07,080 --> 02:01:09,880 before we go to queries, and that 2235 02:01:09,880 --> 02:01:12,010 is the article placeholder. 2236 02:01:12,010 --> 02:01:15,010 The article placeholder is a feature 2237 02:01:15,010 --> 02:01:19,660 that is being tested on the Esperanto Wikipedia, and maybe 2238 02:01:19,660 --> 02:01:22,180 another wiki, I don't remember. 2239 02:01:22,180 --> 02:01:28,490 And it is using the potential of Wikidata 2240 02:01:28,490 --> 02:01:32,690 to offer a placeholder for an article. 2241 02:01:32,690 --> 02:01:37,940 An automatically generated Wikidata powered replacement 2242 02:01:37,940 --> 02:01:41,720 placeholder for an article for articles that don't yet 2243 02:01:41,720 --> 02:01:45,950 exist on Esperanto. 2244 02:01:45,950 --> 02:01:50,440 So let's go to the Esperanto Wikipedia. 2245 02:01:50,440 --> 02:01:52,440 I don't speak Esperanto. 2246 02:01:52,440 --> 02:01:56,760 But let's look for Helen Dewitt, our friend, 2247 02:01:56,760 --> 02:01:58,170 in Esperanto Wikipedia. 2248 02:01:58,170 --> 02:02:00,270 Now Esperanto is not one of the Wikipedias 2249 02:02:00,270 --> 02:02:03,060 that have an article about Helen Dewitt. 2250 02:02:03,060 --> 02:02:04,890 And so it tells me that, right? 2251 02:02:04,890 --> 02:02:06,570 There is no Helen Dewitt. 2252 02:02:06,570 --> 02:02:08,670 Maybe you were looking for Helena Dewitt. 2253 02:02:08,670 --> 02:02:10,200 No, I was not. 2254 02:02:10,200 --> 02:02:13,650 You can start an article about Helen Dewitt. 2255 02:02:13,650 --> 02:02:15,390 You can search. 2256 02:02:15,390 --> 02:02:17,820 You know, there's all this stuff. 2257 02:02:17,820 --> 02:02:24,180 But there is also this little option here, hiding, 2258 02:02:24,180 --> 02:02:30,640 which tells me that the Esperanto Wikipedia is-- 2259 02:02:30,640 --> 02:02:31,580 what's happening here? 2260 02:02:35,140 --> 02:02:35,890 Yes. 2261 02:02:35,890 --> 02:02:40,520 The Esperanto Wikipedia is ready to give me this page. 2262 02:02:40,520 --> 02:02:44,020 This page, as you can see, it's on the Esperanto Wikipedia, 2263 02:02:44,020 --> 02:02:46,090 but it's not an article. 2264 02:02:46,090 --> 02:02:47,480 See, it's a special page. 2265 02:02:47,480 --> 02:02:49,700 It's machine generated. 2266 02:02:49,700 --> 02:02:52,150 You can see the URL as well. 2267 02:02:52,150 --> 02:02:54,410 It's not, you know, slash Helen Dewitt. 2268 02:02:54,410 --> 02:02:58,450 It's slash specialio, about topic, 2269 02:02:58,450 --> 02:03:01,570 and then the Wikidata ID of Helen Dewitt. 2270 02:03:01,570 --> 02:03:03,760 And what I get here-- 2271 02:03:03,760 --> 02:03:05,860 I get an English description, by the way, 2272 02:03:05,860 --> 02:03:08,300 because there is no Esperanto description. 2273 02:03:08,300 --> 02:03:10,420 Wikidata can't make it up. 2274 02:03:10,420 --> 02:03:13,600 But what it can do is offer me these pieces 2275 02:03:13,600 --> 02:03:16,960 of data in my language, in this case Esperanto. 2276 02:03:16,960 --> 02:03:18,921 I'm on the Esperanto Wikipedia. 2277 02:03:18,921 --> 02:03:19,420 OK. 2278 02:03:19,420 --> 02:03:23,380 So it tells me that she's American, for example, 2279 02:03:23,380 --> 02:03:26,090 and it tells me that in Esperanto. 2280 02:03:26,090 --> 02:03:29,350 OK and it tells me that she speaks Latin. 2281 02:03:29,350 --> 02:03:32,410 Remember we taught Wikidata that? 2282 02:03:32,410 --> 02:03:35,800 It tells me that she was educated in Oxford, 2283 02:03:35,800 --> 02:03:38,050 you know, and gives me the references to the extent 2284 02:03:38,050 --> 02:03:39,130 that they exist. 2285 02:03:39,130 --> 02:03:41,560 I mean this is not an article. 2286 02:03:41,560 --> 02:03:46,650 It's not, you know, paragraphs of fluent Esperanto text. 2287 02:03:46,650 --> 02:03:50,190 But it is information that I can understand 2288 02:03:50,190 --> 02:03:51,960 if I speak this language. 2289 02:03:51,960 --> 02:03:55,380 And it's better than nothing. 2290 02:03:55,380 --> 02:04:00,120 And remember Helen Dewitt was not a very detailed article. 2291 02:04:00,120 --> 02:04:03,690 If I were to ask about, I don't know, some politician, 2292 02:04:03,690 --> 02:04:08,340 or popular singer that has more data in Wikidata, 2293 02:04:08,340 --> 02:04:12,690 than this machine generated thing would have been richer. 2294 02:04:12,690 --> 02:04:16,320 So this feature is available and is under beta testing 2295 02:04:16,320 --> 02:04:19,530 right now, but generally if this sounds interesting for you 2296 02:04:19,530 --> 02:04:21,600 especially if you come from a smaller wiki that 2297 02:04:21,600 --> 02:04:25,230 is missing a lot of articles that people may want to learn 2298 02:04:25,230 --> 02:04:28,320 about, you can contact the Wikimedia foundation 2299 02:04:28,320 --> 02:04:33,486 and ask for article placeholder to be enabled on your wiki. 2300 02:04:33,486 --> 02:04:34,860 And again, this is a placeholder. 2301 02:04:34,860 --> 02:04:37,890 Of course, it exists only until someone actually 2302 02:04:37,890 --> 02:04:43,290 writes a proper Esperanto article about Helen Dewitt. 2303 02:04:43,290 --> 02:04:45,060 So I hope this is clear. 2304 02:04:45,060 --> 02:04:50,810 This is all coming from Wikidata on the fly. 2305 02:04:50,810 --> 02:04:51,470 In real time. 2306 02:04:51,470 --> 02:04:57,500 As you can see it includes my latest edits to Helen Dewitt. 2307 02:04:57,500 --> 02:04:58,940 OK. 2308 02:04:58,940 --> 02:05:05,250 Questions about the-- questions about the article placeholder? 2309 02:05:05,250 --> 02:05:09,580 If there are try and put them on the channel. 2310 02:05:09,580 --> 02:05:13,300 And this brings us to one of the main courses of this talk, 2311 02:05:13,300 --> 02:05:15,270 which is querying Wikidata. 2312 02:05:15,270 --> 02:05:18,660 So I've explained how Wikidata works. 2313 02:05:18,660 --> 02:05:19,680 We've walked through it. 2314 02:05:19,680 --> 02:05:20,850 We've added to it. 2315 02:05:20,850 --> 02:05:22,800 We've created a new item. 2316 02:05:22,800 --> 02:05:26,360 We learned how to contribute during our commutes. 2317 02:05:26,360 --> 02:05:30,150 And all this was you kept promising us, 2318 02:05:30,150 --> 02:05:32,050 Asaf, that this would be-- 2319 02:05:32,050 --> 02:05:34,690 this would enable these amazing queries. 2320 02:05:34,690 --> 02:05:37,960 So time to make good on that. 2321 02:05:37,960 --> 02:05:42,880 The URL you need to remember is query.wikidata.org. 2322 02:05:42,880 --> 02:05:49,390 And that will take you to a query system that 2323 02:05:49,390 --> 02:05:52,510 uses a language called SPARQL. 2324 02:05:52,510 --> 02:05:58,150 SPARQL, spelt with a Q. This language 2325 02:05:58,150 --> 02:06:01,690 is not a Wikimedia creation. 2326 02:06:01,690 --> 02:06:06,010 It's a standardized language used for querying linked data 2327 02:06:06,010 --> 02:06:07,540 sources. 2328 02:06:07,540 --> 02:06:10,720 And because of that there are there 2329 02:06:10,720 --> 02:06:14,590 are certain usability prices that we pay for using SPARQL, 2330 02:06:14,590 --> 02:06:16,010 for using a standard language. 2331 02:06:16,010 --> 02:06:19,570 It's not completely custom made for querying Wikidata, 2332 02:06:19,570 --> 02:06:21,740 and we'll see that in just a moment. 2333 02:06:21,740 --> 02:06:23,530 The principle to remember about Wikidata 2334 02:06:23,530 --> 02:06:27,880 query is that Wikidata will tell you everything it knows, 2335 02:06:27,880 --> 02:06:29,470 but no more. 2336 02:06:29,470 --> 02:06:32,440 I have anticipated this several times already, right? 2337 02:06:32,440 --> 02:06:35,980 Until this moment when we taught Wikidata data 2338 02:06:35,980 --> 02:06:38,590 that Helen Dewitt speaks Latin, she 2339 02:06:38,590 --> 02:06:41,500 would not have appeared in query results 2340 02:06:41,500 --> 02:06:45,974 asking who are American writers who speak Latin? 2341 02:06:45,974 --> 02:06:47,140 She would not have appeared. 2342 02:06:47,140 --> 02:06:49,090 But as of this afternoon, she will 2343 02:06:49,090 --> 02:06:52,950 appear because I've added that piece of information. 2344 02:06:52,950 --> 02:07:01,380 So a result of that principle is that you can never say, 2345 02:07:01,380 --> 02:07:05,950 well I ran a Wikidata query and this 2346 02:07:05,950 --> 02:07:11,510 is the list of Flemish painters who are sons of painters. 2347 02:07:11,510 --> 02:07:12,310 The list. 2348 02:07:12,310 --> 02:07:14,110 That these are all the Flemish painters 2349 02:07:14,110 --> 02:07:15,220 who are sons of painters. 2350 02:07:15,220 --> 02:07:19,390 That is never something you can say based on a Wikidata query, 2351 02:07:19,390 --> 02:07:22,390 because of course, maybe not all the Flemish painters 2352 02:07:22,390 --> 02:07:26,020 who are sons of painters have been expressed in Wikidata data 2353 02:07:26,020 --> 02:07:26,760 yet. 2354 02:07:26,760 --> 02:07:28,840 Wikidata doesn't know about some of them, 2355 02:07:28,840 --> 02:07:30,340 or maybe it knows about all of them 2356 02:07:30,340 --> 02:07:32,500 but doesn't know the important fact 2357 02:07:32,500 --> 02:07:35,200 that this person is the son of that person, 2358 02:07:35,200 --> 02:07:38,740 because those properties have not been added. 2359 02:07:38,740 --> 02:07:40,940 And so they cannot be included in the results. 2360 02:07:40,940 --> 02:07:42,550 So the results of a Wikidata query 2361 02:07:42,550 --> 02:07:46,870 are never the definitive sets. 2362 02:07:46,870 --> 02:07:49,600 What you can say about a Wikidata query is here 2363 02:07:49,600 --> 02:07:52,840 are some Flemish painters who are sons of painters. 2364 02:07:52,840 --> 02:07:56,260 Here are some cities with female mayors. 2365 02:07:56,260 --> 02:07:58,270 Whatever it is you're querying about 2366 02:07:58,270 --> 02:08:01,030 is never guaranteed to be complete 2367 02:08:01,030 --> 02:08:03,580 because Wikidata, like Wikipedia, is 2368 02:08:03,580 --> 02:08:05,530 a work in progress. 2369 02:08:05,530 --> 02:08:13,240 And of course, the more we teach Wikidata the 2370 02:08:13,240 --> 02:08:16,240 more useful it becomes. 2371 02:08:16,240 --> 02:08:22,520 OK so lets go and see those queries. 2372 02:08:22,520 --> 02:08:25,990 So this is query.wikidata.org. 2373 02:08:25,990 --> 02:08:29,000 It's not the wiki. 2374 02:08:29,000 --> 02:08:29,500 All right? 2375 02:08:29,500 --> 02:08:32,530 So this isn't like some page on the wiki itself. 2376 02:08:32,530 --> 02:08:35,099 This is kind of an external system. 2377 02:08:35,099 --> 02:08:35,890 So it's not a wiki. 2378 02:08:35,890 --> 02:08:37,960 You can see I don't have a user page here. 2379 02:08:37,960 --> 02:08:39,520 I don't have a history tab. 2380 02:08:39,520 --> 02:08:40,960 This isn't a wiki page. 2381 02:08:40,960 --> 02:08:44,560 This is a special kind of tool or system. 2382 02:08:44,560 --> 02:08:51,330 And it invites me to input a SPARQL query. 2383 02:08:51,330 --> 02:08:55,060 Now most of us do not speak SPARQL. 2384 02:08:55,060 --> 02:08:59,800 It's a a technical language. 2385 02:08:59,800 --> 02:09:01,720 It's a query language. 2386 02:09:01,720 --> 02:09:06,760 Some of you may be thinking about SQL, the database query 2387 02:09:06,760 --> 02:09:08,500 language. 2388 02:09:08,500 --> 02:09:13,330 SPARQL is named with kind of a wink, or a nod, to SQL. 2389 02:09:13,330 --> 02:09:17,440 But, I warn you, if you are comfortable in 2390 02:09:17,440 --> 02:09:22,750 SQL don't expect to carry over your knowledge of SQL 2391 02:09:22,750 --> 02:09:23,550 into SPARQL. 2392 02:09:23,550 --> 02:09:26,140 They're not the same. 2393 02:09:26,140 --> 02:09:27,940 They are superficially similar. 2394 02:09:27,940 --> 02:09:28,440 Right? 2395 02:09:28,440 --> 02:09:31,530 So they both use the keyword select, 2396 02:09:31,530 --> 02:09:35,010 and they use the word where, and they use things like limit, 2397 02:09:35,010 --> 02:09:35,770 and order. 2398 02:09:35,770 --> 02:09:38,190 So again, if you know this already from SQL 2399 02:09:38,190 --> 02:09:40,500 those mean roughly the same things, 2400 02:09:40,500 --> 02:09:44,550 but don't expect it to behave just like SQL. 2401 02:09:44,550 --> 02:09:49,800 You do need to spend some time understanding how SPARQL works. 2402 02:09:49,800 --> 02:09:52,560 So, by all means, I invite you to go and read 2403 02:09:52,560 --> 02:09:55,680 one of the many fine SPARQL tutorials that 2404 02:09:55,680 --> 02:09:59,590 are out there on the web, or to click the Help button here, 2405 02:09:59,590 --> 02:10:03,930 which also includes help about SPARQL. 2406 02:10:03,930 --> 02:10:08,440 But I also know that most of us when 2407 02:10:08,440 --> 02:10:12,580 we want to do some advanced formatting on wiki, 2408 02:10:12,580 --> 02:10:16,090 for example, we don't go and read the help page 2409 02:10:16,090 --> 02:10:18,220 on templates, right? 2410 02:10:18,220 --> 02:10:21,460 We go to a page that already does what we want to do, 2411 02:10:21,460 --> 02:10:27,430 and adopt and adapt the code from that other page, right? 2412 02:10:27,430 --> 02:10:30,610 So we just take something that does roughly what we want, 2413 02:10:30,610 --> 02:10:33,280 and just copy it over and change what we need to change. 2414 02:10:33,280 --> 02:10:35,620 That is a very pragmatic and reasonable way 2415 02:10:35,620 --> 02:10:37,420 to do things which is why-- 2416 02:10:37,420 --> 02:10:39,850 and the wiki data engineers know this, 2417 02:10:39,850 --> 02:10:43,300 which is why they prepared this very handy button for us 2418 02:10:43,300 --> 02:10:45,580 called examples. 2419 02:10:45,580 --> 02:10:47,710 We click the examples button. 2420 02:10:47,710 --> 02:10:52,390 And, oh my god, there is a ton of-- well there's 312 example 2421 02:10:52,390 --> 02:10:55,582 queries for us to choose from. 2422 02:10:55,582 --> 02:10:57,040 And we can just pick something that 2423 02:10:57,040 --> 02:11:00,310 is roughly like what we're trying to find out, 2424 02:11:00,310 --> 02:11:02,740 and then just change what needs changing. 2425 02:11:02,740 --> 02:11:05,410 So let's take a very simple one. 2426 02:11:05,410 --> 02:11:07,020 The cats query. 2427 02:11:07,020 --> 02:11:10,270 Maybe one of the simplest you could possibly have. 2428 02:11:10,270 --> 02:11:13,510 And let's run it first and then I'll kind of 2429 02:11:13,510 --> 02:11:16,420 walk you through it. 2430 02:11:16,420 --> 02:11:18,460 The goal here is not to teach you SPARQL, 2431 02:11:18,460 --> 02:11:20,860 but to get you to be kind of literate in SPARQL. 2432 02:11:20,860 --> 02:11:23,980 To kind of understand why this does what it does. 2433 02:11:23,980 --> 02:11:25,730 So let's run this query first. 2434 02:11:25,730 --> 02:11:31,390 We click Run and here I have results at the bottom. 2435 02:11:31,390 --> 02:11:34,060 The item, which is just a Wikidata item, 2436 02:11:34,060 --> 02:11:35,290 which of course is a number. 2437 02:11:35,290 --> 02:11:38,860 Remember, wiki data thinks of items as queue numbers. 2438 02:11:38,860 --> 02:11:40,900 And the label, because we're humans 2439 02:11:40,900 --> 02:11:43,190 and we prefer words to numbers. 2440 02:11:43,190 --> 02:11:49,870 So these 114 results are all the cats 2441 02:11:49,870 --> 02:11:53,310 that wiki data knows about. 2442 02:11:53,310 --> 02:11:55,380 Is this all the cats in the world? 2443 02:11:55,380 --> 02:11:57,320 No of course not, remember? 2444 02:11:57,320 --> 02:11:59,730 It's all the cats Wikidata knows about, which 2445 02:11:59,730 --> 02:12:01,410 means they're somehow notable. 2446 02:12:01,410 --> 02:12:05,130 I mean someone bothered to describe them on Wikidata. 2447 02:12:05,130 --> 02:12:12,570 And Wikidata was told this item is an instance of cat. 2448 02:12:12,570 --> 02:12:13,620 Right? 2449 02:12:13,620 --> 02:12:17,040 So these are those cats. 2450 02:12:17,040 --> 02:12:18,540 And we can click any of them. 2451 02:12:18,540 --> 02:12:20,190 I don't know, Pixel, for example. 2452 02:12:20,190 --> 02:12:21,780 Click the Wikipedia item. 2453 02:12:21,780 --> 02:12:24,090 And here is the Wikidata item about Pixel 2454 02:12:24,090 --> 02:12:25,860 with the queue number. 2455 02:12:25,860 --> 02:12:28,980 And he is a tortoiseshell cat. 2456 02:12:28,980 --> 02:12:32,640 And as you can see instance of cat. 2457 02:12:32,640 --> 02:12:33,610 OK. 2458 02:12:33,610 --> 02:12:37,220 And he is five inches high. 2459 02:12:37,220 --> 02:12:41,780 And he is apparently documented in Indonesian, In Bahasa. 2460 02:12:41,780 --> 02:12:45,080 Right here this is Pixel. 2461 02:12:45,080 --> 02:12:50,060 And he is apparently somehow related to the Guinness World 2462 02:12:50,060 --> 02:12:52,160 Records book. 2463 02:12:52,160 --> 02:12:54,650 I don't speak Bahasa, so I don't know exactly why 2464 02:12:54,650 --> 02:12:56,120 this cat is so notable. 2465 02:12:56,120 --> 02:12:58,889 But, of course, cats can become notable 2466 02:12:58,889 --> 02:12:59,930 for all kinds of reasons. 2467 02:12:59,930 --> 02:13:02,204 Maybe they're a YouTube sensation, 2468 02:13:02,204 --> 02:13:03,620 you know, maybe they were involved 2469 02:13:03,620 --> 02:13:05,330 in some historical event. 2470 02:13:05,330 --> 02:13:09,410 I like this cat named Gladstone. 2471 02:13:09,410 --> 02:13:16,590 This cat named Gladstone is-- 2472 02:13:16,590 --> 02:13:19,950 he has position held Chief Mouser 2473 02:13:19,950 --> 02:13:22,320 to Her Majesty's Treasury. 2474 02:13:22,320 --> 02:13:25,230 This is an official cat with a job. 2475 02:13:25,230 --> 02:13:29,190 And he has been holding this job, mind you, since the 28th 2476 02:13:29,190 --> 02:13:31,570 of June this past year. 2477 02:13:31,570 --> 02:13:32,970 That's the start time. 2478 02:13:32,970 --> 02:13:35,760 And there is no end time which means he currently 2479 02:13:35,760 --> 02:13:38,850 holds the position of Chief Mouser 2480 02:13:38,850 --> 02:13:40,470 to her Majesty's Treasury. 2481 02:13:40,470 --> 02:13:42,750 His employer is Her Majesty's Treasury. 2482 02:13:42,750 --> 02:13:44,290 He's a male creature. 2483 02:13:44,290 --> 02:13:46,650 And Wikidata knows that this cat is 2484 02:13:46,650 --> 02:13:53,127 named after William Gladstone, the Victorian prime minister. 2485 02:13:53,127 --> 02:13:54,960 Of course if I don't know who this person is 2486 02:13:54,960 --> 02:13:57,540 I can click through and learn that he 2487 02:13:57,540 --> 02:14:01,860 was a liberal politician and prime minister, right? 2488 02:14:01,860 --> 02:14:03,390 He even has a Twitter account. 2489 02:14:03,390 --> 02:14:05,910 And Wikidata sends me right to it. 2490 02:14:05,910 --> 02:14:08,040 The treasury cat Twitter account. 2491 02:14:08,040 --> 02:14:11,010 And he has articles in German, and English, 2492 02:14:11,010 --> 02:14:15,520 and of course Japanese, because he's a cat. 2493 02:14:15,520 --> 02:14:16,020 All right. 2494 02:14:16,020 --> 02:14:19,500 So this was a very simple query. 2495 02:14:19,500 --> 02:14:21,400 Let's find out why it works. 2496 02:14:21,400 --> 02:14:21,900 OK. 2497 02:14:21,900 --> 02:14:25,800 So what did we actually tell Wikidata to do for us? 2498 02:14:25,800 --> 02:14:31,650 We said, please select some items for us 2499 02:14:31,650 --> 02:14:33,580 along with their labels. 2500 02:14:33,580 --> 02:14:34,080 OK? 2501 02:14:34,080 --> 02:14:36,180 Along with their human readable labels 2502 02:14:36,180 --> 02:14:42,010 because if I remove this label what I get is, see, 2503 02:14:42,010 --> 02:14:44,200 just a list of item numbers. 2504 02:14:44,200 --> 02:14:45,280 That's not as fun. 2505 02:14:45,280 --> 02:14:46,930 So that's what this little bit did. 2506 02:14:46,930 --> 02:14:49,630 I just said, give me the items, but also they're 2507 02:14:49,630 --> 02:14:52,330 human readable label. 2508 02:14:52,330 --> 02:14:54,620 And I want you to select a bunch of items, 2509 02:14:54,620 --> 02:14:56,770 but not just any random bunch of items, 2510 02:14:56,770 --> 02:15:01,210 I want to select items where a certain condition holds. 2511 02:15:01,210 --> 02:15:02,790 What is the condition? 2512 02:15:02,790 --> 02:15:06,430 The condition is that the item that I want you to select 2513 02:15:06,430 --> 02:15:14,360 needs to have property 31 with a value of Q146. 2514 02:15:14,360 --> 02:15:15,670 Well, that's helpful. 2515 02:15:15,670 --> 02:15:18,070 If I hover over these numbers-- 2516 02:15:18,070 --> 02:15:19,750 Again, I get the human readable version. 2517 02:15:19,750 --> 02:15:23,530 So I'm looking for items that have property 2518 02:15:23,530 --> 02:15:28,841 instance of with the value cat. 2519 02:15:28,841 --> 02:15:29,340 Right? 2520 02:15:29,340 --> 02:15:31,173 Because that's literally what I want, right? 2521 02:15:31,173 --> 02:15:33,960 I want all the items that have a property, a statement, that 2522 02:15:33,960 --> 02:15:36,840 says instance of cat. 2523 02:15:36,840 --> 02:15:37,950 That's the condition. 2524 02:15:37,950 --> 02:15:41,640 I'm not interested in items that are instance of book, 2525 02:15:41,640 --> 02:15:43,200 or instance of human. 2526 02:15:43,200 --> 02:15:46,290 I'm interested in instance of cat. 2527 02:15:46,290 --> 02:15:51,090 That is the only condition here in this query. 2528 02:15:51,090 --> 02:15:55,800 This complicated line I ask you to basically ignore. 2529 02:15:55,800 --> 02:15:57,510 This is one of those sacrifices that we 2530 02:15:57,510 --> 02:16:00,720 make for using a standard language like SPARQL. 2531 02:16:00,720 --> 02:16:02,820 But the role of this complicated line 2532 02:16:02,820 --> 02:16:04,920 is to basically ensure that we get 2533 02:16:04,920 --> 02:16:07,860 the English label for that cat. 2534 02:16:07,860 --> 02:16:08,817 OK? 2535 02:16:08,817 --> 02:16:09,900 So don't worry about that. 2536 02:16:09,900 --> 02:16:11,550 Just leave it there. 2537 02:16:11,550 --> 02:16:13,320 And we run the query and we get the list 2538 02:16:13,320 --> 02:16:17,330 of cats with their English labels, and that is awesome. 2539 02:16:17,330 --> 02:16:21,510 By the way, if I change EN, without really understanding 2540 02:16:21,510 --> 02:16:27,260 this line, if I change EN to HE, for Hebrew, 2541 02:16:27,260 --> 02:16:30,160 I get the same results with a Hebrew label. 2542 02:16:30,160 --> 02:16:33,670 Of course, these cats, nobody bothered to give them 2543 02:16:33,670 --> 02:16:35,709 Hebrew labels unfortunately. 2544 02:16:35,709 --> 02:16:37,570 So I get the queue number. 2545 02:16:37,570 --> 02:16:42,874 But if I changed it to Japanese, JA, 2546 02:16:42,874 --> 02:16:45,290 I would get still a bunch of queue numbers for where there 2547 02:16:45,290 --> 02:16:47,389 isn't a Japanese label, but I would get the labels 2548 02:16:47,389 --> 02:16:48,781 in Japanese. 2549 02:16:48,781 --> 02:16:49,280 OK? 2550 02:16:49,280 --> 02:16:51,260 So this is an example of how you don't even 2551 02:16:51,260 --> 02:16:54,620 need to understand all the syntax of this query 2552 02:16:54,620 --> 02:16:56,100 to adapt it to your needs. 2553 02:16:56,100 --> 02:16:58,070 If you want this query as is, but you 2554 02:16:58,070 --> 02:17:00,320 want the labels in Japanese, you can just 2555 02:17:00,320 --> 02:17:03,190 change the language code here. 2556 02:17:03,190 --> 02:17:06,559 OK so that is all this query does. 2557 02:17:06,559 --> 02:17:08,870 Again, just give me the items that 2558 02:17:08,870 --> 02:17:17,590 have property 31, instance of, with a value 146, which is cat. 2559 02:17:17,590 --> 02:17:20,379 Let's take a question just about this very simple query 2560 02:17:20,379 --> 02:17:25,809 before we advance to more complicated queries. 2561 02:17:25,809 --> 02:17:29,200 Any questions just about this? 2562 02:17:29,200 --> 02:17:32,850 Like, did anyone kind of really lose me talking 2563 02:17:32,850 --> 02:17:35,010 about this simple query? 2564 02:17:35,010 --> 02:17:39,389 Again, this query just tells Wikidata, get me all the items 2565 02:17:39,389 --> 02:17:41,280 that somewhere among their statements 2566 02:17:41,280 --> 02:17:44,219 have instance of cat. 2567 02:17:44,219 --> 02:17:46,670 That's the only condition. 2568 02:17:46,670 --> 02:17:47,740 No questions. 2569 02:17:47,740 --> 02:17:49,959 OK, feel free to ask if you'd come up with one. 2570 02:17:49,959 --> 02:17:54,709 So let's complicate things a little. 2571 02:17:54,709 --> 02:17:59,365 Let's ask only for male cats. 2572 02:18:02,080 --> 02:18:03,070 OK. 2573 02:18:03,070 --> 02:18:07,330 Remember this cat Gladstone is male, 2574 02:18:07,330 --> 02:18:09,850 and we know this because he has a property called 2575 02:18:09,850 --> 02:18:14,320 sex or gender, and the value is male creature, right? 2576 02:18:14,320 --> 02:18:17,950 So let's add another condition right here 2577 02:18:17,950 --> 02:18:19,860 under the first condition. 2578 02:18:19,860 --> 02:18:20,870 OK? 2579 02:18:20,870 --> 02:18:22,750 This is a new line. 2580 02:18:22,750 --> 02:18:24,940 And I'm adding a new condition to the query. 2581 02:18:24,940 --> 02:18:30,520 I'm saying, not only do I want this item that you return 2582 02:18:30,520 --> 02:18:35,469 to be instance of cat, I also want this same item 2583 02:18:35,469 --> 02:18:39,280 to have another property, the property sex or gender. 2584 02:18:39,280 --> 02:18:40,299 Right? 2585 02:18:40,299 --> 02:18:43,480 And I need to refer to the property by number. 2586 02:18:43,480 --> 02:18:45,760 But don't worry, Wikidata will help you. 2587 02:18:45,760 --> 02:18:49,500 So you start with this prefix, Wikidata WDDT. 2588 02:18:52,520 --> 02:18:54,980 Again, just ignore that prefix it's 2589 02:18:54,980 --> 02:18:58,940 one of the features of SPARQL that we need to respect. 2590 02:18:58,940 --> 02:19:02,715 WDT colon, and then I can just type control space 2591 02:19:02,715 --> 02:19:04,340 to do a search, to do an auto complete. 2592 02:19:04,340 --> 02:19:08,090 So I can just type sex and Wikidata helpfully 2593 02:19:08,090 --> 02:19:11,760 offers me a drop down with relevant properties. 2594 02:19:11,760 --> 02:19:15,200 So I click property 21, which is the sex or gender property. 2595 02:19:15,200 --> 02:19:17,629 And then I say, so I want the sex or gender property 2596 02:19:17,629 --> 02:19:19,670 to have the Wikidata value. 2597 02:19:19,670 --> 02:19:21,799 Again, control space. 2598 02:19:21,799 --> 02:19:25,340 And I can just say male creature. 2599 02:19:25,340 --> 02:19:25,850 See? 2600 02:19:25,850 --> 02:19:30,950 There's a different item for male, as inhuman, 2601 02:19:30,950 --> 02:19:33,799 and a different one for male creature, for reasons 2602 02:19:33,799 --> 02:19:34,910 that we won't go into. 2603 02:19:34,910 --> 02:19:36,535 Let's pick male creature, because we're 2604 02:19:36,535 --> 02:19:38,040 talking about cats here. 2605 02:19:38,040 --> 02:19:38,540 All right. 2606 02:19:38,540 --> 02:19:42,080 And add a period here at the end and click Run. 2607 02:19:42,080 --> 02:19:48,330 And instead of 114 cats, we get, this time, we got 43 results. 2608 02:19:48,330 --> 02:19:53,360 Including our friend Gladstone who is a male creature cat. 2609 02:19:53,360 --> 02:19:58,530 So that means all the rest are female, right? 2610 02:19:58,530 --> 02:20:00,410 Wrong. 2611 02:20:00,410 --> 02:20:00,980 Wrong. 2612 02:20:00,980 --> 02:20:02,840 That does not mean that at all. 2613 02:20:02,840 --> 02:20:06,530 What it means is of the 114 items that 2614 02:20:06,530 --> 02:20:11,960 have instance of cat, only 43 have explicitly 2615 02:20:11,960 --> 02:20:14,690 sex male creature. 2616 02:20:14,690 --> 02:20:17,570 The rest of them do not. 2617 02:20:17,570 --> 02:20:21,800 Maybe because they have sex female creature, 2618 02:20:21,800 --> 02:20:25,930 but maybe because they don't have that property at all. 2619 02:20:25,930 --> 02:20:28,290 I'm emphasizing this to kind of help 2620 02:20:28,290 --> 02:20:31,770 you train yourself to correctly interpret 2621 02:20:31,770 --> 02:20:34,140 the results of queries from Wikidata. 2622 02:20:34,140 --> 02:20:36,870 Don't jump into this kind of simplistic conclusion, 2623 02:20:36,870 --> 02:20:41,820 OK there's 114 total, 43 male, therefore the rest are female. 2624 02:20:41,820 --> 02:20:43,520 That is not correct. 2625 02:20:43,520 --> 02:20:45,030 OK? 2626 02:20:45,030 --> 02:20:49,740 But 43 of those explicitly had another statement, sex 2627 02:20:49,740 --> 02:20:52,530 or gender, male creature. 2628 02:20:52,530 --> 02:20:55,020 So I just added another condition, 2629 02:20:55,020 --> 02:20:58,290 and now my query is asking two separate things 2630 02:20:58,290 --> 02:21:00,150 about the results. 2631 02:21:00,150 --> 02:21:04,472 They need to be a cat and a male creature. 2632 02:21:04,472 --> 02:21:06,270 AUDIENCE: Maybe we should see how many 2633 02:21:06,270 --> 02:21:08,100 cats have Twitter accounts. 2634 02:21:08,100 --> 02:21:11,440 But there is a question from YouTube, 2635 02:21:11,440 --> 02:21:14,220 which is will you talk about the export possibilities 2636 02:21:14,220 --> 02:21:17,280 of the result of the query? 2637 02:21:17,280 --> 02:21:18,420 ASAF BARTOV: Absolutely. 2638 02:21:18,420 --> 02:21:21,000 Absolutely I will in just a little bit. 2639 02:21:21,000 --> 02:21:23,010 I mean there is, in addition to just getting 2640 02:21:23,010 --> 02:21:28,350 this kind of table, I can get these results in other formats. 2641 02:21:28,350 --> 02:21:30,360 And I can also download these results. 2642 02:21:30,360 --> 02:21:32,820 I can click the Download button and get them 2643 02:21:32,820 --> 02:21:35,070 as a comma separated file, tab separated 2644 02:21:35,070 --> 02:21:38,910 file, a JSON file, which is useful for programmatic uses. 2645 02:21:38,910 --> 02:21:40,590 I can also get a link. 2646 02:21:40,590 --> 02:21:42,330 So I can get a link to this query. 2647 02:21:42,330 --> 02:21:45,990 I mean, I spent all this time designing this beautiful query. 2648 02:21:45,990 --> 02:21:50,280 I can get a short URL that was generated especially for me 2649 02:21:50,280 --> 02:21:52,170 right now with a tiny URL. 2650 02:21:52,170 --> 02:21:54,690 I can just paste this into Twitter and go, 2651 02:21:54,690 --> 02:21:59,280 hey people look at all the male cats that Wikidata knows about. 2652 02:21:59,280 --> 02:22:01,170 OK, this is not a very exciting query. 2653 02:22:01,170 --> 02:22:03,900 But once I get to a really complicated exciting query 2654 02:22:03,900 --> 02:22:07,650 I can totally share that very easily through this. 2655 02:22:07,650 --> 02:22:09,750 And we will get to more interesting queries 2656 02:22:09,750 --> 02:22:11,740 in just a second. 2657 02:22:11,740 --> 02:22:16,400 Any questions on this kind of basic querying so far? 2658 02:22:16,400 --> 02:22:17,940 OK. 2659 02:22:17,940 --> 02:22:25,340 So that was a very simple example. 2660 02:22:25,340 --> 02:22:30,250 Let's spend a moment exploring. 2661 02:22:30,250 --> 02:22:38,920 So this cat Gladstone was named after this dude, William 2662 02:22:38,920 --> 02:22:43,550 Gladstone, who was an important British politician. 2663 02:22:43,550 --> 02:22:45,760 I'm sure he's not the only thing out there 2664 02:22:45,760 --> 02:22:48,970 in the universe that's named after Gladstone, right? 2665 02:22:48,970 --> 02:22:52,120 I mean there has got to be, I don't know, 2666 02:22:52,120 --> 02:22:54,790 park benches, planets, asteroids, 2667 02:22:54,790 --> 02:22:59,590 something other than the cat, named after this guy. 2668 02:22:59,590 --> 02:23:04,030 So we can ask Wikidata to tell us all the things 2669 02:23:04,030 --> 02:23:06,850 that, you know, without saying instance of something. 2670 02:23:06,850 --> 02:23:10,960 Like, I don't know, anything named after William Gladstone. 2671 02:23:10,960 --> 02:23:12,760 So how do I do that? 2672 02:23:12,760 --> 02:23:15,310 Same principle. 2673 02:23:15,310 --> 02:23:19,850 Instead of asking about the property instance of, property 2674 02:23:19,850 --> 02:23:25,360 31, instead of that, I will ask about the property 2675 02:23:25,360 --> 02:23:26,860 named after-- 2676 02:23:26,860 --> 02:23:29,120 sorry, named after-- 2677 02:23:29,120 --> 02:23:30,830 I don't need to remember the number. 2678 02:23:30,830 --> 02:23:32,240 I have auto-complete. 2679 02:23:32,240 --> 02:23:35,360 Named after is property 138. 2680 02:23:35,360 --> 02:23:37,430 And I want anything at all that is 2681 02:23:37,430 --> 02:23:42,080 named after this person, William Gladstone. 2682 02:23:42,080 --> 02:23:43,850 Here we go. 2683 02:23:43,850 --> 02:23:45,860 Which is 160852. 2684 02:23:45,860 --> 02:23:46,820 Whatever. 2685 02:23:46,820 --> 02:23:48,230 OK. 2686 02:23:48,230 --> 02:23:50,510 You notice I removed instance of cat. 2687 02:23:50,510 --> 02:23:52,040 I remove the male creature. 2688 02:23:52,040 --> 02:23:55,130 I'm only asking, get me all the items 2689 02:23:55,130 --> 02:23:58,940 that are somehow named after that particular politician. 2690 02:23:58,940 --> 02:24:00,920 And I run the query, and it turns out 2691 02:24:00,920 --> 02:24:05,007 the Wikidata knows about three such things. 2692 02:24:05,007 --> 02:24:06,590 Does that mean that's the only-- these 2693 02:24:06,590 --> 02:24:08,881 are the only three things named after him in the world? 2694 02:24:08,881 --> 02:24:09,939 Of course not. 2695 02:24:09,939 --> 02:24:12,230 But these are the only three items that are in Wikidata 2696 02:24:12,230 --> 02:24:17,720 and explicitly have the property named after Gladstone. 2697 02:24:17,720 --> 02:24:20,150 For all I know, there may be a village 2698 02:24:20,150 --> 02:24:23,600 in England called Gladstone named after this person. 2699 02:24:23,600 --> 02:24:27,410 But if nobody added the property, named after, linking 2700 02:24:27,410 --> 02:24:30,950 to the person, he wouldn't show up in the results to my query. 2701 02:24:30,950 --> 02:24:33,750 So Wikidata knows about three such things. 2702 02:24:33,750 --> 02:24:36,110 One of them is something called the Gladstone Professor 2703 02:24:36,110 --> 02:24:37,360 of Government. 2704 02:24:37,360 --> 02:24:40,370 I can click through and see that it's a chair at Oxford 2705 02:24:40,370 --> 02:24:41,180 University, right? 2706 02:24:41,180 --> 02:24:43,470 So it's a position. 2707 02:24:43,470 --> 02:24:49,520 And another is the William Gladstone school number 18. 2708 02:24:49,520 --> 02:24:51,470 William Gladstone school number 18. 2709 02:24:51,470 --> 02:24:52,900 Where is that? 2710 02:24:52,900 --> 02:24:55,380 That is in Sofia, Bulgaria. 2711 02:24:55,380 --> 02:24:56,470 Again. 2712 02:24:56,470 --> 02:24:59,000 All right, so that's a particular school in Bulgaria 2713 02:24:59,000 --> 02:25:02,720 named after William Gladstone. 2714 02:25:02,720 --> 02:25:07,220 And finally, the third result is, of course, our pal 2715 02:25:07,220 --> 02:25:09,800 Gladstone the Cheif Mouser. 2716 02:25:09,800 --> 02:25:12,674 If I click through, that's the cat. 2717 02:25:12,674 --> 02:25:14,090 All right, so that was an example. 2718 02:25:14,090 --> 02:25:15,700 I mean, you saw how easy it was. 2719 02:25:15,700 --> 02:25:18,980 I just named the property and the value that I care about, 2720 02:25:18,980 --> 02:25:21,420 and I get the results. 2721 02:25:21,420 --> 02:25:23,289 Again, I mean, it's kind of a silly example, 2722 02:25:23,289 --> 02:25:24,080 but think about it. 2723 02:25:24,080 --> 02:25:27,570 This is-- how else can you answer that question? 2724 02:25:27,570 --> 02:25:30,470 There's no reference desk, even at a great University 2725 02:25:30,470 --> 02:25:34,250 of Oxford, where you can walk in and say, give me 2726 02:25:34,250 --> 02:25:37,470 a list of things named after Gladstone. 2727 02:25:37,470 --> 02:25:40,590 There's no easy way to answer that unless you happen 2728 02:25:40,590 --> 02:25:44,520 to have a very large structured and linked 2729 02:25:44,520 --> 02:25:48,130 data store, like Wikidata. 2730 02:25:48,130 --> 02:25:50,560 All right, so that was a silly example. 2731 02:25:50,560 --> 02:25:51,280 Let's take some-- 2732 02:25:51,280 --> 02:25:53,113 AUDIENCE: There's a bunch of stuff on there. 2733 02:25:53,113 --> 02:25:54,446 ASAF: Oh, OK. 2734 02:25:54,446 --> 02:25:57,430 AUDIENCE: Can you show easy query on the video? 2735 02:25:57,430 --> 02:26:02,260 And somebody needs to know how to just do property 2736 02:26:02,260 --> 02:26:05,750 exists without giving a specific value. 2737 02:26:05,750 --> 02:26:11,030 And then once you show easy query you reload the page and-- 2738 02:26:11,030 --> 02:26:13,240 ASAF: I don't know easy query. 2739 02:26:13,240 --> 02:26:15,670 So is that a gadget? 2740 02:26:15,670 --> 02:26:17,110 I don't know what easy query is. 2741 02:26:17,110 --> 02:26:19,870 I don't use it. 2742 02:26:19,870 --> 02:26:24,760 So someone can maybe send a link or something? 2743 02:26:24,760 --> 02:26:26,100 Oh it is a gadget. 2744 02:26:26,100 --> 02:26:27,100 I don't have it enabled. 2745 02:26:31,610 --> 02:26:32,480 That is nice. 2746 02:26:32,480 --> 02:26:42,080 So now, what I just did by hand, by formulating the query named 2747 02:26:42,080 --> 02:26:45,200 after Gladstone-- 2748 02:26:45,200 --> 02:26:48,390 I guess this is the-- 2749 02:26:48,390 --> 02:26:48,960 Is it? 2750 02:26:53,000 --> 02:26:53,720 Yeah. 2751 02:26:53,720 --> 02:26:56,050 So this-- I just clicked the three-- 2752 02:26:56,050 --> 02:26:57,470 the ellipsis here. 2753 02:26:57,470 --> 02:26:58,460 Right after the name. 2754 02:26:58,460 --> 02:26:59,630 You see this? 2755 02:26:59,630 --> 02:27:03,050 This was just added by enabling easy query, 2756 02:27:03,050 --> 02:27:04,640 which I just learned about. 2757 02:27:04,640 --> 02:27:07,640 So you just click this and it auto-magically 2758 02:27:07,640 --> 02:27:09,620 made this kind of trivial query. 2759 02:27:09,620 --> 02:27:12,380 Of course, if I want a more complicated query like, 2760 02:27:12,380 --> 02:27:14,510 I don't know, give me all the things that 2761 02:27:14,510 --> 02:27:18,110 are named after Lincoln but are a school, 2762 02:27:18,110 --> 02:27:21,650 I will still need to kind of edit a custom query. 2763 02:27:21,650 --> 02:27:23,450 But this is a super easy and very nice 2764 02:27:23,450 --> 02:27:28,620 way of just doing a very super quick query for exactly this. 2765 02:27:28,620 --> 02:27:29,120 Right? 2766 02:27:29,120 --> 02:27:33,410 Like. what other items have exactly this property and value 2767 02:27:33,410 --> 02:27:35,720 named after William Gladstone? 2768 02:27:35,720 --> 02:27:38,750 So, thank you to whoever made this suggestion 2769 02:27:38,750 --> 02:27:42,140 to demonstrate that, and I'm glad I learned something 2770 02:27:42,140 --> 02:27:45,230 too today. 2771 02:27:45,230 --> 02:27:48,590 Let's move to another sample query. 2772 02:27:48,590 --> 02:27:50,360 Here's a fun example. 2773 02:27:50,360 --> 02:27:56,910 Popular surnames among fictional characters. 2774 02:27:56,910 --> 02:27:58,650 Think about that for a second. 2775 02:27:58,650 --> 02:28:03,030 Popular surnames among fictional characters. 2776 02:28:03,030 --> 02:28:06,510 So we're asking Wikidata to go through all 2777 02:28:06,510 --> 02:28:10,120 the fictional characters you know, 2778 02:28:10,120 --> 02:28:13,510 and of those look through their surnames, group 2779 02:28:13,510 --> 02:28:15,910 them so that you can count them, the repetitions 2780 02:28:15,910 --> 02:28:18,460 of the surnames, and give me the most 2781 02:28:18,460 --> 02:28:21,550 popular surnames among them. 2782 02:28:21,550 --> 02:28:26,280 Additionally, I want you to awesomely present the results 2783 02:28:26,280 --> 02:28:28,020 as a bubble chart. 2784 02:28:28,020 --> 02:28:29,220 Oh, yeah. 2785 02:28:29,220 --> 02:28:31,050 Wikidata can do that. 2786 02:28:31,050 --> 02:28:34,420 And I run the query. 2787 02:28:34,420 --> 02:28:36,750 And check it out. 2788 02:28:36,750 --> 02:28:41,130 The most popular names among fictional characters 2789 02:28:41,130 --> 02:28:45,780 we can say that knows about are Joan, Smith, Taylor, et cetera. 2790 02:28:45,780 --> 02:28:48,450 I mean for all we know, the most popular name 2791 02:28:48,450 --> 02:28:50,770 among fictional characters actually in the world 2792 02:28:50,770 --> 02:28:52,350 may be Wu. 2793 02:28:52,350 --> 02:28:54,790 Or something in Chinese for all we know. 2794 02:28:54,790 --> 02:28:57,930 But if that has not been modeled in Wikidata, 2795 02:28:57,930 --> 02:29:01,020 we're not going to get that. 2796 02:29:01,020 --> 02:29:03,540 So Taylor, Smith, Jones, Williams, 2797 02:29:03,540 --> 02:29:06,870 seem to be the most popular names. 2798 02:29:06,870 --> 02:29:08,400 And again, I could limit this. 2799 02:29:08,400 --> 02:29:11,520 I could make the same query but add, 2800 02:29:11,520 --> 02:29:14,250 only among works whose original language 2801 02:29:14,250 --> 02:29:19,020 was Italian, for example, to get more interesting results if I 2802 02:29:19,020 --> 02:29:21,480 only care about Italian literature. 2803 02:29:21,480 --> 02:29:24,720 But this is an example of how I got awesome bubble 2804 02:29:24,720 --> 02:29:28,170 charts for free, and I can just plug this 2805 02:29:28,170 --> 02:29:30,900 into an awesome presentation that I make. 2806 02:29:30,900 --> 02:29:34,500 Of course I can still look at the raw table. 2807 02:29:34,500 --> 02:29:37,940 So the query still resulted in a bunch of data, right? 2808 02:29:37,940 --> 02:29:42,480 So Smith repeats 41 times, Jones 38 times, Taylor 34 times, 2809 02:29:42,480 --> 02:29:43,750 et cetera, et cetera. 2810 02:29:43,750 --> 02:29:48,960 And down that list. 2811 02:29:48,960 --> 02:29:52,320 And I could, again, I could export this into a file 2812 02:29:52,320 --> 02:29:56,100 and load it up in a spreadsheet, and do additional processing 2813 02:29:56,100 --> 02:29:56,670 on it. 2814 02:29:56,670 --> 02:29:58,560 I can link to it. 2815 02:29:58,560 --> 02:30:02,530 I can do all kinds of awesome things with it. 2816 02:30:02,530 --> 02:30:05,250 So that's another awesome query. 2817 02:30:05,250 --> 02:30:08,460 We don't have to go into every line by line analysis 2818 02:30:08,460 --> 02:30:11,670 here of why this works the way it does. 2819 02:30:11,670 --> 02:30:15,840 I want to show you some other queries first. 2820 02:30:15,840 --> 02:30:22,470 Let's look at-- this is just fun, overall causes of death. 2821 02:30:22,470 --> 02:30:24,870 Again a bubble chart just looking 2822 02:30:24,870 --> 02:30:28,260 at people who died of things, and have 2823 02:30:28,260 --> 02:30:30,760 a cause of death listed. 2824 02:30:30,760 --> 02:30:34,380 And we learn that the most commonly listed cause of death 2825 02:30:34,380 --> 02:30:40,350 is myocardial infarction, pneumonitis, cerebral vascular, 2826 02:30:40,350 --> 02:30:42,620 lung cancer, et cetera, et cetera. 2827 02:30:42,620 --> 02:30:44,850 And again, in a bubble chart. 2828 02:30:44,850 --> 02:30:49,670 And so how does that work? 2829 02:30:49,670 --> 02:30:53,050 So just very briefly, the important parts of this query 2830 02:30:53,050 --> 02:30:59,150 are I'm looking for something, for some person, who 2831 02:30:59,150 --> 02:31:04,240 is instance of 31, instance of Q5, which is human. 2832 02:31:04,240 --> 02:31:05,390 So a human. 2833 02:31:05,390 --> 02:31:07,130 Again, just to kind of limit the query. 2834 02:31:07,130 --> 02:31:11,330 I'm not interested in books or mountains. 2835 02:31:11,330 --> 02:31:14,420 I'm looking for humans who have that same person, 2836 02:31:14,420 --> 02:31:21,150 that same variable PID, should have a 509, meaning-- 2837 02:31:21,150 --> 02:31:22,412 Hello. 2838 02:31:22,412 --> 02:31:24,620 Why don't I have the-- 2839 02:31:24,620 --> 02:31:25,120 Yeah. 2840 02:31:25,120 --> 02:31:28,480 A 509, which is cause of death. 2841 02:31:28,480 --> 02:31:31,540 And that cause of death is another variable, 2842 02:31:31,540 --> 02:31:32,930 that I'm calling CID. 2843 02:31:32,930 --> 02:31:35,410 Now, previously we were saying you 2844 02:31:35,410 --> 02:31:36,850 know I want things that are named 2845 02:31:36,850 --> 02:31:39,550 after Gladstone specifically. 2846 02:31:39,550 --> 02:31:42,000 Only things that have that particular value. 2847 02:31:42,000 --> 02:31:44,320 Here I'm saying I'm looking for things 2848 02:31:44,320 --> 02:31:47,110 that have some cause of death. 2849 02:31:47,110 --> 02:31:48,760 Not a specific one. 2850 02:31:48,760 --> 02:31:50,260 I just wanted to get everything that 2851 02:31:50,260 --> 02:31:54,880 has a statement with some value about property 509 2852 02:31:54,880 --> 02:31:56,530 cause of death. 2853 02:31:56,530 --> 02:31:57,940 OK? 2854 02:31:57,940 --> 02:32:04,410 And then this other bit of magic here, the group by, 2855 02:32:04,410 --> 02:32:07,870 tells Wikidata I'm not actually interested 2856 02:32:07,870 --> 02:32:09,100 in every individual thing. 2857 02:32:09,100 --> 02:32:12,310 I want you to group those causes, and then count them 2858 02:32:12,310 --> 02:32:14,230 and give me the top ones. 2859 02:32:14,230 --> 02:32:15,523 So that's how this query works. 2860 02:32:20,550 --> 02:32:22,320 Here's that query I promised. 2861 02:32:22,320 --> 02:32:26,460 Painters whose fathers were also painters. 2862 02:32:26,460 --> 02:32:28,630 I can only think of a couple. 2863 02:32:28,630 --> 02:32:31,890 I mean, Monet and Vogel. 2864 02:32:31,890 --> 02:32:34,800 But I'm sure Wikidata knows many more. 2865 02:32:34,800 --> 02:32:38,620 So let's run this query. 2866 02:32:38,620 --> 02:32:40,270 And I have 100 results. 2867 02:32:40,270 --> 02:32:43,120 By the way, I have limited it to 100 results just 2868 02:32:43,120 --> 02:32:44,650 to keep it kind of snappy. 2869 02:32:44,650 --> 02:32:47,530 But actually, we could maybe try removing the limit 2870 02:32:47,530 --> 02:32:50,170 and see if Wikidata could tell us 2871 02:32:50,170 --> 02:32:53,890 the total number in Wikidata. 2872 02:32:53,890 --> 02:32:55,120 Yeah, that wasn't too bad. 2873 02:32:55,120 --> 02:32:58,400 So 1,270 results. 2874 02:32:58,400 --> 02:32:59,140 OK. 2875 02:32:59,140 --> 02:33:04,150 Wikidata, already at this early date and it's progress, 2876 02:33:04,150 --> 02:33:07,540 already knows about more than 1,200 painters 2877 02:33:07,540 --> 02:33:10,980 who are sons of painters. 2878 02:33:10,980 --> 02:33:16,140 Sons of male painters, like their father is a painter. 2879 02:33:16,140 --> 02:33:18,120 There may be additional painters who 2880 02:33:18,120 --> 02:33:21,390 are sons of female painters not included in this query. 2881 02:33:21,390 --> 02:33:24,990 Again, always remember what exactly you are asking. 2882 02:33:24,990 --> 02:33:27,840 In this query I was asking about the father. 2883 02:33:27,840 --> 02:33:30,330 I'm leaving out any possible painters who 2884 02:33:30,330 --> 02:33:32,720 are sons of mother painters. 2885 02:33:32,720 --> 02:33:33,390 OK? 2886 02:33:33,390 --> 02:33:35,250 So how does this work? 2887 02:33:35,250 --> 02:33:39,630 I'm asking for the painter along with the human label, 2888 02:33:39,630 --> 02:33:42,630 and the father along with the human label. 2889 02:33:42,630 --> 02:33:47,610 So Michel Monet is the son of Claude Monet. 2890 02:33:47,610 --> 02:33:54,180 And Domenico Tintoretto is the son of the famous Tintoretto 2891 02:33:54,180 --> 02:33:57,210 whose label, you know, is just Tintoretto like Michelangelo. 2892 02:33:57,210 --> 02:33:59,960 You know, you don't always have to have the full name 2893 02:33:59,960 --> 02:34:02,420 in the common label. 2894 02:34:02,420 --> 02:34:07,010 Paloma Picasso is the daughter of Pablo Picasso. 2895 02:34:07,010 --> 02:34:07,510 OK. 2896 02:34:07,510 --> 02:34:11,040 So Wikidata knows about all these results. 2897 02:34:11,040 --> 02:34:14,610 Of course Holbein the Younger son of Holbein the Elder. 2898 02:34:14,610 --> 02:34:15,760 And how did we get there? 2899 02:34:15,760 --> 02:34:20,860 Well we asked Wikidata to look for something, 2900 02:34:20,860 --> 02:34:26,820 let's call it painter, which has 106, which is occupation, 2901 02:34:26,820 --> 02:34:31,100 with a value painter. 2902 02:34:31,100 --> 02:34:31,600 Right? 2903 02:34:31,600 --> 02:34:35,310 This unwieldy number 1028181, that's painter. 2904 02:34:35,310 --> 02:34:40,250 So I'm asking for any item that has occupation painter. 2905 02:34:40,250 --> 02:34:43,300 And let's call that item painter. 2906 02:34:43,300 --> 02:34:49,770 I also want that painter to have a property 22, which is father. 2907 02:34:49,770 --> 02:34:50,850 OK. 2908 02:34:50,850 --> 02:34:52,350 Father. 2909 02:34:52,350 --> 02:34:55,140 And I want it to have some value. 2910 02:34:55,140 --> 02:34:58,770 OK, I'm putting it into another variable called father. 2911 02:34:58,770 --> 02:35:01,320 I could have called it, you know, frog. 2912 02:35:01,320 --> 02:35:04,230 That doesn't change anything, just to be clear. 2913 02:35:04,230 --> 02:35:06,630 What matters is that this is the property father. 2914 02:35:06,630 --> 02:35:10,320 I could have called it anything I want. 2915 02:35:10,320 --> 02:35:13,590 So, and then, I have a third condition. 2916 02:35:13,590 --> 02:35:18,010 That the father, like whatever it says here in property 22, 2917 02:35:18,010 --> 02:35:22,590 I want that father to have himself a property 106 2918 02:35:22,590 --> 02:35:27,750 occupation with a value painter. 2919 02:35:27,750 --> 02:35:28,730 OK? 2920 02:35:28,730 --> 02:35:30,800 These conditions combined to give me 2921 02:35:30,800 --> 02:35:36,080 a list of people who have a father and that father 2922 02:35:36,080 --> 02:35:37,850 has occupation painter as well. 2923 02:35:37,850 --> 02:35:40,550 Of course, if I suddenly, or if you suddenly, 2924 02:35:40,550 --> 02:35:44,480 are consumed by curiosity to know 2925 02:35:44,480 --> 02:35:51,344 who are some politicians who are sons of carpenters? 2926 02:35:51,344 --> 02:35:52,760 You could just change that, right? 2927 02:35:52,760 --> 02:35:56,700 Change the first value from painter to politician. 2928 02:35:56,700 --> 02:36:02,624 Change the third line's value from painter to carpenter. 2929 02:36:02,624 --> 02:36:04,040 Maybe that list will be very short 2930 02:36:04,040 --> 02:36:06,680 because carpenters don't tend to be notable, 2931 02:36:06,680 --> 02:36:08,910 so they wouldn't be represented on Wikidata. 2932 02:36:08,910 --> 02:36:11,990 That's why this works relatively well with painters, right? 2933 02:36:11,990 --> 02:36:14,420 Because most of them are notable. 2934 02:36:14,420 --> 02:36:16,370 But generally you could do that, right? 2935 02:36:16,370 --> 02:36:18,500 That's an example of how you can take a query 2936 02:36:18,500 --> 02:36:22,340 and just replace one of those values, or even the language. 2937 02:36:22,340 --> 02:36:26,840 So again, I could ask for these same painters. 2938 02:36:26,840 --> 02:36:27,650 It's limited again. 2939 02:36:27,650 --> 02:36:31,190 These same painters, but with Arabic labels. 2940 02:36:31,190 --> 02:36:34,880 Same query, but I have Arabic labels for these painters. 2941 02:36:34,880 --> 02:36:37,250 And of course where there is no Arabic label 2942 02:36:37,250 --> 02:36:40,360 I get the queue number. 2943 02:36:40,360 --> 02:36:40,860 OK? 2944 02:36:40,860 --> 02:36:43,650 So that's that query that I promised you, 2945 02:36:43,650 --> 02:36:47,670 painters who sons of painters can be done by Wikidata 2946 02:36:47,670 --> 02:36:49,830 in under one second. 2947 02:36:49,830 --> 02:36:51,480 How awesome is that? 2948 02:36:51,480 --> 02:36:52,950 We can also get some statistics. 2949 02:36:52,950 --> 02:36:55,920 So how about counting total articles 2950 02:36:55,920 --> 02:36:59,740 in a given wiki by gender. 2951 02:36:59,740 --> 02:37:02,070 This is what we call the content gender 2952 02:37:02,070 --> 02:37:06,900 gap, as distinct from the participation gender gap. 2953 02:37:06,900 --> 02:37:10,276 This is the gender gap in what we cover on Wikipedia. 2954 02:37:10,276 --> 02:37:11,400 So let's take one of these. 2955 02:37:16,380 --> 02:37:17,630 So this is a query. 2956 02:37:17,630 --> 02:37:23,130 Articles about women in some given Wikipedia. 2957 02:37:23,130 --> 02:37:23,660 All right. 2958 02:37:23,660 --> 02:37:25,799 So let's take-- 2959 02:37:25,799 --> 02:37:26,340 I don't know. 2960 02:37:26,340 --> 02:37:30,240 Let's take the Tamil Wikipedia. 2961 02:37:30,240 --> 02:37:32,460 That's language code TA. 2962 02:37:32,460 --> 02:37:34,950 So I just put TA here. 2963 02:37:34,950 --> 02:37:38,850 And I click Run, and I get this count. 2964 02:37:38,850 --> 02:37:39,960 That's all I wanted. 2965 02:37:39,960 --> 02:37:41,720 I'm not actually interested in the items, 2966 02:37:41,720 --> 02:37:44,962 like in the list of women on the Tamil Wikipedia. 2967 02:37:44,962 --> 02:37:45,920 I just want the number. 2968 02:37:45,920 --> 02:37:48,510 So I selected the count here. 2969 02:37:48,510 --> 02:37:52,610 And this number turns out to be 2159. 2970 02:37:52,610 --> 02:37:57,300 So there are 2000 articles about women 2971 02:37:57,300 --> 02:38:02,350 the Tamil Wikipedia that Wikidata knows to be female. 2972 02:38:02,350 --> 02:38:02,850 Right? 2973 02:38:02,850 --> 02:38:05,730 I'm asking about the gender field, property 21 again. 2974 02:38:05,730 --> 02:38:08,900 Remember, if there's some article about a woman in Tamil 2975 02:38:08,900 --> 02:38:12,090 Wikipedia, but wiki data doesn't have 2976 02:38:12,090 --> 02:38:14,460 a statement about the gender, that person 2977 02:38:14,460 --> 02:38:15,640 will not be counted here. 2978 02:38:15,640 --> 02:38:18,240 So again, be careful about kind of stating 2979 02:38:18,240 --> 02:38:22,800 that is exactly the number of women articles on Tamil 2980 02:38:22,800 --> 02:38:23,340 Wikipedia. 2981 02:38:23,340 --> 02:38:24,600 That's probably not true. 2982 02:38:24,600 --> 02:38:27,560 I'm sure some of those articles are missing 2983 02:38:27,560 --> 02:38:30,740 a sex or gender or property. 2984 02:38:30,740 --> 02:38:33,150 But for raw statistics, that's probably good, 2985 02:38:33,150 --> 02:38:35,700 because some men are also missing the sex or gender 2986 02:38:35,700 --> 02:38:37,620 statistic property. 2987 02:38:37,620 --> 02:38:41,820 So we could take the same query for men. 2988 02:38:41,820 --> 02:38:43,170 It's essentially the exact same. 2989 02:38:43,170 --> 02:38:48,840 It just has this unwieldy number for males, 6581097. 2990 02:38:48,840 --> 02:38:52,710 I can change this language code again to TA for Tamil. 2991 02:38:52,710 --> 02:38:58,880 And how many men are covered on Tamil Wikipedia 14,649. 2992 02:38:58,880 --> 02:38:59,610 OK. 2993 02:38:59,610 --> 02:39:06,880 So women, 2,100, men, about seven times as many. 2994 02:39:06,880 --> 02:39:07,380 Right? 2995 02:39:07,380 --> 02:39:12,300 So that's the approximate size of the content gender 2996 02:39:12,300 --> 02:39:14,610 gap on Tamil Wikipedia. 2997 02:39:14,610 --> 02:39:18,850 And again, I can complicate this query as much as I want. 2998 02:39:18,850 --> 02:39:21,390 For example, I can try and find out 2999 02:39:21,390 --> 02:39:30,390 if this gender gap is wider or narrower among musicians, 3000 02:39:30,390 --> 02:39:31,350 just as an example. 3001 02:39:31,350 --> 02:39:35,850 I could just add a line here that says occupation musician, 3002 02:39:35,850 --> 02:39:37,890 and then I'm only counting articles 3003 02:39:37,890 --> 02:39:41,190 on Tamil Wikipedia about musicians who are female 3004 02:39:41,190 --> 02:39:43,190 versus articles on Tamil Wikipedia 3005 02:39:43,190 --> 02:39:45,030 about musicians who are male. 3006 02:39:45,030 --> 02:39:47,890 And I can kind of compare the gender-- 3007 02:39:47,890 --> 02:39:53,820 the content gender gap across occupations on Tamil Wikipedia. 3008 02:39:53,820 --> 02:39:56,030 Do you see the important point here? 3009 02:39:56,030 --> 02:39:58,490 Is that this is not just kind of a one purpose query. 3010 02:39:58,490 --> 02:40:01,250 I can just with a single additional conditional suddenly 3011 02:40:01,250 --> 02:40:04,370 make it a much more interesting query, because I break it down 3012 02:40:04,370 --> 02:40:05,540 by occupation. 3013 02:40:05,540 --> 02:40:07,810 Or I break it down by century. 3014 02:40:07,810 --> 02:40:12,530 Do we have more of the coverage gap in 19th century people 3015 02:40:12,530 --> 02:40:13,940 than in 21st century people? 3016 02:40:13,940 --> 02:40:15,560 I mean, I sure hope so, right? 3017 02:40:15,560 --> 02:40:18,480 The patriarchy is weakening somewhat. 3018 02:40:18,480 --> 02:40:21,830 So I wouldn't be surprised if there are many more notable men 3019 02:40:21,830 --> 02:40:23,430 covered about the 19th century. 3020 02:40:23,430 --> 02:40:25,784 But if we are also covering-- 3021 02:40:25,784 --> 02:40:27,200 I mean it's the gender gap is just 3022 02:40:27,200 --> 02:40:29,540 as wide for 21st century people, that would 3023 02:40:29,540 --> 02:40:30,800 be a little disappointing. 3024 02:40:30,800 --> 02:40:35,870 Again that's something I can fairly easily find out 3025 02:40:35,870 --> 02:40:38,980 on Wikidata query. 3026 02:40:38,980 --> 02:40:41,500 Any questions so far, or are you just sharing links? 3027 02:40:41,500 --> 02:40:43,160 AUDIENCE: Yep there is one. 3028 02:40:43,160 --> 02:40:47,480 So somebody is wondering if you can demonstrate, or at least 3029 02:40:47,480 --> 02:40:50,420 give a short answer of the latter of this question. 3030 02:40:50,420 --> 02:40:52,530 Is it possible using in Wikidata SPARQL 3031 02:40:52,530 --> 02:40:55,520 to find specific Wikidata articles, e.g. 3032 02:40:55,520 --> 02:40:59,060 featured articles, of a certain language which do not 3033 02:40:59,060 --> 02:41:01,160 exist in another language. 3034 02:41:01,160 --> 02:41:03,770 I know it is possible to find category based 3035 02:41:03,770 --> 02:41:05,820 results using a PET scan tool. 3036 02:41:05,820 --> 02:41:09,110 But can we specify that by selecting e.g. 3037 02:41:09,110 --> 02:41:10,055 featured articles? 3038 02:41:10,055 --> 02:41:11,390 ASAF BARTOV: Yes. 3039 02:41:11,390 --> 02:41:12,600 Excellent question. 3040 02:41:12,600 --> 02:41:14,120 It is possible, indeed. 3041 02:41:14,120 --> 02:41:17,570 And I will demonstrate one such query. 3042 02:41:17,570 --> 02:41:19,190 Another query that I already mentioned 3043 02:41:19,190 --> 02:41:24,840 largest cities in the world with a female mayor. 3044 02:41:24,840 --> 02:41:29,190 This query-- let's close some of these tabs 3045 02:41:29,190 --> 02:41:30,315 before my browser chokes. 3046 02:41:33,600 --> 02:41:36,840 So this query lists the major world cities 3047 02:41:36,840 --> 02:41:39,120 run by women currently. 3048 02:41:39,120 --> 02:41:45,650 And the answer is Mumbai, Mexico City, Tokyo, bunch of others. 3049 02:41:49,470 --> 02:41:52,371 And wait-- that's not it at all. 3050 02:41:52,371 --> 02:41:53,370 I clicked the wrong one. 3051 02:41:53,370 --> 02:41:55,050 That's the map of paintings. 3052 02:41:55,050 --> 02:41:55,800 OK. 3053 02:41:55,800 --> 02:41:57,370 Let's demonstrate that for a second. 3054 02:41:57,370 --> 02:41:59,520 So this is the map of all paintings 3055 02:41:59,520 --> 02:42:03,870 for which we know a location with the count per location. 3056 02:42:03,870 --> 02:42:07,770 And the results are awesomely presented on a map. 3057 02:42:07,770 --> 02:42:08,830 OK. 3058 02:42:08,830 --> 02:42:12,420 Again, under the hood this is a table, of course, of results. 3059 02:42:12,420 --> 02:42:15,660 But, awesomely, I can browse it as a map. 3060 02:42:15,660 --> 02:42:20,320 So here is a map of the world with all the paintings 3061 02:42:20,320 --> 02:42:22,060 that Wikidata knows about. 3062 02:42:22,060 --> 02:42:23,920 Not just knows about the paintings, 3063 02:42:23,920 --> 02:42:28,180 but knows about their location in a museum. 3064 02:42:28,180 --> 02:42:30,670 Not surprisingly Europe is much better 3065 02:42:30,670 --> 02:42:35,540 covered than Russia or Africa. 3066 02:42:35,540 --> 02:42:40,150 There is a huge gap in contribution to Wikidata 3067 02:42:40,150 --> 02:42:41,740 from these countries. 3068 02:42:41,740 --> 02:42:43,780 And some of it can be fixed. 3069 02:42:43,780 --> 02:42:47,740 And of course there is much more documentation, and much more 3070 02:42:47,740 --> 02:42:50,260 art in Europe. 3071 02:42:50,260 --> 02:42:54,280 But if we zoom in, I don't know, Rome probably 3072 02:42:54,280 --> 02:42:55,900 has a few paintings. 3073 02:42:55,900 --> 02:42:56,400 Right? 3074 02:43:00,080 --> 02:43:02,288 Hello. 3075 02:43:02,288 --> 02:43:04,200 Sorry. 3076 02:43:04,200 --> 02:43:09,780 It's-- Yes. 3077 02:43:09,780 --> 02:43:13,290 Vatican City sounds like a good bet, right? 3078 02:43:13,290 --> 02:43:14,290 I can zoom in here. 3079 02:43:14,290 --> 02:43:16,290 And I can just click one of these dots 3080 02:43:16,290 --> 02:43:21,400 and see in this point there are two paintings. 3081 02:43:21,400 --> 02:43:25,270 And in this one there is one and it's the Archbasilica 3082 02:43:25,270 --> 02:43:27,460 of St. John Lateran. 3083 02:43:27,460 --> 02:43:31,060 Let's see, this is the actual St. Peter, right? 3084 02:43:31,060 --> 02:43:33,650 Sistine Chapel has 23 paintings. 3085 02:43:33,650 --> 02:43:34,330 What? 3086 02:43:34,330 --> 02:43:36,670 The Sistine Chapel has way more than 23 paintings. 3087 02:43:36,670 --> 02:43:40,330 Correct, but 23 of them are documented on Wikidata. 3088 02:43:40,330 --> 02:43:43,330 Have their own item for the painting, not 3089 02:43:43,330 --> 02:43:45,280 the Sistine Chapel, the painting has 3090 02:43:45,280 --> 02:43:49,540 an item that lists its being in the Sistine Chapel. 3091 02:43:49,540 --> 02:43:50,950 There are 23 of those. 3092 02:43:50,950 --> 02:43:52,270 OK. 3093 02:43:52,270 --> 02:43:54,310 There is definitely room to document 3094 02:43:54,310 --> 02:43:57,040 the rest of the artworks in the Sistine Chapel. 3095 02:43:57,040 --> 02:43:59,740 So, again, this is just not the kind of query 3096 02:43:59,740 --> 02:44:03,330 you were able to make before Wikidata, 3097 02:44:03,330 --> 02:44:07,750 and it's a fairly simple query, as you can see. 3098 02:44:07,750 --> 02:44:13,020 There are examples using maps like airports within 100 3099 02:44:13,020 --> 02:44:15,040 kilometers of Berlin. 3100 02:44:15,040 --> 02:44:18,310 Again using the coordinates as a useful data point. 3101 02:44:18,310 --> 02:44:21,880 And here is a map showing me only airports within a 100 3102 02:44:21,880 --> 02:44:25,990 kilometer radius from Berlin. 3103 02:44:25,990 --> 02:44:29,140 But I wanted to show you the mayors query. 3104 02:44:29,140 --> 02:44:34,510 Let's click the-- oh I just have the wrong link here. 3105 02:44:34,510 --> 02:44:41,040 But I can still find it here by typing mayor. 3106 02:44:41,040 --> 02:44:44,590 Here we go, largest cities with female mayor. 3107 02:44:44,590 --> 02:44:47,230 So this is a slightly more complicated query. 3108 02:44:47,230 --> 02:44:53,010 But if I run it, I get the top 10, because I set limit to 10. 3109 02:44:53,010 --> 02:44:54,820 I get the top 10 cities in the world, 3110 02:44:54,820 --> 02:44:59,710 by population, size that are currently run by women. 3111 02:44:59,710 --> 02:45:03,490 Tokyo, Mumbai, Yokohama, Caracas, et cetera. 3112 02:45:03,490 --> 02:45:08,080 And one interesting thing that you may want to notice here 3113 02:45:08,080 --> 02:45:10,690 is that I'm asking for cities. 3114 02:45:10,690 --> 02:45:13,660 I mean items, that are instance of city. 3115 02:45:13,660 --> 02:45:16,420 And that have a head of government, 3116 02:45:16,420 --> 02:45:18,640 that have some statement about who 3117 02:45:18,640 --> 02:45:28,440 is in charge, and that statement has sex that's listed up here 3118 02:45:28,440 --> 02:45:29,886 as female. 3119 02:45:29,886 --> 02:45:31,510 Don't worry about the syntax right now. 3120 02:45:31,510 --> 02:45:34,590 I just want to show you some specific angle here. 3121 02:45:34,590 --> 02:45:37,920 And I'm further filtering these results. 3122 02:45:37,920 --> 02:45:45,400 I only want those items where there is not the property 3123 02:45:45,400 --> 02:45:48,630 and the qualifier, end time. 3124 02:45:48,630 --> 02:45:50,390 Why is that important? 3125 02:45:50,390 --> 02:45:56,530 Because if a city once had a female mayor, 3126 02:45:56,530 --> 02:45:59,890 but that mayor is not the mayor anymore, because mayors change, 3127 02:45:59,890 --> 02:46:01,600 I don't want them in this query. 3128 02:46:01,600 --> 02:46:04,990 I want to query of cities currently having 3129 02:46:04,990 --> 02:46:05,680 a female mayor. 3130 02:46:05,680 --> 02:46:07,990 And of course Wikidata may have historical data 3131 02:46:07,990 --> 02:46:09,880 with start and end time, as we've 3132 02:46:09,880 --> 02:46:14,530 seen, that documents this person was the mayor of Tokyo 3133 02:46:14,530 --> 02:46:17,170 or San Francisco between these years. 3134 02:46:17,170 --> 02:46:18,820 But if there is no end times that means 3135 02:46:18,820 --> 02:46:21,520 they are currently the mayor. 3136 02:46:21,520 --> 02:46:24,490 So that's an example of asking about a qualifier 3137 02:46:24,490 --> 02:46:28,180 of a statement, to again, to get the results we actually want. 3138 02:46:28,180 --> 02:46:31,630 If we want current mayors it's important to put this filter. 3139 02:46:31,630 --> 02:46:35,365 If we don't, we will get historical female mayors 3140 02:46:35,365 --> 02:46:35,865 as well. 3141 02:46:39,920 --> 02:46:40,490 All right. 3142 02:46:40,490 --> 02:46:45,380 So these are some example queries. 3143 02:46:45,380 --> 02:46:49,085 Questions about that? 3144 02:46:51,620 --> 02:46:53,030 Oh, the featured article example. 3145 02:46:58,280 --> 02:47:01,700 So let's look at that. 3146 02:47:07,050 --> 02:47:12,660 So I have prepared such a query recently. 3147 02:47:12,660 --> 02:47:15,300 Here we go. 3148 02:47:15,300 --> 02:47:18,570 So this is a query. 3149 02:47:18,570 --> 02:47:20,472 I just saved it here on my user page. 3150 02:47:20,472 --> 02:47:21,930 I mean, this is not Wikidata query. 3151 02:47:21,930 --> 02:47:25,390 This is just a meta page containing the query usefully. 3152 02:47:28,260 --> 02:47:33,800 And let's run this. 3153 02:47:33,800 --> 02:47:38,030 So this query, it's actually not very complicated. 3154 02:47:38,030 --> 02:47:40,030 It's just has a long list of countries, 3155 02:47:40,030 --> 02:47:42,170 because I'm asking about African countries. 3156 02:47:42,170 --> 02:47:42,670 OK. 3157 02:47:42,670 --> 02:47:45,010 I'm looking for human females from one 3158 02:47:45,010 --> 02:47:51,060 of these countries that have an article in English. 3159 02:47:51,060 --> 02:47:53,010 That's what this line means. 3160 02:47:53,010 --> 02:47:55,620 But not in French. 3161 02:47:55,620 --> 02:47:57,570 That's what this part means. 3162 02:47:57,570 --> 02:47:59,170 OK. 3163 02:47:59,170 --> 02:48:01,720 This part, these two lines together. 3164 02:48:01,720 --> 02:48:03,190 But not in French. 3165 02:48:03,190 --> 02:48:05,920 And this is what's called a badge. 3166 02:48:05,920 --> 02:48:09,430 That's Wikidata's concept of good and featured articles. 3167 02:48:09,430 --> 02:48:10,600 It's called a badge. 3168 02:48:10,600 --> 02:48:16,500 So I want them to have some badge on English Wikipedia. 3169 02:48:16,500 --> 02:48:17,000 OK? 3170 02:48:17,000 --> 02:48:22,250 So again, this query is asking for the top 100 women 3171 02:48:22,250 --> 02:48:26,150 from Africa who are documented on English Wikipedia, 3172 02:48:26,150 --> 02:48:28,730 in a featured or good article status. 3173 02:48:28,730 --> 02:48:30,660 But not on French Wikipedia. 3174 02:48:30,660 --> 02:48:33,270 So this is a query that's a to-do query, right? 3175 02:48:33,270 --> 02:48:35,630 That's a query for French editors 3176 02:48:35,630 --> 02:48:40,100 to consider what they might usefully translate or create 3177 02:48:40,100 --> 02:48:41,180 in French. 3178 02:48:41,180 --> 02:48:48,860 And if we run this see we have three results. 3179 02:48:48,860 --> 02:48:50,720 I mean, we have many women from Africa 3180 02:48:50,720 --> 02:48:52,460 covered on English Wikipedia. 3181 02:48:52,460 --> 02:48:57,500 But only three articles have featured or good status 3182 02:48:57,500 --> 02:49:03,460 among those that do not have French Wikipedia coverage. 3183 02:49:03,460 --> 02:49:04,900 Let me rephrase that. 3184 02:49:04,900 --> 02:49:07,990 Among the English Wikipedia articles about African women 3185 02:49:07,990 --> 02:49:11,170 that don't have a French counterpart, 3186 02:49:11,170 --> 02:49:14,520 only three are featured or good. 3187 02:49:14,520 --> 02:49:16,960 OK? 3188 02:49:16,960 --> 02:49:17,640 Do you see this? 3189 02:49:17,640 --> 02:49:19,720 The badge is good article. 3190 02:49:19,720 --> 02:49:23,550 This little incantation here is what allows 3191 02:49:23,550 --> 02:49:25,950 you to ask about the badge. 3192 02:49:25,950 --> 02:49:28,730 This here. 3193 02:49:28,730 --> 02:49:33,420 And, by the way, the slides will be uploaded to commons. 3194 02:49:33,420 --> 02:49:38,708 And we will-- how shall we make it available on the YouTube 3195 02:49:38,708 --> 02:49:39,710 thing as well? 3196 02:49:42,730 --> 02:49:43,230 No, no. 3197 02:49:43,230 --> 02:49:45,870 But, I mean, for people who will later watch this video. 3198 02:49:52,119 --> 02:49:54,160 Oh yeah, we can add it to the YouTube description 3199 02:49:54,160 --> 02:49:55,368 and the comments description. 3200 02:49:55,368 --> 02:49:58,090 So in the-- if you're watching this video later, 3201 02:49:58,090 --> 02:50:00,820 in the description, we will add a link to this query 3202 02:50:00,820 --> 02:50:01,480 specifically. 3203 02:50:01,480 --> 02:50:03,340 Because it's not in the slides right now. 3204 02:50:03,340 --> 02:50:03,910 It will be. 3205 02:50:06,622 --> 02:50:07,980 OK. 3206 02:50:07,980 --> 02:50:10,260 So. 3207 02:50:10,260 --> 02:50:13,590 Questions so far? 3208 02:50:13,590 --> 02:50:14,700 We're almost done. 3209 02:50:14,700 --> 02:50:16,260 We have a few minutes left. 3210 02:50:16,260 --> 02:50:18,090 So questions about queries? 3211 02:50:18,090 --> 02:50:20,130 I mean, I'm sure there's tons of things 3212 02:50:20,130 --> 02:50:21,510 you don't know how to do yet. 3213 02:50:21,510 --> 02:50:24,720 And you maybe you didn't really get the sense for SPARQL. 3214 02:50:24,720 --> 02:50:27,120 It's something you need to really do on your own 3215 02:50:27,120 --> 02:50:28,290 on your computer. 3216 02:50:28,290 --> 02:50:29,465 See how it works. 3217 02:50:29,465 --> 02:50:30,090 Fiddle with it. 3218 02:50:30,090 --> 02:50:30,900 Change something. 3219 02:50:30,900 --> 02:50:33,270 See that it breaks and complains. 3220 02:50:33,270 --> 02:50:37,470 But, very importantly-- oh I had this in the other questions 3221 02:50:37,470 --> 02:50:38,340 slide. 3222 02:50:38,340 --> 02:50:42,480 Remember Wikidata project chat. 3223 02:50:42,480 --> 02:50:45,810 That's kind of the Wikidata equivalent of the village pump. 3224 02:50:45,810 --> 02:50:47,790 It's the page on Wikidata where you can just 3225 02:50:47,790 --> 02:50:49,830 show up and ask a question. 3226 02:50:49,830 --> 02:50:52,290 In my experience, the Wikidata community 3227 02:50:52,290 --> 02:50:55,410 is very nice, very welcoming, and very eager 3228 02:50:55,410 --> 02:51:00,100 to help newer people integrate and learn how to do things. 3229 02:51:00,100 --> 02:51:01,800 There's also an IRC channel. 3230 02:51:01,800 --> 02:51:04,260 If you know what IRC is and how to use it, by all means, 3231 02:51:04,260 --> 02:51:07,890 go to IRC channel Wikidata. 3232 02:51:07,890 --> 02:51:09,330 There's people there all the time, 3233 02:51:09,330 --> 02:51:11,040 and you can just ask a question. 3234 02:51:11,040 --> 02:51:13,245 If you're trying to do a query, and you don't quite 3235 02:51:13,245 --> 02:51:15,870 understand the syntax, or you're not sure how to get the result 3236 02:51:15,870 --> 02:51:16,680 you want. 3237 02:51:16,680 --> 02:51:20,050 There are people there who will gladly help you do that. 3238 02:51:20,050 --> 02:51:22,560 There is also a Wikidata newsletter 3239 02:51:22,560 --> 02:51:25,680 published by the Wikidata team, which is centered in Germany 3240 02:51:25,680 --> 02:51:27,330 and Wikipedia Germany. 3241 02:51:27,330 --> 02:51:31,890 And they send out a newsletter in English with Wikidata news. 3242 02:51:31,890 --> 02:51:33,570 You know, new properties, new items, 3243 02:51:33,570 --> 02:51:34,920 new things in the project. 3244 02:51:34,920 --> 02:51:36,840 But also sample queries. 3245 02:51:36,840 --> 02:51:39,300 So once a week there is kind of an awesome query 3246 02:51:39,300 --> 02:51:43,440 to learn from, if you want to learn that way instead 3247 02:51:43,440 --> 02:51:46,230 of reading like a whole manual on SPARQL. 3248 02:51:46,230 --> 02:51:48,300 So I'm just encouraging you to get help 3249 02:51:48,300 --> 02:51:49,470 in one of those channels. 3250 02:51:49,470 --> 02:51:51,000 Of course you can write to me. 3251 02:51:51,000 --> 02:51:55,920 Just reach out to me and ask me questions as well. 3252 02:51:55,920 --> 02:51:58,860 I hope by now you agree that Wikidata is love, 3253 02:51:58,860 --> 02:52:03,150 and Wikidata data is awesome. 3254 02:52:03,150 --> 02:52:06,480 If there are no questions, we do have a tiny bit of time 3255 02:52:06,480 --> 02:52:11,510 to demonstrate one more tool but that's-- 3256 02:52:11,510 --> 02:52:12,010 no? 3257 02:52:12,010 --> 02:52:13,170 No questions. 3258 02:52:13,170 --> 02:52:17,600 OK so let's talk about-- 3259 02:52:17,600 --> 02:52:19,100 well, the resonator is kind of nice, 3260 02:52:19,100 --> 02:52:22,890 but it's a little like the article placeholder. 3261 02:52:22,890 --> 02:52:25,530 So this is not Wikidata this is a tool again 3262 02:52:25,530 --> 02:52:26,805 built by Magnus Manske-- 3263 02:52:26,805 --> 02:52:29,310 AUDIENCE: There's also one final question to you in case-- 3264 02:52:29,310 --> 02:52:29,820 ASAF BARTOV: Oh, there is a question. 3265 02:52:29,820 --> 02:52:30,390 AUDIENCE: Yeah. 3266 02:52:30,390 --> 02:52:32,348 ASAF BARTOV: Which advantages and disadvantages 3267 02:52:32,348 --> 02:52:35,370 to create an item before an article is 3268 02:52:35,370 --> 02:52:37,920 done on English Wikipedia? 3269 02:52:37,920 --> 02:52:42,340 Well, I mean, this example that I just made right. 3270 02:52:42,340 --> 02:52:46,960 I'm reading this book by a notable author. 3271 02:52:46,960 --> 02:52:47,810 OK. 3272 02:52:47,810 --> 02:52:51,400 I want this to exist on Wikidata, 3273 02:52:51,400 --> 02:52:53,320 and to be mentioned on Wikidata, so 3274 02:52:53,320 --> 02:52:56,950 that when people look up that author in Wikidata 3275 02:52:56,950 --> 02:52:59,170 they will know about one of his notable works. 3276 02:52:59,170 --> 02:53:02,470 But I'm not prepared to put in the time investment 3277 02:53:02,470 --> 02:53:05,670 to build a whole article on English Wikipedia. 3278 02:53:05,670 --> 02:53:07,420 Either because I don't have the time, or I 3279 02:53:07,420 --> 02:53:09,040 don't have good sources. 3280 02:53:09,040 --> 02:53:11,560 Or maybe my English is not good enough, 3281 02:53:11,560 --> 02:53:14,980 but it is good enough to just record these very basic facts 3282 02:53:14,980 --> 02:53:17,850 and point to the Library of Congress records et cetera. 3283 02:53:17,850 --> 02:53:20,170 So that it's better than nothing. 3284 02:53:20,170 --> 02:53:23,170 So that's one reason to maybe do it. 3285 02:53:23,170 --> 02:53:26,690 Another reason is to be able to link to it. 3286 02:53:26,690 --> 02:53:30,190 So remember that translator lady already 3287 02:53:30,190 --> 02:53:33,280 had an item on Wikidata, but if she hadn't we could have just 3288 02:53:33,280 --> 02:53:38,560 created a very, very basic rudimentary item about her just 3289 02:53:38,560 --> 02:53:41,740 saying, you know, this name is human. 3290 02:53:41,740 --> 02:53:43,060 Country, Bulgaria. 3291 02:53:43,060 --> 02:53:45,220 Occupation, translator. 3292 02:53:45,220 --> 02:53:48,580 Even just that would have would have been something, 3293 02:53:48,580 --> 02:53:51,610 and would have enabled me to link to this person. 3294 02:53:51,610 --> 02:53:56,860 So these are legitimate reasons to create Wikidata entities 3295 02:53:56,860 --> 02:54:01,510 without, or at least before, creating a Wikipedia article. 3296 02:54:01,510 --> 02:54:02,709 If you are going to create-- 3297 02:54:02,709 --> 02:54:04,750 I mean if you're at and edit-a-thon or something, 3298 02:54:04,750 --> 02:54:07,690 and you have come to create Wikipedia articles, 3299 02:54:07,690 --> 02:54:10,660 by all means, first create the Wikipedia article, 3300 02:54:10,660 --> 02:54:13,982 then create the Wikipedia item and link to it. 3301 02:54:17,580 --> 02:54:20,500 I hope that answers the question. 3302 02:54:20,500 --> 02:54:24,940 So the reasonator is simply a kind 3303 02:54:24,940 --> 02:54:31,330 of prettier view of items in Wikidata. 3304 02:54:31,330 --> 02:54:35,980 So you can just type the name of an item or the number. 3305 02:54:35,980 --> 02:54:39,010 Let's pick just a random number, 42. 3306 02:54:39,010 --> 02:54:39,595 Say 42. 3307 02:54:42,770 --> 02:54:45,950 Which happens to be, maybe you've 3308 02:54:45,950 --> 02:54:51,310 heard of this guy, Douglas Adams. 3309 02:54:51,310 --> 02:54:55,490 He happened to have received the queue number 42. 3310 02:54:55,490 --> 02:54:58,760 I'm sure it's a cosmic coincidence 3311 02:54:58,760 --> 02:55:01,460 of infinite improbability. 3312 02:55:01,460 --> 02:55:03,470 And this is a view-- 3313 02:55:03,470 --> 02:55:05,690 this is a tool that is not Wikidata. 3314 02:55:05,690 --> 02:55:09,690 It's a tool built on top of Wikidata called resonator. 3315 02:55:09,690 --> 02:55:14,750 And it gives us the information from Q42, that is from the-- 3316 02:55:14,750 --> 02:55:18,800 this item in Wikidata, which looks like an item in Wikidata. 3317 02:55:18,800 --> 02:55:21,320 But it gives it to us in a slightly more rational kind 3318 02:55:21,320 --> 02:55:22,430 of lay out. 3319 02:55:22,430 --> 02:55:24,200 It even kind of generates a little bit 3320 02:55:24,200 --> 02:55:27,620 of pseudo article text for us. 3321 02:55:27,620 --> 02:55:30,429 You know, Douglas Adams was a British writer, playwright, 3322 02:55:30,429 --> 02:55:31,970 screenwriter, bla-bla-bla, an author. 3323 02:55:31,970 --> 02:55:35,630 He was born on this date, in this place, to these people. 3324 02:55:35,630 --> 02:55:39,080 He studied at this place between these years. 3325 02:55:39,080 --> 02:55:40,670 That's all machine generated. 3326 02:55:40,670 --> 02:55:42,230 Nobody wrote this text. 3327 02:55:42,230 --> 02:55:46,330 That's all taken from those statements in Wikidata, 3328 02:55:46,330 --> 02:55:51,080 and generates this reasonable reading summary paragraph. 3329 02:55:51,080 --> 02:55:54,140 And then it gives us this little table of relatives. 3330 02:55:54,140 --> 02:55:55,610 It's all taken from Wikidata. 3331 02:55:55,610 --> 02:55:57,740 But as you can see, this is already 3332 02:55:57,740 --> 02:56:02,120 a little more accessible than the essentially arbitrary 3333 02:56:02,120 --> 02:56:05,120 ordering of statements on Wikidata. 3334 02:56:05,120 --> 02:56:06,200 And that's OK. 3335 02:56:06,200 --> 02:56:08,060 I mean, that's kind of by design. 3336 02:56:08,060 --> 02:56:10,100 Wikidata is the platform. 3337 02:56:10,100 --> 02:56:11,960 There is going to be-- there are going 3338 02:56:11,960 --> 02:56:15,680 to be many new applications, and platforms, and tools, 3339 02:56:15,680 --> 02:56:19,010 and visual interfaces on top of Wikidata 3340 02:56:19,010 --> 02:56:23,000 to browse Wikidata in a more friendly or more customized 3341 02:56:23,000 --> 02:56:24,480 ways. 3342 02:56:24,480 --> 02:56:27,080 For example, one of the things that resonator 3343 02:56:27,080 --> 02:56:31,610 does for us is give us pictures and maps and a timeline. 3344 02:56:31,610 --> 02:56:32,960 Check it out this. 3345 02:56:32,960 --> 02:56:38,990 Time line machine generated, just from dates and points 3346 02:56:38,990 --> 02:56:44,090 in time, mentioned in the relatively rich Wikidata 3347 02:56:44,090 --> 02:56:47,200 item about Douglas Adams. 3348 02:56:47,200 --> 02:56:47,700 Right? 3349 02:56:47,700 --> 02:56:50,030 So this timeline, for example again, completely machine 3350 02:56:50,030 --> 02:56:51,140 generated. 3351 02:56:51,140 --> 02:56:53,270 But he was educated between these years, 3352 02:56:53,270 --> 02:56:54,920 so I can put it on the timeline. 3353 02:56:54,920 --> 02:56:57,260 And this is the year he was nominated for a Hugo awards, 3354 02:56:57,260 --> 02:56:59,570 so I can put that in a timeline. 3355 02:56:59,570 --> 02:57:00,600 Et cetera. 3356 02:57:00,600 --> 02:57:03,050 So that's just a super quick demonstration 3357 02:57:03,050 --> 02:57:06,620 of that tool, the resonator. 3358 02:57:06,620 --> 02:57:10,310 Links are all here in the slides. 3359 02:57:10,310 --> 02:57:13,390 And the final tool I wanted to mention very quickly 3360 02:57:13,390 --> 02:57:16,220 is the mix and match tool. 3361 02:57:16,220 --> 02:57:21,980 You remember my explanation about Wikidata as Nexus, 3362 02:57:21,980 --> 02:57:27,320 as connection point between many databases, many data sources. 3363 02:57:27,320 --> 02:57:31,080 Those depend on these equivalencies. 3364 02:57:31,080 --> 02:57:35,300 On Wikidata being taught that this item is like that 3365 02:57:35,300 --> 02:57:37,940 ID in this other database. 3366 02:57:37,940 --> 02:57:41,810 And mix and match is a tool again by, Magnus Manske. 3367 02:57:41,810 --> 02:57:44,690 Maybe you're detecting a pattern here. 3368 02:57:44,690 --> 02:57:47,390 It's a tool by Magnus that is designed 3369 02:57:47,390 --> 02:57:50,270 to enable us to kind of take a foreign, 3370 02:57:50,270 --> 02:57:54,950 an external data set, put it alongside Wikidata, 3371 02:57:54,950 --> 02:57:56,690 and kind of try and align them. 3372 02:57:56,690 --> 02:57:59,410 So this item in this external dataset, 3373 02:57:59,410 --> 02:58:01,230 is that already covered in Wikidata? 3374 02:58:01,230 --> 02:58:02,890 If so, by what queue number? 3375 02:58:02,890 --> 02:58:03,890 By what item? 3376 02:58:03,890 --> 02:58:06,170 If not, maybe we need to create a Wikidata 3377 02:58:06,170 --> 02:58:07,610 item to represent it. 3378 02:58:07,610 --> 02:58:10,010 Or maybe it's a duplicate, or something. 3379 02:58:10,010 --> 02:58:15,980 So the mix and match tool has a list of external data sets, 3380 02:58:15,980 --> 02:58:18,140 as you can see. 3381 02:58:18,140 --> 02:58:21,260 The Art and Architecture Thesaurus by the Getty Research 3382 02:58:21,260 --> 02:58:22,220 Institute. 3383 02:58:22,220 --> 02:58:26,690 Or the Australian Dictionary of Biography. 3384 02:58:26,690 --> 02:58:28,880 All kinds of external data sets here. 3385 02:58:32,470 --> 02:58:40,060 Somewhere here I had a specific link to the Royal Society. 3386 02:58:40,060 --> 02:58:41,710 It can also give me some statistics. 3387 02:58:41,710 --> 02:58:47,410 So there is an external data set of all the Fellows of the Royal 3388 02:58:47,410 --> 02:58:48,001 Society. 3389 02:58:48,001 --> 02:58:48,500 Right? 3390 02:58:48,500 --> 02:58:54,970 The oldest academic learned society in England. 3391 02:58:54,970 --> 02:58:57,415 And the internet is tired. 3392 02:59:03,240 --> 02:59:04,640 Here we go. 3393 02:59:04,640 --> 02:59:07,115 Nope. 3394 02:59:07,115 --> 02:59:08,105 Did that work? 3395 02:59:12,560 --> 02:59:15,390 Fellows of the Royal Society, here we go. 3396 02:59:15,390 --> 02:59:17,970 So this one is complete. 3397 02:59:17,970 --> 02:59:21,330 I mean, people have manually gone over every single item 3398 02:59:21,330 --> 02:59:24,330 there and either matched it to Wikidata 3399 02:59:24,330 --> 02:59:27,390 or declared that it was not in scope, or a duplicate 3400 02:59:27,390 --> 02:59:28,520 or whatever. 3401 02:59:28,520 --> 02:59:31,230 But let's look at site stats. 3402 02:59:31,230 --> 02:59:35,210 This is a fun kind of aspect of this tool. 3403 02:59:35,210 --> 02:59:38,530 But that is not working. 3404 02:59:38,530 --> 02:59:40,820 Or it's taking too long. 3405 02:59:40,820 --> 02:59:43,940 So let's just demonstrate how this works. 3406 02:59:43,940 --> 02:59:45,590 Maybe Britannica? 3407 02:59:45,590 --> 02:59:46,780 Is that done already? 3408 02:59:52,570 --> 02:59:53,990 Here we go. 3409 02:59:53,990 --> 02:59:55,330 Encyclopedia Britannica. 3410 02:59:55,330 --> 02:59:55,960 Yeah. 3411 02:59:55,960 --> 03:00:02,040 So the Encyclopedia Britannica has 3412 03:00:02,040 --> 03:00:05,940 40% of the items there are not yet processed. 3413 03:00:05,940 --> 03:00:07,830 So let's process one of them. 3414 03:00:07,830 --> 03:00:16,180 For example there is an item in the Encyclopedia Britannica 3415 03:00:16,180 --> 03:00:19,960 called Boston, England. 3416 03:00:19,960 --> 03:00:23,050 As you know All-American place names 3417 03:00:23,050 --> 03:00:26,050 are totally stolen from elsewhere. 3418 03:00:26,050 --> 03:00:29,440 So there is a Boston in England, though it's 3419 03:00:29,440 --> 03:00:30,700 no longer the famous one. 3420 03:00:30,700 --> 03:00:36,340 And the mix and match tool has automatically 3421 03:00:36,340 --> 03:00:39,610 matched it based on the label to queue 3422 03:00:39,610 --> 03:00:43,900 100, which is Boston big city in the United States. 3423 03:00:43,900 --> 03:00:45,500 And that is incorrect, right? 3424 03:00:45,500 --> 03:00:48,910 That's kind of naive computer going, well this is Boston, 3425 03:00:48,910 --> 03:00:50,820 and this other thing is also Boston. 3426 03:00:50,820 --> 03:00:56,260 And it is asking me to confirm this match or not. 3427 03:00:56,260 --> 03:00:57,400 You see? 3428 03:00:57,400 --> 03:01:01,120 So this is the Boston, England from Britannica. 3429 03:01:01,120 --> 03:01:04,720 And the tool is asking me, is this the same as 3430 03:01:04,720 --> 03:01:06,910 Boston queue 100 in America? 3431 03:01:06,910 --> 03:01:07,990 The answer is no. 3432 03:01:07,990 --> 03:01:10,110 I removed this. 3433 03:01:10,110 --> 03:01:11,860 I remove this match. 3434 03:01:11,860 --> 03:01:15,430 And now this Boston, England is unmatched. 3435 03:01:15,430 --> 03:01:23,230 And I can match it to the correct one in England. 3436 03:01:23,230 --> 03:01:27,370 I can do this by searching English Wikipedia, 3437 03:01:27,370 --> 03:01:28,780 or searching Wikidata. 3438 03:01:28,780 --> 03:01:32,000 I mean, it has these handy links. 3439 03:01:32,000 --> 03:01:36,910 So the English town is in Lincolnshire. 3440 03:01:36,910 --> 03:01:38,230 Boston, Lincolnshire. 3441 03:01:38,230 --> 03:01:46,030 So I can go there and then get the Wikidata item number. 3442 03:01:46,030 --> 03:01:49,810 See this is not queue 100, Boston in the states, 3443 03:01:49,810 --> 03:01:53,440 this is queue 311975 town in Lincolnshire. 3444 03:01:53,440 --> 03:01:57,310 I can get this queue number, go back to the mix 3445 03:01:57,310 --> 03:01:58,160 and match tool-- 3446 03:01:58,160 --> 03:01:59,110 Where was that? 3447 03:01:59,110 --> 03:02:00,180 Here we are. 3448 03:02:00,180 --> 03:02:01,510 And set queue. 3449 03:02:01,510 --> 03:02:08,650 I can tell the tool that this is the right Boston, and click OK. 3450 03:02:08,650 --> 03:02:14,550 And now this town in Lincolnshire, 3451 03:02:14,550 --> 03:02:17,100 you can see this here, this item, queue 311975, 3452 03:02:17,100 --> 03:02:21,190 is linked to Britannica. 3453 03:02:21,190 --> 03:02:22,660 What does this mean? 3454 03:02:22,660 --> 03:02:23,820 Well, if we go there. 3455 03:02:23,820 --> 03:02:25,380 If we actually go to the Wikidata 3456 03:02:25,380 --> 03:02:28,890 entity you will see that in addition 3457 03:02:28,890 --> 03:02:34,140 to the few statements that it already had, it now has, 3458 03:02:34,140 --> 03:02:38,610 thanks to my clicking, it now has another identifier here. 3459 03:02:38,610 --> 03:02:39,270 See? 3460 03:02:39,270 --> 03:02:43,950 Encyclopedia Britannica Online ID, with this link. 3461 03:02:43,950 --> 03:02:49,440 And if we click it, we will indeed reach this page 3462 03:02:49,440 --> 03:02:51,510 in the Britannica online, which is indeed 3463 03:02:51,510 --> 03:02:53,700 about this town in Lincolnshire. 3464 03:02:53,700 --> 03:02:54,510 You see? 3465 03:02:54,510 --> 03:02:58,650 So I've contributed one of those mappings, one 3466 03:02:58,650 --> 03:03:01,950 of those identifiers, into Wikidata. 3467 03:03:01,950 --> 03:03:04,860 And I didn't have to do it manually. 3468 03:03:04,860 --> 03:03:07,980 This tool kind of prompted me to either confirm 3469 03:03:07,980 --> 03:03:09,480 if it was correct, I could have just 3470 03:03:09,480 --> 03:03:12,150 clicked confirm since it wasn't correct. 3471 03:03:12,150 --> 03:03:16,920 I corrected it manually, but it made this edit on my behalf. 3472 03:03:16,920 --> 03:03:21,180 So that's another tool that encourages us to systematically 3473 03:03:21,180 --> 03:03:24,360 teach Wikidata more things. 3474 03:03:24,360 --> 03:03:25,860 And we're out of time. 3475 03:03:25,860 --> 03:03:29,430 Go edit Wikidata, Now that you have the power, 3476 03:03:29,430 --> 03:03:30,510 you know the deal. 3477 03:03:30,510 --> 03:03:32,430 Use it for good, and not for evil. 3478 03:03:32,430 --> 03:03:35,640 If you have questions, this is my email address. 3479 03:03:35,640 --> 03:03:38,640 If you're watching this video not live the description 3480 03:03:38,640 --> 03:03:41,610 will have links to the slides, and to a bunch 3481 03:03:41,610 --> 03:03:44,610 of other useful pieces of information. 3482 03:03:44,610 --> 03:03:49,510 Any last questions on IRC? 3483 03:03:49,510 --> 03:03:53,210 If not, thank you for your attention. 3484 03:03:53,210 --> 03:03:56,470 And if you like this, and if you feel that you now get Wikidata, 3485 03:03:56,470 --> 03:03:58,330 and you get what it's good for, and you're 3486 03:03:58,330 --> 03:04:01,660 inspired to contribute, I have only one request from you. 3487 03:04:01,660 --> 03:04:04,960 I mean, in addition to using it for good not for evil, 3488 03:04:04,960 --> 03:04:07,630 I ask that you spread the word. 3489 03:04:07,630 --> 03:04:09,550 Show this video-- share this video 3490 03:04:09,550 --> 03:04:13,180 with other people in your community, or around you. 3491 03:04:13,180 --> 03:04:16,000 Teach this yourself once you're comfortable 3492 03:04:16,000 --> 03:04:17,650 with these concepts. 3493 03:04:17,650 --> 03:04:21,330 Feel free to use my slides. 3494 03:04:21,330 --> 03:04:23,580 Yeah, and edit Wikidata. 3495 03:04:23,580 --> 03:04:27,010 Thank you very much, and goodbye.