0:00:02.651,0:00:03.900 Asaf Bartov: Testing, testing. 0:00:10.036,0:00:12.640 Is this heard in the room? 0:00:15.190,0:00:15.690 Testing. 0:00:22.620,0:00:24.930 Hello, everyone. 0:00:24.930,0:00:29.460 This is a gentle[br]introduction to Wikidata 0:00:29.460,0:00:31.922 for absolute beginners. 0:00:31.922,0:00:34.130 If you're an absolute[br]beginner, if you've never heard 0:00:34.130,0:00:38.210 of Wikidata, or if you've heard[br]of Wikidata but don't quite get 0:00:38.210,0:00:41.360 it, don't know what it's[br]good for, have only used it 0:00:41.360,0:00:43.880 for inter-wiki links-- 0:00:43.880,0:00:46.247 if you're anywhere[br]on this range, 0:00:46.247,0:00:47.330 you're in the right place. 0:00:50.990,0:00:52.040 My name is Asaf Bartov. 0:00:52.040,0:00:54.590 I work for the[br]Wikimedia Foundation, 0:00:54.590,0:00:59.790 and I am a Wikidata enthusiast. 0:00:59.790,0:01:05.620 So the first thing I want to[br]say is that you are lucky. 0:01:05.620,0:01:10.540 You are lucky because[br]Wikidata is already 0:01:10.540,0:01:15.415 and is quickly becoming even[br]more of an important research 0:01:15.415,0:01:21.730 tool for anyone who's[br]trying to ask questions 0:01:21.730,0:01:25.030 about large amounts[br]of information. 0:01:25.030,0:01:29.770 It will become more and more[br]used across the humanities, 0:01:29.770,0:01:33.460 in particular, because of the[br]things that it's able to do, 0:01:33.460,0:01:37.090 some of which we will[br]demonstrate shortly. 0:01:37.090,0:01:40.750 And you are lucky because you[br]get to find out about it now 0:01:40.750,0:01:43.400 before most of the world. 0:01:43.400,0:01:49.120 So by the end of this talk,[br]you will be a Wikidata hipster 0:01:49.120,0:01:51.250 because you'll be[br]able to say, oh yeah. 0:01:51.250,0:01:53.470 I knew about Wikidata[br]before it was cool. 0:01:56.090,0:02:00.370 So before we actually[br]visit Wikidata, 0:02:00.370,0:02:08.620 I want to share two key problems[br]that Wikidata seeks to solve 0:02:08.620,0:02:12.940 and which would help us[br]understand why it exists. 0:02:12.940,0:02:17.640 The first problem is that[br]have of dated data, that 0:02:17.640,0:02:20.880 is data that is out of date. 0:02:20.880,0:02:23.960 And this is apparent[br]on Wikipedia 0:02:23.960,0:02:27.870 across our free[br]knowledge encyclopedias. 0:02:27.870,0:02:32.160 Data on Wikipedia is[br]not always up to date. 0:02:32.160,0:02:37.470 And the more obscure[br]it is, the more likely 0:02:37.470,0:02:40.280 it is not to be up to date. 0:02:40.280,0:02:49.360 So the Polish Wikipedia may have[br]an article about a small town 0:02:49.360,0:02:55.480 in Argentina, and that article[br]will include information 0:02:55.480,0:03:00.910 about that town like population[br]size, name of the mayor. 0:03:00.910,0:03:04.580 And that information,[br]ideally, was 0:03:04.580,0:03:08.540 correct at the time the article[br]was created on the Polish 0:03:08.540,0:03:10.370 Wikipedia-- 0:03:10.370,0:03:13.760 maybe translated[br]from another wiki. 0:03:13.760,0:03:17.900 But then how likely is[br]it to be kept up to date? 0:03:17.900,0:03:20.960 How likely is it that the[br]Polish Wikipedia would give us 0:03:20.960,0:03:25.880 the correct and latest numbers[br]or data about the population 0:03:25.880,0:03:28.370 size of that town[br]or the mayor, right? 0:03:28.370,0:03:31.720 So this is the kind of data[br]that does go out of date, right? 0:03:31.720,0:03:34.250 Every few years--[br]five, 10 years-- 0:03:34.250,0:03:37.850 there is a census, and now there[br]are new population figures. 0:03:37.850,0:03:42.440 Now the census in Argentina will[br]be made available in Argentina 0:03:42.440,0:03:45.500 in Spanish, probably,[br]which brings us 0:03:45.500,0:03:48.710 to another component of the[br]problem of dated data, which 0:03:48.710,0:03:53.810 is there are no obvious[br]triggers for updating the data. 0:03:53.810,0:03:58.520 So the Polish Wikipedian[br]is not sent an email 0:03:58.520,0:04:00.680 by the Argentinean[br]government saying, hey, 0:04:00.680,0:04:01.820 we have a new census. 0:04:01.820,0:04:05.420 There are new population numbers[br]for you to update on Wikipedia. 0:04:05.420,0:04:07.550 No such email is sent. 0:04:07.550,0:04:10.146 So it's kind of[br]hard to notice when. 0:04:10.146,0:04:12.770 And of course, multiply that by[br]all the different jurisdictions 0:04:12.770,0:04:14.670 around the world. 0:04:14.670,0:04:16.610 There's no easy[br]way and notice when 0:04:16.610,0:04:17.790 your data goes out of date. 0:04:20.620,0:04:24.070 So that's difficult[br]to keep up to date. 0:04:24.070,0:04:27.940 And even if we were to receive[br]some kind of indication-- 0:04:27.940,0:04:31.310 oh, there's a new[br]census in Argentina, 0:04:31.310,0:04:33.100 so a whole bunch of[br]population figures 0:04:33.100,0:04:34.960 have now gone out of date. 0:04:34.960,0:04:37.240 Updating it on the[br]Polish Wikipedia 0:04:37.240,0:04:40.090 and the French Wikipedia[br]and the Indonesian Wikipedia 0:04:40.090,0:04:44.920 and the Arabic Wikipedia is a[br]whole bunch of repetitive work 0:04:44.920,0:04:46.540 that a lot of[br]different volunteers 0:04:46.540,0:04:49.900 will need to do just for[br]that one updated piece 0:04:49.900,0:04:54.810 of information about Argentina. 0:04:54.810,0:04:57.720 So I hope this is[br]clear and resonates 0:04:57.720,0:05:01.920 with some of your experience[br]editing Wikipedia-- 0:05:01.920,0:05:04.170 data that is out of[br]date or that needs 0:05:04.170,0:05:08.640 to be updated[br]manually, menially, 0:05:08.640,0:05:16.190 on a fairly frequent schedule[br]across the different countries 0:05:16.190,0:05:18.410 and data sources. 0:05:18.410,0:05:22.340 The other-- and I think[br]maybe more interesting-- 0:05:22.340,0:05:26.210 shortcoming or problem[br]that I want to discuss 0:05:26.210,0:05:30.260 is what I call the[br]inflexible ways 0:05:30.260,0:05:36.020 of lateral queries, crosscutting[br]queries of knowledge. 0:05:36.020,0:05:43.980 So if I want an answer to[br]the question, what countries 0:05:43.980,0:05:48.740 in the world export rubber-- 0:05:52.300,0:05:54.790 that's a reasonable[br]question, right? 0:05:54.790,0:05:57.460 That information[br]is on Wikipedia. 0:05:57.460,0:05:58.630 Do you agree? 0:05:58.630,0:06:00.640 If you go to[br]Wikipedia and read up 0:06:00.640,0:06:05.560 about Brazil, about Peru, about[br]Germany, somewhere in there-- 0:06:05.560,0:06:09.010 maybe a sub-article called[br]Economics of Brazil-- 0:06:09.010,0:06:13.600 you will find the main[br]exports of that country. 0:06:13.600,0:06:15.400 And you can find[br]out whether or not 0:06:15.400,0:06:16.930 that country exports rubber. 0:06:16.930,0:06:19.994 But what if I don't want[br]to go country by country 0:06:19.994,0:06:21.160 looking for the word rubber? 0:06:21.160,0:06:22.090 I just want an answer. 0:06:22.090,0:06:25.540 What are the countries[br]that export rubber? 0:06:25.540,0:06:28.360 Even though that[br]information is in Wikipedia, 0:06:28.360,0:06:29.680 it's hard to get at. 0:06:29.680,0:06:31.680 It's hard to query. 0:06:31.680,0:06:35.770 Now, you may say, well, that's[br]what we have categories for, 0:06:35.770,0:06:36.270 right? 0:06:36.270,0:06:39.820 Categories are a way to[br]cut across Wikipedia. 0:06:39.820,0:06:45.110 So if someone made a[br]category called rubber 0:06:45.110,0:06:48.380 exporting countries, then[br]you can go to that category 0:06:48.380,0:06:51.560 and see a list of countries[br]that export rubber. 0:06:51.560,0:06:53.390 And if nobody has[br]made it yet, well, you 0:06:53.390,0:06:56.990 can create that category and,[br]with a kind of one-time effort, 0:06:56.990,0:06:59.730 populate that category,[br]and you're done. 0:06:59.730,0:07:01.970 Well, yes. 0:07:01.970,0:07:04.250 That's still not[br]very convenient. 0:07:04.250,0:07:06.980 But also, it's still[br]very, very limited, 0:07:06.980,0:07:12.380 because what if I only want[br]countries that export rubber 0:07:12.380,0:07:15.950 and have a democratic[br]system of government, 0:07:15.950,0:07:18.770 or any other kind of[br]additional condition 0:07:18.770,0:07:20.510 that I would like[br]to add to this? 0:07:20.510,0:07:22.230 Or take a completely[br]different example. 0:07:22.230,0:07:26.750 What if I want to know[br]which Flemish town had 0:07:26.750,0:07:31.510 the most painters born in it? 0:07:31.510,0:07:34.480 There's a ton of[br]Flemish painters. 0:07:34.480,0:07:37.870 Most of them were[br]born somewhere. 0:07:37.870,0:07:39.685 We could theoretically,[br]just you know, 0:07:39.685,0:07:43.900 look up all the birthplaces[br]of all the Flemish painters 0:07:43.900,0:07:46.900 and tally up the[br]numbers and figure out 0:07:46.900,0:07:51.610 what is the place where the[br]most Flemish painters come from? 0:07:51.610,0:07:53.050 I don't know the answer to that. 0:07:53.050,0:07:55.420 It would be nice to be[br]able to get that answer. 0:07:55.420,0:07:57.610 Again, the data is in Wikipedia. 0:07:57.610,0:08:00.400 Those birthplaces are[br]listed in the articles 0:08:00.400,0:08:01.636 about those painters. 0:08:01.636,0:08:05.710 But there's no easy way[br]to get that information. 0:08:05.710,0:08:13.420 What if I want to ask, who are[br]some painters whose father was 0:08:13.420,0:08:14.245 also a painter? 0:08:16.840,0:08:18.500 That's a thing[br]that exists, right? 0:08:18.500,0:08:22.630 Some painters are[br]sons of painters. 0:08:22.630,0:08:26.560 You know, Bruegel comes to[br]mind as an obvious example. 0:08:26.560,0:08:28.240 But there's a bunch[br]of others, right? 0:08:28.240,0:08:29.380 So who are those people? 0:08:29.380,0:08:30.930 What if I want to[br]ask that question? 0:08:30.930,0:08:33.400 That's the kind of question[br]that not only Wikipedia 0:08:33.400,0:08:34.600 doesn't answer today. 0:08:34.600,0:08:41.500 If you walk to your friendly[br]university library reference 0:08:41.500,0:08:45.010 desk and say,[br]hello, I would like 0:08:45.010,0:08:49.290 a list of painters whose[br]father was also a painter, 0:08:49.290,0:08:52.820 how would that[br]librarian help you? 0:08:52.820,0:08:57.960 There's no easy way to get an[br]answer to a question like that. 0:08:57.960,0:09:01.100 What if you only want[br]a list of painters 0:09:01.100,0:09:05.870 who were immigrants, painters[br]who lived somewhere else 0:09:05.870,0:09:08.240 than where they were born? 0:09:08.240,0:09:09.770 There's no book. 0:09:09.770,0:09:11.720 I guess maybe there[br]is, but you know, 0:09:11.720,0:09:15.590 it's not obvious that there's a[br]ready resource that says, list 0:09:15.590,0:09:17.840 of painters who are immigrants. 0:09:17.840,0:09:19.910 And the librarian would[br]probably refer you 0:09:19.910,0:09:22.760 to a book on the shelf[br]called, I don't know, 0:09:22.760,0:09:24.200 The Complete[br]Dictionary of Flemish 0:09:24.200,0:09:26.300 Painters and go,[br]look up the index, 0:09:26.300,0:09:28.520 you know, and if you[br]see a similar surname, 0:09:28.520,0:09:29.910 maybe they're father and son. 0:09:29.910,0:09:35.000 And kind of cobble together[br]the answer on your own. 0:09:35.000,0:09:37.100 The reason I'm comparing[br]this to a library 0:09:37.100,0:09:42.170 is to show you that this is a[br]kind of question that is not 0:09:42.170,0:09:46.760 readily satisfiable today. 0:09:46.760,0:09:50.240 Now, these questions may[br]sound contrived to you. 0:09:50.240,0:09:52.460 You may say to[br]yourself, well, you 0:09:52.460,0:09:54.860 know, painters who are also[br]sons of painters, yeah. 0:09:54.860,0:09:57.680 You know, that[br]never occurred to me 0:09:57.680,0:09:59.610 as a question I[br]might care about. 0:09:59.610,0:10:01.850 But I want to invite[br]you to consider 0:10:01.850,0:10:06.380 that this kind of question,[br]questions like that question, 0:10:06.380,0:10:09.260 may well be questions[br]you do care about. 0:10:09.260,0:10:12.740 And I also want to suggest[br]that the fact it is so nearly 0:10:12.740,0:10:16.250 impossible, the fact that[br]there's no obvious way 0:10:16.250,0:10:19.250 to ask that kind[br]of question today, 0:10:19.250,0:10:21.200 is partly responsible[br]to your not 0:10:21.200,0:10:22.970 coming up with those[br]questions, right? 0:10:22.970,0:10:25.850 We tend to be limited[br]by the possible. 0:10:25.850,0:10:30.080 You know, until human[br]flight was made possible, 0:10:30.080,0:10:32.840 it did not occur to anyone[br]to say, oh yeah, by this time 0:10:32.840,0:10:34.430 next week I will[br]be in Australia, 0:10:34.430,0:10:36.630 because that was[br]just impossible. 0:10:36.630,0:10:38.587 But when flight is[br]possible, there's 0:10:38.587,0:10:40.670 all kinds of things that[br]suddenly become possible, 0:10:40.670,0:10:42.740 and there's all[br]kinds of needs that 0:10:42.740,0:10:46.430 arise based on the[br]availability of resources 0:10:46.430,0:10:48.600 to fulfill those needs. 0:10:48.600,0:10:54.120 So many of these research[br]questions, compound lateral 0:10:54.120,0:10:58.520 cross-cutting queries, are not[br]being asked because people have 0:10:58.520,0:11:00.410 internalized the fact[br]that there is no way 0:11:00.410,0:11:05.750 to get an answer[br]to questions like, 0:11:05.750,0:11:13.270 what is the most popular first[br]name among British politicians? 0:11:13.270,0:11:14.520 I just made that up, you know? 0:11:14.520,0:11:15.340 Is it John? 0:11:15.340,0:11:16.510 Maybe. 0:11:16.510,0:11:19.030 Maybe it's William,[br]for whatever reason. 0:11:19.030,0:11:22.030 You know, these are the kinds[br]of questions we don't routinely 0:11:22.030,0:11:25.855 ask because we know that it's[br]like, who are you going to ask? 0:11:25.855,0:11:28.330 How are you going to[br]get an answer to that? 0:11:28.330,0:11:36.040 So this problem of not having[br]very flexible ways of querying 0:11:36.040,0:11:38.220 the data that we already have-- 0:11:38.220,0:11:41.230 in Wikipedia, in[br]Wikisource, elsewhere-- 0:11:41.230,0:11:45.060 is a significant limitation. 0:11:45.060,0:11:50.880 So these two key problems[br]have one solution. 0:11:50.880,0:11:55.500 And that is an editable,[br]central storage 0:11:55.500,0:12:00.510 for structured and[br]linked data on a wiki, 0:12:00.510,0:12:05.160 under a free license, which[br]is a very long way of saying 0:12:05.160,0:12:07.290 Wikidata. 0:12:07.290,0:12:08.470 That is Wikidata. 0:12:08.470,0:12:11.190 Wikidata is an editable,[br]central storage 0:12:11.190,0:12:15.840 for structured and[br]linked data on a wiki, 0:12:15.840,0:12:17.700 under a free license. 0:12:17.700,0:12:22.590 So let's take this[br]apart and unpack it. 0:12:22.590,0:12:24.820 First of all, it's[br]a central storage. 0:12:24.820,0:12:27.660 This relates to the[br]first problem, right? 0:12:27.660,0:12:34.370 If we had one place containing[br]data like population size, 0:12:34.370,0:12:38.270 we would be able to update[br]that one place and then have 0:12:38.270,0:12:42.260 all of the different Wikipedias[br]draw the data from that one 0:12:42.260,0:12:45.320 place so that we wouldn't[br]have to manually, 0:12:45.320,0:12:49.980 repetitively update it across[br]our hundreds of projects. 0:12:49.980,0:12:53.690 So having central storage[br]makes, I hope, kind 0:12:53.690,0:12:57.230 of immediate, intuitive sense. 0:12:57.230,0:13:02.840 But what do I mean by[br]structured and linked data? 0:13:02.840,0:13:10.120 So structured data means[br]that each datum, each piece-- 0:13:10.120,0:13:15.880 individual piece-- of data[br]is managed on its own, 0:13:15.880,0:13:19.660 is identified and[br]defined on its own, 0:13:19.660,0:13:21.040 as distinct from Wikipedia. 0:13:21.040,0:13:22.990 Wikipedia has articles. 0:13:22.990,0:13:27.190 The article about Brazil[br]includes a ton of data, 0:13:27.190,0:13:31.570 all kinds of information,[br]and it's presented as text, 0:13:31.570,0:13:34.270 as several paragraphs--[br]several pages-- 0:13:34.270,0:13:36.540 of text, right? 0:13:36.540,0:13:41.460 Now, we do have an[br]approximation of structured data 0:13:41.460,0:13:43.580 on Wikipedia. 0:13:43.580,0:13:45.300 If you've browsed[br]Wikipedia a little, 0:13:45.300,0:13:49.100 you've noticed that we often[br]have an info box, what we 0:13:49.100,0:13:50.750 call an info box on Wikipedia. 0:13:50.750,0:13:55.220 That's the table on the right[br]side if it's a left to right 0:13:55.220,0:13:57.200 language, the table[br]on the right side 0:13:57.200,0:14:02.270 that has information that[br]is easy to tabulate, right? 0:14:02.270,0:14:08.210 So you know, birth date, birth[br]place, death date, death place, 0:14:08.210,0:14:09.710 nationality-- 0:14:09.710,0:14:16.670 or if it's about a country,[br]area, population, anthem, 0:14:16.670,0:14:20.090 type of government, whatever[br]you are likely to find. 0:14:20.090,0:14:23.150 If it's a movie, then[br]you know, starring, 0:14:23.150,0:14:27.350 genre, box office receipts,[br]whatever pieces of data 0:14:27.350,0:14:29.900 are relevant to an[br]article about a movie. 0:14:29.900,0:14:34.940 So we do already kind of[br]group pieces of information 0:14:34.940,0:14:40.160 on Wikipedia into this[br]kind of structured format. 0:14:40.160,0:14:43.630 Those of you who have[br]ever looked at the source, 0:14:43.630,0:14:45.970 at what the wiki code[br]under that looks like, 0:14:45.970,0:14:49.640 know that it's only[br]semi-structured. 0:14:49.640,0:14:52.370 It looks neat and[br]organized in a table, 0:14:52.370,0:14:55.660 but really, it's just a bunch[br]of text that is put there. 0:14:55.660,0:14:57.140 It is not centralized. 0:14:57.140,0:15:00.100 Every Wikipedia has its[br]own copy of that data. 0:15:00.100,0:15:02.930 And if I go and update[br]the population size 0:15:02.930,0:15:07.070 on Spanish Wikipedia of[br]that Argentinean town, 0:15:07.070,0:15:10.190 it does not get[br]updated automagically 0:15:10.190,0:15:13.520 on the English Wikipedia or[br]the Arabic Wikipedia, right? 0:15:13.520,0:15:17.150 So the structured data that[br]we already have on Wikipedia 0:15:17.150,0:15:20.939 is not managed centrally. 0:15:20.939,0:15:22.480 The other thing[br]about structured data 0:15:22.480,0:15:29.250 is, when you have a notion of an[br]individual piece of data, that 0:15:29.250,0:15:33.390 is the cornerstone of[br]allowing the kinds of queries 0:15:33.390,0:15:34.770 that I was talking about. 0:15:34.770,0:15:40.440 That is what will allow[br]me to ask questions like, 0:15:40.440,0:15:43.470 what is the Flemish town where[br]the most painters were born, 0:15:43.470,0:15:46.650 or what are the world's[br]largest cities that 0:15:46.650,0:15:49.730 have a female mayor? 0:15:49.730,0:15:52.430 I could come up with other[br]examples all day long, right? 0:15:52.430,0:15:55.280 These are all questions[br]that you can ask, 0:15:55.280,0:15:59.390 once you break down your data[br]into individual pieces, each 0:15:59.390,0:16:02.300 of which is-- 0:16:02.300,0:16:06.950 you're able to refer to each[br]of those programmatically. 0:16:06.950,0:16:10.430 The computer can[br]identify, isolate, 0:16:10.430,0:16:14.700 and calculate based on each[br]of those pieces of data. 0:16:14.700,0:16:17.060 So that's why the[br]structure is important. 0:16:17.060,0:16:22.520 Now, Wikidata is also a[br]linked data repository. 0:16:22.520,0:16:24.890 What does it mean that[br]the data is linked? 0:16:24.890,0:16:29.700 Well, it means that a single[br]piece of data can point at, 0:16:29.700,0:16:34.770 can link to another[br]whole bag of data. 0:16:34.770,0:16:43.360 So if we are describing,[br]for example, a person, 0:16:43.360,0:16:46.960 and we record the[br]single piece of data 0:16:46.960,0:16:54.820 that this person was born[br]in Salem, Massachusetts, 0:16:54.820,0:17:02.300 that single piece of data[br]links to the item about Salem, 0:17:02.300,0:17:04.060 Massachusetts[br]because, of course, 0:17:04.060,0:17:07.010 we know a lot of things[br]about that place, Salem, 0:17:07.010,0:17:07.869 Massachusetts. 0:17:07.869,0:17:09.245 So it's not just the text-- 0:17:09.245,0:17:13.450 S-A-L-E-M. It's not just,[br]that's where they were born. 0:17:13.450,0:17:17.170 But it's a link to all[br]the data that we have 0:17:17.170,0:17:19.270 about Salem, Massachusetts. 0:17:19.270,0:17:24.940 If we say someone's[br]nationality is French, 0:17:24.940,0:17:26.589 that is a link to France. 0:17:26.589,0:17:30.700 That is a link to everything we[br]know about the country France. 0:17:30.700,0:17:34.150 The fact that the data[br]is linked and structured 0:17:34.150,0:17:37.630 allows not only humans,[br]but also computers 0:17:37.630,0:17:41.620 to traverse information[br]and to bring 0:17:41.620,0:17:44.950 us different pieces of[br]relevant information 0:17:44.950,0:17:49.000 programmatically, automatically,[br]based on those links. 0:17:49.000,0:17:52.000 Because it's not just[br]text, it's an actual link 0:17:52.000,0:17:56.700 to another chunk of data. 0:17:56.700,0:17:58.880 If this sounds a[br]little abstract, 0:17:58.880,0:18:01.190 it will become much[br]clearer in just a second 0:18:01.190,0:18:03.230 when we see it in action. 0:18:03.230,0:18:06.200 But the other components of[br]this little definition are, 0:18:06.200,0:18:09.650 of course, this central storage[br]of structured and linked data 0:18:09.650,0:18:12.620 needs to be editable,[br]of course, because we 0:18:12.620,0:18:14.370 need to keep it up to date. 0:18:14.370,0:18:16.460 We need to correct mistakes. 0:18:16.460,0:18:21.300 And we want it on a wiki[br]under a free license. 0:18:21.300,0:18:23.940 The free license is, of[br]course, essential to enable 0:18:23.940,0:18:30.910 reuse of that data, to enable[br]all kinds of reuse of the data. 0:18:30.910,0:18:34.060 And Wikidata, unlike[br]Wikipedia, is released 0:18:34.060,0:18:36.160 under a different free license. 0:18:36.160,0:18:41.590 Wikidata is released[br]under CC0 waiver. 0:18:41.590,0:18:44.920 That means unlike[br]Wikipedia, where 0:18:44.920,0:18:51.160 you have to attribute Wikipedia[br]when you reuse information 0:18:51.160,0:18:55.150 from Wikipedia, you do not[br]need to attribute Wikidata, 0:18:55.150,0:18:57.040 and you do not need to[br]share alike your work. 0:18:57.040,0:19:02.020 It's an unencumbered license to[br]reuse the data in any way you 0:19:02.020,0:19:03.267 want, including commercially. 0:19:03.267,0:19:05.350 You don't have to say that[br]it comes from Wikidata. 0:19:05.350,0:19:07.390 I mean, it could be nice,[br]but you don't have to. 0:19:07.390,0:19:09.280 You're under no[br]obligation to do it. 0:19:09.280,0:19:14.080 And that is important to[br]allow certain kinds of reuse 0:19:14.080,0:19:17.140 where, for example, if you're[br]building some kind of device, 0:19:17.140,0:19:20.680 you may not have a practical[br]way to give attribution. 0:19:20.680,0:19:23.920 And had we required[br]that to use Wikidata, 0:19:23.920,0:19:27.250 we would have made[br]Wikidata less reusable. 0:19:27.250,0:19:32.940 So Wikidata is unencumbered by[br]the requirement of attribution. 0:19:32.940,0:19:35.730 And of course, because[br]it's on a wiki, 0:19:35.730,0:19:40.421 we get all the benefits that we[br]are used to expect from a wiki, 0:19:40.421,0:19:40.920 right? 0:19:40.920,0:19:42.810 So it's a wiki,[br]which means, yes. 0:19:42.810,0:19:44.910 It has discussion pages. 0:19:44.910,0:19:46.500 It has revision histories. 0:19:46.500,0:19:47.620 It remembers everything. 0:19:47.620,0:19:50.610 So if you screw it up, you[br]can always go a version back. 0:19:50.610,0:19:52.380 Or if someone else[br]vandalized the content, 0:19:52.380,0:19:54.610 we can always go back,[br]just like Wikipedia. 0:19:54.610,0:19:56.880 So we get all the[br]benefits we're used to-- 0:19:56.880,0:20:01.260 user talk pages, group[br]discussion pages, watch lists, 0:20:01.260,0:20:03.755 all the features that[br]we expect in a wiki. 0:20:06.740,0:20:11.170 In short, Wikidata is love. 0:20:11.170,0:20:14.100 I hope you agree with me[br]by the end of this talk. 0:20:14.100,0:20:18.580 So let's zoom in and see[br]what this structured data 0:20:18.580,0:20:21.420 looks like. 0:20:21.420,0:20:29.460 So structured data on Wikidata[br]is collected in statements. 0:20:29.460,0:20:31.930 And statements have[br]the general form 0:20:31.930,0:20:39.490 of this triple, this[br]tripartite ascription-- 0:20:39.490,0:20:43.550 items, properties, and values. 0:20:43.550,0:20:46.930 Now an item is the[br]subject, is the topic 0:20:46.930,0:20:48.820 that we are trying to describe. 0:20:48.820,0:20:52.164 It can be any topic that[br]Wikipedia can cover, 0:20:52.164,0:20:53.830 and many others that[br]Wikipedia wouldn't. 0:20:53.830,0:20:57.490 So the topic, the[br]item can be Germany, 0:20:57.490,0:21:00.520 or it can be Salem,[br]Massachusetts, 0:21:00.520,0:21:03.340 or it can be the[br]concept of redemption. 0:21:03.340,0:21:04.610 It can be anything at all. 0:21:04.610,0:21:10.000 Anything you can imagine[br]describing in any way with data 0:21:10.000,0:21:11.990 can be the item. 0:21:11.990,0:21:15.430 So the item, consider[br]it like the title 0:21:15.430,0:21:17.480 of the rest of the data. 0:21:17.480,0:21:20.860 And then what do we say[br]about Salem, Massachusetts 0:21:20.860,0:21:22.330 or about Germany? 0:21:22.330,0:21:26.770 Well, that's a series of[br]properties and values, 0:21:26.770,0:21:28.450 properties and values. 0:21:28.450,0:21:32.680 The property is[br]the kind of datum, 0:21:32.680,0:21:39.770 like birth date or language[br]spoken or manner of death. 0:21:39.770,0:21:42.640 These are all real properties. 0:21:42.640,0:21:46.030 Or national anthem, if I'm[br]trying to describe a country-- 0:21:46.030,0:21:47.830 these are properties. 0:21:47.830,0:21:49.880 And then they have[br]values, right? 0:21:49.880,0:21:55.740 So this person, this[br]imaginary person's place 0:21:55.740,0:21:59.640 of birth, the value of the[br]property place of birth 0:21:59.640,0:22:02.430 is Salem, Massachusetts. 0:22:02.430,0:22:06.690 So you can think about it[br]as like a government form-- 0:22:06.690,0:22:09.540 or not government, just any[br]form that you're filling out-- 0:22:09.540,0:22:12.420 where there are field names,[br]and then empty spaces for you 0:22:12.420,0:22:13.110 to fill out. 0:22:13.110,0:22:14.460 That's the value, OK? 0:22:14.460,0:22:18.150 So the field names[br]or the categories 0:22:18.150,0:22:19.350 are the properties, right? 0:22:19.350,0:22:22.960 So name, language,[br]occupation, date of birth-- 0:22:22.960,0:22:24.420 these are all properties. 0:22:24.420,0:22:26.640 And the values are[br]the actual piece 0:22:26.640,0:22:31.391 of data, the actual[br]information that we have. 0:22:31.391,0:22:33.870 And of course,[br]different kinds of data 0:22:33.870,0:22:40.170 are relevant for describing[br]different kinds of items. 0:22:40.170,0:22:45.030 And the key in the value is it[br]can be either a literal value-- 0:22:45.030,0:22:50.370 like if we're describing[br]the height of a mountain, 0:22:50.370,0:22:55.826 we might say just[br]the number 8,848. 0:22:55.826,0:22:57.325 That's the height[br]of which mountain? 0:23:01.990,0:23:04.070 Not everyone at once. 0:23:04.070,0:23:07.430 Oh, because it's meters,[br]the metric system. 0:23:07.430,0:23:08.270 Yeah, Mt. 0:23:08.270,0:23:12.390 Everest is 8,848 meters. 0:23:12.390,0:23:14.160 Yes. 0:23:14.160,0:23:15.780 Get with it, America. 0:23:15.780,0:23:17.630 The metric system. 0:23:17.630,0:23:20.930 All right, so that[br]can be a literal value 0:23:20.930,0:23:22.580 like an actual number. 0:23:22.580,0:23:28.280 Or it can be a link to an[br]item, pointing at another item. 0:23:28.280,0:23:30.890 But in this statement,[br]it is the value. 0:23:30.890,0:23:35.150 So if I'm talking about[br]Germany, the item is Germany. 0:23:35.150,0:23:39.680 And the property capital[br]city has the value Berlin. 0:23:39.680,0:23:43.130 But the value is[br]not B-E-R-L-I-N. 0:23:43.130,0:23:48.740 The value is a pointer to[br]the item Berlin, right? 0:23:48.740,0:23:51.410 That's the link. 0:23:51.410,0:23:56.671 So a single item is described[br]by a series of such statements, 0:23:56.671,0:23:57.170 right? 0:23:57.170,0:24:01.400 There's hundreds and hundreds of[br]things I can say about Germany. 0:24:01.400,0:24:04.280 There's hundreds of things[br]I can say about a person. 0:24:04.280,0:24:06.350 And these will[br]generally take the form 0:24:06.350,0:24:08.330 of a property and a value. 0:24:08.330,0:24:11.720 By the way, some properties[br]may have more than one value. 0:24:11.720,0:24:15.920 Consider the property[br]languages spoken. 0:24:15.920,0:24:18.050 People can speak more[br]than one language, right? 0:24:18.050,0:24:20.330 So if I'm from[br]describing myself, 0:24:20.330,0:24:22.400 we can say languages spoken-- 0:24:22.400,0:24:26.000 English, Hebrew,[br]Latin, whatever. 0:24:26.000,0:24:27.860 So a property can have[br]more than one value. 0:24:30.970,0:24:34.010 So if the item is[br]about a country, 0:24:34.010,0:24:38.890 it would have statements about[br]properties like population, 0:24:38.890,0:24:43.180 land area, official languages,[br]borders with, anthem, 0:24:43.180,0:24:45.070 capital city. 0:24:45.070,0:24:48.580 If I'm describing a person, I[br]have a whole mostly different 0:24:48.580,0:24:51.220 set of properties that[br]are relevant, right? 0:24:51.220,0:24:54.160 Date of birth, place of birth,[br]citizenship, occupation, 0:24:54.160,0:24:56.950 father, mother,[br]religion, notable works-- 0:24:56.950,0:24:59.780 now, are all of these[br]relevant for all people? 0:24:59.780,0:25:00.970 No, of course not. 0:25:00.970,0:25:02.140 It depends. 0:25:02.140,0:25:05.220 And different items[br]about different people 0:25:05.220,0:25:08.920 will either have or not[br]have these fields, right? 0:25:08.920,0:25:12.640 So we wouldn't record religion[br]for absolutely every person. 0:25:12.640,0:25:14.200 Some people manage[br]to do without. 0:25:14.200,0:25:17.710 And also, it's not relevant[br]for a lot of people, like, 0:25:17.710,0:25:20.320 what their religion[br]happens to be. 0:25:20.320,0:25:22.840 Date of birth is generally[br]relevant for most people 0:25:22.840,0:25:24.060 that we're documenting. 0:25:24.060,0:25:29.390 So some properties kind of crop[br]up more commonly than others. 0:25:29.390,0:25:33.220 A person's height, for[br]example, is not generally 0:25:33.220,0:25:35.596 considered of[br]encyclopedic value, right? 0:25:35.596,0:25:36.970 We don't, for[br]example, if we have 0:25:36.970,0:25:40.840 an article about even a[br]really well-documented person 0:25:40.840,0:25:45.610 like Winston Churchill, does[br]Wikipedia mention his height? 0:25:45.610,0:25:47.620 I don't think it does. 0:25:47.620,0:25:50.320 Even though I'm sure[br]we could probably 0:25:50.320,0:25:52.810 find a source somewhere[br]that lists his height, 0:25:52.810,0:25:55.570 it's just not a[br]very relevant piece 0:25:55.570,0:25:57.506 of information about Churchill. 0:25:57.506,0:25:59.380 With everything else[br]that's written about him 0:25:59.380,0:26:00.796 and that we know[br]about him that we 0:26:00.796,0:26:03.460 want to include in the[br]article, a person's height 0:26:03.460,0:26:08.180 is not really something of[br]great value most of the time. 0:26:08.180,0:26:14.420 But if we are describing[br]Michael Jordan, it is relevant. 0:26:14.420,0:26:15.430 I'm dating myself. 0:26:15.430,0:26:19.230 People still know[br]Michael Jordan, right? 0:26:19.230,0:26:21.600 You know, a basketball[br]player, that's 0:26:21.600,0:26:24.204 when height is very[br]relevant, right? 0:26:24.204,0:26:25.620 That's one of the[br]first things you 0:26:25.620,0:26:28.020 say when you're describing[br]a basketball player, 0:26:28.020,0:26:31.380 is list their height. 0:26:31.380,0:26:33.690 So even within the[br]class of person, 0:26:33.690,0:26:36.480 some properties may be[br]more or less relevant, 0:26:36.480,0:26:38.320 depending on the context. 0:26:38.320,0:26:40.090 So let's look at some examples. 0:26:40.090,0:26:42.870 These are examples[br]of statements. 0:26:42.870,0:26:45.400 Each line is a statement. 0:26:45.400,0:26:47.130 So here's the first one. 0:26:47.130,0:26:53.270 I want to state, about the[br]item Earth, our planet. 0:26:53.270,0:26:55.760 And what I want[br]to say about Earth 0:26:55.760,0:27:00.980 is that the property[br]highest point on Earth 0:27:00.980,0:27:03.310 has the value Mt. 0:27:03.310,0:27:04.817 Everest. 0:27:04.817,0:27:05.900 Would you agree with that? 0:27:05.900,0:27:09.580 That is the highest[br]point on Earth. 0:27:09.580,0:27:11.100 That's a statement. 0:27:11.100,0:27:14.020 It says something[br]specific, one piece 0:27:14.020,0:27:15.517 of information about Earth. 0:27:15.517,0:27:17.350 Now of course, there's[br]a lot of other things 0:27:17.350,0:27:18.820 we want to say about Earth-- 0:27:18.820,0:27:21.165 circumference,[br]average temperature, 0:27:21.165,0:27:22.540 I don't know, all[br]kinds of things 0:27:22.540,0:27:26.750 we can describe the planet[br]with, density, it's a galaxy, 0:27:26.750,0:27:28.250 it belongs to, all that. 0:27:28.250,0:27:30.400 But here's one piece[br]of information, 0:27:30.400,0:27:37.370 one very specific field in[br]the detailed form about Earth. 0:27:37.370,0:27:38.990 The highest point is Mt. 0:27:38.990,0:27:39.590 Everest. 0:27:39.590,0:27:41.570 Now here's a second statement. 0:27:41.570,0:27:42.920 This time Mt. 0:27:42.920,0:27:46.690 Everest itself is the item[br]that I'm describing, right? 0:27:46.690,0:27:48.590 The topic has changed. 0:27:48.590,0:27:50.120 Now I'm saying[br]something about Mt. 0:27:50.120,0:27:52.340 Everest, and what[br]I'm saying about Mt. 0:27:52.340,0:27:56.860 Everest is elevation[br]above sea level. 0:27:56.860,0:28:01.190 Sounds the same but it[br]isn't, because the highest 0:28:01.190,0:28:04.670 point on Earth answers[br]the question where, 0:28:04.670,0:28:08.090 like on the planet, what[br]is the highest point? 0:28:08.090,0:28:08.720 It's Mt. 0:28:08.720,0:28:09.630 Everest. 0:28:09.630,0:28:12.911 But how high is that highest[br]point is a different piece 0:28:12.911,0:28:13.535 of information. 0:28:13.535,0:28:14.710 Do you agree? 0:28:14.710,0:28:16.790 It's the actual altitude. 0:28:16.790,0:28:19.600 It's not where on[br]the planet it is. 0:28:19.600,0:28:21.680 So it may sound similar,[br]but these are actually 0:28:21.680,0:28:24.030 very different pieces[br]of information. 0:28:24.030,0:28:27.800 So that highest[br]point, how high is it? 0:28:27.800,0:28:31.790 Well, it's 8,848 meters high. 0:28:31.790,0:28:36.550 Now the third statement gives[br]another piece of information 0:28:36.550,0:28:37.960 about the first item. 0:28:37.960,0:28:40.870 Same item-- I could have[br]grouped them together. 0:28:40.870,0:28:42.400 Another thing I[br]know about the Earth 0:28:42.400,0:28:46.480 is that the deepest[br]point on the planet 0:28:46.480,0:28:53.050 is the Challenger Deep, part[br]of the so-called Mariana 0:28:53.050,0:28:54.760 Trench in the ocean. 0:28:54.760,0:28:56.530 So that is the deepest point. 0:28:56.530,0:28:58.180 And how deep is it? 0:28:58.180,0:29:01.384 I again use the elevation[br]above sea level. 0:29:01.384,0:29:03.550 That's the name of the[br]property even though it's not 0:29:03.550,0:29:04.750 above sea level. 0:29:04.750,0:29:08.260 I have a negative value because[br]the elevation of the Challenger 0:29:08.260,0:29:13.700 Deep is minus 11[br]kilometers, more or less. 0:29:13.700,0:29:14.200 All right? 0:29:14.200,0:29:15.620 So these are statements. 0:29:15.620,0:29:18.820 These are four individual[br]pieces of data. 0:29:18.820,0:29:21.160 And I could also[br]look at it this way. 0:29:21.160,0:29:25.210 Maybe that's closer to the[br]government form example 0:29:25.210,0:29:26.620 that I was giving, right? 0:29:26.620,0:29:29.190 So I want to say[br]something about Earth. 0:29:29.190,0:29:30.760 What do I want to say? 0:29:30.760,0:29:33.580 Two things-- highest point. 0:29:33.580,0:29:36.760 That's the field,[br]that's the property, 0:29:36.760,0:29:37.780 and this is the value. 0:29:37.780,0:29:39.190 The highest point is Mt. 0:29:39.190,0:29:40.240 Everest. 0:29:40.240,0:29:42.880 The deepest point[br]is Challenger Deep. 0:29:42.880,0:29:46.450 And then I have things to[br]say about Challenger Deep-- 0:29:46.450,0:29:49.630 the property of elevation[br]above sea level, the value 0:29:49.630,0:29:52.280 is minus 11 kilometers. 0:29:55.900,0:30:00.600 Now here's yet another[br]view of the same data 0:30:00.600,0:30:04.530 once more, with numeric IDs. 0:30:04.530,0:30:08.150 So this is the same information,[br]the same four statements. 0:30:08.150,0:30:13.020 But this time, in[br]addition to using words, 0:30:13.020,0:30:21.270 I'm also including weird[br]numbers following either Q or P. 0:30:21.270,0:30:25.890 So P stands for property. 0:30:25.890,0:30:30.330 So the highest point[br]property is P610. 0:30:30.330,0:30:34.216 And the deepest point[br]property is P1589. 0:30:34.216,0:30:35.340 What do these numbers mean? 0:30:35.340,0:30:36.985 They don't mean anything at all. 0:30:36.985,0:30:37.860 They're just numbers. 0:30:37.860,0:30:39.760 They're just sequential numbers. 0:30:39.760,0:30:42.600 And if I create a new[br]Wikidata item right now, 0:30:42.600,0:30:46.020 it'll get just the[br]next available number. 0:30:46.020,0:30:47.790 So they're just numbers. 0:30:47.790,0:30:49.080 So P stands for property. 0:30:49.080,0:30:51.480 What does Q stand for? 0:30:51.480,0:30:53.460 Does anyone know? 0:30:53.460,0:30:58.500 It's a trick question[br]because it's hard to guess. 0:30:58.500,0:31:01.896 But the principal[br]architect of Wikidata, 0:31:01.896,0:31:07.860 a Wikipedian named Danny[br][INAUDIBLE] and data scientist, 0:31:07.860,0:31:10.950 is married to a lovely[br]lady named [INAUDIBLE] 0:31:10.950,0:31:16.320 spelled with a Q. And[br]this is a loving tribute. 0:31:16.320,0:31:21.780 And she's also a Wikipedian and[br]an admin of Uzbek Wikipedia. 0:31:21.780,0:31:31.650 So Q2 is just the numeric[br]identifier of the item Earth. 0:31:31.650,0:31:36.190 And Q513 is the[br]identifier of Mt. 0:31:36.190,0:31:37.310 Everest. 0:31:37.310,0:31:42.950 You notice that we use that ID[br]across the statement, right? 0:31:42.950,0:31:48.520 So from Wikidata's[br]perspective, this 0:31:48.520,0:31:53.290 is actually what the[br]database actually contains. 0:31:53.290,0:31:55.030 What we were saying with words-- 0:31:55.030,0:31:57.650 the Earth, highest[br]point, whatever-- 0:31:57.650,0:31:58.540 never mind that. 0:31:58.540,0:32:03.250 Q2 has P610 with a value Q513. 0:32:03.250,0:32:06.190 That's what Wikidata[br]cares about, OK? 0:32:06.190,0:32:09.770 Now that, you'll agree,[br]is a little inaccessible. 0:32:09.770,0:32:13.120 Just these lists of numbers,[br]that's a little hard. 0:32:13.120,0:32:16.240 So Wikidata[br]understands and allows 0:32:16.240,0:32:19.690 us to continue using our words. 0:32:19.690,0:32:23.650 But actually, it gets[br]translated into numeric IDs. 0:32:23.650,0:32:25.050 Now why is this a good idea? 0:32:30.070,0:32:33.070 Why can't we just[br]say Earth or Mt. 0:32:33.070,0:32:35.120 Everest? 0:32:35.120,0:32:36.170 Any thoughts? 0:32:36.170,0:32:39.530 This is an open question. 0:32:39.530,0:32:41.540 Why is this a good[br]idea to use numbers 0:32:41.540,0:32:43.260 instead of the names of things? 0:32:47.000,0:32:51.750 Yes, because more than one[br]thing can have the same name. 0:32:51.750,0:32:52.590 What do you mean? 0:32:52.590,0:32:53.460 There's only one Mt. 0:32:53.460,0:32:54.480 Everest. 0:32:54.480,0:32:55.510 Well, yeah. 0:32:55.510,0:32:58.710 But there there's also a[br]movie called-- and probably 0:32:58.710,0:33:00.000 more than one-- called Mt. 0:33:00.000,0:33:04.080 Everest, or a TV documentary[br]literally called Mt. 0:33:04.080,0:33:06.590 Everest. 0:33:06.590,0:33:09.960 And of course, if I'm[br]describing a person named 0:33:09.960,0:33:14.930 Frank Johnson, not the only[br]Frank Johnson on the planet, 0:33:14.930,0:33:16.180 right? 0:33:16.180,0:33:17.760 But wait, you say. 0:33:17.760,0:33:20.640 On Wikipedia we deal[br]with that problem, right? 0:33:20.640,0:33:23.490 How do we deal with that[br]problem on Wikipedia? 0:33:23.490,0:33:26.270 Does anyone in[br]the audience know? 0:33:26.270,0:33:27.969 The standard way to[br]deal with the fact 0:33:27.969,0:33:30.260 that there is more than one[br]Frank Johnson in the world, 0:33:30.260,0:33:35.600 on Wikipedia, is to use[br]parentheses after the name. 0:33:35.600,0:33:39.200 So there is Frank[br]Johnson (actor) 0:33:39.200,0:33:42.620 and Frank Johnson[br](politician), for example, 0:33:42.620,0:33:44.700 if that's the distinction[br]we need to make. 0:33:44.700,0:33:48.140 So you put in parentheses[br]kind of the minimal amount 0:33:48.140,0:33:51.840 of information you need to tell[br]apart these Frank Johnsons. 0:33:51.840,0:33:54.530 What if there's two[br]politician Frank Johnsons? 0:33:54.530,0:33:58.880 Well, then you would say Frank[br]Johnson, (Delaware politician) 0:33:58.880,0:34:01.960 versus Frank Johnson[br](California politician), right? 0:34:01.960,0:34:05.210 You just put in that bit of[br]context to tell them apart. 0:34:05.210,0:34:07.640 So that's the solution[br]that Wikipedians came up 0:34:07.640,0:34:12.469 with years and years ago[br]because they did need 0:34:12.469,0:34:15.560 a unique name for the article. 0:34:15.560,0:34:18.170 You can't have two[br]articles literally called 0:34:18.170,0:34:20.790 Frank Johnson on Wikipedia. 0:34:20.790,0:34:23.570 So that's the[br]solution on Wikipedia. 0:34:23.570,0:34:28.429 But Wikidata was designed[br]much later, more than a decade 0:34:28.429,0:34:31.340 after Wikipedia, and was[br]able to kind of learn 0:34:31.340,0:34:34.520 from the experience[br]of Wikipedia, which 0:34:34.520,0:34:39.380 has tremendous experience[br]with multilingualism, much 0:34:39.380,0:34:42.870 more than most sites and[br]projects, as we know. 0:34:42.870,0:34:44.659 And so the Wikidata[br]team understood 0:34:44.659,0:34:47.840 from the get go that[br]this will be an issue, 0:34:47.840,0:34:50.989 and it's better to use[br]numbers that are unequivocally 0:34:50.989,0:34:54.800 different from each[br]other instead of labels, 0:34:54.800,0:34:57.290 instead of the actual[br]name, the actual text, 0:34:57.290,0:34:59.630 because names are not unique. 0:34:59.630,0:35:03.260 Names can change, right? 0:35:03.260,0:35:08.960 Just last year, there was a[br]big naming reform in Ukraine 0:35:08.960,0:35:13.610 and a whole bunch of towns[br]and districts were renamed. 0:35:13.610,0:35:17.330 Does that mean we should change[br]all the data that we have, like 0:35:17.330,0:35:19.550 lose all the data that we[br]have about the old name? 0:35:19.550,0:35:22.130 No, we ideally just[br]want to change the name 0:35:22.130,0:35:24.020 without breaking links. 0:35:24.020,0:35:28.550 So having the links actually[br]refer to the numbers 0:35:28.550,0:35:32.090 is one way to ensure the[br]integrity of the data, 0:35:32.090,0:35:35.360 of the links, when[br]renaming happens. 0:35:35.360,0:35:39.230 Another reason is well, even[br]if the name doesn't change, 0:35:39.230,0:35:42.230 not all humans call[br]everything the same, right? 0:35:42.230,0:35:46.180 So Earth is Earth[br]in English, but it's 0:35:46.180,0:35:48.210 [SPEAKING ARABIC] in Arabic. 0:35:48.210,0:35:49.585 It's [SPEAKING HEBREW][br]in Hebrew. 0:35:53.480,0:35:56.570 So obviously, Earth--[br]even that is not 0:35:56.570,0:36:01.920 as unambiguous or unequivocal[br]as you might think. 0:36:01.920,0:36:03.500 And so that is the[br]reason Wikidata, 0:36:03.500,0:36:07.640 which is built to be[br]multilingual from the start, 0:36:07.640,0:36:11.230 talks about numbers[br]rather than labels. 0:36:11.230,0:36:12.150 OK. 0:36:12.150,0:36:15.370 Ha, I had a whole slide[br]about that and I forgot. 0:36:15.370,0:36:17.830 Yes, so even London,[br]again, is not 0:36:17.830,0:36:20.710 just London, England, which is[br]what you were thinking about. 0:36:20.710,0:36:22.030 It's also a city in Canada. 0:36:22.030,0:36:26.260 And it's also a family[br]name, like Jack London. 0:36:26.260,0:36:27.430 It's also a movie company. 0:36:27.430,0:36:32.230 There must be some hotel[br]named London somewhere. 0:36:32.230,0:36:36.070 This is a good opportunity[br]to remind everyone 0:36:36.070,0:36:41.110 that the vast[br]majority of humankind 0:36:41.110,0:36:45.700 does not speak a[br]word of English. 0:36:45.700,0:36:48.790 That's a statistic[br]worth remembering. 0:36:48.790,0:36:55.240 The vast majority of the planet[br]does not speak English at all. 0:36:55.240,0:36:57.070 That does not[br]contradict the datum 0:36:57.070,0:37:00.070 that English is the most[br]widely spoken language. 0:37:00.070,0:37:02.860 And yet, in aggregate,[br]a majority of people 0:37:02.860,0:37:07.180 speak other languages,[br]and not English at all. 0:37:07.180,0:37:13.150 So moving swiftly on, this[br]is a pause for questions 0:37:13.150,0:37:15.610 about what I've covered so far. 0:37:15.610,0:37:17.390 Any questions in the audience? 0:37:17.390,0:37:19.450 If not, we moved to IRC. 0:37:19.450,0:37:21.042 If there are any questions-- 0:37:23.880,0:37:26.891 Any questions? 0:37:26.891,0:37:27.390 No? 0:37:27.390,0:37:28.305 IRC? 0:37:28.305,0:37:29.490 Any questions? 0:37:33.580,0:37:34.180 OK. 0:37:34.180,0:37:38.170 We will have additional[br]pauses for questions later. 0:37:38.170,0:37:41.470 But enough of my hand-waving. 0:37:41.470,0:37:44.590 Let's go explore Wikidata. 0:37:44.590,0:37:49.730 So Wikidata lives[br]at wikidata.org. 0:37:49.730,0:37:59.570 And Wikidata already has[br]more than 25 million items. 0:37:59.570,0:38:05.570 That is, it collects[br]statements about more than 25 0:38:05.570,0:38:08.270 million topics. 0:38:08.270,0:38:12.170 It has many, many more[br]than 25 million statements 0:38:12.170,0:38:14.660 because many of these items[br]have dozens or hundreds 0:38:14.660,0:38:16.370 of statements. 0:38:16.370,0:38:20.720 So it documents 25[br]million things-- 0:38:20.720,0:38:23.153 people, books, rivers, whatever. 0:38:26.010,0:38:28.800 Just to give us a sense[br]of how big that number is, 0:38:28.800,0:38:32.430 how many articles do we[br]have on English Wikipedia? 0:38:32.430,0:38:35.610 More than-- yes, more[br]than 5 million articles. 0:38:35.610,0:38:37.990 And that's the[br]largest Wikipedia. 0:38:37.990,0:38:41.100 So Wikidata is[br]already describing 0:38:41.100,0:38:45.450 more than five times, or[br]about five times as many items 0:38:45.450,0:38:48.460 as even our largest Wikipedia. 0:38:48.460,0:38:50.840 So obviously,[br]Wikidata contains data 0:38:50.840,0:38:56.900 about things that have no[br]article on any Wikipedia. 0:38:56.900,0:39:01.980 It is a much, much larger,[br]more comprehensive project. 0:39:01.980,0:39:04.250 All right, the second[br]thing we might notice 0:39:04.250,0:39:07.610 is, well, this looks kind[br]of like Wikipedia, right? 0:39:07.610,0:39:11.210 If we've never visited, it[br]looks kind of like Wikipedia. 0:39:11.210,0:39:13.490 It has this sidebar. 0:39:13.490,0:39:15.290 It has these buttons at the top. 0:39:15.290,0:39:17.810 It looks like it's[br]from the '90s. 0:39:17.810,0:39:18.770 Yeah. 0:39:18.770,0:39:20.900 So the reason it[br]looks like Wikipedia 0:39:20.900,0:39:24.410 is that it is a wiki running[br]on Mediawiki software. 0:39:24.410,0:39:28.430 It is running on software[br]very much like Wikipedia. 0:39:28.430,0:39:32.180 But it is running on[br]a kind of modification 0:39:32.180,0:39:34.010 of the standard wiki software. 0:39:34.010,0:39:36.170 It has an additional,[br]very important component 0:39:36.170,0:39:38.630 named Wikibase,[br]which gives it all 0:39:38.630,0:39:42.700 of its structured and[br]linked data power. 0:39:42.700,0:39:46.763 So let's start[br]exploring Wikidata. 0:39:52.830,0:39:55.770 Let's take something local-- 0:39:55.770,0:39:57.530 Harvey Milk. 0:39:57.530,0:40:00.190 Harvey Milk. 0:40:00.190,0:40:03.460 What does Wikidata[br]know about Harvey Milk? 0:40:03.460,0:40:06.730 For those on YouTube[br]who may not be local, 0:40:06.730,0:40:15.580 he's a San Francisco politician[br]and gay rights activist 0:40:15.580,0:40:18.380 who was murdered in the '70s. 0:40:18.380,0:40:21.280 It was very significant in[br]the history of those struggles 0:40:21.280,0:40:22.710 in this country. 0:40:22.710,0:40:27.220 So what does Wikidata[br]tell us about Harvey Milk? 0:40:27.220,0:40:29.770 Well, the first[br]thing is it knows 0:40:29.770,0:40:34.562 that Harvey Milk is Q17141. 0:40:34.562,0:40:36.520 That's the most important[br]piece of information, 0:40:36.520,0:40:38.770 is first of all, that[br]is the identifier. 0:40:38.770,0:40:42.490 That is the item[br]number of all the data 0:40:42.490,0:40:46.150 that we will collect[br]about Harvey Milk. 0:40:46.150,0:40:50.020 The second thing you see[br]right under the title 0:40:50.020,0:40:54.730 is this line, this very,[br]very brief summary, right? 0:40:54.730,0:40:59.620 "American politician who became[br]a martyr in the gay community." 0:40:59.620,0:41:02.080 This line is the[br]description line. 0:41:02.080,0:41:04.640 So the name of the item-- 0:41:04.640,0:41:05.980 this is the label. 0:41:05.980,0:41:07.450 We call it label on Wikidata. 0:41:07.450,0:41:08.740 That's the label. 0:41:08.740,0:41:10.990 And this line is[br]the description. 0:41:10.990,0:41:13.480 Now why is this[br]description important? 0:41:13.480,0:41:16.990 This is the description that[br]helps us tell this Harvey 0:41:16.990,0:41:23.230 Milk from any other Harvey[br]Milk that may exist, all right? 0:41:23.230,0:41:26.530 So again, this would[br]be useful if I'm 0:41:26.530,0:41:30.190 looking up someone with a[br]slightly more generic name. 0:41:30.190,0:41:33.910 That line will help me tell[br]apart the item about Harvey 0:41:33.910,0:41:38.860 Milk the gay activist rather[br]than Harvey Milk the film 0:41:38.860,0:41:41.750 actor, OK? 0:41:41.750,0:41:43.100 And where is it coming from? 0:41:43.100,0:41:48.690 Well, Wikidata has[br]this whole table, 0:41:48.690,0:41:52.790 as you can see, with[br]descriptions and labels 0:41:52.790,0:41:54.750 in other languages. 0:41:54.750,0:41:59.600 So Wikidata is able to refer[br]to Harvey Milk in Arabic which, 0:41:59.600,0:42:04.010 don't panic, is written[br]from right to left. 0:42:04.010,0:42:07.730 It also knows what to[br]call him in Bulgarian. 0:42:07.730,0:42:11.030 I mean, it's the same name,[br]but it's in a different script. 0:42:11.030,0:42:13.640 In French, in Hebrew,[br]and that's it? 0:42:13.640,0:42:17.960 Does it not know a name[br]for Harvey Milk in Italian? 0:42:17.960,0:42:19.760 Of course it does. 0:42:19.760,0:42:22.250 It actually has[br]labels for this person 0:42:22.250,0:42:24.435 in many, many, many languages. 0:42:24.435,0:42:30.080 It doesn't have descriptions in[br]every language, as you can see. 0:42:30.080,0:42:30.800 OK? 0:42:30.800,0:42:36.240 So why was Wikidata showing me[br]these languages and not others? 0:42:36.240,0:42:39.260 I mean, why this somewhat[br]arbitrary collection-- 0:42:39.260,0:42:42.860 English, Arabic, Bulgarian,[br]German, French, and Hebrew? 0:42:42.860,0:42:45.300 Because I told it to. 0:42:45.300,0:42:50.390 So if we briefly click[br]over to my user page-- 0:42:50.390,0:42:52.730 again, like every wiki,[br]you have user accounts. 0:42:52.730,0:42:53.960 You have user pages. 0:42:53.960,0:42:55.380 This is my user page. 0:42:55.380,0:42:59.750 And as you can see,[br]there's this little user 0:42:59.750,0:43:03.230 information box here called[br]a Babel box by Wikipedians, 0:43:03.230,0:43:06.610 where I list the[br]languages that I speak. 0:43:06.610,0:43:11.000 And Wikidata uses this box[br]just to kind of helpfully 0:43:11.000,0:43:12.944 show me these languages. 0:43:12.944,0:43:14.360 Of course, all the[br]other languages 0:43:14.360,0:43:19.580 are still available, as you saw,[br]by clicking the more languages. 0:43:19.580,0:43:22.940 But this is just a[br]useful little way 0:43:22.940,0:43:27.590 of getting the languages I[br]care about up there first. 0:43:27.590,0:43:29.060 By the way, this is a lie. 0:43:29.060,0:43:31.170 I don't actually[br]speak Bulgarian. 0:43:31.170,0:43:33.740 That stayed on my user page[br]because I was demonstrating 0:43:33.740,0:43:37.010 this in Bulgaria and I wanted[br]that label to show up there 0:43:37.010,0:43:38.420 during the talk-- 0:43:38.420,0:43:40.250 just in case you[br]were going to tell me 0:43:40.250,0:43:43.840 a really good Bulgarian joke. 0:43:43.840,0:43:48.470 OK so for example, Hebrew[br]is my mother tongue. 0:43:48.470,0:43:51.730 And we have a Hebrew[br]label for Harvey Milk. 0:43:51.730,0:43:53.810 But we don't have a description. 0:43:53.810,0:44:00.950 So let's fix that right now by[br]clicking the edit button right 0:44:00.950,0:44:01.960 here. 0:44:01.960,0:44:05.930 I click edit, and this[br]table became editable. 0:44:05.930,0:44:09.661 And now I can very briefly[br]type a description. 0:44:22.899,0:44:24.440 AUDIENCE: Online in[br]about 20 seconds. 0:44:24.440,0:44:25.400 But can we hold it? 0:44:25.400,0:44:26.066 ASAF BARTOV: OK. 0:44:28.454,0:44:30.430 That was good timing[br]for the screen to crash. 0:44:53.642,0:44:54.142 OK? 0:44:59.082,0:45:01.800 Are we back? 0:45:01.800,0:45:02.850 OK. 0:45:02.850,0:45:03.690 Sorry about that. 0:45:03.690,0:45:07.500 So this was all about what to[br]call him in different languages 0:45:07.500,0:45:09.930 and scripts and how to[br]tell this person apart 0:45:09.930,0:45:13.590 from other people with[br]potentially the same name. 0:45:13.590,0:45:17.930 Let's scroll down and see[br]what else does Wikidata 0:45:17.930,0:45:19.680 know about this person? 0:45:19.680,0:45:24.060 So as you can see, this is[br]a list of statements, right? 0:45:24.060,0:45:25.500 This is a list of statements. 0:45:25.500,0:45:27.900 And the properties[br]are on the left, 0:45:27.900,0:45:30.340 the values are on the right. 0:45:30.340,0:45:33.870 So the first thing Wikidata[br]knows about Harvey Milk 0:45:33.870,0:45:38.520 is a very important[br]property called instance of. 0:45:38.520,0:45:39.910 Instance of. 0:45:39.910,0:45:44.690 And the property instance of[br]answers the very basic question 0:45:44.690,0:45:49.460 what kind of thing is[br]this that I'm describing? 0:45:49.460,0:45:50.870 Is it a book? 0:45:50.870,0:45:51.980 Is it a poem? 0:45:51.980,0:45:53.570 Is it a mountain? 0:45:53.570,0:45:55.520 Is it a theological concept? 0:45:55.520,0:45:57.800 No, it's a human. 0:45:57.800,0:46:00.020 It's a person, OK? 0:46:00.020,0:46:01.880 The item about Mt. 0:46:01.880,0:46:07.070 Everest will say[br]instance of mountain, OK? 0:46:07.070,0:46:10.790 This is a very[br]important property. 0:46:10.790,0:46:12.500 Why is it important? 0:46:12.500,0:46:14.630 Wouldn't anyone looking[br]at this know that this is 0:46:14.630,0:46:15.550 a human being? 0:46:15.550,0:46:16.310 Yes. 0:46:16.310,0:46:18.720 Anyone looking at[br]this will know. 0:46:18.720,0:46:23.780 But if I want a computer to[br]be able to pull information 0:46:23.780,0:46:28.160 about people, I want to[br]be able to easily exclude 0:46:28.160,0:46:30.680 all the mountains and[br]poems and other things that 0:46:30.680,0:46:33.440 are not people from my query. 0:46:33.440,0:46:37.400 So this single datum,[br]this single piece of data, 0:46:37.400,0:46:41.720 is what tells computers and[br]algorithms very clearly, 0:46:41.720,0:46:42.890 this is a human. 0:46:42.890,0:46:47.340 Things that aren't instance[br]of human are other things. 0:46:47.340,0:46:48.230 OK? 0:46:48.230,0:46:50.145 So it may sound very[br]trivial, but it's not. 0:46:50.145,0:46:51.770 It's very important[br]to have an instance 0:46:51.770,0:46:54.077 of field for Wikidata items. 0:46:54.077,0:46:55.410 All right, what else do we know? 0:46:55.410,0:46:59.360 Well, Wikidata knows about[br]an image for Harvey Milk. 0:46:59.360,0:47:02.982 Again, we can find a ton of[br]images-- or maybe not a ton, 0:47:02.982,0:47:04.940 but we can find dozens[br]of images of Harvey Milk 0:47:04.940,0:47:10.430 on Commons, on our Wikimedia[br]multimedia repository. 0:47:10.430,0:47:13.430 So why should we have a[br]single image here on Wikidata? 0:47:13.430,0:47:16.280 Again, this is[br]mostly for reusers. 0:47:16.280,0:47:18.920 If I'm building some kind of[br]tool that pulls information 0:47:18.920,0:47:21.680 from Wikidata, it's[br]nice if there's 0:47:21.680,0:47:24.680 at least one representative[br]image to kind of use 0:47:24.680,0:47:30.300 as the default or immediate[br]image for Harvey Milk 0:47:30.300,0:47:33.120 in some other reused context. 0:47:33.120,0:47:34.770 All right, sex or gender-- 0:47:34.770,0:47:35.670 male. 0:47:35.670,0:47:38.790 Country of citizenship--[br]United States of America. 0:47:38.790,0:47:39.910 Given name is Harvey. 0:47:39.910,0:47:41.580 The date of birth is so and so. 0:47:41.580,0:47:44.340 The place of birth is Woodmere. 0:47:44.340,0:47:45.870 The place of death[br]is San Francisco. 0:47:45.870,0:47:48.640 The manner of death is homicide. 0:47:48.640,0:47:50.930 Wikidata knows that. 0:47:50.930,0:47:55.700 Now again, every[br]little datum like that 0:47:55.700,0:48:02.210 is the basis for later querying[br]and answering questions. 0:48:02.210,0:48:07.390 So the fact that we record the[br]manner of death of people-- 0:48:07.390,0:48:09.230 or at least of some people-- 0:48:09.230,0:48:11.900 will allow us later[br]to go, you know, 0:48:11.900,0:48:17.120 who are some people from[br]Belgium who died by homicide? 0:48:17.120,0:48:24.650 That's a question Wikidata can[br]answer, thanks to this field. 0:48:24.650,0:48:27.680 The other thing I mentioned[br]is that things are links. 0:48:27.680,0:48:29.680 So the place of[br]birth is Woodmere. 0:48:29.680,0:48:31.900 I don't know where[br]Woodmere is, but I 0:48:31.900,0:48:34.390 can click that and find out. 0:48:34.390,0:48:38.270 Here is the Wikidata item[br]about Woodmere, right? 0:48:38.270,0:48:41.230 It was the value in the[br]statement about Harvey Milk, 0:48:41.230,0:48:43.900 but now I'm looking at[br]the item about Woodmere. 0:48:43.900,0:48:48.047 And it turns out it's in[br]Nassau County, New York, right? 0:48:48.047,0:48:50.380 And of course, Wikidata has[br]a whole bunch of information 0:48:50.380,0:48:55.450 for me about Woodmere-- 0:48:55.450,0:48:59.720 what country it's in and the[br]coordinates and the population 0:48:59.720,0:49:06.230 and the area, all the things you[br]would expect about a place, OK? 0:49:06.230,0:49:07.512 Let's get back to Harvey Milk. 0:49:10.370,0:49:13.260 So the manner of death,[br]the cause of death-- 0:49:13.260,0:49:16.880 now here, Wikidata gives[br]us excellent information. 0:49:16.880,0:49:20.390 The actual cause of death[br]is ballistic trauma. 0:49:20.390,0:49:22.160 That's a professional term. 0:49:22.160,0:49:27.560 And this statement[br]has qualifiers. 0:49:27.560,0:49:30.650 So until now, I was talking[br]about triples, right? 0:49:30.650,0:49:33.260 The item has a property[br]with a certain value. 0:49:33.260,0:49:35.270 Actually, each[br]statement can also 0:49:35.270,0:49:38.030 have a number of[br]qualifiers which 0:49:38.030,0:49:45.424 add aspects of information,[br]still about that one question 0:49:45.424,0:49:46.590 that we're answering, right? 0:49:46.590,0:49:49.904 So if this property[br]answers cause of death, 0:49:49.904,0:49:51.320 it's not discussing[br]anything else. 0:49:51.320,0:49:52.880 It's not discussing languages. 0:49:52.880,0:49:54.920 It's not discussing[br]date of birth, right? 0:49:54.920,0:49:56.930 It's talking about[br]the cause of death. 0:49:56.930,0:49:59.300 But we're not just[br]saying ballistic trauma. 0:49:59.300,0:50:04.550 We're saying ballistic trauma[br]with the quantity attribute 0:50:04.550,0:50:05.660 being five. 0:50:05.660,0:50:07.550 What does that mean? 0:50:07.550,0:50:08.870 Five bullets, right? 0:50:08.870,0:50:12.780 There are five[br]ballistic traumas. 0:50:12.780,0:50:15.300 He was he was shot five times. 0:50:15.300,0:50:18.210 And he was shot by this[br]person named Dan White. 0:50:18.210,0:50:25.020 And this ballistic trauma,[br]like this actual shooting, 0:50:25.020,0:50:28.420 is itself the subject[br]of this other thing. 0:50:28.420,0:50:31.440 This is a link to a[br]whole other Wikidata 0:50:31.440,0:50:35.510 item about the Moscone-Milk[br]assassinations. 0:50:35.510,0:50:38.610 Moscone was the San[br]Francisco mayor at the time. 0:50:43.540,0:50:47.510 We'll see slightly better or[br]easier to understand examples 0:50:47.510,0:50:49.460 of qualifiers in a bit. 0:50:49.460,0:50:54.440 So if this was[br]confusing, hang on. 0:50:54.440,0:50:55.970 So he was killed by Dan White. 0:50:55.970,0:50:57.800 He spoke English. 0:50:57.800,0:50:59.960 His occupation--[br]here's an example 0:50:59.960,0:51:03.140 of a property with more[br]than one value, right? 0:51:03.140,0:51:06.260 So Milk was a politician. 0:51:06.260,0:51:09.710 But he was also a Navy[br]officer, at least for a while. 0:51:09.710,0:51:12.980 That was another thing that[br]he did during his life. 0:51:12.980,0:51:15.350 And he was a human[br]rights activist, right? 0:51:15.350,0:51:20.600 So some people are[br]writers and translators. 0:51:20.600,0:51:22.610 So people can have more[br]than one occupation. 0:51:22.610,0:51:26.310 People can speak more[br]than one language. 0:51:26.310,0:51:29.130 Here's a better[br]example of a qualifier. 0:51:29.130,0:51:35.090 So the property award received[br]has the value Presidential 0:51:35.090,0:51:37.560 Medal of Freedom. 0:51:37.560,0:51:42.570 And that award has an[br]attribute called point in time, 0:51:42.570,0:51:44.070 like when was this? 0:51:44.070,0:51:46.580 This was in 2009. 0:51:46.580,0:51:50.510 Do you see that[br]this piece of data-- 0:51:50.510,0:52:04.780 2009-- is a sub-statement[br]or is subjugated 0:52:04.780,0:52:09.621 to the context of this award,[br]was the Presidential Medal 0:52:09.621,0:52:10.120 of Freedom? 0:52:10.120,0:52:13.430 It can't just kind of[br]free float in the article. 0:52:13.430,0:52:17.650 It's not that 2009 is itself[br]a meaningful thing, right? 0:52:17.650,0:52:21.550 This medal was awarded in 2009. 0:52:21.550,0:52:22.170 If 0:52:22.170,0:52:24.070 Wikidata doesn't[br]tell us, for example, 0:52:24.070,0:52:27.130 when he was a Navy officer, OK? 0:52:27.130,0:52:30.100 But if we were, for example,[br]to look that up right now 0:52:30.100,0:52:33.820 and find out that Milk was[br]a Navy officer between 1962 0:52:33.820,0:52:39.542 and 1964, we could go back[br]here to the Navy officer bit 0:52:39.542,0:52:41.010 and click edit. 0:52:41.010,0:52:44.190 This is how I edit this[br]particular little piece 0:52:44.190,0:52:45.360 of information. 0:52:45.360,0:52:49.350 And add a qualifier like this. 0:52:49.350,0:52:51.300 I click Add Qualifier. 0:52:51.300,0:52:57.660 And I could pick start[br]time and end time, right? 0:52:57.660,0:53:04.990 And then I could[br]type 1962 to 1964, 0:53:04.990,0:53:08.000 and that would be[br]teaching Wikidata. 0:53:08.000,0:53:10.660 Oh, I'm sorry, I meant to[br]do that for Navy officer. 0:53:10.660,0:53:11.230 OK. 0:53:11.230,0:53:14.800 But, you know,[br]that is the exact-- 0:53:14.800,0:53:18.400 the accurate time span[br]of that statement. 0:53:18.400,0:53:22.850 So it's true to say about a[br]person, he was a Navy officer, 0:53:22.850,0:53:25.990 even if of course he wasn't a[br]Navy officer his entire life. 0:53:25.990,0:53:28.120 But it's better and[br]it's more accurate, 0:53:28.120,0:53:32.260 to say he was a Navy officer[br]between 1962 and 1964. 0:53:32.260,0:53:35.380 Don't worry, I'm[br]not saving this. 0:53:35.380,0:53:39.150 No vandalizing of[br]Wikidata in this session. 0:53:39.150,0:53:40.450 OK. 0:53:40.450,0:53:41.140 Moving on. 0:53:41.140,0:53:42.430 What else does Wikidata know? 0:53:42.430,0:53:43.960 He was educated at[br]this university. 0:53:43.960,0:53:46.970 He was a member of[br]this political party. 0:53:46.970,0:53:47.470 Right? 0:53:47.470,0:53:49.428 That's of course if[br]they're a relevant property 0:53:49.428,0:53:52.270 for a politician. 0:53:52.270,0:53:56.500 Religion, military branch,[br]what is the category on commons 0:53:56.500,0:53:58.720 that discusses this[br]item, is something 0:53:58.720,0:54:00.790 that Wikidata can tell us. 0:54:00.790,0:54:02.200 And that's it. 0:54:02.200,0:54:04.570 Now, is that everything[br]that we could possibly 0:54:04.570,0:54:07.780 say in a structured[br]way about Harvey Milk? 0:54:07.780,0:54:08.680 No. 0:54:08.680,0:54:13.570 We could probably find at[br]least a few more things to say. 0:54:13.570,0:54:17.170 We will see how to contribute[br]new information to Wikidata 0:54:17.170,0:54:19.990 in just a minute with[br]a different example. 0:54:19.990,0:54:23.360 But this-- all this was[br]a set of statements. 0:54:23.360,0:54:23.860 Right? 0:54:23.860,0:54:25.927 This was the title[br]statements here. 0:54:28.840,0:54:31.160 But at the bottom of the[br]list of statements is 0:54:31.160,0:54:34.300 another section[br]called identifiers. 0:54:34.300,0:54:36.960 And I want to spend a minute[br]talking about what that is. 0:54:36.960,0:54:43.630 So identifiers is a[br]collection of keys. 0:54:43.630,0:54:47.980 A collection of[br]IDs, or codes, that 0:54:47.980,0:54:52.890 are keys to other[br]information sources. 0:54:52.890,0:54:58.560 And a lot of Wikidata items[br]have a whole series of keys 0:54:58.560,0:55:03.030 to other databases, other[br]sites, other repositories, 0:55:03.030,0:55:08.340 that help you or a computer[br]be able to access not just 0:55:08.340,0:55:12.240 some database and look for[br]information about Harvey Milk, 0:55:12.240,0:55:16.950 but access the exact record[br]relevant to Harvey Milk. 0:55:16.950,0:55:20.280 And again, if you imagine[br]someone named John Smith, 0:55:20.280,0:55:21.690 that is really valuable, right? 0:55:21.690,0:55:23.250 If you're not just[br]told, oh yeah, 0:55:23.250,0:55:24.875 you can look at the[br]Library of Congress 0:55:24.875,0:55:27.840 for John Smith,[br]good luck with that. 0:55:27.840,0:55:30.240 Or if I tell you, go to[br]the Library of Congress 0:55:30.240,0:55:35.810 to this record for this John[br]Smith, you see the difference. 0:55:35.810,0:55:42.080 So Wikidata tells us that on[br]VIAF, which is the Virtual 0:55:42.080,0:55:44.570 International Authority File. 0:55:44.570,0:55:50.140 It's an aggregated master[br]index built by bibliographers, 0:55:50.140,0:55:52.831 by librarians, of people. 0:55:52.831,0:55:53.330 Right? 0:55:53.330,0:55:56.720 It tries to kind of aggregate[br]information about people 0:55:56.720,0:55:59.270 across library[br]catalogs everywhere. 0:55:59.270,0:56:05.120 So the VIAF ID for Harvey[br]Milk is this number. 0:56:05.120,0:56:07.340 And conveniently,[br]if I click that, 0:56:07.340,0:56:10.160 I'm not taking to[br]some Wikidata item. 0:56:10.160,0:56:13.010 I'm actually taken[br]to the relevant site. 0:56:13.010,0:56:16.760 So this took me right[br]to viaf.org, the Virtual 0:56:16.760,0:56:21.770 International Authority File,[br]directly to their record 0:56:21.770,0:56:23.310 about Harvey Milk. 0:56:23.310,0:56:23.810 All right? 0:56:23.810,0:56:27.290 And that itself leads[br]me to national catalogs 0:56:27.290,0:56:29.630 of national libraries[br]all over the world. 0:56:29.630,0:56:32.360 We won't get into the[br]things you can do with VIAF. 0:56:32.360,0:56:37.220 The point is Wikidata[br]contained the piece of thread 0:56:37.220,0:56:40.820 that I could tug on[br]to arrive directly 0:56:40.820,0:56:44.840 to that information[br]in other databases. 0:56:44.840,0:56:45.680 Yes. 0:56:45.680,0:56:49.670 And it has that for many,[br]many kinds of databases. 0:56:49.670,0:56:53.150 The BNF, for example, that's[br]the National Library of France. 0:56:53.150,0:56:56.270 And that will take me[br]to that index card. 0:56:56.270,0:56:57.320 IMDB. 0:56:57.320,0:56:58.620 We all know IMDB, right? 0:56:58.620,0:57:03.320 So here I have the key[br]to Harvey Milk in IMDB. 0:57:03.320,0:57:05.810 And this is what IMDB says[br]about Harvey Milk, right? 0:57:05.810,0:57:08.480 They have their own piece[br]of information about him, 0:57:08.480,0:57:11.590 of course, with filmography[br]and everything else. 0:57:11.590,0:57:15.140 And see, I did not have[br]to search IMDB for it. 0:57:15.140,0:57:19.070 I just had the key right[br]there waiting for me. 0:57:19.070,0:57:21.080 Now, again, this is[br]very convenient for me 0:57:21.080,0:57:24.590 as I just showed you the[br]human use case for this. 0:57:24.590,0:57:27.530 But it's even more[br]powerful in aggregate 0:57:27.530,0:57:35.450 when we allow computers to[br]traverse this network of links 0:57:35.450,0:57:36.110 between-- 0:57:36.110,0:57:41.690 not just within wiki data, but[br]between data storage facilities 0:57:41.690,0:57:43.850 and repositories. 0:57:43.850,0:57:49.790 This is sometimes referred to[br]as the linked data open cloud. 0:57:49.790,0:57:52.670 Cloud, because it's multiple[br]different repositories 0:57:52.670,0:57:54.740 that are interlinked. 0:57:54.740,0:58:02.210 And Wikidata is already, and[br]to a growing extent, the Nexus, 0:58:02.210,0:58:04.460 the connection[br]point between a lot 0:58:04.460,0:58:06.780 of these different databases. 0:58:06.780,0:58:09.230 So IMDB, for example,[br]it's a good example 0:58:09.230,0:58:11.300 because it's site[br]almost everyone knows, 0:58:11.300,0:58:14.000 IMDB has information[br]about Harvey Milk. 0:58:14.000,0:58:16.670 But that information[br]does not include a link 0:58:16.670,0:58:19.140 to the French National Library. 0:58:19.140,0:58:19.645 Right? 0:58:19.645,0:58:20.770 Do you see what I'm saying? 0:58:20.770,0:58:25.550 So IMDB is a data repository[br]with IDs and allows linking. 0:58:25.550,0:58:28.100 But it does not give you[br]what Wikidata gives you which 0:58:28.100,0:58:32.850 is this kind of collection of-- 0:58:32.850,0:58:36.330 it's like a junction of all[br]these different data sources. 0:58:36.330,0:58:37.910 So Wikidata is the[br]place where you 0:58:37.910,0:58:40.730 can document these[br]interrelationships 0:58:40.730,0:58:41.640 or equivalencies. 0:58:41.640,0:58:42.140 Right? 0:58:42.140,0:58:48.770 So ID, you know, 587548 on IMDB[br]is discussing the same topic 0:58:48.770,0:58:52.260 as French National[br]Library ID whatever. 0:58:52.260,0:58:55.210 Wikidata contains that[br]piece of information. 0:58:55.210,0:58:59.090 that this ID in this database[br]is about the same person 0:58:59.090,0:59:04.050 as that ID in that database. 0:59:04.050,0:59:05.290 OK. 0:59:05.290,0:59:07.420 So that's what[br]identifiers are about. 0:59:07.420,0:59:11.320 Still scrolling down the[br]Wikidata item about Harvey 0:59:11.320,0:59:15.500 Milk, we have the site links. 0:59:15.500,0:59:20.840 The site links are links[br]to Wikimedia projects 0:59:20.840,0:59:22.770 that are related to this item. 0:59:22.770,0:59:25.250 So of course there[br]are Wikipedia articles 0:59:25.250,0:59:28.880 about Harvey Milk in many,[br]many different wikipedias. 0:59:28.880,0:59:31.700 Quite a few language versions. 0:59:31.700,0:59:34.960 And there are[br]pages on Wikiquote, 0:59:34.960,0:59:36.680 one of the sister projects. 0:59:36.680,0:59:38.630 There are pages on[br]Wikiquote with some quotes 0:59:38.630,0:59:40.130 from Harvey Milk. 0:59:40.130,0:59:45.060 And there is even a page for[br]Harvey Milk on Wikisource. 0:59:45.060,0:59:45.560 Right? 0:59:45.560,0:59:47.840 So this is a collection[br]of those links. 0:59:47.840,0:59:52.760 And those of you who have maybe[br]only dealt with Wikidata data 0:59:52.760,0:59:57.290 for inter-wiki links, which[br]we used to do in the old days 0:59:57.290,0:59:59.600 manually within[br]the article text, 0:59:59.600,1:00:01.716 now we do it through[br]Wikidata, so maybe that's 1:00:01.716,1:00:03.590 the only thing you didn't[br]know about Wikidata 1:00:03.590,1:00:10.130 is how to update these[br]inter-wiki tables on Wikidata. 1:00:10.130,1:00:11.430 All right. 1:00:11.430,1:00:14.090 So that concludes[br]our little tour 1:00:14.090,1:00:18.560 of the anatomy of[br]a Wikidata page. 1:00:18.560,1:00:22.370 I will just remind you that[br]it's a wiki page, which 1:00:22.370,1:00:26.120 means it has a discussion[br]page, a talk page. 1:00:26.120,1:00:27.960 This one happens to be empty. 1:00:27.960,1:00:30.092 But, you know, if we have[br]concerns or arguments 1:00:30.092,1:00:31.550 about some of the[br]data here that is 1:00:31.550,1:00:33.290 what we would use[br]to discuss this 1:00:33.290,1:00:36.830 and to arrive at consensus. 1:00:36.830,1:00:41.760 It also has a history view just[br]like every Wikipedia article. 1:00:41.760,1:00:47.402 So you can see here[br]a list of edits. 1:00:47.402,1:00:48.860 Maybe some of you[br]have never looked 1:00:48.860,1:00:51.710 at a history page on Wikipedia,[br]so this looks overwhelming. 1:00:51.710,1:00:55.040 But every line here,[br]every entry here, 1:00:55.040,1:00:58.240 is a single edit, a single[br]revision, a single change 1:00:58.240,1:01:00.440 to this Wikidata item. 1:01:00.440,1:01:01.670 Just Harvey Milk. 1:01:01.670,1:01:04.250 And you can see at the very[br]top this edit that I just 1:01:04.250,1:01:06.680 made-- this is my[br]volunteer account 1:01:06.680,1:01:09.650 and I just made this edit,[br]and in parentheses you 1:01:09.650,1:01:10.790 can see what I did. 1:01:10.790,1:01:14.640 I added an HE,[br]Hebrew, description. 1:01:14.640,1:01:16.930 And this is the text[br]that I added in Hebrew. 1:01:16.930,1:01:17.430 Right? 1:01:17.430,1:01:21.470 So we can see who added[br]what to the Wikidata item, 1:01:21.470,1:01:24.960 just like we can do[br]the same on Wikipedia. 1:01:24.960,1:01:26.390 So we have the revision history. 1:01:26.390,1:01:27.560 We can undo edits. 1:01:27.560,1:01:30.320 We can revert, just[br]like on Wikipedia. 1:01:34.420,1:01:36.940 And what else did I[br]want to show here? 1:01:36.940,1:01:40.930 We can add an item to my[br]watch list using the star, 1:01:40.930,1:01:42.020 just like on Wikipedia. 1:01:42.020,1:01:46.670 So we have all these[br]standard wiki features 1:01:46.670,1:01:47.878 that we would come to expect. 1:01:50.440,1:01:54.270 Let's pause for questions. 1:01:54.270,1:01:58.412 Any questions about what[br]we've covered so far? 1:02:02.573,1:02:03.073 Yes. 1:02:06.950,1:02:11.345 Are attributes of statements[br]precept for the specific value? 1:02:16.640,1:02:19.830 No they're not reset. 1:02:19.830,1:02:29.760 And generally Wikidata data does[br]not enforce by default logic. 1:02:29.760,1:02:32.130 So, I mean, there's[br]nothing to prevent you 1:02:32.130,1:02:38.700 from editing the[br]item about Brazil, 1:02:38.700,1:02:42.990 and adding the property height. 1:02:46.690,1:02:50.430 Now height is not a relevant[br]property for a country. 1:02:50.430,1:02:50.970 Right? 1:02:50.970,1:02:53.880 I mean, maybe average[br]elevation, maybe. 1:02:53.880,1:02:56.400 But not just height,[br]which is used for humans 1:02:56.400,1:02:59.040 or for physical things. 1:02:59.040,1:03:02.400 So you could add that[br]property to Brazil and save it 1:03:02.400,1:03:04.650 and the wiki would not complain. 1:03:04.650,1:03:07.590 Now in the background[br]there are kind 1:03:07.590,1:03:13.020 of extra wiki outside the[br]wiki prostheses for constraint 1:03:13.020,1:03:13.710 validation. 1:03:13.710,1:03:16.050 So there are bots and[br]other processes that 1:03:16.050,1:03:17.940 run, and occasionally,[br]for example, 1:03:17.940,1:03:26.570 identify non-living things[br]with a date of birth field. 1:03:26.570,1:03:27.720 That's nonsensical. 1:03:27.720,1:03:29.010 That should not exist. 1:03:29.010,1:03:31.710 If someone mistakenly added[br]that there are processes 1:03:31.710,1:03:34.350 that would flag[br]that to be fixed. 1:03:34.350,1:03:36.690 But the wiki itself,[br]Wikidata, will not 1:03:36.690,1:03:38.550 prevent you from adding that. 1:03:38.550,1:03:41.940 And that is by design[br]to keep things flexible. 1:03:41.940,1:03:43.930 So that people don't[br]run into, oh wait, 1:03:43.930,1:03:46.560 but I can't add this[br]because nobody thought 1:03:46.560,1:03:49.830 that I would need this, maybe. 1:03:49.830,1:03:54.530 I hope that answers[br]your question. 1:03:54.530,1:03:57.290 You say helpful[br]answer, question mark. 1:03:57.290,1:03:59.510 So was it a helpful answer, or? 1:04:03.940,1:04:04.440 OK. 1:04:04.440,1:04:05.426 Yes, Eleanor. 1:04:05.426,1:04:10.707 AUDIENCE: [INAUDIBLE] 1:04:10.707,1:04:12.040 ASAF BARTOV: Excellent question. 1:04:12.040,1:04:13.030 I'll repeat it. 1:04:13.030,1:04:16.180 You ask how do I find[br]the wiki data item 1:04:16.180,1:04:18.370 number from Wikipedia. 1:04:18.370,1:04:21.580 If I'm reading about Harvey Milk[br]and I want to look at the data 1:04:21.580,1:04:23.600 how do I do that? 1:04:23.600,1:04:27.400 That is an excellent question[br]and let's skip to Wikipedia. 1:04:27.400,1:04:32.030 Conveniently I have the[br]link right here on English. 1:04:32.030,1:04:35.600 So this is the Wikipedia[br]article about Harvey Milk 1:04:35.600,1:04:42.740 and every item on Wikipedia[br]should have a wiki data 1:04:42.740,1:04:47.660 item associated with it, but it[br]doesn't happen automatically. 1:04:47.660,1:04:51.470 So if I just created[br]a page on Wikipedia 1:04:51.470,1:04:55.010 I also need to create a[br]Wikidata entity for it 1:04:55.010,1:04:57.170 if it doesn't already exist. 1:04:57.170,1:04:59.420 It could already exist[br]because it was already 1:04:59.420,1:05:01.970 covered in a different[br]language, for example. 1:05:01.970,1:05:05.390 So that was parenthetical. 1:05:05.390,1:05:09.020 But every article on Wikipedia[br]should have, here on the side, 1:05:09.020,1:05:14.270 on the side are under Tools,[br]a link called Wikidata item. 1:05:14.270,1:05:15.450 Right here. 1:05:15.450,1:05:16.160 OK. 1:05:16.160,1:05:18.110 That Wikidata data[br]item is a link 1:05:18.110,1:05:21.710 that takes you to[br]Wikidata, to the entity, 1:05:21.710,1:05:23.510 and there you find the number. 1:05:23.510,1:05:25.370 You can-- you don't[br]even have to click it. 1:05:25.370,1:05:27.830 I mean, the URL itself[br]tells you the number. 1:05:27.830,1:05:34.620 The number, you see, it's[br]wikidata.org/wiki/q17141. 1:05:34.620,1:05:35.444 OK. 1:05:35.444,1:05:36.860 So that was an[br]excellent question. 1:05:36.860,1:05:37.686 Other questions? 1:05:37.686,1:05:38.185 Yes. 1:05:41.470,1:05:44.430 Yeah, about the additional[br]attributes, the qualifiers. 1:05:44.430,1:05:46.920 So, yes, I answered[br]more generically. 1:05:46.920,1:05:49.370 But just like the[br]properties themselves 1:05:49.370,1:05:53.390 are not limited per item,[br]the qualifiers per statement 1:05:53.390,1:05:57.750 are also not[br]entirely preordained. 1:05:57.750,1:05:59.570 But there is some[br]structure to it. 1:05:59.570,1:06:03.140 I don't want to go into it[br]at great length right now. 1:06:03.140,1:06:06.320 If we have time in the end[br]we can get back to that. 1:06:06.320,1:06:09.590 But some qualifiers are again[br]relevant for some things, 1:06:09.590,1:06:13.180 start time, end time,[br]and others won't be. 1:06:13.180,1:06:16.280 Wikidata does try to offer you-- 1:06:16.280,1:06:18.710 you may remember when I[br]clicked add qualifier, 1:06:18.710,1:06:22.170 it gave me kind of drop down[br]of some relevant qualifiers. 1:06:22.170,1:06:24.475 So it does try to[br]help you in that way. 1:06:27.280,1:06:28.160 Other question? 1:06:28.160,1:06:31.180 Are the values for[br]instance of already 1:06:31.180,1:06:33.310 mappable to external ontologies? 1:06:36.500,1:06:41.310 That is a complicated question. 1:06:41.310,1:06:43.490 I'll help people understand[br]the question first. 1:06:43.490,1:06:48.570 So an ontology is a[br]structure, some kind 1:06:48.570,1:06:52.350 of hierarchy or[br]cloud, of entities 1:06:52.350,1:06:54.510 and their interrelationships. 1:06:54.510,1:06:56.920 An ontology would[br]say, for example, 1:06:56.920,1:06:58.710 a person is a living thing. 1:06:58.710,1:06:59.670 So is a dog. 1:06:59.670,1:07:02.340 They're both living things,[br]but they're different things. 1:07:02.340,1:07:09.910 And then, you know, say[br]things about those entities 1:07:09.910,1:07:11.350 and their interrelationships. 1:07:11.350,1:07:13.300 Now there are many,[br]many competing, 1:07:13.300,1:07:17.230 or coexisting models[br]of ontology's. 1:07:17.230,1:07:19.840 Many of them were created[br]for specific needs. 1:07:19.840,1:07:25.170 Many of them want to be[br]a universal ontology. 1:07:25.170,1:07:27.790 But of course it's[br]impossible to quite 1:07:27.790,1:07:32.150 agree on one complete[br]and simple ontology. 1:07:32.150,1:07:34.240 And so there are[br]many ontology's. 1:07:34.240,1:07:38.520 Which brings up your question,[br]can we map across ontology's? 1:07:38.520,1:07:43.840 Can we say that when wiki data[br]says instance of book that 1:07:43.840,1:07:47.260 is equivalent to some other[br]ontology saying instance 1:07:47.260,1:07:49.940 of bibliographic record? 1:07:49.940,1:07:50.860 And the answer is yes. 1:07:50.860,1:07:52.360 There are some such mappings. 1:07:52.360,1:07:54.420 They are incomplete. 1:07:54.420,1:07:58.240 And there's no kind of[br]auto magic thing happening 1:07:58.240,1:08:01.180 in the wiki vis-a-vis[br]those other ontology's. 1:08:01.180,1:08:03.250 That's kind of[br]left as an exercise 1:08:03.250,1:08:06.280 for those dealing with those[br]other ontology's, and for tool 1:08:06.280,1:08:09.880 builders and other[br]platform improvements 1:08:09.880,1:08:13.050 beyond Wikidata itself. 1:08:13.050,1:08:13.750 OK. 1:08:13.750,1:08:15.190 Other questions? 1:08:15.190,1:08:17.430 Yeah, we have one from[br]the YouTube stream. 1:08:17.430,1:08:21.160 Someone asked, why can't I[br]link Howard Carter's occupation 1:08:21.160,1:08:26.439 to archeologists when I use[br]an info box that fetches info 1:08:26.439,1:08:28.960 from Wikidata? 1:08:28.960,1:08:33.160 Why can't I link it[br]from the info box? 1:08:33.160,1:08:35.500 So, someone on the[br]stream answered 1:08:35.500,1:08:37.659 saying, because it's[br]an improper connection, 1:08:37.659,1:08:39.700 because the target is not[br]about the subject only. 1:08:43.020,1:08:46.710 The target is not[br]about the subject? 1:08:46.710,1:08:48.479 If I understand the[br]question correctly, 1:08:48.479,1:08:53.130 what you would want to be able[br]to do is from within Wikipedia 1:08:53.130,1:08:59.130 be able to say occupation[br]and link to a Wikidata entry 1:08:59.130,1:09:01.050 about archeology. 1:09:01.050,1:09:03.569 That doesn't quite[br]work that way. 1:09:03.569,1:09:05.430 We will get to a[br]little discussion 1:09:05.430,1:09:08.460 of that in an upcoming[br]section of this talk. 1:09:08.460,1:09:13.260 So I will defer the rest[br]of my answer to then. 1:09:13.260,1:09:15.319 OK. 1:09:15.319,1:09:19.160 So we're done with[br]questions for this phase, 1:09:19.160,1:09:22.850 and my browser got[br]tired of waiting for me. 1:09:22.850,1:09:26.551 So, yes. 1:09:26.551,1:09:27.050 All right. 1:09:27.050,1:09:36.850 So we took a look at Wikidata,[br]and we took questions. 1:09:36.850,1:09:41.020 So now, let's teach[br]Wikidata some new things. 1:09:41.020,1:09:44.020 Some things it[br]doesn't already know. 1:09:44.020,1:09:47.109 Let's look at this item here. 1:09:47.109,1:09:50.950 So this item is about one[br]of my favorite writers, 1:09:50.950,1:09:53.840 an American writer[br]named Helen Dewitt. 1:09:53.840,1:10:01.570 Wikidata, of course, fondly[br]refers to her as q54674, 1:10:01.570,1:10:03.070 but we can call[br]her Helen Dewitt. 1:10:03.070,1:10:05.740 And what can we contribute here? 1:10:05.740,1:10:10.600 So Wikidata has far less[br]information about Helen Dewitt. 1:10:10.600,1:10:13.144 Most of you probably haven't[br]heard of her, that's OK. 1:10:13.144,1:10:14.560 What does Wikidata[br]know about her? 1:10:14.560,1:10:16.450 Well instance of human. 1:10:16.450,1:10:17.800 We have a photo of her. 1:10:17.800,1:10:18.780 She's female. 1:10:18.780,1:10:20.530 She's an American. 1:10:20.530,1:10:21.790 Her name is Helen. 1:10:21.790,1:10:22.630 Date of birth. 1:10:22.630,1:10:23.650 Place of birth. 1:10:23.650,1:10:25.970 She's an author, a[br]novelist, a writer. 1:10:25.970,1:10:28.840 She was educated at the[br]University of Oxford. 1:10:28.840,1:10:33.160 And Wikidata knows what[br]her official website is. 1:10:33.160,1:10:35.780 That's useful, but that's it. 1:10:35.780,1:10:37.780 Now we can contribute[br]information here. 1:10:37.780,1:10:43.120 For example, she's an American[br]author writing in English. 1:10:43.120,1:10:45.550 So we could add[br]that information. 1:10:45.550,1:10:48.430 We could click the[br]Add button here. 1:10:48.430,1:10:50.200 And this is a good[br]moment to acknowledge 1:10:50.200,1:10:54.830 that the user interface of[br]Wikidata is a work in progress. 1:10:54.830,1:10:56.740 It's not as intuitive[br]as it might be. 1:10:56.740,1:10:58.570 So you need to[br]understand that click-- 1:10:58.570,1:11:01.630 to add a completely[br]new property, 1:11:01.630,1:11:04.060 You need to click[br]this Add button. 1:11:04.060,1:11:08.020 If you want to add an additional[br]value to the property official 1:11:08.020,1:11:11.530 website, you need to[br]click this Add button. 1:11:11.530,1:11:13.780 It makes a kind of[br]sense with a shaded box. 1:11:13.780,1:11:15.880 But, you know, you need[br]to kind of pay attention, 1:11:15.880,1:11:18.901 and it's not as[br]friendly as it might be. 1:11:18.901,1:11:20.650 [COUGHING] Excuse me. 1:11:20.650,1:11:23.380 So, let's add a property here. 1:11:23.380,1:11:25.690 Click the Add button. 1:11:25.690,1:11:29.740 Again, Wikidata tries to[br]be useful by suggesting 1:11:29.740,1:11:32.760 some relevant[br]properties for humans. 1:11:32.760,1:11:36.640 A bit more morbidly it suggests,[br]how about date of death? 1:11:36.640,1:11:38.700 That's not cool, Wikidata. 1:11:38.700,1:11:40.480 Helen Dewitt is still alive. 1:11:40.480,1:11:42.700 So I will not add[br]date of death, but I 1:11:42.700,1:11:46.140 can add languages spoken,[br]written, or signed. 1:11:46.140,1:11:48.370 OK, so I click that. 1:11:48.370,1:11:51.670 And she writes in English. 1:11:51.670,1:11:54.450 I just type English-- whoops. 1:11:54.450,1:11:56.750 Not in Hebrew. 1:11:56.750,1:11:58.380 Don't panic. 1:11:58.380,1:12:01.010 I type English here. 1:12:01.010,1:12:04.250 And, oh, and of course Wikidata[br]has auto-complete, right? 1:12:04.250,1:12:06.080 So it tries to help me along. 1:12:06.080,1:12:10.100 But you will notice that[br]it has all kinds of things 1:12:10.100,1:12:10.940 called English. 1:12:10.940,1:12:14.030 I mean, it turns out that[br]there is a place in Indiana 1:12:14.030,1:12:16.370 called English, Indiana. 1:12:16.370,1:12:17.150 Did I mean that? 1:12:17.150,1:12:20.210 No, of course I didn't mean[br]that she writes her books 1:12:20.210,1:12:21.961 in English, Indiana. 1:12:21.961,1:12:22.460 Right? 1:12:22.460,1:12:26.180 But, you know, Wikidata gives me[br]the option of linking to that. 1:12:26.180,1:12:30.530 I also don't mean the botanist[br]Carl Schwartz English. 1:12:30.530,1:12:32.870 No, no I mean the[br]west Germanic language 1:12:32.870,1:12:34.029 originating in England. 1:12:34.029,1:12:34.820 That's what I mean. 1:12:34.820,1:12:36.110 So I click that. 1:12:36.110,1:12:37.760 And I click Save. 1:12:37.760,1:12:38.450 And that's it. 1:12:38.450,1:12:41.780 Again I have just made[br]an edit to Wikidata. 1:12:41.780,1:12:47.750 I have just taught Wikidata[br]that this author speaks English. 1:12:47.750,1:12:50.370 Now, again, this[br]may be very obvious. 1:12:50.370,1:12:52.280 She's American. 1:12:52.280,1:12:54.560 Of course not all[br]Americans write in English. 1:12:54.560,1:12:56.930 It may be obvious if[br]you look at her books. 1:12:56.930,1:12:59.060 The important thing[br]is that now Wikidata 1:12:59.060,1:13:02.090 knows this as a piece of data. 1:13:02.090,1:13:04.610 And, again, think ahead[br]to queries, which we will 1:13:04.610,1:13:06.980 demonstrate in a little bit. 1:13:06.980,1:13:09.000 Without this piece[br]of information 1:13:09.000,1:13:14.060 that I just added, if I were to[br]ask Wikidata five minutes ago, 1:13:14.060,1:13:19.760 give me a list of novelists[br]writing in English, OK, 1:13:19.760,1:13:22.730 Wikidata would have returned[br]thousands of results. 1:13:22.730,1:13:27.600 But Helen Dewitt would[br]not have been among them. 1:13:27.600,1:13:32.000 Because up until two[br]minutes ago Wikidata 1:13:32.000,1:13:35.640 didn't know that Helen Dewitt[br]writes in English and not 1:13:35.640,1:13:37.520 in Spanish. 1:13:37.520,1:13:38.730 Do you see? 1:13:38.730,1:13:42.570 It is this explicit[br]statement that will now 1:13:42.570,1:13:46.560 make her be included in any[br]future queries that asks, 1:13:46.560,1:13:48.700 who are novelists[br]writing in English? 1:13:53.250,1:13:54.500 OK. 1:13:54.500,1:13:58.560 By the way, she's[br]a PhD in Classics. 1:13:58.560,1:14:05.590 She speaks-- or at least reads[br]and writes Latin and Greek, 1:14:05.590,1:14:07.270 ancient Greek, and I could-- 1:14:07.270,1:14:09.610 I can-- I mean, I[br]happen to know that. 1:14:09.610,1:14:12.420 But wait, wait, wait,[br]wait, wait, you say. 1:14:12.420,1:14:14.130 What about original research? 1:14:14.130,1:14:18.890 I mean, you can't just add[br]stuff like that to Wikidata. 1:14:18.890,1:14:19.920 Don't you need sources? 1:14:19.920,1:14:22.860 Citations? 1:14:22.860,1:14:23.890 Of course I do. 1:14:23.890,1:14:25.020 Yes. 1:14:25.020,1:14:27.720 Let's add some sources to this. 1:14:27.720,1:14:31.410 So on Wikidata,[br]just like Wikipedia, 1:14:31.410,1:14:34.980 things should generally[br]be supported by citations, 1:14:34.980,1:14:36.990 by references. 1:14:36.990,1:14:43.290 And just like Wikipedia,[br]they aren't always supported 1:14:43.290,1:14:44.650 in that way. 1:14:44.650,1:14:48.870 OK so, I mean, I can[br]just add it to Wikidata. 1:14:48.870,1:14:49.442 Watch me. 1:14:49.442,1:14:50.400 I just did that, right? 1:14:50.400,1:14:54.450 I just added English and[br]Latin without any citation, 1:14:54.450,1:14:56.850 and I will not be[br]arrested for it. 1:14:56.850,1:14:59.520 Just like I could edit[br]a Wikipedia article 1:14:59.520,1:15:02.610 and add some information[br]without a citation. 1:15:02.610,1:15:03.600 It may stick. 1:15:03.600,1:15:06.810 It may stay in the article,[br]or it may be reverted. 1:15:06.810,1:15:11.010 It depends on the kind of[br]information I'm adding. 1:15:11.010,1:15:13.740 It depends how many people[br]are paying attention 1:15:13.740,1:15:15.060 to the article on Wikipedia. 1:15:15.060,1:15:18.420 And it works the[br]same way on Wikidata. 1:15:18.420,1:15:21.780 OK, so, you can add some[br]things without references. 1:15:21.780,1:15:23.970 Ideally, when you[br]add, information you 1:15:23.970,1:15:25.570 should include references. 1:15:25.570,1:15:30.990 So let's be good Wikidata[br]citizens and add a source. 1:15:30.990,1:15:34.395 Here is an article that[br]I prepared in advance. 1:15:38.100,1:15:39.370 This is Helen Dewitt. 1:15:39.370,1:15:44.450 And in this article,[br]somewhere, it actually 1:15:44.450,1:15:51.770 says right at the[br]bottom here, see, 1:15:51.770,1:15:54.990 Dewitt knows, in descending[br]order of proficiency, Latin, 1:15:54.990,1:15:57.010 ancient Greek, French,[br]German, Spanish, 1:15:57.010,1:15:59.460 and Portuguese, Dutch, Danish,[br]Norwegian, Swedish, Arabic, 1:15:59.460,1:16:01.680 Hebrew and Japanese. 1:16:01.680,1:16:04.770 This may sound[br]excessive, but it's true. 1:16:04.770,1:16:06.330 I met this woman. 1:16:06.330,1:16:09.670 So anyway, we don't have[br]to include all of that. 1:16:09.670,1:16:13.050 The point is this article from[br]a reasonably reliable source, 1:16:13.050,1:16:15.840 this magazine,[br]this interview, can 1:16:15.840,1:16:19.270 count as a source for[br]the languages she speaks. 1:16:19.270,1:16:20.700 So I copy the URL. 1:16:20.700,1:16:23.130 I just copied off my browser. 1:16:23.130,1:16:27.530 And, whoops-- that's not-- 1:16:27.530,1:16:28.580 here we go. 1:16:28.580,1:16:31.610 And I can just add[br]a reference here 1:16:31.610,1:16:34.670 to the information that I[br]just added to Wikidata, right? 1:16:34.670,1:16:38.300 I can click Add Reference. 1:16:38.300,1:16:45.800 And then just say the reference[br]URL is, and I just paste. 1:16:45.800,1:16:48.840 I paste this URL. 1:16:48.840,1:16:50.160 Hit Enter. 1:16:50.160,1:16:51.060 And that's it. 1:16:51.060,1:16:55.380 And now the fact that she[br]speaks Latin has a reference. 1:16:55.380,1:16:58.320 If you look at the other[br]things here on Wikidata, 1:16:58.320,1:17:02.660 you can see that these IDs, for[br]example, have references, too. 1:17:02.660,1:17:03.420 Right? 1:17:03.420,1:17:06.570 In this case, the reference[br]just says, excuse me-- 1:17:14.760,1:17:18.600 In this case it just as[br]imported from English Wikipedia. 1:17:18.600,1:17:24.970 But wait, you say, can[br]Wikipedia be a source? 1:17:24.970,1:17:26.620 Not properly, no. 1:17:26.620,1:17:30.100 I mean, just like Wikipedia[br]itself doesn't cite itself. 1:17:30.100,1:17:33.790 We don't say, this person[br]was born in this city 1:17:33.790,1:17:34.870 how do we know? 1:17:34.870,1:17:37.210 We read it on Wikipedia[br]in another language. 1:17:37.210,1:17:39.610 That's not a good citation. 1:17:39.610,1:17:41.400 It's not a good[br]citation for Wikidata 1:17:41.400,1:17:45.040 either so why do we put it here? 1:17:45.040,1:17:49.240 Well you can see the qualifier[br]here is different, right? 1:17:49.240,1:17:53.535 It's not reference URL, which[br]is what I put in for Latin here. 1:18:17.020,1:18:20.320 It's not reference URL here,[br]it's a different qualifier. 1:18:20.320,1:18:23.020 It says-- saying, imported from. 1:18:23.020,1:18:25.960 So this is not an[br]actual reference that 1:18:25.960,1:18:27.610 supports this piece of data. 1:18:27.610,1:18:30.730 It just shows where did[br]this data come from. 1:18:30.730,1:18:33.670 It's a slightly different[br]thing, because this data was 1:18:33.670,1:18:37.210 mass imported into Wikidata. 1:18:37.210,1:18:40.960 So it wasn't input by[br]hand by some volunteer. 1:18:40.960,1:18:44.770 It was imported into Wikidata[br]en masse by a script, 1:18:44.770,1:18:46.180 by a program. 1:18:46.180,1:18:49.820 And we want to know, where[br]did this number come from? 1:18:49.820,1:18:51.440 Well it came from[br]English Wikipedia. 1:18:51.440,1:18:54.130 So again, that's not[br]a proper reference 1:18:54.130,1:18:56.200 for the validity[br]of the information, 1:18:56.200,1:18:59.200 but it does at least tell us[br]it came from English Wikipedia. 1:18:59.200,1:19:03.460 We can click and look on[br]English Wikipedia and find out. 1:19:03.460,1:19:05.230 Maybe there's a[br]footnote there that 1:19:05.230,1:19:08.970 says where it did come from. 1:19:08.970,1:19:11.000 OK. 1:19:11.000,1:19:15.320 So this was an example of[br]teaching Wikidata something 1:19:15.320,1:19:16.910 that it didn't know. 1:19:16.910,1:19:18.512 Something about the languages. 1:19:18.512,1:19:20.720 And of course I could add[br]this reference for English. 1:19:20.720,1:19:23.210 I could add all the other[br]languages that she speaks. 1:19:23.210,1:19:26.060 And I won't bore you with[br]that, but that is basically 1:19:26.060,1:19:27.050 how it's done. 1:19:27.050,1:19:29.720 So you click this Add to[br]add a completely new-- 1:19:32.650,1:19:34.030 completely new statement. 1:19:34.030,1:19:36.250 Now, by the way, the fact[br]that these are the only two 1:19:36.250,1:19:39.220 suggestions that[br]Wikidata can think of, 1:19:39.220,1:19:42.100 doesn't mean these[br]are the only options. 1:19:42.100,1:19:46.750 OK, you can just type[br]anything that may be relevant. 1:19:46.750,1:19:50.950 We could add, for[br]example, award. 1:19:50.950,1:19:52.570 Just start typing award. 1:19:52.570,1:19:54.910 And here I have I have[br]a bunch of properties 1:19:54.910,1:19:56.510 that are relevant for awards. 1:19:56.510,1:20:00.100 Awards received, together[br]with, conferred by, right? 1:20:00.100,1:20:05.790 There's all kinds of properties[br]that I could rely on. 1:20:05.790,1:20:09.600 And of course there is a list of[br]all the properties of Wikidata. 1:20:09.600,1:20:11.580 And that list is[br]also sorted by type. 1:20:11.580,1:20:15.480 So yes, there is a list of[br]properties relevant to people 1:20:15.480,1:20:17.130 so that you don't have to guess. 1:20:17.130,1:20:18.660 But a surprising[br]amount of the time 1:20:18.660,1:20:22.760 you can just start typing[br]and get the right properties 1:20:22.760,1:20:25.340 suggested to you. 1:20:25.340,1:20:27.230 OK. 1:20:27.230,1:20:33.050 So we taught Wikidata[br]something new, 1:20:33.050,1:20:38.980 and now let's teach Wikidata[br]something completely new. 1:20:38.980,1:20:39.480 Right? 1:20:39.480,1:20:42.480 So how do we create[br]a new Wikidata item? 1:20:42.480,1:20:46.880 So, like I said, if I[br]created a Wikipedia article 1:20:46.880,1:20:49.520 about something that was[br]not previously covered 1:20:49.520,1:20:53.540 on any other[br]Wikipedia, chances are 1:20:53.540,1:20:57.170 there would not be an already[br]existing Wikidata item. 1:20:57.170,1:21:03.190 Sometimes there might[br]be, because Wikidata 1:21:03.190,1:21:06.857 does have 25 million entities. 1:21:06.857,1:21:08.190 But sometimes there wouldn't be. 1:21:08.190,1:21:10.148 So, first of all, I could[br]search for it, right? 1:21:10.148,1:21:14.210 So I could go to Wikidata[br]to the search box 1:21:14.210,1:21:17.390 here and just start typing, and[br]search for what I want, right? 1:21:17.390,1:21:20.690 So if I'm searching for Helen[br]Dewitt I just say Helen, 1:21:20.690,1:21:25.590 and I can see whether[br]or not it exists. 1:21:25.590,1:21:29.240 And there's a detailed search[br]results page, et cetera, 1:21:29.240,1:21:33.074 where I can where I can find out[br]if the item does exist or not. 1:21:33.074,1:21:35.240 Excuse me, this reminds me[br]of a very important thing 1:21:35.240,1:21:36.620 I wanted to[br]demonstrate, and that 1:21:36.620,1:21:42.710 is the multilingualism[br]of Wikidata. 1:21:42.710,1:21:49.340 So remember all these[br]labels in other languages. 1:21:49.340,1:21:54.390 Wikidata knows what to call[br]Helen Dewitt in Hebrew. 1:21:54.390,1:22:00.800 And it will show it to Wikidata[br]users whose language is Hebrew. 1:22:00.800,1:22:04.220 Mine is set to[br]English, for your sake. 1:22:04.220,1:22:08.830 But if I change this I go to[br]Preferences here and change 1:22:08.830,1:22:09.740 my language. 1:22:09.740,1:22:15.475 [INAUDIBLE] All[br]right, and I hit Save. 1:22:15.475,1:22:20.350 Wikidata will start[br]talking to me in Hebrew. 1:22:20.350,1:22:23.090 Now brace yourselves. 1:22:23.090,1:22:24.620 Are you ready? 1:22:24.620,1:22:28.430 Don't panic, it's right to left. 1:22:28.430,1:22:32.630 Oh my god everything[br]is topsy-turvy. 1:22:32.630,1:22:36.590 So this is the same[br]article in Hebrew. 1:22:36.590,1:22:39.290 So the sidebar has[br]switched direction, 1:22:39.290,1:22:41.300 and I know most of[br]you cannot read it. 1:22:41.300,1:22:42.480 Bear with me. 1:22:42.480,1:22:44.750 This is the label[br]that we previously 1:22:44.750,1:22:46.840 saw in the label box. 1:22:46.840,1:22:49.580 This is how you spell[br]Helen Dewitt in Hebrew. 1:22:49.580,1:22:52.550 And here is the[br]description in Hebrew. 1:22:52.550,1:22:54.980 It's not the description in[br]English, this description, 1:22:54.980,1:22:57.380 American writer, which[br]I was shown previously. 1:22:57.380,1:23:00.740 Now I'm shown the Hebrew[br]description, appropriately. 1:23:00.740,1:23:03.500 But more interestingly,[br]oh my god! 1:23:03.500,1:23:07.640 All these statements[br]are suddenly in Hebrew. 1:23:07.640,1:23:08.940 How did that happen? 1:23:11.570,1:23:15.560 Well this tiny word here[br]is the very concise way 1:23:15.560,1:23:22.450 to say in Hebrew, instance of,[br]and this word here means human. 1:23:22.450,1:23:25.960 So these are links to[br]the same things, right? 1:23:25.960,1:23:28.100 It still links to Q5. 1:23:28.100,1:23:31.780 Q5 is the Wikidata[br]entity for human. 1:23:31.780,1:23:33.370 These are still the same things. 1:23:33.370,1:23:37.600 But because Wikidata has[br]multiple labels for everything, 1:23:37.600,1:23:39.580 it has multiple[br]labels for items. 1:23:39.580,1:23:42.760 And it also has multiple[br]labels for property names. 1:23:42.760,1:23:46.450 So Wikidata knows how[br]to say, instance of, 1:23:46.450,1:23:50.140 and award received,[br]in other languages. 1:23:50.140,1:23:54.490 That is why it is able to show[br]me all this data in Hebrew 1:23:54.490,1:23:59.890 even if none of that data was[br]actually input into Wikidata 1:23:59.890,1:24:01.870 by a Hebrew speaker. 1:24:01.870,1:24:04.900 That data could have been[br]input by English speakers, 1:24:04.900,1:24:08.230 but thanks to the[br]fact that someone once 1:24:08.230,1:24:12.760 translated the word[br]photo into Hebrew, 1:24:12.760,1:24:14.830 I can see this field in Hebrew. 1:24:17.750,1:24:21.230 So one of the things you[br]can do to help Wikidata, 1:24:21.230,1:24:23.600 right now, without[br]any special knowledge 1:24:23.600,1:24:26.210 is to help translate[br]those labels. 1:24:26.210,1:24:29.030 Every label only needs to[br]be translated just once. 1:24:29.030,1:24:31.310 So you can see that all[br]of these properties, date 1:24:31.310,1:24:34.720 of birth, name et cetera,[br]they all have Hebrew labels. 1:24:34.720,1:24:36.760 Maybe one of these would not. 1:24:36.760,1:24:38.361 No, they all have Hebrew labels. 1:24:38.361,1:24:39.110 Doing pretty good. 1:24:42.960,1:24:45.810 And I'm able to search[br]in my own language. 1:24:45.810,1:24:48.210 I'm able to click Add. 1:24:48.210,1:24:49.890 This word is Add,[br]so I click this, 1:24:49.890,1:24:51.780 and now I have the Add screen. 1:24:51.780,1:24:55.860 It all speaks my language,[br]and it's awesome. 1:24:55.860,1:25:00.330 And now for your sake I[br]will switch back to English, 1:25:00.330,1:25:03.090 but it is important[br]to know you can 1:25:03.090,1:25:05.740 edit Wikidata in any language. 1:25:05.740,1:25:09.050 And it is far more multi-lingual[br]and multi-lingual friendly 1:25:09.050,1:25:13.260 than, for example commons, which[br]is also a project we all share. 1:25:13.260,1:25:17.730 But commons has some limitations[br]on how multi-lingual it is. 1:25:17.730,1:25:21.410 For example, the category[br]names, et cetera. 1:25:21.410,1:25:23.270 OK. 1:25:23.270,1:25:25.670 So we were beginning[br]to discuss creating 1:25:25.670,1:25:27.140 something completely new. 1:25:27.140,1:25:29.360 AUDIENCE: Quick[br]questions, if that's OK? 1:25:29.360,1:25:30.980 So there's two questions on IRC. 1:25:30.980,1:25:33.890 The first one is, can you[br]show search for something 1:25:33.890,1:25:35.420 like getting the list of things? 1:25:35.420,1:25:38.360 I want to learn how to search[br]for something properly like, 1:25:38.360,1:25:43.705 show me all the items with[br]this value of this property. 1:25:43.705,1:25:45.080 ASAF BARTOV: Yes. 1:25:45.080,1:25:47.540 That is part of[br]this talk, but I'll 1:25:47.540,1:25:49.250 get to that in a[br]little bit later. 1:25:49.250,1:25:52.010 There's a whole section where I[br]will demonstrate the very, very 1:25:52.010,1:25:55.190 powerful query[br]system of Wikidata 1:25:55.190,1:25:57.170 where I will cash[br]that check that I gave 1:25:57.170,1:25:59.090 at the beginning of[br]all these painters 1:25:59.090,1:26:01.029 who are sons of painters[br]queries et cetera 1:26:01.029,1:26:02.570 So I will demonstrate[br]how to do that. 1:26:02.570,1:26:04.190 AUDIENCE: Other question. 1:26:04.190,1:26:07.250 How does Wikidata data deal[br]with link rot, and other issues 1:26:07.250,1:26:09.680 streaming from their URL refs. 1:26:13.528,1:26:16.290 ASAF BARTOV: URLs break. 1:26:16.290,1:26:18.730 We call that link rot. 1:26:18.730,1:26:22.470 Wikidata doesn't have[br]any particular magic 1:26:22.470,1:26:24.730 around link rot,[br]just like Wikipedia. 1:26:24.730,1:26:29.100 So if you do use a bare[br]URL it may well rot. 1:26:29.100,1:26:34.230 But you can add qualifiers[br]with back up URLs else 1:26:34.230,1:26:37.680 on the Internet Archive, or[br]another mirroring service. 1:26:37.680,1:26:42.780 And potentially that could be[br]a software feature for Wikidata 1:26:42.780,1:26:46.590 to automatically save[br]or ensure that something 1:26:46.590,1:26:48.660 is saved on Internet[br]Archive, but I don't 1:26:48.660,1:26:50.670 know that it is doing so now. 1:26:50.670,1:26:56.040 So, just like Wikipedia, if[br]it is a bear URL it may rot. 1:26:56.040,1:27:00.240 And may need to be[br]replaced, possibly by bot. 1:27:00.240,1:27:01.390 Other questions? 1:27:09.840,1:27:12.650 All right, so let's[br]talk about how you 1:27:12.650,1:27:15.090 create a completely new item. 1:27:15.090,1:27:16.300 It's very simple. 1:27:16.300,1:27:21.810 You go to Wikidata and you[br]click here on the side. 1:27:21.810,1:27:30.180 There's a link, create new item,[br]which gives you this screen. 1:27:30.180,1:27:35.030 And let's create an[br]item about a book 1:27:35.030,1:27:39.500 that I'm reading right now[br]by this Bulgarian writer. 1:27:39.500,1:27:43.950 So we have an article about this[br]writer guy named Deyan Enev. 1:27:43.950,1:27:48.530 But we don't have an[br]article or a Wikidata item 1:27:48.530,1:28:07.980 about one of his famous[br]books called Circus Bulgaria. 1:28:07.980,1:28:10.050 That's the book I'm reading,[br]his first collection 1:28:10.050,1:28:11.216 of short stories in English. 1:28:11.216,1:28:14.280 Circus Bulgaria came out[br]in 2010, Portobello Books, 1:28:14.280,1:28:17.099 translated by Kapka Kassabova. 1:28:17.099,1:28:18.390 So that's the book I'm reading. 1:28:18.390,1:28:20.520 As you can see it's not[br]a link on Wikipedia. 1:28:20.520,1:28:23.370 There's no article about[br]it, and there's not even 1:28:23.370,1:28:26.310 a Wikidata entity item about it. 1:28:26.310,1:28:32.220 But we can totally create[br]it, even without a Wikipedia 1:28:32.220,1:28:33.090 article. 1:28:33.090,1:28:34.980 So let's create this new item. 1:28:34.980,1:28:37.260 Let's create it in[br]English for the purposes 1:28:37.260,1:28:38.880 of our demonstration. 1:28:38.880,1:28:44.910 The name of the item[br]is Circus Bulgaria. 1:28:44.910,1:28:47.520 Circus Bulgaria,[br]that's the name. 1:28:47.520,1:28:50.670 Not Circus Bulgaria[br]parentheses book, 1:28:50.670,1:28:53.520 or anything you may be[br]used to from Wikipedia. 1:28:53.520,1:28:56.520 It's the actual[br]name of the book, 1:28:56.520,1:29:00.450 and the description,[br]again, remember, 1:29:00.450,1:29:03.270 the description field[br]is just to kind of help 1:29:03.270,1:29:08.681 tell apart this Circus Bulgaria[br]from any other potential Circus 1:29:08.681,1:29:09.180 Bulgaria. 1:29:09.180,1:29:11.280 Maybe there's a[br]film or something. 1:29:11.280,1:29:20.480 So it's enough to just say[br]something like short story 1:29:20.480,1:29:23.270 collection. 1:29:23.270,1:29:27.830 I might add by Deyan Enev[br]and if just in case, again, 1:29:27.830,1:29:31.910 some future other short story[br]collection by some other author 1:29:31.910,1:29:33.560 happens to have that same name. 1:29:33.560,1:29:36.391 That should be[br]disambiguating enough. 1:29:36.391,1:29:36.890 OK. 1:29:36.890,1:29:39.770 Short story collection[br]by Deyan Enev. 1:29:39.770,1:29:42.050 I could have aliases for this. 1:29:42.050,1:29:47.240 The aliases assist find-ability. 1:29:47.240,1:29:51.020 This particular book has just[br]this one name, so that's fine. 1:29:51.020,1:29:52.260 And I click Create. 1:29:52.260,1:29:52.760 That's it. 1:29:52.760,1:29:55.990 I just start with a[br]label, and a description. 1:29:55.990,1:29:58.740 I click Create. 1:29:58.740,1:30:03.890 I have a brand new queue number[br]for my new Wikidata item. 1:30:03.890,1:30:05.960 And Wikidata knows[br]what to call it. 1:30:05.960,1:30:09.320 And a description in[br]one language at least. 1:30:09.320,1:30:11.930 And that's it, and I[br]can start populating it. 1:30:11.930,1:30:15.050 As it can see, it it[br]has no site links, 1:30:15.050,1:30:17.450 but it's ready to be taught. 1:30:17.450,1:30:20.450 So, for example, I[br]can start by teaching 1:30:20.450,1:30:24.610 it the name of the book[br]in another language 1:30:24.610,1:30:25.870 that I happened to speak. 1:30:29.050,1:30:31.720 Now it has two labels[br]in English and Hebrew. 1:30:31.720,1:30:36.880 I could also look[br]up the book Areon, 1:30:36.880,1:30:39.510 the original Bulgarian[br]label for this book. 1:30:39.510,1:30:41.550 Seems relevant. 1:30:41.550,1:30:43.320 Again, I do not speak Bulgarian. 1:30:43.320,1:30:49.860 But I can go to the Bulgarian[br]Wikipedia through into Wiki. 1:30:49.860,1:30:51.510 This is this gentleman. 1:30:51.510,1:30:54.510 And I could find-- 1:30:54.510,1:30:59.190 I can read Cyrillic so[br]I could easily find-- 1:30:59.190,1:31:00.030 when I say easily-- 1:31:02.940,1:31:05.710 when I say easily-- 1:31:05.710,1:31:12.731 maybe not so easy, but[br]I can search for it. 1:31:21.070,1:31:22.180 Here we go. 1:31:22.180,1:31:25.190 Tsirk Bulgaria. 1:31:25.190,1:31:27.510 That is the name of the book. 1:31:27.510,1:31:28.910 Tsirk, as in circus. 1:31:28.910,1:31:30.440 No problem. 1:31:30.440,1:31:32.725 So I just copy this right here. 1:31:35.240,1:31:38.090 And I go back to my new item. 1:31:38.090,1:31:45.725 My new item, which is here,[br]and I edit the Bulgarian field. 1:31:48.260,1:31:49.950 And here it is. 1:31:49.950,1:31:50.720 Awesome. 1:31:50.720,1:31:51.220 All right. 1:31:51.220,1:31:55.420 But I still haven't told[br]Wikidata anything about this. 1:31:55.420,1:31:56.920 I know I'm talking about a book. 1:31:56.920,1:31:59.110 Wikidata that doesn't[br]know that yet. 1:31:59.110,1:32:02.630 So let's start by[br]adding some statements. 1:32:02.630,1:32:05.390 First of all, I click Add. 1:32:05.390,1:32:07.190 Wikidata sensibly[br]says, how about we 1:32:07.190,1:32:08.630 start with instance of. 1:32:08.630,1:32:11.090 Tell me what kind of animal--[br]no, not kind of animal. 1:32:11.090,1:32:13.940 What kind of thing are you[br]trying to describe here? 1:32:13.940,1:32:18.130 Well it's an instance of a book. 1:32:18.130,1:32:20.930 Not in Hebrew, please. 1:32:20.930,1:32:22.180 So it's an instance of a book. 1:32:22.180,1:32:23.763 I could even be a[br]little more specific 1:32:23.763,1:32:31.920 and say it's an instance of[br]a short story collection. 1:32:31.920,1:32:34.620 There we go, short[br]story collection. 1:32:34.620,1:32:36.800 I hit Save. 1:32:36.800,1:32:37.430 Awesome. 1:32:37.430,1:32:39.680 So now we know what[br]kind of thing it is. 1:32:39.680,1:32:42.860 It's not a human, it's not a[br]mountain, it's not a concept. 1:32:42.860,1:32:44.760 It's a short story collection. 1:32:44.760,1:32:46.400 Now I can add some other things. 1:32:46.400,1:32:48.770 See, Wikidata is[br]already working for me. 1:32:48.770,1:32:51.020 Because it's a short[br]story collection 1:32:51.020,1:32:53.960 it's offering me to populate[br]these properties, and not 1:32:53.960,1:32:54.890 other ones. 1:32:54.890,1:32:56.990 Publication date,[br]original language, 1:32:56.990,1:33:00.350 genre, country of origin,[br]these are all relevant, right? 1:33:00.350,1:33:04.220 So let's start with original[br]language of the work 1:33:04.220,1:33:07.410 is Bulgarian. 1:33:07.410,1:33:09.810 Not Bulgaria, Bulgarian. 1:33:09.810,1:33:12.040 This is the item I want to link. 1:33:12.040,1:33:21.570 Hit Save, and whatever. 1:33:21.570,1:33:22.890 Author. 1:33:22.890,1:33:26.540 Let's identify the author. 1:33:26.540,1:33:29.350 So the author, the main[br]creator of the work, 1:33:29.350,1:33:32.470 is that gentleman Deyan Enev. 1:33:32.470,1:33:34.750 And remember, he has[br]a Wikipedia article. 1:33:34.750,1:33:37.210 He also has a Wikidata entity. 1:33:37.210,1:33:39.640 So Wikidata does know about him. 1:33:39.640,1:33:48.930 So I hit Save, and I can add[br]something about the translator. 1:33:52.530,1:33:54.390 And what was that lady's name? 1:33:57.990,1:34:00.120 Kapka Kassabova. 1:34:00.120,1:34:05.430 Now it so happens that Wikidata[br]already knows about this lady. 1:34:08.330,1:34:08.840 See? 1:34:08.840,1:34:12.290 So I can just start typing[br]and then just link to it. 1:34:12.290,1:34:12.840 Awesome. 1:34:12.840,1:34:13.824 But what if it didn't? 1:34:13.824,1:34:15.740 What if it was translated[br]by someone who isn't 1:34:15.740,1:34:17.690 already covered on Wikidata? 1:34:17.690,1:34:22.190 Well I could just type[br]the name as a string, 1:34:22.190,1:34:25.760 but ideally I could[br]create a Wikidata entity 1:34:25.760,1:34:28.940 about this translator so[br]that there is a possibility 1:34:28.940,1:34:30.350 to link to her. 1:34:33.560,1:34:36.920 Now I might actually[br]add a qualifier here 1:34:36.920,1:34:40.310 because, she's not the[br]translator of the book, right? 1:34:40.310,1:34:43.620 She's the translator of[br]the book into English. 1:34:43.620,1:34:44.440 Right. 1:34:44.440,1:34:50.151 So the language that she[br]translated into is English. 1:34:50.151,1:34:50.650 Right? 1:34:50.650,1:34:53.620 This book-- remember[br]I'm describing the book. 1:34:53.620,1:34:55.376 The item is about the book. 1:34:55.376,1:34:57.250 So the book would have[br]a different translator 1:34:57.250,1:34:58.510 into Polish. 1:34:58.510,1:35:02.320 So this is an example of[br]a property or a statement 1:35:02.320,1:35:06.430 that doesn't make sense without[br]one of those qualifiers. 1:35:06.430,1:35:08.140 It's just not correct. 1:35:08.140,1:35:11.320 It doesn't make sense to[br]say that translator is. 1:35:11.320,1:35:14.950 The English translator, or[br]even this English translator. 1:35:14.950,1:35:17.770 In 50 years maybe there would[br]be an additional English 1:35:17.770,1:35:18.940 translation. 1:35:18.940,1:35:24.774 So that's an example of[br]needing that qualifier. 1:35:24.774,1:35:27.190 And of course I could go on[br]and populate the other fields. 1:35:27.190,1:35:29.710 We don't have to[br]do that right now. 1:35:29.710,1:35:32.960 Publication date, country[br]of origin, et cetera. 1:35:32.960,1:35:35.440 So this is already beginning[br]to look like all those items 1:35:35.440,1:35:38.440 that we already saw, but just[br]a moment ago it didn't exist. 1:35:38.440,1:35:43.920 Just a moment ago Wikidata[br]had no concept of this work. 1:35:43.920,1:35:46.500 This happens to be one[br]of his notable works. 1:35:46.500,1:35:52.080 So I could actually go to the[br]item about Deyan Enev which 1:35:52.080,1:35:56.190 has all this information[br]already, occupation, languages, 1:35:56.190,1:35:59.170 and add a property. 1:35:59.170,1:36:01.050 Remember, I'm not[br]limited to these. 1:36:01.050,1:36:06.180 I can add a property[br]called notable works, 1:36:06.180,1:36:08.670 and mention my new item. 1:36:08.670,1:36:12.120 Circus Bulgaria. 1:36:12.120,1:36:12.750 See? 1:36:12.750,1:36:15.180 My new item is[br]showing up, and thanks 1:36:15.180,1:36:18.660 to this description that I[br]wrote, short story collection, 1:36:18.660,1:36:22.650 it's already appearing here in[br]the dropdown very conveniently. 1:36:22.650,1:36:24.270 So I linked to this. 1:36:24.270,1:36:25.154 I hit Save. 1:36:28.680,1:36:32.310 Ideally again I should find[br]some references showing 1:36:32.310,1:36:34.620 that this is a[br]notable work by him, 1:36:34.620,1:36:37.000 but we won't spend[br]time on that right now. 1:36:37.000,1:36:39.010 But the point is we[br]created a new item. 1:36:39.010,1:36:40.410 We populated it a little bit. 1:36:40.410,1:36:44.400 We linked to it so that it's[br]more discoverable by mentioning 1:36:44.400,1:36:47.760 it in the author name, and[br]of course the book item 1:36:47.760,1:36:50.710 itself mentions the author[br]and links to the author. 1:36:50.710,1:36:52.770 So that's all good. 1:36:52.770,1:36:57.780 One last thing we shall do is[br]give it some useful identifier 1:36:57.780,1:37:02.880 so let's add, say, the[br]Library of Congress record 1:37:02.880,1:37:03.940 for this book. 1:37:03.940,1:37:04.440 OK. 1:37:04.440,1:37:07.710 So I have prepared[br]this in advance. 1:37:07.710,1:37:08.760 Ooh. 1:37:08.760,1:37:12.720 Just in time, with 80 seconds to[br]go before it's giving up on me. 1:37:12.720,1:37:14.310 Oh it has already[br]given up on me. 1:37:14.310,1:37:15.490 That is very unfortunate. 1:37:23.300,1:37:29.110 So I go to the Library of[br]Congress and I find this book. 1:37:29.110,1:37:33.050 I find this entry, right? 1:37:33.050,1:37:37.320 In the Library of Congress[br]database about this book. 1:37:37.320,1:37:39.120 And it has a permalink. 1:37:39.120,1:37:42.570 It has a kind of guaranteed[br]to be permanent link. 1:37:42.570,1:37:47.950 I can just copy that link,[br]go back to my little book, 1:37:47.950,1:37:55.770 and say the Library of Congress. 1:37:55.770,1:38:01.070 Yeah, LCCN, that's what they[br]call their IDs, the call 1:38:01.070,1:38:02.120 number. 1:38:02.120,1:38:06.502 And I paste it here. 1:38:06.502,1:38:08.210 I actually don't need the URL. 1:38:08.210,1:38:09.136 I need just a number. 1:38:12.440,1:38:13.520 And there we go. 1:38:13.520,1:38:16.550 I have added it,[br]and now Wikidata 1:38:16.550,1:38:20.630 knows how to find bibliographic[br]information about this book. 1:38:20.630,1:38:24.710 And any re-user of[br]Wikidata, some program, 1:38:24.710,1:38:28.950 some tool that connects[br]books to authors 1:38:28.950,1:38:32.870 or does statistical analysis or[br]whatever, some future yet to be 1:38:32.870,1:38:35.090 imagined tool[br]could automatically 1:38:35.090,1:38:39.170 find additional metadata on the[br]Library of Congress site thanks 1:38:39.170,1:38:41.840 to this connection[br]that I just made. 1:38:41.840,1:38:44.150 And of course I could[br]add many other IDs 1:38:44.150,1:38:46.460 to other catalogs[br]around the world, 1:38:46.460,1:38:48.150 and we won't do that right now. 1:38:48.150,1:38:51.840 You can see that it's now[br]showing up under identifiers. 1:38:51.840,1:38:56.330 So this is how we created[br]a brand new piece of data. 1:38:56.330,1:38:59.632 Questions about this,[br]about creating new items? 1:39:18.100,1:39:19.180 Yeah, all right. 1:39:19.180,1:39:25.510 So we've seen how to contribute[br]to Wikidata on our own, 1:39:25.510,1:39:26.350 kind of through-- 1:39:26.350,1:39:27.840 directly through Wikidata. 1:39:30.680,1:39:35.220 Now you may you may be[br]thinking, but Asaf, this 1:39:35.220,1:39:39.880 sounds like a ton[br]of work recording 1:39:39.880,1:39:44.500 all of these little tiny bits of[br]information about every person 1:39:44.500,1:39:47.410 and every book and every town. 1:39:47.410,1:39:50.520 And if you think that[br]you would be correct. 1:39:50.520,1:39:52.730 That is a ton of work. 1:39:52.730,1:39:54.600 It's a lot of work. 1:39:54.600,1:39:59.930 However, it is centralized, so[br]it is reusable on other wikis 1:39:59.930,1:40:03.860 and we will show in just a[br]moment how we pull information 1:40:03.860,1:40:07.296 from Wikidata into[br]Wikipedia or other projects. 1:40:10.860,1:40:13.780 We will show that[br]in just a moment. 1:40:13.780,1:40:18.660 But here's an[br]awesome little game 1:40:18.660,1:40:23.205 that we Wikidata[br]volunteer, Magnis Monska, 1:40:23.205,1:40:30.900 has authored called the[br]Wikidata game, in which he 1:40:30.900,1:40:31.920 tricks people-- 1:40:31.920,1:40:35.730 sorry, helps people[br]make contributions 1:40:35.730,1:40:41.500 to Wikidata in a very,[br]very easy and pleasant way. 1:40:41.500,1:40:44.410 Let's look at the Wikidata game. 1:40:44.410,1:40:47.840 So the first thing you need[br]to do in that Wikidata game 1:40:47.840,1:40:50.660 is to log in,[br]because the Wikidata 1:40:50.660,1:40:53.150 game makes edits in your name. 1:40:53.150,1:40:54.980 So we need to authorize it. 1:40:54.980,1:40:57.250 It's perfectly safe. 1:40:57.250,1:41:01.090 And after you do that you[br]can go to the Wikidata game. 1:41:01.090,1:41:02.020 So this is the game. 1:41:02.020,1:41:03.520 Now I'm logged in. 1:41:03.520,1:41:05.230 And the Wikidata game[br]actually includes 1:41:05.230,1:41:06.970 a number of different games. 1:41:06.970,1:41:09.310 Let's start with a person game. 1:41:09.310,1:41:14.170 So Wikidata shows you-- 1:41:14.170,1:41:20.800 shows you an item, and asks[br]you a very simple question. 1:41:20.800,1:41:23.200 Person, or not a person? 1:41:26.410,1:41:30.550 So Wikidata goes through[br]Wikidata entities 1:41:30.550,1:41:35.540 that don't even have the[br]instance of property. 1:41:35.540,1:41:37.520 Which is why Wikidata[br]doesn't know, 1:41:37.520,1:41:41.120 literally doesn't know, if this[br]is a person, or a mountain, 1:41:41.120,1:41:44.390 or a city, or a country,[br]or anything else. 1:41:44.390,1:41:47.150 So it asks you, because this[br]is the kind of question that 1:41:47.150,1:41:50.300 Wikidata cannot[br]decide on its own, 1:41:50.300,1:41:54.800 but for us humans it's generally[br]trivial to be able to say 1:41:54.800,1:41:58.220 whether something that we're[br]looking at is a person or not. 1:41:58.220,1:42:03.590 It gets slightly trickier when[br]the information is in Javanese, 1:42:03.590,1:42:06.470 as it is here,[br]rather than English. 1:42:06.470,1:42:10.010 So this item happens to[br]be described in Javanese. 1:42:10.010,1:42:14.360 My Javanese, spoken in[br]Indonesia, is very weak. 1:42:14.360,1:42:19.620 However, I can tell that[br]this is not a person. 1:42:19.620,1:42:20.730 How can I tell? 1:42:20.730,1:42:23.220 Without understanding[br]a word of Japanese 1:42:23.220,1:42:25.950 I see that it mentions[br]1000 kilometers 1:42:25.950,1:42:28.860 and square kilometers, see? 1:42:28.860,1:42:32.520 So this is about a[br]place, or an area, 1:42:32.520,1:42:36.090 or a region, or whatever,[br]but not a person. 1:42:36.090,1:42:39.060 So this is an[br]example of how even 1:42:39.060,1:42:41.100 without understanding[br]language you can sometimes 1:42:41.100,1:42:42.400 make a determination. 1:42:42.400,1:42:45.030 However, of course,[br]you should be sure. 1:42:45.030,1:42:47.700 This is definitely not[br]what the Wikipedia article 1:42:47.700,1:42:49.150 about a person looks like. 1:42:49.150,1:42:50.430 So this is not a person. 1:42:50.430,1:42:52.780 I just click it and I'm[br]shown the next item. 1:42:56.600,1:42:59.660 This item is in another[br]language I do not speak, 1:42:59.660,1:43:00.950 and I just don't know. 1:43:00.950,1:43:03.740 I do not know if this is[br]about a person or not. 1:43:03.740,1:43:07.350 So I click Not Sure. 1:43:07.350,1:43:11.190 This is in Swedish, and[br]it's about Sulawesi, still 1:43:11.190,1:43:13.770 Indonesia. 1:43:13.770,1:43:16.530 And it is not about a person. 1:43:16.530,1:43:18.150 I have enough Swedish for that. 1:43:18.150,1:43:21.750 So I click not a person. 1:43:21.750,1:43:24.420 Now, you may say,[br]well, do I really 1:43:24.420,1:43:28.350 have to deal with all these[br]languages that I don't speak? 1:43:28.350,1:43:29.190 The answer is no. 1:43:29.190,1:43:30.630 You don't have to. 1:43:30.630,1:43:32.580 Here at the bottom[br]of the Wikidata game 1:43:32.580,1:43:33.840 there are settings. 1:43:33.840,1:43:38.270 You can click that[br]and tell Wikidata, 1:43:38.270,1:43:41.840 I cannot even read[br]Chinese or Japanese, 1:43:41.840,1:43:44.600 so please don't show me[br]items in those languages. 1:43:44.600,1:43:47.060 Because I wouldn't[br]even be able to guess. 1:43:47.060,1:43:50.000 I prefer these languages in[br]which I can relatively easily 1:43:50.000,1:43:51.380 make determinations. 1:43:51.380,1:43:54.601 And I can even tell Wikidata to[br]only show me these languages. 1:43:54.601,1:43:55.100 You see? 1:43:55.100,1:43:57.350 This was not selected,[br]which is why I 1:43:57.350,1:44:00.600 was shown some other languages. 1:44:00.600,1:44:04.240 I could say, only use[br]these languages, and save. 1:44:04.240,1:44:06.100 And now I can try[br]this game again. 1:44:06.100,1:44:07.980 However, that can[br]slow it down a little. 1:44:07.980,1:44:09.000 So here we go. 1:44:09.000,1:44:11.640 Here's a Spanish-- which[br]is one of the languages I 1:44:11.640,1:44:14.640 told Wikidata game it can use. 1:44:14.640,1:44:16.480 This is a Spanish item. 1:44:16.480,1:44:19.265 Now is it about a person or not? 1:44:22.120,1:44:23.230 It is not about a person. 1:44:25.906,1:44:26.780 Is it about a person? 1:44:29.155,1:44:29.655 No. 1:44:32.900,1:44:35.180 Yes, it is right? 1:44:35.180,1:44:38.550 Monk Cistercian, Pedro[br]de Ovideo Falconi. 1:44:38.550,1:44:40.890 That sounds like a person. 1:44:40.890,1:44:42.680 Frau Pedro Nasser. 1:44:42.680,1:44:44.960 Yeah, he was born[br]in Madrid 1577. 1:44:44.960,1:44:46.280 This is a person. 1:44:46.280,1:44:47.060 OK. 1:44:47.060,1:44:49.730 So I click person. 1:44:49.730,1:44:52.100 Again, if you're not[br]sure, click not sure. 1:44:52.100,1:44:55.100 The point is, just by clicking[br]person and as you can see 1:44:55.100,1:44:57.780 this would work[br]very well on mobile, 1:44:57.780,1:45:01.430 which is why I said you can[br]contribute on your commute. 1:45:01.430,1:45:04.100 You can just hold your[br]phone or tablet or whatever, 1:45:04.100,1:45:05.840 and just tap. 1:45:05.840,1:45:07.040 Person, not a person. 1:45:07.040,1:45:08.900 Person, not a person. 1:45:08.900,1:45:12.500 The amazing thing is that just[br]tapping person has actually 1:45:12.500,1:45:15.830 made an edit to Wikidata[br]on my behalf, which 1:45:15.830,1:45:21.560 I can find out, like every[br]wiki, by clicking contributions. 1:45:21.560,1:45:24.200 And as you can see in addition[br]to the stuff about circus 1:45:24.200,1:45:28.340 Bulgaria, my latest edit is in[br]fact about this Pedro de Ovideo 1:45:28.340,1:45:30.130 Falconi person. 1:45:30.130,1:45:32.000 And the edit was, you can-- 1:45:32.000,1:45:38.030 I hope you can see this, created[br]the claim instance of human. 1:45:38.030,1:45:39.110 So I added-- 1:45:39.110,1:45:43.100 I mean Wikidata game[br]added for me the statement 1:45:43.100,1:45:44.180 instance of human. 1:45:44.180,1:45:47.780 Now, the awesome thing is[br]that it was super easy to do. 1:45:47.780,1:45:51.890 I didn't have to go into that[br]entity, click the Add button, 1:45:51.890,1:45:57.080 choose the instance of property,[br]choose human, hit Save. 1:45:57.080,1:45:59.210 Instead of all these[br]operations I just 1:45:59.210,1:46:04.250 tapped on my screen,[br]person, not a person. 1:46:04.250,1:46:10.280 And I can do hundreds of[br]edits during my daily commute. 1:46:10.280,1:46:12.410 There are other games,[br]like the gender game. 1:46:12.410,1:46:14.810 So this is about-- 1:46:14.810,1:46:17.240 this is when Wikidata[br]already knows 1:46:17.240,1:46:19.760 that this item is a[br]person, but it doesn't 1:46:19.760,1:46:21.710 know the gender of this person. 1:46:21.710,1:46:25.340 Which is another one of[br]the more basic items. 1:46:25.340,1:46:27.770 And this is taking a long[br]time because of the language 1:46:27.770,1:46:29.870 limitations that I set on it. 1:46:29.870,1:46:32.660 I guess the less exotic[br]languages have already 1:46:32.660,1:46:35.130 been exhausted in the game. 1:46:35.130,1:46:36.880 We don't have to[br]wait all this time. 1:46:40.280,1:46:44.970 We can try something else. 1:46:44.970,1:46:45.950 How about occupation? 1:46:45.950,1:46:46.850 The occupation game. 1:46:46.850,1:46:49.400 Here we go, this is in Russian. 1:46:49.400,1:46:55.540 And what is the occupation[br]of this gentleman? 1:46:55.540,1:46:58.630 Well he is an [INAUDIBLE]. 1:46:58.630,1:47:00.700 He's a church person. 1:47:00.700,1:47:04.300 However, so the[br]occupation game is 1:47:04.300,1:47:06.490 where Wikidata game[br]will automatically 1:47:06.490,1:47:10.990 pull likely occupations[br]from the article text 1:47:10.990,1:47:13.810 and ask for confirmation. 1:47:13.810,1:47:16.840 So if he-- if this person[br]really is a deacon, 1:47:16.840,1:47:17.770 I should click that. 1:47:17.770,1:47:19.990 But I'm not sure. 1:47:19.990,1:47:24.950 I'm not clear on the Russian[br]church's distinctions between-- 1:47:24.950,1:47:26.620 I mean [INAUDIBLE][br]is pretty senior, 1:47:26.620,1:47:28.690 but I don't know if that[br]automatically also means 1:47:28.690,1:47:30.100 he's a deacon or not. 1:47:30.100,1:47:32.720 And [INAUDIBLE] is[br]not listed here. 1:47:32.720,1:47:36.380 So I will click not listed. 1:47:36.380,1:47:39.540 Also, these guesses[br]are not always correct. 1:47:39.540,1:47:42.680 So, this guy for[br]example, is in Russian. 1:47:42.680,1:47:43.430 I can read this. 1:47:43.430,1:47:44.470 He's a philologist. 1:47:44.470,1:47:45.380 He's a linguist. 1:47:45.380,1:47:48.510 So I can confirm it[br]and click linguist. 1:47:48.510,1:47:49.010 All right? 1:47:49.010,1:47:51.950 And again, if we look[br]at my contributions 1:47:51.950,1:47:55.700 we can see the Wikidata[br]game on my behalf 1:47:55.700,1:47:59.930 created occupation linguist. 1:47:59.930,1:48:02.450 OK. 1:48:02.450,1:48:04.370 Just by typing linguist there. 1:48:04.370,1:48:07.040 Now if it's taken[br]from the article, 1:48:07.040,1:48:09.860 why would it ever be wrong? 1:48:09.860,1:48:15.970 Well Jesus was the[br]son of a carpenter. 1:48:15.970,1:48:18.870 The word carpenter[br]appears in the text. 1:48:18.870,1:48:22.840 That doesn't mean it's correct[br]to say Jesus was a carpenter. 1:48:22.840,1:48:23.340 OK? 1:48:23.340,1:48:24.660 Just a trivial example, right? 1:48:24.660,1:48:30.250 So many, many articles will say,[br]you know, born to a physician. 1:48:30.250,1:48:32.850 And so the word physician[br]could be guessed, 1:48:32.850,1:48:36.030 but it wouldn't be correct[br]unless the son is also 1:48:36.030,1:48:38.090 a physician. 1:48:38.090,1:48:43.540 So I hope it gives[br]you the gist of it. 1:48:43.540,1:48:47.500 There is also a[br]distributed Wikidata game, 1:48:47.500,1:48:48.774 which is pretty awesome. 1:48:51.450,1:48:54.320 Here we go, which[br]has additional games. 1:48:54.320,1:49:02.610 So, for example, the[br]key on game gives you, 1:49:02.610,1:49:06.940 maybe it gives you,[br]some items to play with. 1:49:16.610,1:49:17.110 Yes? 1:49:17.110,1:49:17.610 No? 1:49:17.610,1:49:18.430 OK. 1:49:18.430,1:49:20.830 So it gives you[br]this little card, 1:49:20.830,1:49:27.940 and asks you to confirm is this[br]instance of human settlement? 1:49:27.940,1:49:30.480 That is, is it a village,[br]town, city, whatever. 1:49:30.480,1:49:33.310 Is it a kind of human[br]settlement or not? 1:49:33.310,1:49:34.340 Or maybe it's a book. 1:49:34.340,1:49:35.540 Maybe it's a poem. 1:49:35.540,1:49:38.980 Again, so, is it an[br]English settlement? 1:49:38.980,1:49:41.500 And you can click the languages[br]here to see the information. 1:49:41.500,1:49:43.270 So I can click English. 1:49:43.270,1:49:44.572 And indeed the article-- 1:49:44.572,1:49:46.030 I mean the actual[br]Wikipedia article 1:49:46.030,1:49:49.360 says Camigji is a[br]town and territory 1:49:49.360,1:49:51.370 in this district in the Congo. 1:49:51.370,1:49:54.640 So yes, this is an instance[br]of human settlement. 1:49:54.640,1:49:57.580 So I clicked yes. 1:49:57.580,1:50:00.460 And just clicking yes[br]again went to that item, 1:50:00.460,1:50:02.740 and added property[br]of human settlement. 1:50:02.740,1:50:05.560 Now the point of[br]all these games is 1:50:05.560,1:50:08.140 these are tools,[br]written by programmers, 1:50:08.140,1:50:12.490 making kind of semi educated[br]guesses about these fairly 1:50:12.490,1:50:14.120 basic properties. 1:50:14.120,1:50:17.770 And they are meant to[br]semi automate, to assist, 1:50:17.770,1:50:23.730 in the accumulation of all[br]these important pieces of data. 1:50:23.730,1:50:26.640 Now every single[br]click here helps 1:50:26.640,1:50:31.000 Wikidata give better[br]results, richer results 1:50:31.000,1:50:32.380 in future queries. 1:50:32.380,1:50:38.130 Again, as of right now[br]Wikidata can include Camigji 1:50:38.130,1:50:42.690 if I ask it, you know, what[br]are some towns in Congo? 1:50:42.690,1:50:44.220 Until now it could not. 1:50:44.220,1:50:46.830 Because it literally[br]didn't know. 1:50:46.830,1:50:51.950 So every time we click male,[br]female, person, not a person, 1:50:51.950,1:50:56.640 make these decisions,[br]we help improve Wikidata 1:50:56.640,1:51:01.560 and enrich the results[br]that we could receive. 1:51:01.560,1:51:04.590 Any questions about this, about[br]kind of micro contributions 1:51:04.590,1:51:07.010 through the Wikidata game? 1:51:07.010,1:51:09.890 If that looks[br]appealing I encourage 1:51:09.890,1:51:12.860 you to go and visit[br]the Wikidata game 1:51:12.860,1:51:15.205 and start contributing[br]in that way. 1:51:19.580,1:51:21.650 There is a question here. 1:51:21.650,1:51:24.650 If I make an article about[br]Circus Bulgaria how should 1:51:24.650,1:51:26.630 I correctly connect them? 1:51:26.630,1:51:28.740 That is an excellent question. 1:51:28.740,1:51:33.090 So once-- so now there is a[br]Wikidata item about that book, 1:51:33.090,1:51:37.650 but there is no Wikipedia[br]article anywhere. 1:51:37.650,1:51:41.460 Now suppose I write one[br]in, Bulgarian maybe, 1:51:41.460,1:51:42.870 you go to Wikidata. 1:51:42.870,1:51:45.180 You find the item by searching. 1:51:45.180,1:51:49.170 You find the item, and then[br]the empty site links section 1:51:49.170,1:51:50.850 right at the bottom there-- 1:51:50.850,1:51:52.020 where are we? 1:51:52.020,1:51:53.100 We have this? 1:51:53.100,1:51:55.050 Circus Bulgaria. 1:51:55.050,1:51:56.010 Let's demonstrate this. 1:51:56.010,1:51:58.000 So here is the item[br]about the book. 1:51:58.000,1:52:01.030 Let's say that now[br]there is an article 1:52:01.030,1:52:03.670 because I just created it. 1:52:03.670,1:52:07.450 I can go here to the empty[br]Wikipedia link section, 1:52:07.450,1:52:11.760 click Edit, type the[br]name of the wiki, 1:52:11.760,1:52:16.430 let's say English, and then[br]type the name of the page 1:52:16.430,1:52:18.230 that I just created. 1:52:18.230,1:52:20.790 Circus-- right? 1:52:20.790,1:52:23.400 And again, it offers[br]me auto-complete 1:52:23.400,1:52:25.080 for my convenience. 1:52:25.080,1:52:28.260 Now we don't actually[br]have the article created, 1:52:28.260,1:52:30.480 but I could let's just[br]say this was the article. 1:52:30.480,1:52:33.330 I can just click this,[br]hit Save, and that 1:52:33.330,1:52:36.450 would associate the[br]new Wikipedia article 1:52:36.450,1:52:38.130 with this Wikidata item. 1:52:38.130,1:52:41.940 That is the beginning of the[br]inter-wiki list for this item. 1:52:41.940,1:52:43.620 I will not click[br]Save Now, because we 1:52:43.620,1:52:45.289 didn't have the article yet. 1:52:45.289,1:52:46.830 So I hope that[br]answers that question. 1:52:46.830,1:52:50.340 Was there another question[br]that I missed here? 1:52:50.340,1:52:51.450 No. 1:52:51.450,1:52:53.170 OK. 1:52:53.170,1:52:55.300 Any questions about[br]the Wikidata game? 1:52:55.300,1:53:00.740 About this idea of[br]micro contributions? 1:53:00.740,1:53:05.330 If not then we can move[br]on to embedding data, 1:53:05.330,1:53:07.490 and after that we[br]can discuss queries, 1:53:07.490,1:53:12.000 how to get at all this[br]data from Wikidata. 1:53:12.000,1:53:16.500 So the short version of how[br]to embed data from Wikidata 1:53:16.500,1:53:19.920 is that there is this[br]little magic incantation. 1:53:19.920,1:53:25.410 Curly brace, curly brace,[br]hash mark, property. 1:53:25.410,1:53:29.820 It looks like a template, but[br]it isn't because of that hash. 1:53:29.820,1:53:31.320 And that is magic. 1:53:31.320,1:53:34.170 Take a look at this little[br]demo that I prepared. 1:53:34.170,1:53:37.950 This page, which is off[br]my user page on meta, 1:53:37.950,1:53:40.110 but it could be on any wiki. 1:53:40.110,1:53:42.490 OK. 1:53:42.490,1:53:49.420 Says, since San Francisco[br]is item Q62 in Wikidata, 1:53:49.420,1:53:55.240 and since population is[br]property P1082, I can tell you 1:53:55.240,1:53:58.840 that according to Wikidata the[br]population of San Francisco 1:53:58.840,1:54:02.180 is this. 1:54:02.180,1:54:08.420 And this bolded number here was[br]produced with this incantation. 1:54:08.420,1:54:14.420 Curly brace, curly brace,[br]hash mark, property P1082, 1:54:14.420,1:54:18.751 that's population,[br]type from what item? 1:54:18.751,1:54:19.250 Right? 1:54:19.250,1:54:21.650 Cause I'm pulling[br]an arbitrary number. 1:54:21.650,1:54:23.570 I could put any[br]property in any item 1:54:23.570,1:54:27.020 here, and kind of include[br]it, embedded, into my text. 1:54:27.020,1:54:29.630 This isn't even about-- you[br]notice this is my user page. 1:54:29.630,1:54:32.480 This isn't even the article[br]about San Francisco. 1:54:32.480,1:54:35.210 I just want to pull that[br]number into this thing 1:54:35.210,1:54:36.410 that I'm writing. 1:54:36.410,1:54:38.820 So it's fairly simple. 1:54:38.820,1:54:40.970 I identify the property. 1:54:40.970,1:54:43.440 I identify the item[br]to take it from. 1:54:43.440,1:54:47.120 And Wikidata will,[br]I mean Wikipedia, 1:54:47.120,1:54:50.480 or the wiki I'm on, in this[br]case meta, will go to Wikipedia 1:54:50.480,1:54:52.820 and fetch it for me. 1:54:52.820,1:54:56.480 Likewise, since Denny Vrandecic,[br]the designer of Wikidata 1:54:56.480,1:55:01.370 is item 18618629, right? 1:55:01.370,1:55:04.790 I mean, he's a notable person,[br]so he has a Wikidata entity. 1:55:04.790,1:55:09.160 And since occupation is property[br]106, and date of birth is 569, 1:55:09.160,1:55:12.290 and place of birth[br]is 19, because 1:55:12.290,1:55:14.720 of all that I can tell you[br]that Vrandecic was born 1:55:14.720,1:55:19.130 in Stuttgart, on this date,[br]and is researcher, programmer, 1:55:19.130,1:55:20.850 and computer scientist. 1:55:20.850,1:55:25.010 If you look at the source for[br]this page, click Edit Source, 1:55:25.010,1:55:28.700 you can see that the word[br]Stuttgart does not appear here, 1:55:28.700,1:55:30.530 because it came from Wikidata. 1:55:30.530,1:55:34.171 I did not write this into[br]my little demo page here. 1:55:34.171,1:55:34.670 See? 1:55:34.670,1:55:37.380 Place of birth is-- 1:55:37.380,1:55:37.880 where is it? 1:55:37.880,1:55:38.380 Here. 1:55:38.380,1:55:43.790 Born in property 19 from[br]queue number so-and-so. 1:55:43.790,1:55:46.970 That is how easy[br]it is to pull stuff 1:55:46.970,1:55:51.890 into a wiki from Wikidata. 1:55:51.890,1:55:55.280 OK now there's[br]some nuance to it. 1:55:55.280,1:55:57.470 And there's there are[br]some additional parameters 1:55:57.470,1:55:58.130 you can give. 1:55:58.130,1:56:00.230 And you can ask[br]Wikidata to give you 1:56:00.230,1:56:03.635 not just the text of the values,[br]but actually make it links. 1:56:06.750,1:56:14.825 So, for example, if I change[br]this from property to values-- 1:56:25.950,1:56:29.142 No, that did not work at all. 1:56:29.142,1:56:29.850 Wasn't it values? 1:56:29.850,1:56:30.350 What was it? 1:56:33.370,1:56:34.614 Values and then-- 1:57:19.265,1:57:19.890 Oh, statements. 1:57:19.890,1:57:20.710 My bad, sorry. 1:57:20.710,1:57:22.980 The Magic word is statements. 1:57:22.980,1:57:24.010 Statements. 1:57:24.010,1:57:28.680 So going back here. 1:57:28.680,1:57:35.385 If I change the word property[br]to the word statements 1:57:35.385,1:57:40.890 here then this same value-- 1:57:40.890,1:57:43.300 that did not work at all. 1:57:43.300,1:57:46.690 Oh, because I'm on meta. 1:57:46.690,1:57:48.670 So because I'm on[br]meta, meta doesn't 1:57:48.670,1:57:52.230 have an article named[br]researcher, programmer, 1:57:52.230,1:57:53.500 or computer scientist. 1:57:53.500,1:57:55.120 But Wikipedia does. 1:57:55.120,1:58:00.210 If I included this same[br]syntax in Wikipedia, 1:58:00.210,1:58:02.950 like English Wikipedia,[br]for example-- 1:58:02.950,1:58:04.855 So let's go there right now. 1:58:11.240,1:58:13.480 And go-- go to my-- 1:58:18.550,1:58:19.345 Go to my sandbox. 1:58:23.090,1:58:27.982 If I just brutally paste[br]this on my sandbox here-- 1:58:32.690,1:58:35.810 So, see, these became links. 1:58:35.810,1:58:39.740 Because Wikipedia has an article[br]called programmer and computer 1:58:39.740,1:58:40.910 scientist. 1:58:40.910,1:58:43.460 So, like I said, there's[br]some additional nuance 1:58:43.460,1:58:44.840 to the embedding. 1:58:44.840,1:58:47.030 The important thing[br]is that this is 1:58:47.030,1:58:51.470 the key to delivering on that[br]first problem that I mentioned. 1:58:51.470,1:58:55.970 How to get data from[br]a central location 1:58:55.970,1:58:58.850 onto your wiki in your language. 1:58:58.850,1:59:04.460 Basically using property and[br]statements magic incantations. 1:59:04.460,1:59:07.100 And of course,[br]usually, this would be 1:59:07.100,1:59:10.010 in the context of an info box. 1:59:10.010,1:59:14.180 Some wikis-- English Wikipedia[br]is not leading the way there. 1:59:14.180,1:59:16.490 Some smaller wikis[br]are more advanced 1:59:16.490,1:59:22.070 actually in integrating[br]Wikidata embeddings like this 1:59:22.070,1:59:24.620 into their info boxes. 1:59:24.620,1:59:26.300 So that instead of[br]the info box just 1:59:26.300,1:59:30.620 being a template on the wiki[br]with field equals value, 1:59:30.620,1:59:31.685 field equals value. 1:59:31.685,1:59:35.700 That template of the[br]info box on the wiki 1:59:35.700,1:59:40.160 pulls the values, the birthdate,[br]the languages, et cetera, 1:59:40.160,1:59:44.210 pulls them from Wikidata. 1:59:44.210,1:59:49.820 So basically just-- I just[br]demonstrated single calls 1:59:49.820,1:59:52.550 to this, but of course[br]an info box template 1:59:52.550,1:59:56.270 would include maybe[br]20 or 40 such embeds, 1:59:56.270,1:59:57.710 and that is not a problem. 1:59:57.710,2:00:01.460 Of course, before you go and[br]edit the English Wikipedia's 2:00:01.460,2:00:06.050 info box person and replace[br]it all with Wikidata embeds, 2:00:06.050,2:00:09.050 you should discuss it with the[br]English Wikipedia community. 2:00:09.050,2:00:12.000 These discussions have[br]already been taking place. 2:00:12.000,2:00:13.640 There are some[br]concerns about how 2:00:13.640,2:00:17.150 to patrol this, how to keep[br]it newbie friendly, et cetera. 2:00:17.150,2:00:20.690 So there are legitimate concerns[br]with just moving everything 2:00:20.690,2:00:22.910 to be embedded from Wikidata. 2:00:22.910,2:00:26.450 But the communities are[br]gradually handling this. 2:00:26.450,2:00:29.390 I mean this ability to embed[br]from Wikidata is not very old. 2:00:29.390,2:00:31.550 It's been around[br]for about a year. 2:00:31.550,2:00:35.150 So communities are[br]still working on kind 2:00:35.150,2:00:37.560 of integrating that technology. 2:00:37.560,2:00:40.190 But that is that is kind[br]of just the basics of how 2:00:40.190,2:00:44.210 to pull data, individual bits[br]of data, that's not querying, 2:00:44.210,2:00:47.330 that's not asking those sweeping[br]questions that I was talking 2:00:47.330,2:00:48.850 about yet. 2:00:48.850,2:00:50.720 We'll get to that[br]right now this is 2:00:50.720,2:00:55.310 how to pull a specific datum,[br]a specific piece of data, 2:00:55.310,2:00:57.395 from Wikidata. 2:01:01.530,2:01:02.530 OK. 2:01:02.530,2:01:07.080 So here's another quick[br]thing to demonstrate 2:01:07.080,2:01:09.880 before we go to[br]queries, and that 2:01:09.880,2:01:12.010 is the article placeholder. 2:01:12.010,2:01:15.010 The article placeholder[br]is a feature 2:01:15.010,2:01:19.660 that is being tested on the[br]Esperanto Wikipedia, and maybe 2:01:19.660,2:01:22.180 another wiki, I don't remember. 2:01:22.180,2:01:28.490 And it is using the[br]potential of Wikidata 2:01:28.490,2:01:32.690 to offer a placeholder[br]for an article. 2:01:32.690,2:01:37.940 An automatically generated[br]Wikidata powered replacement 2:01:37.940,2:01:41.720 placeholder for an article[br]for articles that don't yet 2:01:41.720,2:01:45.950 exist on Esperanto. 2:01:45.950,2:01:50.440 So let's go to the[br]Esperanto Wikipedia. 2:01:50.440,2:01:52.440 I don't speak Esperanto. 2:01:52.440,2:01:56.760 But let's look for Helen[br]Dewitt, our friend, 2:01:56.760,2:01:58.170 in Esperanto Wikipedia. 2:01:58.170,2:02:00.270 Now Esperanto is not[br]one of the Wikipedias 2:02:00.270,2:02:03.060 that have an article[br]about Helen Dewitt. 2:02:03.060,2:02:04.890 And so it tells me that, right? 2:02:04.890,2:02:06.570 There is no Helen Dewitt. 2:02:06.570,2:02:08.670 Maybe you were looking[br]for Helena Dewitt. 2:02:08.670,2:02:10.200 No, I was not. 2:02:10.200,2:02:13.650 You can start an article[br]about Helen Dewitt. 2:02:13.650,2:02:15.390 You can search. 2:02:15.390,2:02:17.820 You know, there's[br]all this stuff. 2:02:17.820,2:02:24.180 But there is also this[br]little option here, hiding, 2:02:24.180,2:02:30.640 which tells me that the[br]Esperanto Wikipedia is-- 2:02:30.640,2:02:31.580 what's happening here? 2:02:35.140,2:02:35.890 Yes. 2:02:35.890,2:02:40.520 The Esperanto Wikipedia is[br]ready to give me this page. 2:02:40.520,2:02:44.020 This page, as you can see, it's[br]on the Esperanto Wikipedia, 2:02:44.020,2:02:46.090 but it's not an article. 2:02:46.090,2:02:47.480 See, it's a special page. 2:02:47.480,2:02:49.700 It's machine generated. 2:02:49.700,2:02:52.150 You can see the URL as well. 2:02:52.150,2:02:54.410 It's not, you know,[br]slash Helen Dewitt. 2:02:54.410,2:02:58.450 It's slash specialio,[br]about topic, 2:02:58.450,2:03:01.570 and then the Wikidata[br]ID of Helen Dewitt. 2:03:01.570,2:03:03.760 And what I get here-- 2:03:03.760,2:03:05.860 I get an English[br]description, by the way, 2:03:05.860,2:03:08.300 because there is no[br]Esperanto description. 2:03:08.300,2:03:10.420 Wikidata can't make it up. 2:03:10.420,2:03:13.600 But what it can do is[br]offer me these pieces 2:03:13.600,2:03:16.960 of data in my language,[br]in this case Esperanto. 2:03:16.960,2:03:18.921 I'm on the Esperanto Wikipedia. 2:03:18.921,2:03:19.420 OK. 2:03:19.420,2:03:23.380 So it tells me that she's[br]American, for example, 2:03:23.380,2:03:26.090 and it tells me[br]that in Esperanto. 2:03:26.090,2:03:29.350 OK and it tells me[br]that she speaks Latin. 2:03:29.350,2:03:32.410 Remember we taught[br]Wikidata that? 2:03:32.410,2:03:35.800 It tells me that she[br]was educated in Oxford, 2:03:35.800,2:03:38.050 you know, and gives me the[br]references to the extent 2:03:38.050,2:03:39.130 that they exist. 2:03:39.130,2:03:41.560 I mean this is not an article. 2:03:41.560,2:03:46.650 It's not, you know, paragraphs[br]of fluent Esperanto text. 2:03:46.650,2:03:50.190 But it is information[br]that I can understand 2:03:50.190,2:03:51.960 if I speak this language. 2:03:51.960,2:03:55.380 And it's better than nothing. 2:03:55.380,2:04:00.120 And remember Helen Dewitt was[br]not a very detailed article. 2:04:00.120,2:04:03.690 If I were to ask about, I[br]don't know, some politician, 2:04:03.690,2:04:08.340 or popular singer that[br]has more data in Wikidata, 2:04:08.340,2:04:12.690 than this machine generated[br]thing would have been richer. 2:04:12.690,2:04:16.320 So this feature is available[br]and is under beta testing 2:04:16.320,2:04:19.530 right now, but generally if[br]this sounds interesting for you 2:04:19.530,2:04:21.600 especially if you come[br]from a smaller wiki that 2:04:21.600,2:04:25.230 is missing a lot of articles[br]that people may want to learn 2:04:25.230,2:04:28.320 about, you can contact[br]the Wikimedia foundation 2:04:28.320,2:04:33.486 and ask for article placeholder[br]to be enabled on your wiki. 2:04:33.486,2:04:34.860 And again, this[br]is a placeholder. 2:04:34.860,2:04:37.890 Of course, it exists only[br]until someone actually 2:04:37.890,2:04:43.290 writes a proper Esperanto[br]article about Helen Dewitt. 2:04:43.290,2:04:45.060 So I hope this is clear. 2:04:45.060,2:04:50.810 This is all coming from[br]Wikidata on the fly. 2:04:50.810,2:04:51.470 In real time. 2:04:51.470,2:04:57.500 As you can see it includes my[br]latest edits to Helen Dewitt. 2:04:57.500,2:04:58.940 OK. 2:04:58.940,2:05:05.250 Questions about the-- questions[br]about the article placeholder? 2:05:05.250,2:05:09.580 If there are try and[br]put them on the channel. 2:05:09.580,2:05:13.300 And this brings us to one of[br]the main courses of this talk, 2:05:13.300,2:05:15.270 which is querying Wikidata. 2:05:15.270,2:05:18.660 So I've explained[br]how Wikidata works. 2:05:18.660,2:05:19.680 We've walked through it. 2:05:19.680,2:05:20.850 We've added to it. 2:05:20.850,2:05:22.800 We've created a new item. 2:05:22.800,2:05:26.360 We learned how to contribute[br]during our commutes. 2:05:26.360,2:05:30.150 And all this was you[br]kept promising us, 2:05:30.150,2:05:32.050 Asaf, that this would be-- 2:05:32.050,2:05:34.690 this would enable[br]these amazing queries. 2:05:34.690,2:05:37.960 So time to make good on that. 2:05:37.960,2:05:42.880 The URL you need to remember[br]is query.wikidata.org. 2:05:42.880,2:05:49.390 And that will take you[br]to a query system that 2:05:49.390,2:05:52.510 uses a language called SPARQL. 2:05:52.510,2:05:58.150 SPARQL, spelt with[br]a Q. This language 2:05:58.150,2:06:01.690 is not a Wikimedia creation. 2:06:01.690,2:06:06.010 It's a standardized language[br]used for querying linked data 2:06:06.010,2:06:07.540 sources. 2:06:07.540,2:06:10.720 And because of that[br]there are there 2:06:10.720,2:06:14.590 are certain usability prices[br]that we pay for using SPARQL, 2:06:14.590,2:06:16.010 for using a standard language. 2:06:16.010,2:06:19.570 It's not completely custom[br]made for querying Wikidata, 2:06:19.570,2:06:21.740 and we'll see that[br]in just a moment. 2:06:21.740,2:06:23.530 The principle to[br]remember about Wikidata 2:06:23.530,2:06:27.880 query is that Wikidata will[br]tell you everything it knows, 2:06:27.880,2:06:29.470 but no more. 2:06:29.470,2:06:32.440 I have anticipated this[br]several times already, right? 2:06:32.440,2:06:35.980 Until this moment when[br]we taught Wikidata data 2:06:35.980,2:06:38.590 that Helen Dewitt[br]speaks Latin, she 2:06:38.590,2:06:41.500 would not have appeared[br]in query results 2:06:41.500,2:06:45.974 asking who are American[br]writers who speak Latin? 2:06:45.974,2:06:47.140 She would not have appeared. 2:06:47.140,2:06:49.090 But as of this[br]afternoon, she will 2:06:49.090,2:06:52.950 appear because I've added[br]that piece of information. 2:06:52.950,2:07:01.380 So a result of that principle[br]is that you can never say, 2:07:01.380,2:07:05.950 well I ran a Wikidata[br]query and this 2:07:05.950,2:07:11.510 is the list of Flemish painters[br]who are sons of painters. 2:07:11.510,2:07:12.310 The list. 2:07:12.310,2:07:14.110 That these are all[br]the Flemish painters 2:07:14.110,2:07:15.220 who are sons of painters. 2:07:15.220,2:07:19.390 That is never something you can[br]say based on a Wikidata query, 2:07:19.390,2:07:22.390 because of course, maybe[br]not all the Flemish painters 2:07:22.390,2:07:26.020 who are sons of painters have[br]been expressed in Wikidata data 2:07:26.020,2:07:26.760 yet. 2:07:26.760,2:07:28.840 Wikidata doesn't know[br]about some of them, 2:07:28.840,2:07:30.340 or maybe it knows[br]about all of them 2:07:30.340,2:07:32.500 but doesn't know[br]the important fact 2:07:32.500,2:07:35.200 that this person is[br]the son of that person, 2:07:35.200,2:07:38.740 because those properties[br]have not been added. 2:07:38.740,2:07:40.940 And so they cannot be[br]included in the results. 2:07:40.940,2:07:42.550 So the results of[br]a Wikidata query 2:07:42.550,2:07:46.870 are never the definitive sets. 2:07:46.870,2:07:49.600 What you can say about[br]a Wikidata query is here 2:07:49.600,2:07:52.840 are some Flemish painters[br]who are sons of painters. 2:07:52.840,2:07:56.260 Here are some cities[br]with female mayors. 2:07:56.260,2:07:58.270 Whatever it is[br]you're querying about 2:07:58.270,2:08:01.030 is never guaranteed[br]to be complete 2:08:01.030,2:08:03.580 because Wikidata,[br]like Wikipedia, is 2:08:03.580,2:08:05.530 a work in progress. 2:08:05.530,2:08:13.240 And of course, the more[br]we teach Wikidata the 2:08:13.240,2:08:16.240 more useful it becomes. 2:08:16.240,2:08:22.520 OK so lets go and[br]see those queries. 2:08:22.520,2:08:25.990 So this is query.wikidata.org. 2:08:25.990,2:08:29.000 It's not the wiki. 2:08:29.000,2:08:29.500 All right? 2:08:29.500,2:08:32.530 So this isn't like some[br]page on the wiki itself. 2:08:32.530,2:08:35.099 This is kind of an[br]external system. 2:08:35.099,2:08:35.890 So it's not a wiki. 2:08:35.890,2:08:37.960 You can see I don't[br]have a user page here. 2:08:37.960,2:08:39.520 I don't have a history tab. 2:08:39.520,2:08:40.960 This isn't a wiki page. 2:08:40.960,2:08:44.560 This is a special kind[br]of tool or system. 2:08:44.560,2:08:51.330 And it invites me to[br]input a SPARQL query. 2:08:51.330,2:08:55.060 Now most of us do[br]not speak SPARQL. 2:08:55.060,2:08:59.800 It's a a technical language. 2:08:59.800,2:09:01.720 It's a query language. 2:09:01.720,2:09:06.760 Some of you may be thinking[br]about SQL, the database query 2:09:06.760,2:09:08.500 language. 2:09:08.500,2:09:13.330 SPARQL is named with kind[br]of a wink, or a nod, to SQL. 2:09:13.330,2:09:17.440 But, I warn you, if[br]you are comfortable in 2:09:17.440,2:09:22.750 SQL don't expect to carry[br]over your knowledge of SQL 2:09:22.750,2:09:23.550 into SPARQL. 2:09:23.550,2:09:26.140 They're not the same. 2:09:26.140,2:09:27.940 They are superficially similar. 2:09:27.940,2:09:28.440 Right? 2:09:28.440,2:09:31.530 So they both use[br]the keyword select, 2:09:31.530,2:09:35.010 and they use the word where,[br]and they use things like limit, 2:09:35.010,2:09:35.770 and order. 2:09:35.770,2:09:38.190 So again, if you know[br]this already from SQL 2:09:38.190,2:09:40.500 those mean roughly[br]the same things, 2:09:40.500,2:09:44.550 but don't expect it to[br]behave just like SQL. 2:09:44.550,2:09:49.800 You do need to spend some time[br]understanding how SPARQL works. 2:09:49.800,2:09:52.560 So, by all means, I[br]invite you to go and read 2:09:52.560,2:09:55.680 one of the many fine[br]SPARQL tutorials that 2:09:55.680,2:09:59.590 are out there on the web, or[br]to click the Help button here, 2:09:59.590,2:10:03.930 which also includes[br]help about SPARQL. 2:10:03.930,2:10:08.440 But I also know[br]that most of us when 2:10:08.440,2:10:12.580 we want to do some advanced[br]formatting on wiki, 2:10:12.580,2:10:16.090 for example, we don't go[br]and read the help page 2:10:16.090,2:10:18.220 on templates, right? 2:10:18.220,2:10:21.460 We go to a page that already[br]does what we want to do, 2:10:21.460,2:10:27.430 and adopt and adapt the code[br]from that other page, right? 2:10:27.430,2:10:30.610 So we just take something that[br]does roughly what we want, 2:10:30.610,2:10:33.280 and just copy it over and[br]change what we need to change. 2:10:33.280,2:10:35.620 That is a very pragmatic[br]and reasonable way 2:10:35.620,2:10:37.420 to do things which is why-- 2:10:37.420,2:10:39.850 and the wiki data[br]engineers know this, 2:10:39.850,2:10:43.300 which is why they prepared[br]this very handy button for us 2:10:43.300,2:10:45.580 called examples. 2:10:45.580,2:10:47.710 We click the examples button. 2:10:47.710,2:10:52.390 And, oh my god, there is a ton[br]of-- well there's 312 example 2:10:52.390,2:10:55.582 queries for us to choose from. 2:10:55.582,2:10:57.040 And we can just[br]pick something that 2:10:57.040,2:11:00.310 is roughly like what[br]we're trying to find out, 2:11:00.310,2:11:02.740 and then just change[br]what needs changing. 2:11:02.740,2:11:05.410 So let's take a very simple one. 2:11:05.410,2:11:07.020 The cats query. 2:11:07.020,2:11:10.270 Maybe one of the simplest[br]you could possibly have. 2:11:10.270,2:11:13.510 And let's run it first[br]and then I'll kind of 2:11:13.510,2:11:16.420 walk you through it. 2:11:16.420,2:11:18.460 The goal here is not[br]to teach you SPARQL, 2:11:18.460,2:11:20.860 but to get you to be kind[br]of literate in SPARQL. 2:11:20.860,2:11:23.980 To kind of understand why[br]this does what it does. 2:11:23.980,2:11:25.730 So let's run this query first. 2:11:25.730,2:11:31.390 We click Run and here I[br]have results at the bottom. 2:11:31.390,2:11:34.060 The item, which is[br]just a Wikidata item, 2:11:34.060,2:11:35.290 which of course is a number. 2:11:35.290,2:11:38.860 Remember, wiki data thinks[br]of items as queue numbers. 2:11:38.860,2:11:40.900 And the label,[br]because we're humans 2:11:40.900,2:11:43.190 and we prefer words to numbers. 2:11:43.190,2:11:49.870 So these 114 results[br]are all the cats 2:11:49.870,2:11:53.310 that wiki data knows about. 2:11:53.310,2:11:55.380 Is this all the[br]cats in the world? 2:11:55.380,2:11:57.320 No of course not, remember? 2:11:57.320,2:11:59.730 It's all the cats Wikidata[br]knows about, which 2:11:59.730,2:12:01.410 means they're somehow notable. 2:12:01.410,2:12:05.130 I mean someone bothered to[br]describe them on Wikidata. 2:12:05.130,2:12:12.570 And Wikidata was told this[br]item is an instance of cat. 2:12:12.570,2:12:13.620 Right? 2:12:13.620,2:12:17.040 So these are those cats. 2:12:17.040,2:12:18.540 And we can click any of them. 2:12:18.540,2:12:20.190 I don't know,[br]Pixel, for example. 2:12:20.190,2:12:21.780 Click the Wikipedia item. 2:12:21.780,2:12:24.090 And here is the Wikidata[br]item about Pixel 2:12:24.090,2:12:25.860 with the queue number. 2:12:25.860,2:12:28.980 And he is a tortoiseshell cat. 2:12:28.980,2:12:32.640 And as you can see[br]instance of cat. 2:12:32.640,2:12:33.610 OK. 2:12:33.610,2:12:37.220 And he is five inches high. 2:12:37.220,2:12:41.780 And he is apparently documented[br]in Indonesian, In Bahasa. 2:12:41.780,2:12:45.080 Right here this is Pixel. 2:12:45.080,2:12:50.060 And he is apparently somehow[br]related to the Guinness World 2:12:50.060,2:12:52.160 Records book. 2:12:52.160,2:12:54.650 I don't speak Bahasa, so[br]I don't know exactly why 2:12:54.650,2:12:56.120 this cat is so notable. 2:12:56.120,2:12:58.889 But, of course, cats[br]can become notable 2:12:58.889,2:12:59.930 for all kinds of reasons. 2:12:59.930,2:13:02.204 Maybe they're a[br]YouTube sensation, 2:13:02.204,2:13:03.620 you know, maybe[br]they were involved 2:13:03.620,2:13:05.330 in some historical event. 2:13:05.330,2:13:09.410 I like this cat named Gladstone. 2:13:09.410,2:13:16.590 This cat named Gladstone is-- 2:13:16.590,2:13:19.950 he has position[br]held Chief Mouser 2:13:19.950,2:13:22.320 to Her Majesty's Treasury. 2:13:22.320,2:13:25.230 This is an official[br]cat with a job. 2:13:25.230,2:13:29.190 And he has been holding this[br]job, mind you, since the 28th 2:13:29.190,2:13:31.570 of June this past year. 2:13:31.570,2:13:32.970 That's the start time. 2:13:32.970,2:13:35.760 And there is no end time[br]which means he currently 2:13:35.760,2:13:38.850 holds the position[br]of Chief Mouser 2:13:38.850,2:13:40.470 to her Majesty's Treasury. 2:13:40.470,2:13:42.750 His employer is Her[br]Majesty's Treasury. 2:13:42.750,2:13:44.290 He's a male creature. 2:13:44.290,2:13:46.650 And Wikidata knows[br]that this cat is 2:13:46.650,2:13:53.127 named after William Gladstone,[br]the Victorian prime minister. 2:13:53.127,2:13:54.960 Of course if I don't[br]know who this person is 2:13:54.960,2:13:57.540 I can click through[br]and learn that he 2:13:57.540,2:14:01.860 was a liberal politician[br]and prime minister, right? 2:14:01.860,2:14:03.390 He even has a Twitter account. 2:14:03.390,2:14:05.910 And Wikidata sends[br]me right to it. 2:14:05.910,2:14:08.040 The treasury cat[br]Twitter account. 2:14:08.040,2:14:11.010 And he has articles in[br]German, and English, 2:14:11.010,2:14:15.520 and of course Japanese,[br]because he's a cat. 2:14:15.520,2:14:16.020 All right. 2:14:16.020,2:14:19.500 So this was a very simple query. 2:14:19.500,2:14:21.400 Let's find out why it works. 2:14:21.400,2:14:21.900 OK. 2:14:21.900,2:14:25.800 So what did we actually[br]tell Wikidata to do for us? 2:14:25.800,2:14:31.650 We said, please select[br]some items for us 2:14:31.650,2:14:33.580 along with their labels. 2:14:33.580,2:14:34.080 OK? 2:14:34.080,2:14:36.180 Along with their[br]human readable labels 2:14:36.180,2:14:42.010 because if I remove this[br]label what I get is, see, 2:14:42.010,2:14:44.200 just a list of item numbers. 2:14:44.200,2:14:45.280 That's not as fun. 2:14:45.280,2:14:46.930 So that's what this[br]little bit did. 2:14:46.930,2:14:49.630 I just said, give me the[br]items, but also they're 2:14:49.630,2:14:52.330 human readable label. 2:14:52.330,2:14:54.620 And I want you to[br]select a bunch of items, 2:14:54.620,2:14:56.770 but not just any[br]random bunch of items, 2:14:56.770,2:15:01.210 I want to select items where[br]a certain condition holds. 2:15:01.210,2:15:02.790 What is the condition? 2:15:02.790,2:15:06.430 The condition is that the[br]item that I want you to select 2:15:06.430,2:15:14.360 needs to have property[br]31 with a value of Q146. 2:15:14.360,2:15:15.670 Well, that's helpful. 2:15:15.670,2:15:18.070 If I hover over these numbers-- 2:15:18.070,2:15:19.750 Again, I get the human[br]readable version. 2:15:19.750,2:15:23.530 So I'm looking for[br]items that have property 2:15:23.530,2:15:28.841 instance of with the value cat. 2:15:28.841,2:15:29.340 Right? 2:15:29.340,2:15:31.173 Because that's literally[br]what I want, right? 2:15:31.173,2:15:33.960 I want all the items that have[br]a property, a statement, that 2:15:33.960,2:15:36.840 says instance of cat. 2:15:36.840,2:15:37.950 That's the condition. 2:15:37.950,2:15:41.640 I'm not interested in items[br]that are instance of book, 2:15:41.640,2:15:43.200 or instance of human. 2:15:43.200,2:15:46.290 I'm interested in[br]instance of cat. 2:15:46.290,2:15:51.090 That is the only condition[br]here in this query. 2:15:51.090,2:15:55.800 This complicated line I ask[br]you to basically ignore. 2:15:55.800,2:15:57.510 This is one of those[br]sacrifices that we 2:15:57.510,2:16:00.720 make for using a standard[br]language like SPARQL. 2:16:00.720,2:16:02.820 But the role of this[br]complicated line 2:16:02.820,2:16:04.920 is to basically[br]ensure that we get 2:16:04.920,2:16:07.860 the English label for that cat. 2:16:07.860,2:16:08.817 OK? 2:16:08.817,2:16:09.900 So don't worry about that. 2:16:09.900,2:16:11.550 Just leave it there. 2:16:11.550,2:16:13.320 And we run the query[br]and we get the list 2:16:13.320,2:16:17.330 of cats with their English[br]labels, and that is awesome. 2:16:17.330,2:16:21.510 By the way, if I change EN,[br]without really understanding 2:16:21.510,2:16:27.260 this line, if I change[br]EN to HE, for Hebrew, 2:16:27.260,2:16:30.160 I get the same results[br]with a Hebrew label. 2:16:30.160,2:16:33.670 Of course, these cats,[br]nobody bothered to give them 2:16:33.670,2:16:35.709 Hebrew labels unfortunately. 2:16:35.709,2:16:37.570 So I get the queue number. 2:16:37.570,2:16:42.874 But if I changed[br]it to Japanese, JA, 2:16:42.874,2:16:45.290 I would get still a bunch of[br]queue numbers for where there 2:16:45.290,2:16:47.389 isn't a Japanese label,[br]but I would get the labels 2:16:47.389,2:16:48.781 in Japanese. 2:16:48.781,2:16:49.280 OK? 2:16:49.280,2:16:51.260 So this is an example[br]of how you don't even 2:16:51.260,2:16:54.620 need to understand all[br]the syntax of this query 2:16:54.620,2:16:56.100 to adapt it to your needs. 2:16:56.100,2:16:58.070 If you want this[br]query as is, but you 2:16:58.070,2:17:00.320 want the labels in[br]Japanese, you can just 2:17:00.320,2:17:03.190 change the language code here. 2:17:03.190,2:17:06.559 OK so that is all[br]this query does. 2:17:06.559,2:17:08.870 Again, just give[br]me the items that 2:17:08.870,2:17:17.590 have property 31, instance of,[br]with a value 146, which is cat. 2:17:17.590,2:17:20.379 Let's take a question just[br]about this very simple query 2:17:20.379,2:17:25.809 before we advance to[br]more complicated queries. 2:17:25.809,2:17:29.200 Any questions just about this? 2:17:29.200,2:17:32.850 Like, did anyone kind of[br]really lose me talking 2:17:32.850,2:17:35.010 about this simple query? 2:17:35.010,2:17:39.389 Again, this query just tells[br]Wikidata, get me all the items 2:17:39.389,2:17:41.280 that somewhere among[br]their statements 2:17:41.280,2:17:44.219 have instance of cat. 2:17:44.219,2:17:46.670 That's the only condition. 2:17:46.670,2:17:47.740 No questions. 2:17:47.740,2:17:49.959 OK, feel free to ask if[br]you'd come up with one. 2:17:49.959,2:17:54.709 So let's complicate[br]things a little. 2:17:54.709,2:17:59.365 Let's ask only for male cats. 2:18:02.080,2:18:03.070 OK. 2:18:03.070,2:18:07.330 Remember this cat[br]Gladstone is male, 2:18:07.330,2:18:09.850 and we know this because[br]he has a property called 2:18:09.850,2:18:14.320 sex or gender, and the value[br]is male creature, right? 2:18:14.320,2:18:17.950 So let's add another[br]condition right here 2:18:17.950,2:18:19.860 under the first condition. 2:18:19.860,2:18:20.870 OK? 2:18:20.870,2:18:22.750 This is a new line. 2:18:22.750,2:18:24.940 And I'm adding a new[br]condition to the query. 2:18:24.940,2:18:30.520 I'm saying, not only do I[br]want this item that you return 2:18:30.520,2:18:35.469 to be instance of cat, I[br]also want this same item 2:18:35.469,2:18:39.280 to have another property,[br]the property sex or gender. 2:18:39.280,2:18:40.299 Right? 2:18:40.299,2:18:43.480 And I need to refer to[br]the property by number. 2:18:43.480,2:18:45.760 But don't worry,[br]Wikidata will help you. 2:18:45.760,2:18:49.500 So you start with this[br]prefix, Wikidata WDDT. 2:18:52.520,2:18:54.980 Again, just ignore[br]that prefix it's 2:18:54.980,2:18:58.940 one of the features of SPARQL[br]that we need to respect. 2:18:58.940,2:19:02.715 WDT colon, and then I can[br]just type control space 2:19:02.715,2:19:04.340 to do a search, to[br]do an auto complete. 2:19:04.340,2:19:08.090 So I can just type sex[br]and Wikidata helpfully 2:19:08.090,2:19:11.760 offers me a drop down[br]with relevant properties. 2:19:11.760,2:19:15.200 So I click property 21, which[br]is the sex or gender property. 2:19:15.200,2:19:17.629 And then I say, so I want[br]the sex or gender property 2:19:17.629,2:19:19.670 to have the Wikidata value. 2:19:19.670,2:19:21.799 Again, control space. 2:19:21.799,2:19:25.340 And I can just[br]say male creature. 2:19:25.340,2:19:25.850 See? 2:19:25.850,2:19:30.950 There's a different item[br]for male, as inhuman, 2:19:30.950,2:19:33.799 and a different one for[br]male creature, for reasons 2:19:33.799,2:19:34.910 that we won't go into. 2:19:34.910,2:19:36.535 Let's pick male[br]creature, because we're 2:19:36.535,2:19:38.040 talking about cats here. 2:19:38.040,2:19:38.540 All right. 2:19:38.540,2:19:42.080 And add a period here at[br]the end and click Run. 2:19:42.080,2:19:48.330 And instead of 114 cats, we get,[br]this time, we got 43 results. 2:19:48.330,2:19:53.360 Including our friend Gladstone[br]who is a male creature cat. 2:19:53.360,2:19:58.530 So that means all the[br]rest are female, right? 2:19:58.530,2:20:00.410 Wrong. 2:20:00.410,2:20:00.980 Wrong. 2:20:00.980,2:20:02.840 That does not mean that at all. 2:20:02.840,2:20:06.530 What it means is of[br]the 114 items that 2:20:06.530,2:20:11.960 have instance of cat,[br]only 43 have explicitly 2:20:11.960,2:20:14.690 sex male creature. 2:20:14.690,2:20:17.570 The rest of them do not. 2:20:17.570,2:20:21.800 Maybe because they have[br]sex female creature, 2:20:21.800,2:20:25.930 but maybe because they don't[br]have that property at all. 2:20:25.930,2:20:28.290 I'm emphasizing[br]this to kind of help 2:20:28.290,2:20:31.770 you train yourself to[br]correctly interpret 2:20:31.770,2:20:34.140 the results of[br]queries from Wikidata. 2:20:34.140,2:20:36.870 Don't jump into this kind[br]of simplistic conclusion, 2:20:36.870,2:20:41.820 OK there's 114 total, 43 male,[br]therefore the rest are female. 2:20:41.820,2:20:43.520 That is not correct. 2:20:43.520,2:20:45.030 OK? 2:20:45.030,2:20:49.740 But 43 of those explicitly[br]had another statement, sex 2:20:49.740,2:20:52.530 or gender, male creature. 2:20:52.530,2:20:55.020 So I just added[br]another condition, 2:20:55.020,2:20:58.290 and now my query is[br]asking two separate things 2:20:58.290,2:21:00.150 about the results. 2:21:00.150,2:21:04.472 They need to be a cat[br]and a male creature. 2:21:04.472,2:21:06.270 AUDIENCE: Maybe we[br]should see how many 2:21:06.270,2:21:08.100 cats have Twitter accounts. 2:21:08.100,2:21:11.440 But there is a[br]question from YouTube, 2:21:11.440,2:21:14.220 which is will you talk about[br]the export possibilities 2:21:14.220,2:21:17.280 of the result of the query? 2:21:17.280,2:21:18.420 ASAF BARTOV: Absolutely. 2:21:18.420,2:21:21.000 Absolutely I will in[br]just a little bit. 2:21:21.000,2:21:23.010 I mean there is, in[br]addition to just getting 2:21:23.010,2:21:28.350 this kind of table, I can get[br]these results in other formats. 2:21:28.350,2:21:30.360 And I can also[br]download these results. 2:21:30.360,2:21:32.820 I can click the Download[br]button and get them 2:21:32.820,2:21:35.070 as a comma separated[br]file, tab separated 2:21:35.070,2:21:38.910 file, a JSON file, which is[br]useful for programmatic uses. 2:21:38.910,2:21:40.590 I can also get a link. 2:21:40.590,2:21:42.330 So I can get a[br]link to this query. 2:21:42.330,2:21:45.990 I mean, I spent all this time[br]designing this beautiful query. 2:21:45.990,2:21:50.280 I can get a short URL that was[br]generated especially for me 2:21:50.280,2:21:52.170 right now with a tiny URL. 2:21:52.170,2:21:54.690 I can just paste this[br]into Twitter and go, 2:21:54.690,2:21:59.280 hey people look at all the male[br]cats that Wikidata knows about. 2:21:59.280,2:22:01.170 OK, this is not a[br]very exciting query. 2:22:01.170,2:22:03.900 But once I get to a really[br]complicated exciting query 2:22:03.900,2:22:07.650 I can totally share that[br]very easily through this. 2:22:07.650,2:22:09.750 And we will get to more[br]interesting queries 2:22:09.750,2:22:11.740 in just a second. 2:22:11.740,2:22:16.400 Any questions on this kind[br]of basic querying so far? 2:22:16.400,2:22:17.940 OK. 2:22:17.940,2:22:25.340 So that was a very[br]simple example. 2:22:25.340,2:22:30.250 Let's spend a moment exploring. 2:22:30.250,2:22:38.920 So this cat Gladstone was[br]named after this dude, William 2:22:38.920,2:22:43.550 Gladstone, who was an[br]important British politician. 2:22:43.550,2:22:45.760 I'm sure he's not the[br]only thing out there 2:22:45.760,2:22:48.970 in the universe that's named[br]after Gladstone, right? 2:22:48.970,2:22:52.120 I mean there has got[br]to be, I don't know, 2:22:52.120,2:22:54.790 park benches,[br]planets, asteroids, 2:22:54.790,2:22:59.590 something other than the[br]cat, named after this guy. 2:22:59.590,2:23:04.030 So we can ask Wikidata[br]to tell us all the things 2:23:04.030,2:23:06.850 that, you know, without[br]saying instance of something. 2:23:06.850,2:23:10.960 Like, I don't know, anything[br]named after William Gladstone. 2:23:10.960,2:23:12.760 So how do I do that? 2:23:12.760,2:23:15.310 Same principle. 2:23:15.310,2:23:19.850 Instead of asking about the[br]property instance of, property 2:23:19.850,2:23:25.360 31, instead of that, I[br]will ask about the property 2:23:25.360,2:23:26.860 named after-- 2:23:26.860,2:23:29.120 sorry, named after-- 2:23:29.120,2:23:30.830 I don't need to[br]remember the number. 2:23:30.830,2:23:32.240 I have auto-complete. 2:23:32.240,2:23:35.360 Named after is property 138. 2:23:35.360,2:23:37.430 And I want anything[br]at all that is 2:23:37.430,2:23:42.080 named after this person,[br]William Gladstone. 2:23:42.080,2:23:43.850 Here we go. 2:23:43.850,2:23:45.860 Which is 160852. 2:23:45.860,2:23:46.820 Whatever. 2:23:46.820,2:23:48.230 OK. 2:23:48.230,2:23:50.510 You notice I removed[br]instance of cat. 2:23:50.510,2:23:52.040 I remove the male creature. 2:23:52.040,2:23:55.130 I'm only asking,[br]get me all the items 2:23:55.130,2:23:58.940 that are somehow named after[br]that particular politician. 2:23:58.940,2:24:00.920 And I run the query,[br]and it turns out 2:24:00.920,2:24:05.007 the Wikidata knows[br]about three such things. 2:24:05.007,2:24:06.590 Does that mean that's[br]the only-- these 2:24:06.590,2:24:08.881 are the only three things[br]named after him in the world? 2:24:08.881,2:24:09.939 Of course not. 2:24:09.939,2:24:12.230 But these are the only three[br]items that are in Wikidata 2:24:12.230,2:24:17.720 and explicitly have the[br]property named after Gladstone. 2:24:17.720,2:24:20.150 For all I know, there[br]may be a village 2:24:20.150,2:24:23.600 in England called Gladstone[br]named after this person. 2:24:23.600,2:24:27.410 But if nobody added the[br]property, named after, linking 2:24:27.410,2:24:30.950 to the person, he wouldn't show[br]up in the results to my query. 2:24:30.950,2:24:33.750 So Wikidata knows about[br]three such things. 2:24:33.750,2:24:36.110 One of them is something[br]called the Gladstone Professor 2:24:36.110,2:24:37.360 of Government. 2:24:37.360,2:24:40.370 I can click through and see[br]that it's a chair at Oxford 2:24:40.370,2:24:41.180 University, right? 2:24:41.180,2:24:43.470 So it's a position. 2:24:43.470,2:24:49.520 And another is the William[br]Gladstone school number 18. 2:24:49.520,2:24:51.470 William Gladstone[br]school number 18. 2:24:51.470,2:24:52.900 Where is that? 2:24:52.900,2:24:55.380 That is in Sofia, Bulgaria. 2:24:55.380,2:24:56.470 Again. 2:24:56.470,2:24:59.000 All right, so that's a[br]particular school in Bulgaria 2:24:59.000,2:25:02.720 named after William Gladstone. 2:25:02.720,2:25:07.220 And finally, the third[br]result is, of course, our pal 2:25:07.220,2:25:09.800 Gladstone the Cheif Mouser. 2:25:09.800,2:25:12.674 If I click through,[br]that's the cat. 2:25:12.674,2:25:14.090 All right, so that[br]was an example. 2:25:14.090,2:25:15.700 I mean, you saw how easy it was. 2:25:15.700,2:25:18.980 I just named the property and[br]the value that I care about, 2:25:18.980,2:25:21.420 and I get the results. 2:25:21.420,2:25:23.289 Again, I mean, it's[br]kind of a silly example, 2:25:23.289,2:25:24.080 but think about it. 2:25:24.080,2:25:27.570 This is-- how else can[br]you answer that question? 2:25:27.570,2:25:30.470 There's no reference desk,[br]even at a great University 2:25:30.470,2:25:34.250 of Oxford, where you can[br]walk in and say, give me 2:25:34.250,2:25:37.470 a list of things[br]named after Gladstone. 2:25:37.470,2:25:40.590 There's no easy way to[br]answer that unless you happen 2:25:40.590,2:25:44.520 to have a very large[br]structured and linked 2:25:44.520,2:25:48.130 data store, like Wikidata. 2:25:48.130,2:25:50.560 All right, so that[br]was a silly example. 2:25:50.560,2:25:51.280 Let's take some-- 2:25:51.280,2:25:53.113 AUDIENCE: There's a[br]bunch of stuff on there. 2:25:53.113,2:25:54.446 ASAF: Oh, OK. 2:25:54.446,2:25:57.430 AUDIENCE: Can you show[br]easy query on the video? 2:25:57.430,2:26:02.260 And somebody needs to know[br]how to just do property 2:26:02.260,2:26:05.750 exists without giving[br]a specific value. 2:26:05.750,2:26:11.030 And then once you show easy[br]query you reload the page and-- 2:26:11.030,2:26:13.240 ASAF: I don't know easy query. 2:26:13.240,2:26:15.670 So is that a gadget? 2:26:15.670,2:26:17.110 I don't know what easy query is. 2:26:17.110,2:26:19.870 I don't use it. 2:26:19.870,2:26:24.760 So someone can maybe[br]send a link or something? 2:26:24.760,2:26:26.100 Oh it is a gadget. 2:26:26.100,2:26:27.100 I don't have it enabled. 2:26:31.610,2:26:32.480 That is nice. 2:26:32.480,2:26:42.080 So now, what I just did by hand,[br]by formulating the query named 2:26:42.080,2:26:45.200 after Gladstone-- 2:26:45.200,2:26:48.390 I guess this is the-- 2:26:48.390,2:26:48.960 Is it? 2:26:53.000,2:26:53.720 Yeah. 2:26:53.720,2:26:56.050 So this-- I just[br]clicked the three-- 2:26:56.050,2:26:57.470 the ellipsis here. 2:26:57.470,2:26:58.460 Right after the name. 2:26:58.460,2:26:59.630 You see this? 2:26:59.630,2:27:03.050 This was just added by[br]enabling easy query, 2:27:03.050,2:27:04.640 which I just learned about. 2:27:04.640,2:27:07.640 So you just click this[br]and it auto-magically 2:27:07.640,2:27:09.620 made this kind of trivial query. 2:27:09.620,2:27:12.380 Of course, if I want a more[br]complicated query like, 2:27:12.380,2:27:14.510 I don't know, give me[br]all the things that 2:27:14.510,2:27:18.110 are named after Lincoln[br]but are a school, 2:27:18.110,2:27:21.650 I will still need to kind[br]of edit a custom query. 2:27:21.650,2:27:23.450 But this is a super[br]easy and very nice 2:27:23.450,2:27:28.620 way of just doing a very super[br]quick query for exactly this. 2:27:28.620,2:27:29.120 Right? 2:27:29.120,2:27:33.410 Like. what other items have[br]exactly this property and value 2:27:33.410,2:27:35.720 named after William Gladstone? 2:27:35.720,2:27:38.750 So, thank you to whoever[br]made this suggestion 2:27:38.750,2:27:42.140 to demonstrate that, and[br]I'm glad I learned something 2:27:42.140,2:27:45.230 too today. 2:27:45.230,2:27:48.590 Let's move to[br]another sample query. 2:27:48.590,2:27:50.360 Here's a fun example. 2:27:50.360,2:27:56.910 Popular surnames among[br]fictional characters. 2:27:56.910,2:27:58.650 Think about that for a second. 2:27:58.650,2:28:03.030 Popular surnames among[br]fictional characters. 2:28:03.030,2:28:06.510 So we're asking Wikidata[br]to go through all 2:28:06.510,2:28:10.120 the fictional[br]characters you know, 2:28:10.120,2:28:13.510 and of those look through[br]their surnames, group 2:28:13.510,2:28:15.910 them so that you can count[br]them, the repetitions 2:28:15.910,2:28:18.460 of the surnames,[br]and give me the most 2:28:18.460,2:28:21.550 popular surnames among them. 2:28:21.550,2:28:26.280 Additionally, I want you to[br]awesomely present the results 2:28:26.280,2:28:28.020 as a bubble chart. 2:28:28.020,2:28:29.220 Oh, yeah. 2:28:29.220,2:28:31.050 Wikidata can do that. 2:28:31.050,2:28:34.420 And I run the query. 2:28:34.420,2:28:36.750 And check it out. 2:28:36.750,2:28:41.130 The most popular names[br]among fictional characters 2:28:41.130,2:28:45.780 we can say that knows about are[br]Joan, Smith, Taylor, et cetera. 2:28:45.780,2:28:48.450 I mean for all we know,[br]the most popular name 2:28:48.450,2:28:50.770 among fictional characters[br]actually in the world 2:28:50.770,2:28:52.350 may be Wu. 2:28:52.350,2:28:54.790 Or something in Chinese[br]for all we know. 2:28:54.790,2:28:57.930 But if that has not been[br]modeled in Wikidata, 2:28:57.930,2:29:01.020 we're not going to get that. 2:29:01.020,2:29:03.540 So Taylor, Smith,[br]Jones, Williams, 2:29:03.540,2:29:06.870 seem to be the[br]most popular names. 2:29:06.870,2:29:08.400 And again, I could limit this. 2:29:08.400,2:29:11.520 I could make the[br]same query but add, 2:29:11.520,2:29:14.250 only among works whose[br]original language 2:29:14.250,2:29:19.020 was Italian, for example, to get[br]more interesting results if I 2:29:19.020,2:29:21.480 only care about[br]Italian literature. 2:29:21.480,2:29:24.720 But this is an example of[br]how I got awesome bubble 2:29:24.720,2:29:28.170 charts for free, and[br]I can just plug this 2:29:28.170,2:29:30.900 into an awesome[br]presentation that I make. 2:29:30.900,2:29:34.500 Of course I can still[br]look at the raw table. 2:29:34.500,2:29:37.940 So the query still resulted[br]in a bunch of data, right? 2:29:37.940,2:29:42.480 So Smith repeats 41 times,[br]Jones 38 times, Taylor 34 times, 2:29:42.480,2:29:43.750 et cetera, et cetera. 2:29:43.750,2:29:48.960 And down that list. 2:29:48.960,2:29:52.320 And I could, again, I could[br]export this into a file 2:29:52.320,2:29:56.100 and load it up in a spreadsheet,[br]and do additional processing 2:29:56.100,2:29:56.670 on it. 2:29:56.670,2:29:58.560 I can link to it. 2:29:58.560,2:30:02.530 I can do all kinds of[br]awesome things with it. 2:30:02.530,2:30:05.250 So that's another awesome query. 2:30:05.250,2:30:08.460 We don't have to go into[br]every line by line analysis 2:30:08.460,2:30:11.670 here of why this[br]works the way it does. 2:30:11.670,2:30:15.840 I want to show you some[br]other queries first. 2:30:15.840,2:30:22.470 Let's look at-- this is just[br]fun, overall causes of death. 2:30:22.470,2:30:24.870 Again a bubble[br]chart just looking 2:30:24.870,2:30:28.260 at people who died[br]of things, and have 2:30:28.260,2:30:30.760 a cause of death listed. 2:30:30.760,2:30:34.380 And we learn that the most[br]commonly listed cause of death 2:30:34.380,2:30:40.350 is myocardial infarction,[br]pneumonitis, cerebral vascular, 2:30:40.350,2:30:42.620 lung cancer, et[br]cetera, et cetera. 2:30:42.620,2:30:44.850 And again, in a bubble chart. 2:30:44.850,2:30:49.670 And so how does that work? 2:30:49.670,2:30:53.050 So just very briefly, the[br]important parts of this query 2:30:53.050,2:30:59.150 are I'm looking for something,[br]for some person, who 2:30:59.150,2:31:04.240 is instance of 31, instance[br]of Q5, which is human. 2:31:04.240,2:31:05.390 So a human. 2:31:05.390,2:31:07.130 Again, just to kind[br]of limit the query. 2:31:07.130,2:31:11.330 I'm not interested in[br]books or mountains. 2:31:11.330,2:31:14.420 I'm looking for humans[br]who have that same person, 2:31:14.420,2:31:21.150 that same variable PID,[br]should have a 509, meaning-- 2:31:21.150,2:31:22.412 Hello. 2:31:22.412,2:31:24.620 Why don't I have the-- 2:31:24.620,2:31:25.120 Yeah. 2:31:25.120,2:31:28.480 A 509, which is cause of death. 2:31:28.480,2:31:31.540 And that cause of death[br]is another variable, 2:31:31.540,2:31:32.930 that I'm calling CID. 2:31:32.930,2:31:35.410 Now, previously[br]we were saying you 2:31:35.410,2:31:36.850 know I want things[br]that are named 2:31:36.850,2:31:39.550 after Gladstone specifically. 2:31:39.550,2:31:42.000 Only things that have[br]that particular value. 2:31:42.000,2:31:44.320 Here I'm saying I'm[br]looking for things 2:31:44.320,2:31:47.110 that have some cause of death. 2:31:47.110,2:31:48.760 Not a specific one. 2:31:48.760,2:31:50.260 I just wanted to[br]get everything that 2:31:50.260,2:31:54.880 has a statement with some[br]value about property 509 2:31:54.880,2:31:56.530 cause of death. 2:31:56.530,2:31:57.940 OK? 2:31:57.940,2:32:04.410 And then this other bit of[br]magic here, the group by, 2:32:04.410,2:32:07.870 tells Wikidata I'm not[br]actually interested 2:32:07.870,2:32:09.100 in every individual thing. 2:32:09.100,2:32:12.310 I want you to group those[br]causes, and then count them 2:32:12.310,2:32:14.230 and give me the top ones. 2:32:14.230,2:32:15.523 So that's how this query works. 2:32:20.550,2:32:22.320 Here's that query I promised. 2:32:22.320,2:32:26.460 Painters whose fathers[br]were also painters. 2:32:26.460,2:32:28.630 I can only think of a couple. 2:32:28.630,2:32:31.890 I mean, Monet and Vogel. 2:32:31.890,2:32:34.800 But I'm sure Wikidata[br]knows many more. 2:32:34.800,2:32:38.620 So let's run this query. 2:32:38.620,2:32:40.270 And I have 100 results. 2:32:40.270,2:32:43.120 By the way, I have limited[br]it to 100 results just 2:32:43.120,2:32:44.650 to keep it kind of snappy. 2:32:44.650,2:32:47.530 But actually, we could[br]maybe try removing the limit 2:32:47.530,2:32:50.170 and see if Wikidata[br]could tell us 2:32:50.170,2:32:53.890 the total number in Wikidata. 2:32:53.890,2:32:55.120 Yeah, that wasn't too bad. 2:32:55.120,2:32:58.400 So 1,270 results. 2:32:58.400,2:32:59.140 OK. 2:32:59.140,2:33:04.150 Wikidata, already at this[br]early date and it's progress, 2:33:04.150,2:33:07.540 already knows about[br]more than 1,200 painters 2:33:07.540,2:33:10.980 who are sons of painters. 2:33:10.980,2:33:16.140 Sons of male painters, like[br]their father is a painter. 2:33:16.140,2:33:18.120 There may be[br]additional painters who 2:33:18.120,2:33:21.390 are sons of female painters[br]not included in this query. 2:33:21.390,2:33:24.990 Again, always remember what[br]exactly you are asking. 2:33:24.990,2:33:27.840 In this query I was[br]asking about the father. 2:33:27.840,2:33:30.330 I'm leaving out any[br]possible painters who 2:33:30.330,2:33:32.720 are sons of mother painters. 2:33:32.720,2:33:33.390 OK? 2:33:33.390,2:33:35.250 So how does this work? 2:33:35.250,2:33:39.630 I'm asking for the painter[br]along with the human label, 2:33:39.630,2:33:42.630 and the father along[br]with the human label. 2:33:42.630,2:33:47.610 So Michel Monet is the[br]son of Claude Monet. 2:33:47.610,2:33:54.180 And Domenico Tintoretto is the[br]son of the famous Tintoretto 2:33:54.180,2:33:57.210 whose label, you know, is just[br]Tintoretto like Michelangelo. 2:33:57.210,2:33:59.960 You know, you don't always[br]have to have the full name 2:33:59.960,2:34:02.420 in the common label. 2:34:02.420,2:34:07.010 Paloma Picasso is the[br]daughter of Pablo Picasso. 2:34:07.010,2:34:07.510 OK. 2:34:07.510,2:34:11.040 So Wikidata knows about[br]all these results. 2:34:11.040,2:34:14.610 Of course Holbein the Younger[br]son of Holbein the Elder. 2:34:14.610,2:34:15.760 And how did we get there? 2:34:15.760,2:34:20.860 Well we asked Wikidata[br]to look for something, 2:34:20.860,2:34:26.820 let's call it painter, which[br]has 106, which is occupation, 2:34:26.820,2:34:31.100 with a value painter. 2:34:31.100,2:34:31.600 Right? 2:34:31.600,2:34:35.310 This unwieldy number[br]1028181, that's painter. 2:34:35.310,2:34:40.250 So I'm asking for any item[br]that has occupation painter. 2:34:40.250,2:34:43.300 And let's call[br]that item painter. 2:34:43.300,2:34:49.770 I also want that painter to have[br]a property 22, which is father. 2:34:49.770,2:34:50.850 OK. 2:34:50.850,2:34:52.350 Father. 2:34:52.350,2:34:55.140 And I want it to[br]have some value. 2:34:55.140,2:34:58.770 OK, I'm putting it into[br]another variable called father. 2:34:58.770,2:35:01.320 I could have called[br]it, you know, frog. 2:35:01.320,2:35:04.230 That doesn't change[br]anything, just to be clear. 2:35:04.230,2:35:06.630 What matters is that this[br]is the property father. 2:35:06.630,2:35:10.320 I could have called[br]it anything I want. 2:35:10.320,2:35:13.590 So, and then, I have[br]a third condition. 2:35:13.590,2:35:18.010 That the father, like whatever[br]it says here in property 22, 2:35:18.010,2:35:22.590 I want that father to have[br]himself a property 106 2:35:22.590,2:35:27.750 occupation with a value painter. 2:35:27.750,2:35:28.730 OK? 2:35:28.730,2:35:30.800 These conditions[br]combined to give me 2:35:30.800,2:35:36.080 a list of people who have[br]a father and that father 2:35:36.080,2:35:37.850 has occupation painter as well. 2:35:37.850,2:35:40.550 Of course, if I suddenly,[br]or if you suddenly, 2:35:40.550,2:35:44.480 are consumed by[br]curiosity to know 2:35:44.480,2:35:51.344 who are some politicians[br]who are sons of carpenters? 2:35:51.344,2:35:52.760 You could just[br]change that, right? 2:35:52.760,2:35:56.700 Change the first value[br]from painter to politician. 2:35:56.700,2:36:02.624 Change the third line's value[br]from painter to carpenter. 2:36:02.624,2:36:04.040 Maybe that list[br]will be very short 2:36:04.040,2:36:06.680 because carpenters don't[br]tend to be notable, 2:36:06.680,2:36:08.910 so they wouldn't be[br]represented on Wikidata. 2:36:08.910,2:36:11.990 That's why this works relatively[br]well with painters, right? 2:36:11.990,2:36:14.420 Because most of[br]them are notable. 2:36:14.420,2:36:16.370 But generally you[br]could do that, right? 2:36:16.370,2:36:18.500 That's an example of[br]how you can take a query 2:36:18.500,2:36:22.340 and just replace one of those[br]values, or even the language. 2:36:22.340,2:36:26.840 So again, I could ask[br]for these same painters. 2:36:26.840,2:36:27.650 It's limited again. 2:36:27.650,2:36:31.190 These same painters,[br]but with Arabic labels. 2:36:31.190,2:36:34.880 Same query, but I have Arabic[br]labels for these painters. 2:36:34.880,2:36:37.250 And of course where[br]there is no Arabic label 2:36:37.250,2:36:40.360 I get the queue number. 2:36:40.360,2:36:40.860 OK? 2:36:40.860,2:36:43.650 So that's that query[br]that I promised you, 2:36:43.650,2:36:47.670 painters who sons of painters[br]can be done by Wikidata 2:36:47.670,2:36:49.830 in under one second. 2:36:49.830,2:36:51.480 How awesome is that? 2:36:51.480,2:36:52.950 We can also get some statistics. 2:36:52.950,2:36:55.920 So how about counting[br]total articles 2:36:55.920,2:36:59.740 in a given wiki by gender. 2:36:59.740,2:37:02.070 This is what we call[br]the content gender 2:37:02.070,2:37:06.900 gap, as distinct from the[br]participation gender gap. 2:37:06.900,2:37:10.276 This is the gender gap in[br]what we cover on Wikipedia. 2:37:10.276,2:37:11.400 So let's take one of these. 2:37:16.380,2:37:17.630 So this is a query. 2:37:17.630,2:37:23.130 Articles about women in[br]some given Wikipedia. 2:37:23.130,2:37:23.660 All right. 2:37:23.660,2:37:25.799 So let's take-- 2:37:25.799,2:37:26.340 I don't know. 2:37:26.340,2:37:30.240 Let's take the Tamil Wikipedia. 2:37:30.240,2:37:32.460 That's language code TA. 2:37:32.460,2:37:34.950 So I just put TA here. 2:37:34.950,2:37:38.850 And I click Run, and[br]I get this count. 2:37:38.850,2:37:39.960 That's all I wanted. 2:37:39.960,2:37:41.720 I'm not actually[br]interested in the items, 2:37:41.720,2:37:44.962 like in the list of women[br]on the Tamil Wikipedia. 2:37:44.962,2:37:45.920 I just want the number. 2:37:45.920,2:37:48.510 So I selected the count here. 2:37:48.510,2:37:52.610 And this number[br]turns out to be 2159. 2:37:52.610,2:37:57.300 So there are 2000[br]articles about women 2:37:57.300,2:38:02.350 the Tamil Wikipedia that[br]Wikidata knows to be female. 2:38:02.350,2:38:02.850 Right? 2:38:02.850,2:38:05.730 I'm asking about the gender[br]field, property 21 again. 2:38:05.730,2:38:08.900 Remember, if there's some[br]article about a woman in Tamil 2:38:08.900,2:38:12.090 Wikipedia, but wiki[br]data doesn't have 2:38:12.090,2:38:14.460 a statement about the[br]gender, that person 2:38:14.460,2:38:15.640 will not be counted here. 2:38:15.640,2:38:18.240 So again, be careful[br]about kind of stating 2:38:18.240,2:38:22.800 that is exactly the number[br]of women articles on Tamil 2:38:22.800,2:38:23.340 Wikipedia. 2:38:23.340,2:38:24.600 That's probably not true. 2:38:24.600,2:38:27.560 I'm sure some of those[br]articles are missing 2:38:27.560,2:38:30.740 a sex or gender or property. 2:38:30.740,2:38:33.150 But for raw statistics,[br]that's probably good, 2:38:33.150,2:38:35.700 because some men are also[br]missing the sex or gender 2:38:35.700,2:38:37.620 statistic property. 2:38:37.620,2:38:41.820 So we could take the[br]same query for men. 2:38:41.820,2:38:43.170 It's essentially the exact same. 2:38:43.170,2:38:48.840 It just has this unwieldy[br]number for males, 6581097. 2:38:48.840,2:38:52.710 I can change this language[br]code again to TA for Tamil. 2:38:52.710,2:38:58.880 And how many men are covered[br]on Tamil Wikipedia 14,649. 2:38:58.880,2:38:59.610 OK. 2:38:59.610,2:39:06.880 So women, 2,100, men,[br]about seven times as many. 2:39:06.880,2:39:07.380 Right? 2:39:07.380,2:39:12.300 So that's the approximate[br]size of the content gender 2:39:12.300,2:39:14.610 gap on Tamil Wikipedia. 2:39:14.610,2:39:18.850 And again, I can complicate[br]this query as much as I want. 2:39:18.850,2:39:21.390 For example, I can[br]try and find out 2:39:21.390,2:39:30.390 if this gender gap is wider[br]or narrower among musicians, 2:39:30.390,2:39:31.350 just as an example. 2:39:31.350,2:39:35.850 I could just add a line here[br]that says occupation musician, 2:39:35.850,2:39:37.890 and then I'm only[br]counting articles 2:39:37.890,2:39:41.190 on Tamil Wikipedia about[br]musicians who are female 2:39:41.190,2:39:43.190 versus articles[br]on Tamil Wikipedia 2:39:43.190,2:39:45.030 about musicians who are male. 2:39:45.030,2:39:47.890 And I can kind of[br]compare the gender-- 2:39:47.890,2:39:53.820 the content gender gap across[br]occupations on Tamil Wikipedia. 2:39:53.820,2:39:56.030 Do you see the[br]important point here? 2:39:56.030,2:39:58.490 Is that this is not just[br]kind of a one purpose query. 2:39:58.490,2:40:01.250 I can just with a single[br]additional conditional suddenly 2:40:01.250,2:40:04.370 make it a much more interesting[br]query, because I break it down 2:40:04.370,2:40:05.540 by occupation. 2:40:05.540,2:40:07.810 Or I break it down by century. 2:40:07.810,2:40:12.530 Do we have more of the coverage[br]gap in 19th century people 2:40:12.530,2:40:13.940 than in 21st century people? 2:40:13.940,2:40:15.560 I mean, I sure hope so, right? 2:40:15.560,2:40:18.480 The patriarchy is[br]weakening somewhat. 2:40:18.480,2:40:21.830 So I wouldn't be surprised if[br]there are many more notable men 2:40:21.830,2:40:23.430 covered about the 19th century. 2:40:23.430,2:40:25.784 But if we are also covering-- 2:40:25.784,2:40:27.200 I mean it's the[br]gender gap is just 2:40:27.200,2:40:29.540 as wide for 21st century[br]people, that would 2:40:29.540,2:40:30.800 be a little disappointing. 2:40:30.800,2:40:35.870 Again that's something I[br]can fairly easily find out 2:40:35.870,2:40:38.980 on Wikidata query. 2:40:38.980,2:40:41.500 Any questions so far, or[br]are you just sharing links? 2:40:41.500,2:40:43.160 AUDIENCE: Yep there is one. 2:40:43.160,2:40:47.480 So somebody is wondering if you[br]can demonstrate, or at least 2:40:47.480,2:40:50.420 give a short answer of the[br]latter of this question. 2:40:50.420,2:40:52.530 Is it possible using[br]in Wikidata SPARQL 2:40:52.530,2:40:55.520 to find specific[br]Wikidata articles, e.g. 2:40:55.520,2:40:59.060 featured articles, of a[br]certain language which do not 2:40:59.060,2:41:01.160 exist in another language. 2:41:01.160,2:41:03.770 I know it is possible[br]to find category based 2:41:03.770,2:41:05.820 results using a PET scan tool. 2:41:05.820,2:41:09.110 But can we specify[br]that by selecting e.g. 2:41:09.110,2:41:10.055 featured articles? 2:41:10.055,2:41:11.390 ASAF BARTOV: Yes. 2:41:11.390,2:41:12.600 Excellent question. 2:41:12.600,2:41:14.120 It is possible, indeed. 2:41:14.120,2:41:17.570 And I will demonstrate[br]one such query. 2:41:17.570,2:41:19.190 Another query that[br]I already mentioned 2:41:19.190,2:41:24.840 largest cities in the[br]world with a female mayor. 2:41:24.840,2:41:29.190 This query-- let's[br]close some of these tabs 2:41:29.190,2:41:30.315 before my browser chokes. 2:41:33.600,2:41:36.840 So this query lists[br]the major world cities 2:41:36.840,2:41:39.120 run by women currently. 2:41:39.120,2:41:45.650 And the answer is Mumbai, Mexico[br]City, Tokyo, bunch of others. 2:41:49.470,2:41:52.371 And wait-- that's not it at all. 2:41:52.371,2:41:53.370 I clicked the wrong one. 2:41:53.370,2:41:55.050 That's the map of paintings. 2:41:55.050,2:41:55.800 OK. 2:41:55.800,2:41:57.370 Let's demonstrate[br]that for a second. 2:41:57.370,2:41:59.520 So this is the map[br]of all paintings 2:41:59.520,2:42:03.870 for which we know a location[br]with the count per location. 2:42:03.870,2:42:07.770 And the results are[br]awesomely presented on a map. 2:42:07.770,2:42:08.830 OK. 2:42:08.830,2:42:12.420 Again, under the hood this is[br]a table, of course, of results. 2:42:12.420,2:42:15.660 But, awesomely, I can[br]browse it as a map. 2:42:15.660,2:42:20.320 So here is a map of the[br]world with all the paintings 2:42:20.320,2:42:22.060 that Wikidata knows about. 2:42:22.060,2:42:23.920 Not just knows[br]about the paintings, 2:42:23.920,2:42:28.180 but knows about their[br]location in a museum. 2:42:28.180,2:42:30.670 Not surprisingly[br]Europe is much better 2:42:30.670,2:42:35.540 covered than Russia or Africa. 2:42:35.540,2:42:40.150 There is a huge gap in[br]contribution to Wikidata 2:42:40.150,2:42:41.740 from these countries. 2:42:41.740,2:42:43.780 And some of it can be fixed. 2:42:43.780,2:42:47.740 And of course there is much more[br]documentation, and much more 2:42:47.740,2:42:50.260 art in Europe. 2:42:50.260,2:42:54.280 But if we zoom in, I[br]don't know, Rome probably 2:42:54.280,2:42:55.900 has a few paintings. 2:42:55.900,2:42:56.400 Right? 2:43:00.080,2:43:02.288 Hello. 2:43:02.288,2:43:04.200 Sorry. 2:43:04.200,2:43:09.780 It's-- Yes. 2:43:09.780,2:43:13.290 Vatican City sounds[br]like a good bet, right? 2:43:13.290,2:43:14.290 I can zoom in here. 2:43:14.290,2:43:16.290 And I can just click[br]one of these dots 2:43:16.290,2:43:21.400 and see in this point[br]there are two paintings. 2:43:21.400,2:43:25.270 And in this one there is one[br]and it's the Archbasilica 2:43:25.270,2:43:27.460 of St. John Lateran. 2:43:27.460,2:43:31.060 Let's see, this is the[br]actual St. Peter, right? 2:43:31.060,2:43:33.650 Sistine Chapel has 23 paintings. 2:43:33.650,2:43:34.330 What? 2:43:34.330,2:43:36.670 The Sistine Chapel has way[br]more than 23 paintings. 2:43:36.670,2:43:40.330 Correct, but 23 of them[br]are documented on Wikidata. 2:43:40.330,2:43:43.330 Have their own item[br]for the painting, not 2:43:43.330,2:43:45.280 the Sistine Chapel,[br]the painting has 2:43:45.280,2:43:49.540 an item that lists its[br]being in the Sistine Chapel. 2:43:49.540,2:43:50.950 There are 23 of those. 2:43:50.950,2:43:52.270 OK. 2:43:52.270,2:43:54.310 There is definitely[br]room to document 2:43:54.310,2:43:57.040 the rest of the artworks[br]in the Sistine Chapel. 2:43:57.040,2:43:59.740 So, again, this is just[br]not the kind of query 2:43:59.740,2:44:03.330 you were able to[br]make before Wikidata, 2:44:03.330,2:44:07.750 and it's a fairly simple[br]query, as you can see. 2:44:07.750,2:44:13.020 There are examples using[br]maps like airports within 100 2:44:13.020,2:44:15.040 kilometers of Berlin. 2:44:15.040,2:44:18.310 Again using the coordinates[br]as a useful data point. 2:44:18.310,2:44:21.880 And here is a map showing me[br]only airports within a 100 2:44:21.880,2:44:25.990 kilometer radius from Berlin. 2:44:25.990,2:44:29.140 But I wanted to show[br]you the mayors query. 2:44:29.140,2:44:34.510 Let's click the-- oh I just[br]have the wrong link here. 2:44:34.510,2:44:41.040 But I can still find it[br]here by typing mayor. 2:44:41.040,2:44:44.590 Here we go, largest[br]cities with female mayor. 2:44:44.590,2:44:47.230 So this is a slightly[br]more complicated query. 2:44:47.230,2:44:53.010 But if I run it, I get the top[br]10, because I set limit to 10. 2:44:53.010,2:44:54.820 I get the top 10[br]cities in the world, 2:44:54.820,2:44:59.710 by population, size that[br]are currently run by women. 2:44:59.710,2:45:03.490 Tokyo, Mumbai, Yokohama,[br]Caracas, et cetera. 2:45:03.490,2:45:08.080 And one interesting thing that[br]you may want to notice here 2:45:08.080,2:45:10.690 is that I'm asking for cities. 2:45:10.690,2:45:13.660 I mean items, that[br]are instance of city. 2:45:13.660,2:45:16.420 And that have a[br]head of government, 2:45:16.420,2:45:18.640 that have some[br]statement about who 2:45:18.640,2:45:28.440 is in charge, and that statement[br]has sex that's listed up here 2:45:28.440,2:45:29.886 as female. 2:45:29.886,2:45:31.510 Don't worry about[br]the syntax right now. 2:45:31.510,2:45:34.590 I just want to show you[br]some specific angle here. 2:45:34.590,2:45:37.920 And I'm further[br]filtering these results. 2:45:37.920,2:45:45.400 I only want those items where[br]there is not the property 2:45:45.400,2:45:48.630 and the qualifier, end time. 2:45:48.630,2:45:50.390 Why is that important? 2:45:50.390,2:45:56.530 Because if a city once[br]had a female mayor, 2:45:56.530,2:45:59.890 but that mayor is not the mayor[br]anymore, because mayors change, 2:45:59.890,2:46:01.600 I don't want them in this query. 2:46:01.600,2:46:04.990 I want to query of[br]cities currently having 2:46:04.990,2:46:05.680 a female mayor. 2:46:05.680,2:46:07.990 And of course Wikidata[br]may have historical data 2:46:07.990,2:46:09.880 with start and[br]end time, as we've 2:46:09.880,2:46:14.530 seen, that documents this[br]person was the mayor of Tokyo 2:46:14.530,2:46:17.170 or San Francisco[br]between these years. 2:46:17.170,2:46:18.820 But if there is no[br]end times that means 2:46:18.820,2:46:21.520 they are currently the mayor. 2:46:21.520,2:46:24.490 So that's an example of[br]asking about a qualifier 2:46:24.490,2:46:28.180 of a statement, to again, to get[br]the results we actually want. 2:46:28.180,2:46:31.630 If we want current mayors it's[br]important to put this filter. 2:46:31.630,2:46:35.365 If we don't, we will get[br]historical female mayors 2:46:35.365,2:46:35.865 as well. 2:46:39.920,2:46:40.490 All right. 2:46:40.490,2:46:45.380 So these are some[br]example queries. 2:46:45.380,2:46:49.085 Questions about that? 2:46:51.620,2:46:53.030 Oh, the featured[br]article example. 2:46:58.280,2:47:01.700 So let's look at that. 2:47:07.050,2:47:12.660 So I have prepared[br]such a query recently. 2:47:12.660,2:47:15.300 Here we go. 2:47:15.300,2:47:18.570 So this is a query. 2:47:18.570,2:47:20.472 I just saved it here[br]on my user page. 2:47:20.472,2:47:21.930 I mean, this is[br]not Wikidata query. 2:47:21.930,2:47:25.390 This is just a meta page[br]containing the query usefully. 2:47:28.260,2:47:33.800 And let's run this. 2:47:33.800,2:47:38.030 So this query, it's actually[br]not very complicated. 2:47:38.030,2:47:40.030 It's just has a long[br]list of countries, 2:47:40.030,2:47:42.170 because I'm asking[br]about African countries. 2:47:42.170,2:47:42.670 OK. 2:47:42.670,2:47:45.010 I'm looking for human[br]females from one 2:47:45.010,2:47:51.060 of these countries that[br]have an article in English. 2:47:51.060,2:47:53.010 That's what this line means. 2:47:53.010,2:47:55.620 But not in French. 2:47:55.620,2:47:57.570 That's what this part means. 2:47:57.570,2:47:59.170 OK. 2:47:59.170,2:48:01.720 This part, these[br]two lines together. 2:48:01.720,2:48:03.190 But not in French. 2:48:03.190,2:48:05.920 And this is what's[br]called a badge. 2:48:05.920,2:48:09.430 That's Wikidata's concept of[br]good and featured articles. 2:48:09.430,2:48:10.600 It's called a badge. 2:48:10.600,2:48:16.500 So I want them to have some[br]badge on English Wikipedia. 2:48:16.500,2:48:17.000 OK? 2:48:17.000,2:48:22.250 So again, this query is[br]asking for the top 100 women 2:48:22.250,2:48:26.150 from Africa who are documented[br]on English Wikipedia, 2:48:26.150,2:48:28.730 in a featured or[br]good article status. 2:48:28.730,2:48:30.660 But not on French Wikipedia. 2:48:30.660,2:48:33.270 So this is a query that's[br]a to-do query, right? 2:48:33.270,2:48:35.630 That's a query[br]for French editors 2:48:35.630,2:48:40.100 to consider what they might[br]usefully translate or create 2:48:40.100,2:48:41.180 in French. 2:48:41.180,2:48:48.860 And if we run this see[br]we have three results. 2:48:48.860,2:48:50.720 I mean, we have many[br]women from Africa 2:48:50.720,2:48:52.460 covered on English Wikipedia. 2:48:52.460,2:48:57.500 But only three articles[br]have featured or good status 2:48:57.500,2:49:03.460 among those that do not have[br]French Wikipedia coverage. 2:49:03.460,2:49:04.900 Let me rephrase that. 2:49:04.900,2:49:07.990 Among the English Wikipedia[br]articles about African women 2:49:07.990,2:49:11.170 that don't have a[br]French counterpart, 2:49:11.170,2:49:14.520 only three are featured or good. 2:49:14.520,2:49:16.960 OK? 2:49:16.960,2:49:17.640 Do you see this? 2:49:17.640,2:49:19.720 The badge is good article. 2:49:19.720,2:49:23.550 This little incantation[br]here is what allows 2:49:23.550,2:49:25.950 you to ask about the badge. 2:49:25.950,2:49:28.730 This here. 2:49:28.730,2:49:33.420 And, by the way, the slides[br]will be uploaded to commons. 2:49:33.420,2:49:38.708 And we will-- how shall we make[br]it available on the YouTube 2:49:38.708,2:49:39.710 thing as well? 2:49:42.730,2:49:43.230 No, no. 2:49:43.230,2:49:45.870 But, I mean, for people who[br]will later watch this video. 2:49:52.119,2:49:54.160 Oh yeah, we can add it to[br]the YouTube description 2:49:54.160,2:49:55.368 and the comments description. 2:49:55.368,2:49:58.090 So in the-- if you're[br]watching this video later, 2:49:58.090,2:50:00.820 in the description, we will[br]add a link to this query 2:50:00.820,2:50:01.480 specifically. 2:50:01.480,2:50:03.340 Because it's not in[br]the slides right now. 2:50:03.340,2:50:03.910 It will be. 2:50:06.622,2:50:07.980 OK. 2:50:07.980,2:50:10.260 So. 2:50:10.260,2:50:13.590 Questions so far? 2:50:13.590,2:50:14.700 We're almost done. 2:50:14.700,2:50:16.260 We have a few minutes left. 2:50:16.260,2:50:18.090 So questions about queries? 2:50:18.090,2:50:20.130 I mean, I'm sure[br]there's tons of things 2:50:20.130,2:50:21.510 you don't know how to do yet. 2:50:21.510,2:50:24.720 And you maybe you didn't really[br]get the sense for SPARQL. 2:50:24.720,2:50:27.120 It's something you need[br]to really do on your own 2:50:27.120,2:50:28.290 on your computer. 2:50:28.290,2:50:29.465 See how it works. 2:50:29.465,2:50:30.090 Fiddle with it. 2:50:30.090,2:50:30.900 Change something. 2:50:30.900,2:50:33.270 See that it breaks[br]and complains. 2:50:33.270,2:50:37.470 But, very importantly-- oh I[br]had this in the other questions 2:50:37.470,2:50:38.340 slide. 2:50:38.340,2:50:42.480 Remember Wikidata project chat. 2:50:42.480,2:50:45.810 That's kind of the Wikidata[br]equivalent of the village pump. 2:50:45.810,2:50:47.790 It's the page on Wikidata[br]where you can just 2:50:47.790,2:50:49.830 show up and ask a question. 2:50:49.830,2:50:52.290 In my experience, the[br]Wikidata community 2:50:52.290,2:50:55.410 is very nice, very[br]welcoming, and very eager 2:50:55.410,2:51:00.100 to help newer people integrate[br]and learn how to do things. 2:51:00.100,2:51:01.800 There's also an IRC channel. 2:51:01.800,2:51:04.260 If you know what IRC is and[br]how to use it, by all means, 2:51:04.260,2:51:07.890 go to IRC channel Wikidata. 2:51:07.890,2:51:09.330 There's people[br]there all the time, 2:51:09.330,2:51:11.040 and you can just ask a question. 2:51:11.040,2:51:13.245 If you're trying to do a[br]query, and you don't quite 2:51:13.245,2:51:15.870 understand the syntax, or you're[br]not sure how to get the result 2:51:15.870,2:51:16.680 you want. 2:51:16.680,2:51:20.050 There are people there who[br]will gladly help you do that. 2:51:20.050,2:51:22.560 There is also a[br]Wikidata newsletter 2:51:22.560,2:51:25.680 published by the Wikidata team,[br]which is centered in Germany 2:51:25.680,2:51:27.330 and Wikipedia Germany. 2:51:27.330,2:51:31.890 And they send out a newsletter[br]in English with Wikidata news. 2:51:31.890,2:51:33.570 You know, new[br]properties, new items, 2:51:33.570,2:51:34.920 new things in the project. 2:51:34.920,2:51:36.840 But also sample queries. 2:51:36.840,2:51:39.300 So once a week there is[br]kind of an awesome query 2:51:39.300,2:51:43.440 to learn from, if you want[br]to learn that way instead 2:51:43.440,2:51:46.230 of reading like a[br]whole manual on SPARQL. 2:51:46.230,2:51:48.300 So I'm just encouraging[br]you to get help 2:51:48.300,2:51:49.470 in one of those channels. 2:51:49.470,2:51:51.000 Of course you can write to me. 2:51:51.000,2:51:55.920 Just reach out to me and[br]ask me questions as well. 2:51:55.920,2:51:58.860 I hope by now you agree[br]that Wikidata is love, 2:51:58.860,2:52:03.150 and Wikidata data is awesome. 2:52:03.150,2:52:06.480 If there are no questions,[br]we do have a tiny bit of time 2:52:06.480,2:52:11.510 to demonstrate one[br]more tool but that's-- 2:52:11.510,2:52:12.010 no? 2:52:12.010,2:52:13.170 No questions. 2:52:13.170,2:52:17.600 OK so let's talk about-- 2:52:17.600,2:52:19.100 well, the resonator[br]is kind of nice, 2:52:19.100,2:52:22.890 but it's a little like[br]the article placeholder. 2:52:22.890,2:52:25.530 So this is not Wikidata[br]this is a tool again 2:52:25.530,2:52:26.805 built by Magnus Manske-- 2:52:26.805,2:52:29.310 AUDIENCE: There's also one[br]final question to you in case-- 2:52:29.310,2:52:29.820 ASAF BARTOV: Oh,[br]there is a question. 2:52:29.820,2:52:30.390 AUDIENCE: Yeah. 2:52:30.390,2:52:32.348 ASAF BARTOV: Which[br]advantages and disadvantages 2:52:32.348,2:52:35.370 to create an item[br]before an article is 2:52:35.370,2:52:37.920 done on English Wikipedia? 2:52:37.920,2:52:42.340 Well, I mean, this example[br]that I just made right. 2:52:42.340,2:52:46.960 I'm reading this book[br]by a notable author. 2:52:46.960,2:52:47.810 OK. 2:52:47.810,2:52:51.400 I want this to[br]exist on Wikidata, 2:52:51.400,2:52:53.320 and to be mentioned[br]on Wikidata, so 2:52:53.320,2:52:56.950 that when people look up[br]that author in Wikidata 2:52:56.950,2:52:59.170 they will know about one[br]of his notable works. 2:52:59.170,2:53:02.470 But I'm not prepared to[br]put in the time investment 2:53:02.470,2:53:05.670 to build a whole article[br]on English Wikipedia. 2:53:05.670,2:53:07.420 Either because I don't[br]have the time, or I 2:53:07.420,2:53:09.040 don't have good sources. 2:53:09.040,2:53:11.560 Or maybe my English[br]is not good enough, 2:53:11.560,2:53:14.980 but it is good enough to just[br]record these very basic facts 2:53:14.980,2:53:17.850 and point to the Library of[br]Congress records et cetera. 2:53:17.850,2:53:20.170 So that it's better[br]than nothing. 2:53:20.170,2:53:23.170 So that's one reason[br]to maybe do it. 2:53:23.170,2:53:26.690 Another reason is to[br]be able to link to it. 2:53:26.690,2:53:30.190 So remember that[br]translator lady already 2:53:30.190,2:53:33.280 had an item on Wikidata, but if[br]she hadn't we could have just 2:53:33.280,2:53:38.560 created a very, very basic[br]rudimentary item about her just 2:53:38.560,2:53:41.740 saying, you know,[br]this name is human. 2:53:41.740,2:53:43.060 Country, Bulgaria. 2:53:43.060,2:53:45.220 Occupation, translator. 2:53:45.220,2:53:48.580 Even just that would have[br]would have been something, 2:53:48.580,2:53:51.610 and would have enabled me[br]to link to this person. 2:53:51.610,2:53:56.860 So these are legitimate reasons[br]to create Wikidata entities 2:53:56.860,2:54:01.510 without, or at least before,[br]creating a Wikipedia article. 2:54:01.510,2:54:02.709 If you are going to create-- 2:54:02.709,2:54:04.750 I mean if you're at and[br]edit-a-thon or something, 2:54:04.750,2:54:07.690 and you have come to[br]create Wikipedia articles, 2:54:07.690,2:54:10.660 by all means, first create[br]the Wikipedia article, 2:54:10.660,2:54:13.982 then create the Wikipedia[br]item and link to it. 2:54:17.580,2:54:20.500 I hope that answers[br]the question. 2:54:20.500,2:54:24.940 So the reasonator[br]is simply a kind 2:54:24.940,2:54:31.330 of prettier view of[br]items in Wikidata. 2:54:31.330,2:54:35.980 So you can just type the name[br]of an item or the number. 2:54:35.980,2:54:39.010 Let's pick just a[br]random number, 42. 2:54:39.010,2:54:39.595 Say 42. 2:54:42.770,2:54:45.950 Which happens to[br]be, maybe you've 2:54:45.950,2:54:51.310 heard of this guy,[br]Douglas Adams. 2:54:51.310,2:54:55.490 He happened to have received[br]the queue number 42. 2:54:55.490,2:54:58.760 I'm sure it's a[br]cosmic coincidence 2:54:58.760,2:55:01.460 of infinite improbability. 2:55:01.460,2:55:03.470 And this is a view-- 2:55:03.470,2:55:05.690 this is a tool that[br]is not Wikidata. 2:55:05.690,2:55:09.690 It's a tool built on top of[br]Wikidata called resonator. 2:55:09.690,2:55:14.750 And it gives us the information[br]from Q42, that is from the-- 2:55:14.750,2:55:18.800 this item in Wikidata, which[br]looks like an item in Wikidata. 2:55:18.800,2:55:21.320 But it gives it to us in a[br]slightly more rational kind 2:55:21.320,2:55:22.430 of lay out. 2:55:22.430,2:55:24.200 It even kind of[br]generates a little bit 2:55:24.200,2:55:27.620 of pseudo article text for us. 2:55:27.620,2:55:30.429 You know, Douglas Adams was[br]a British writer, playwright, 2:55:30.429,2:55:31.970 screenwriter,[br]bla-bla-bla, an author. 2:55:31.970,2:55:35.630 He was born on this date, in[br]this place, to these people. 2:55:35.630,2:55:39.080 He studied at this place[br]between these years. 2:55:39.080,2:55:40.670 That's all machine generated. 2:55:40.670,2:55:42.230 Nobody wrote this text. 2:55:42.230,2:55:46.330 That's all taken from those[br]statements in Wikidata, 2:55:46.330,2:55:51.080 and generates this reasonable[br]reading summary paragraph. 2:55:51.080,2:55:54.140 And then it gives us this[br]little table of relatives. 2:55:54.140,2:55:55.610 It's all taken from Wikidata. 2:55:55.610,2:55:57.740 But as you can see,[br]this is already 2:55:57.740,2:56:02.120 a little more accessible than[br]the essentially arbitrary 2:56:02.120,2:56:05.120 ordering of statements[br]on Wikidata. 2:56:05.120,2:56:06.200 And that's OK. 2:56:06.200,2:56:08.060 I mean, that's[br]kind of by design. 2:56:08.060,2:56:10.100 Wikidata is the platform. 2:56:10.100,2:56:11.960 There is going to[br]be-- there are going 2:56:11.960,2:56:15.680 to be many new applications,[br]and platforms, and tools, 2:56:15.680,2:56:19.010 and visual interfaces[br]on top of Wikidata 2:56:19.010,2:56:23.000 to browse Wikidata in a more[br]friendly or more customized 2:56:23.000,2:56:24.480 ways. 2:56:24.480,2:56:27.080 For example, one of the[br]things that resonator 2:56:27.080,2:56:31.610 does for us is give us pictures[br]and maps and a timeline. 2:56:31.610,2:56:32.960 Check it out this. 2:56:32.960,2:56:38.990 Time line machine generated,[br]just from dates and points 2:56:38.990,2:56:44.090 in time, mentioned in the[br]relatively rich Wikidata 2:56:44.090,2:56:47.200 item about Douglas Adams. 2:56:47.200,2:56:47.700 Right? 2:56:47.700,2:56:50.030 So this timeline, for example[br]again, completely machine 2:56:50.030,2:56:51.140 generated. 2:56:51.140,2:56:53.270 But he was educated[br]between these years, 2:56:53.270,2:56:54.920 so I can put it on the timeline. 2:56:54.920,2:56:57.260 And this is the year he was[br]nominated for a Hugo awards, 2:56:57.260,2:56:59.570 so I can put that in a timeline. 2:56:59.570,2:57:00.600 Et cetera. 2:57:00.600,2:57:03.050 So that's just a super[br]quick demonstration 2:57:03.050,2:57:06.620 of that tool, the resonator. 2:57:06.620,2:57:10.310 Links are all here[br]in the slides. 2:57:10.310,2:57:13.390 And the final tool I wanted[br]to mention very quickly 2:57:13.390,2:57:16.220 is the mix and match tool. 2:57:16.220,2:57:21.980 You remember my explanation[br]about Wikidata as Nexus, 2:57:21.980,2:57:27.320 as connection point between many[br]databases, many data sources. 2:57:27.320,2:57:31.080 Those depend on[br]these equivalencies. 2:57:31.080,2:57:35.300 On Wikidata being taught[br]that this item is like that 2:57:35.300,2:57:37.940 ID in this other database. 2:57:37.940,2:57:41.810 And mix and match is a tool[br]again by, Magnus Manske. 2:57:41.810,2:57:44.690 Maybe you're detecting[br]a pattern here. 2:57:44.690,2:57:47.390 It's a tool by Magnus[br]that is designed 2:57:47.390,2:57:50.270 to enable us to kind[br]of take a foreign, 2:57:50.270,2:57:54.950 an external data set, put[br]it alongside Wikidata, 2:57:54.950,2:57:56.690 and kind of try and align them. 2:57:56.690,2:57:59.410 So this item in this[br]external dataset, 2:57:59.410,2:58:01.230 is that already[br]covered in Wikidata? 2:58:01.230,2:58:02.890 If so, by what queue number? 2:58:02.890,2:58:03.890 By what item? 2:58:03.890,2:58:06.170 If not, maybe we need[br]to create a Wikidata 2:58:06.170,2:58:07.610 item to represent it. 2:58:07.610,2:58:10.010 Or maybe it's a[br]duplicate, or something. 2:58:10.010,2:58:15.980 So the mix and match tool has[br]a list of external data sets, 2:58:15.980,2:58:18.140 as you can see. 2:58:18.140,2:58:21.260 The Art and Architecture[br]Thesaurus by the Getty Research 2:58:21.260,2:58:22.220 Institute. 2:58:22.220,2:58:26.690 Or the Australian[br]Dictionary of Biography. 2:58:26.690,2:58:28.880 All kinds of external[br]data sets here. 2:58:32.470,2:58:40.060 Somewhere here I had a specific[br]link to the Royal Society. 2:58:40.060,2:58:41.710 It can also give[br]me some statistics. 2:58:41.710,2:58:47.410 So there is an external data set[br]of all the Fellows of the Royal 2:58:47.410,2:58:48.001 Society. 2:58:48.001,2:58:48.500 Right? 2:58:48.500,2:58:54.970 The oldest academic[br]learned society in England. 2:58:54.970,2:58:57.415 And the internet is tired. 2:59:03.240,2:59:04.640 Here we go. 2:59:04.640,2:59:07.115 Nope. 2:59:07.115,2:59:08.105 Did that work? 2:59:12.560,2:59:15.390 Fellows of the Royal[br]Society, here we go. 2:59:15.390,2:59:17.970 So this one is complete. 2:59:17.970,2:59:21.330 I mean, people have manually[br]gone over every single item 2:59:21.330,2:59:24.330 there and either[br]matched it to Wikidata 2:59:24.330,2:59:27.390 or declared that it was not[br]in scope, or a duplicate 2:59:27.390,2:59:28.520 or whatever. 2:59:28.520,2:59:31.230 But let's look at site stats. 2:59:31.230,2:59:35.210 This is a fun kind of[br]aspect of this tool. 2:59:35.210,2:59:38.530 But that is not working. 2:59:38.530,2:59:40.820 Or it's taking too long. 2:59:40.820,2:59:43.940 So let's just demonstrate[br]how this works. 2:59:43.940,2:59:45.590 Maybe Britannica? 2:59:45.590,2:59:46.780 Is that done already? 2:59:52.570,2:59:53.990 Here we go. 2:59:53.990,2:59:55.330 Encyclopedia Britannica. 2:59:55.330,2:59:55.960 Yeah. 2:59:55.960,3:00:02.040 So the Encyclopedia[br]Britannica has 3:00:02.040,3:00:05.940 40% of the items there[br]are not yet processed. 3:00:05.940,3:00:07.830 So let's process one of them. 3:00:07.830,3:00:16.180 For example there is an item[br]in the Encyclopedia Britannica 3:00:16.180,3:00:19.960 called Boston, England. 3:00:19.960,3:00:23.050 As you know[br]All-American place names 3:00:23.050,3:00:26.050 are totally stolen[br]from elsewhere. 3:00:26.050,3:00:29.440 So there is a Boston[br]in England, though it's 3:00:29.440,3:00:30.700 no longer the famous one. 3:00:30.700,3:00:36.340 And the mix and match[br]tool has automatically 3:00:36.340,3:00:39.610 matched it based on[br]the label to queue 3:00:39.610,3:00:43.900 100, which is Boston big[br]city in the United States. 3:00:43.900,3:00:45.500 And that is incorrect, right? 3:00:45.500,3:00:48.910 That's kind of naive computer[br]going, well this is Boston, 3:00:48.910,3:00:50.820 and this other thing[br]is also Boston. 3:00:50.820,3:00:56.260 And it is asking me to[br]confirm this match or not. 3:00:56.260,3:00:57.400 You see? 3:00:57.400,3:01:01.120 So this is the Boston,[br]England from Britannica. 3:01:01.120,3:01:04.720 And the tool is asking[br]me, is this the same as 3:01:04.720,3:01:06.910 Boston queue 100 in America? 3:01:06.910,3:01:07.990 The answer is no. 3:01:07.990,3:01:10.110 I removed this. 3:01:10.110,3:01:11.860 I remove this match. 3:01:11.860,3:01:15.430 And now this Boston,[br]England is unmatched. 3:01:15.430,3:01:23.230 And I can match it to the[br]correct one in England. 3:01:23.230,3:01:27.370 I can do this by searching[br]English Wikipedia, 3:01:27.370,3:01:28.780 or searching Wikidata. 3:01:28.780,3:01:32.000 I mean, it has[br]these handy links. 3:01:32.000,3:01:36.910 So the English town[br]is in Lincolnshire. 3:01:36.910,3:01:38.230 Boston, Lincolnshire. 3:01:38.230,3:01:46.030 So I can go there and then[br]get the Wikidata item number. 3:01:46.030,3:01:49.810 See this is not queue[br]100, Boston in the states, 3:01:49.810,3:01:53.440 this is queue 311975[br]town in Lincolnshire. 3:01:53.440,3:01:57.310 I can get this queue[br]number, go back to the mix 3:01:57.310,3:01:58.160 and match tool-- 3:01:58.160,3:01:59.110 Where was that? 3:01:59.110,3:02:00.180 Here we are. 3:02:00.180,3:02:01.510 And set queue. 3:02:01.510,3:02:08.650 I can tell the tool that this is[br]the right Boston, and click OK. 3:02:08.650,3:02:14.550 And now this town[br]in Lincolnshire, 3:02:14.550,3:02:17.100 you can see this here,[br]this item, queue 311975, 3:02:17.100,3:02:21.190 is linked to Britannica. 3:02:21.190,3:02:22.660 What does this mean? 3:02:22.660,3:02:23.820 Well, if we go there. 3:02:23.820,3:02:25.380 If we actually go[br]to the Wikidata 3:02:25.380,3:02:28.890 entity you will see[br]that in addition 3:02:28.890,3:02:34.140 to the few statements that[br]it already had, it now has, 3:02:34.140,3:02:38.610 thanks to my clicking, it now[br]has another identifier here. 3:02:38.610,3:02:39.270 See? 3:02:39.270,3:02:43.950 Encyclopedia Britannica[br]Online ID, with this link. 3:02:43.950,3:02:49.440 And if we click it, we[br]will indeed reach this page 3:02:49.440,3:02:51.510 in the Britannica[br]online, which is indeed 3:02:51.510,3:02:53.700 about this town in Lincolnshire. 3:02:53.700,3:02:54.510 You see? 3:02:54.510,3:02:58.650 So I've contributed one[br]of those mappings, one 3:02:58.650,3:03:01.950 of those identifiers,[br]into Wikidata. 3:03:01.950,3:03:04.860 And I didn't have[br]to do it manually. 3:03:04.860,3:03:07.980 This tool kind of prompted[br]me to either confirm 3:03:07.980,3:03:09.480 if it was correct,[br]I could have just 3:03:09.480,3:03:12.150 clicked confirm since[br]it wasn't correct. 3:03:12.150,3:03:16.920 I corrected it manually, but[br]it made this edit on my behalf. 3:03:16.920,3:03:21.180 So that's another tool that[br]encourages us to systematically 3:03:21.180,3:03:24.360 teach Wikidata more things. 3:03:24.360,3:03:25.860 And we're out of time. 3:03:25.860,3:03:29.430 Go edit Wikidata, Now[br]that you have the power, 3:03:29.430,3:03:30.510 you know the deal. 3:03:30.510,3:03:32.430 Use it for good,[br]and not for evil. 3:03:32.430,3:03:35.640 If you have questions,[br]this is my email address. 3:03:35.640,3:03:38.640 If you're watching this video[br]not live the description 3:03:38.640,3:03:41.610 will have links to the[br]slides, and to a bunch 3:03:41.610,3:03:44.610 of other useful[br]pieces of information. 3:03:44.610,3:03:49.510 Any last questions on IRC? 3:03:49.510,3:03:53.210 If not, thank you[br]for your attention. 3:03:53.210,3:03:56.470 And if you like this, and if you[br]feel that you now get Wikidata, 3:03:56.470,3:03:58.330 and you get what it's[br]good for, and you're 3:03:58.330,3:04:01.660 inspired to contribute, I have[br]only one request from you. 3:04:01.660,3:04:04.960 I mean, in addition to using[br]it for good not for evil, 3:04:04.960,3:04:07.630 I ask that you spread the word. 3:04:07.630,3:04:09.550 Show this video--[br]share this video 3:04:09.550,3:04:13.180 with other people in your[br]community, or around you. 3:04:13.180,3:04:16.000 Teach this yourself[br]once you're comfortable 3:04:16.000,3:04:17.650 with these concepts. 3:04:17.650,3:04:21.330 Feel free to use my slides. 3:04:21.330,3:04:23.580 Yeah, and edit Wikidata. 3:04:23.580,3:04:27.010 Thank you very[br]much, and goodbye.