0:00:00.000,0:00:19.480 36C3 preroll music 0:00:19.480,0:00:24.140 Herald Angel: We have Tom and Max here.[br]They have a talk here with a very 0:00:24.140,0:00:28.140 complicated title that I don't quite[br]understand yet. It's called "Interactively 0:00:28.140,0:00:35.810 Discovering Implicational Knowledge in[br]Wikidata. And they told me the point of 0:00:35.810,0:00:39.190 the talk is that I would like to[br]understand what it means and I hope I 0:00:39.190,0:00:42.190 will. So good luck.[br]Tom: Thank you very much. 0:00:42.190,0:00:44.310 Herald: And have some applause, please. 0:00:44.310,0:00:47.880 applause 0:00:47.880,0:00:54.980 T: Thank you very much. Do you hear me?[br]Does it work? Hello? Oh, very good. Thank 0:00:54.980,0:00:58.789 you very much and welcome to our talk[br]about interactively discovering 0:00:58.789,0:01:05.110 implicational knowledge in Wikidata. It[br]is more or less a fun project we started 0:01:05.110,0:01:10.890 for finding rules that are implicit in[br]Wikidata – entailed just by the data it 0:01:10.890,0:01:18.850 has, that people inserted into the[br]Wikidata database so far. And we will 0:01:18.850,0:01:23.570 start with the explicit knowledge. So the[br]explicit data in Wikidata, with Max. 0:01:23.570,0:01:28.340 Max: So. Right. What what is Wikidata?[br]Maybe you have heard about Wikidata, then 0:01:28.340,0:01:33.210 that's all fine. Maybe you haven't, then[br]surely you've heard of Wikipedia. And 0:01:33.210,0:01:36.790 Wikipedia is run by the Wikimedia[br]Foundation and the Wikimedia Foundation 0:01:36.790,0:01:41.330 has several other projects. And one of[br]those is Wikidata. And Wikidata is 0:01:41.330,0:01:45.490 basically a large graph that encodes[br]machine readable knowledge in the form of 0:01:45.490,0:01:51.730 statements. And a statement basically[br]consists of some entity that is connected 0:01:51.730,0:01:58.200 – or some some entities that are connected[br]by some property. And these properties 0:01:58.200,0:02:02.909 can then even have annotations on them.[br]So, for example, we have Donna Strickland 0:02:02.909,0:02:09.149 here and we encode that she has received a[br]Nobel prize in physics last year by this 0:02:09.149,0:02:16.290 property "awarded" and this has then a[br]qualifier "time: 2018" and also "for: 0:02:16.290,0:02:23.100 Chirped Pulse Amplification". And all in[br]all, we have some 890 million statements 0:02:23.100,0:02:31.960 on Wikidata that connect 71 million items[br]using 7000 properties. But there's also a 0:02:31.960,0:02:36.830 bit more. So we also know that Donna[br]Strickland has "field of work: optics" and 0:02:36.830,0:02:41.420 also "field of work: lasers" so we can use[br]the same property to connect some entity 0:02:41.420,0:02:46.480 with different other entities. And we[br]don't even have to have knowledge that 0:02:46.480,0:02:56.530 connects the entities. We can have a date[br]of birth, which is 1959. Nineteen ninety. 0:02:56.530,0:03:05.530 No. Nineteen fifty nine. Yes. And this is[br]then just a plain date, not an entity. And 0:03:05.530,0:03:11.510 now coming from the explicit knowledge[br]then, well, we have some more we have 0:03:11.510,0:03:16.209 Donna Strickland has received a Nobel[br]prize in physics and also Marie Curie has 0:03:16.209,0:03:21.170 received the Nobel prize in physics. And[br]we also know that Marie Curie has a Nobel 0:03:21.170,0:03:27.780 prize ID that starts with "phys" and then[br]"1903" and some random numbers that 0:03:27.780,0:03:32.970 basically are this ID. Then Marie Curie[br]also has received a Nobel prize in 0:03:32.970,0:03:38.580 chemistry in 1911. So she has another[br]Nobel ID that starts with "chem" and has 0:03:38.580,0:03:43.590 "1911" there. And then there's also[br]Frances Arnold, who received the Nobel 0:03:43.590,0:03:48.549 prize in chemistry last year. So she has a[br]Nobel ID that starts with "chem" and has 0:03:48.549,0:03:54.740 "2018" there. And now one one could assume[br]that, well, everybody who was awarded the 0:03:54.740,0:04:00.156 Nobel prize should also have a Nobel ID.[br]So everybody who was awarded the Nobel 0:04:00.156,0:04:05.670 prize should also have a Nobel prize ID,[br]and we could write that as some 0:04:05.670,0:04:11.791 implication here. So "awarded(nobelPrize)"[br]implies "nobelID". And well, if you 0:04:11.791,0:04:16.349 look sharply at this picture, then there's[br]this arrow here conspicuously missing that 0:04:16.349,0:04:22.550 Donald Strickland doesn't have a Nobel[br]prize ID. And indeed, there's 25 people 0:04:22.550,0:04:26.669 currently on Wikidata that are missing[br]Nobel prize IDs, and Donna Strickland is 0:04:26.669,0:04:34.060 one of them. So we call these people that[br]don't satisfy this implication – we call 0:04:34.060,0:04:40.419 those counterexamples and well, if you[br]look at Wikidata on the scale of really 0:04:40.419,0:04:45.350 these 890 million statements, then you[br]won't find any counterexamples because 0:04:45.350,0:04:52.550 it's just too big. So we need some way to[br]automatically do that. And the idea is 0:04:52.550,0:04:58.930 that, well, if we had this knowledge that[br]while some implications are not satisfied, 0:04:58.930,0:05:03.840 then this encodes maybe missing[br]information or wrong information, and we 0:05:03.840,0:05:10.870 want to represent that in a way that is[br]easy to understand and also succinct. So 0:05:10.870,0:05:16.090 it doesn't take long to write it down, it[br]should have a short representation. So 0:05:16.090,0:05:23.060 that rules out anything, including complex[br]syntax or logical quantifies. So no SPARQL 0:05:23.060,0:05:27.480 queries as a description of that implicit[br]knowledge. No description logics, if 0:05:27.480,0:05:33.199 you've heard of that. And we also want[br]something that we can actually compute on 0:05:33.199,0:05:41.539 actual hardware in a reasonable timeframe.[br]So our approach is we use Formal Concept 0:05:41.539,0:05:46.889 Analysis, which is a technique that has[br]been developed over the past several years 0:05:46.889,0:05:52.070 to extract what is called propositional[br]implications. So just logical formulas of 0:05:52.070,0:05:56.240 propositional logic that are an[br]implication in the form of this 0:05:56.240,0:06:03.020 "awarded(nobelPrize)" implies "nobleID".[br]So what exactly is Formal Concept 0:06:03.020,0:06:08.500 Analysis? Off to Tom.[br]T: Thank you. So what is Formal Concept 0:06:08.500,0:06:14.420 Analysis? It was developed in 1980s by a[br]guy called Rudolf Wille and Bernard Ganter 0:06:14.420,0:06:18.539 and they were restructuring lattice[br]theory. Lattice theory is an ambiguous 0:06:18.539,0:06:23.370 name in math, it has two meanings: One[br]meaning is you have a grid and have a 0:06:23.370,0:06:29.050 lattice there. The other thing is to speak[br]about orders – order relations. So I like 0:06:29.050,0:06:34.150 steaks, I like pudding and I like steaks[br]more than pudding. And I like rice more 0:06:34.150,0:06:40.960 than steaks. That's an order, right? And[br]lattices are particular orders which can 0:06:40.960,0:06:46.770 be used to represent propositional logic.[br]So easy rules like "when it rains, the 0:06:46.770,0:06:52.990 street gets wet", right? So and the data[br]representation those guys used back then, 0:06:52.990,0:06:57.080 they called it a formal context, which is[br]basically just a set of objects – they 0:06:57.080,0:07:02.000 call them objects, it's just a name –, a[br]set of attributes and some incidence, 0:07:02.000,0:07:07.890 which basically means which object does[br]have which attributes. So, for example, my 0:07:07.890,0:07:13.150 laptop has the colour black. So this[br]object has some property, right? So that's 0:07:13.150,0:07:17.870 a small example on the right for such a[br]formal context. So the objects there are 0:07:17.870,0:07:24.379 some animals: a platypus – that's the fun[br]animal from Australia, the mammal which is 0:07:24.379,0:07:30.279 also laying eggs and which is also[br]venomous –, a black widow – the spider –, 0:07:30.279,0:07:35.449 the duck and the cat. So we see, the[br]platypus has all the properties; it has 0:07:35.449,0:07:39.729 being venomous, laying eggs and being a[br]mammal; we have the duck, which is not a 0:07:39.729,0:07:44.169 mammal, but it lays eggs, and so on and so[br]on. And it's very easy to grasp some 0:07:44.169,0:07:49.430 implicational knowledge here. An easy rule[br]you can find is whenever you endeavour a 0:07:49.430,0:07:54.300 mammal that is venomous, it has to lay[br]eggs. So this is a rule that falls out of 0:07:54.300,0:07:59.639 this binary data table. Our main problem[br]then or at this point is we do not have 0:07:59.639,0:08:03.470 such a data table for Wikidata, right? We[br]have the implicit graph, which is way more 0:08:03.470,0:08:09.030 expressive than binary data, and we cannot[br]even store Wikidata as a binary table. 0:08:09.030,0:08:13.859 Even if you tried to, we have no chance to[br]compute such rules from that. And for 0:08:13.859,0:08:21.460 this, the people from Formal Context[br]Analysis proposed an algorithm to extract 0:08:21.460,0:08:27.160 implicit knowledge from an expert. So our[br]expert here could be Wikidata. It's an 0:08:27.160,0:08:31.240 expert, you can ask Wikidata questions,[br]right? Using this SPARQL interface, you 0:08:31.240,0:08:34.739 can ask. You can ask "Is there an example[br]for that? Is there a counterexample for 0:08:34.739,0:08:39.880 something else?" So the algorithm is quite[br]easy. The algorithm is the algorithm and 0:08:39.880,0:08:45.380 some expert – in our case, Wikidata –, and[br]the algorithm keeps notes for 0:08:45.380,0:08:49.449 counterexamples and keeps notes for valid[br]implications. So in the beginning, we do 0:08:49.449,0:08:53.569 not have any valid implications, so this[br]list on the right is empty, and in the 0:08:53.569,0:08:56.780 beginning we do not have any[br]counterexamples. So the list on the left, 0:08:56.780,0:09:01.900 the formal context to build up is also[br]empty. And all the algorithm does now is, 0:09:01.900,0:09:09.170 it asks "is this implication, X follows Y,[br]Y follows X or X implies Y, is it true?" 0:09:09.170,0:09:14.000 So "is it true," for example, "that an[br]animal that is a mammal and is venomous 0:09:14.000,0:09:18.880 lays eggs?" So now the expert, which in[br]our case is Wikidata, can answer it. We 0:09:18.880,0:09:24.860 can query that. We showed in our paper we[br]can query that. So we query it, and if the 0:09:24.860,0:09:28.491 Wikidata expert does not find any[br]counterexamples, it will say, ok, that's 0:09:28.491,0:09:36.200 maybe a true, true thing; it's yes. Or if[br]it's not a true implication in Wikidata, 0:09:36.200,0:09:41.779 it can say, no, no, no, it's not true, and[br]here's a counterexample. So this is 0:09:41.779,0:09:48.510 something you contradict by example. You[br]say this rule cannot be true. For example, 0:09:48.510,0:09:52.900 when the street is wet, that does not mean[br]it has rained, right? It could be the 0:09:52.900,0:10:01.380 cleaning service car or something else. So[br]our idea now was to use Wikidata as an 0:10:01.380,0:10:05.819 expert, but also include a human into this[br]loop. So we do not just want to ask 0:10:05.819,0:10:11.709 Wikidata, we also want to ask a human[br]expert as well. So we first ask in our 0:10:11.709,0:10:18.520 tool the Wikidata expert for some rule.[br]After that, we also inquire the human 0:10:18.520,0:10:22.080 expert. And he can also say "yeah, that's[br]true, I know that," or "No, no. Wikidata 0:10:22.080,0:10:27.200 is not aware of this counterexample, I[br]know one." Or, in the other case "oh, 0:10:27.200,0:10:32.770 Wikidata says this is true. I am aware of[br]a counterexample." Yeah, and so on and so 0:10:32.770,0:10:37.600 on. And you can represent this more or[br]less – this is just some mathematical 0:10:37.600,0:10:41.689 picture, it's not very important. But you[br]can see on the left there's an exploration 0:10:41.689,0:10:46.720 going on, just Wikidata with the[br]algorithm, on the right an exploration, a 0:10:46.720,0:10:51.419 human expert versus Wikidata which can[br]answer all the queries. And we combined 0:10:51.419,0:10:57.720 those two into one small tool, still under[br]development. So, back to Max. 0:10:57.720,0:11:02.980 M: Okay. So far for that to work, we[br]basically need to have a way of viewing 0:11:02.980,0:11:08.070 Wikidata, or at least parts of Wikidata,[br]as a formal context. And this formal 0:11:08.070,0:11:13.610 context, well, this was a binary table, so[br]what do we do? We just take all the items 0:11:13.610,0:11:18.880 in Wikidata as objects and all the[br]properties as attributes of our context 0:11:18.880,0:11:24.159 and then have an incidence relation that[br]says "well, this entity has this 0:11:24.159,0:11:30.549 property," so it is incident there, and[br]then we end up with a context that has 71 0:11:30.549,0:11:36.430 million rows and seven thousand columns.[br]So, well, that might actually be a slight 0:11:36.430,0:11:40.180 problem there, because we want to have[br]something that we can run on actual 0:11:40.180,0:11:45.811 hardware and not on a supercomputer. So[br]let's maybe not do that and focus on 0:11:45.811,0:11:50.900 a smaller set of properties that are[br]actually related to one another through 0:11:50.900,0:11:55.689 some kind of common domain, yeah? So it[br]doesn't make any sense to have a property 0:11:55.689,0:11:59.640 that relates to spacecraft and then a[br]property that relates to books – that's 0:11:59.640,0:12:05.050 probably not a good idea to try to find[br]implicit knowledge between those two. But 0:12:05.050,0:12:10.259 two different properties about spacecraft,[br]that sounds good, right? And then the 0:12:10.259,0:12:15.000 interesting question is just how do we[br]define the incidence for our set of 0:12:15.000,0:12:20.150 properties? And that actually depends very[br]much on which properties we choose, 0:12:20.150,0:12:25.550 because it does – for some properties, it[br]makes sense to account for the direction 0:12:25.550,0:12:32.679 of the statement: So there is a property[br]called parent? Actually, no, it's child, 0:12:32.679,0:12:38.309 and then there's father and mother, and[br]you don't want to turn those around, as do 0:12:38.309,0:12:43.760 you want to have "A is a child of B," that[br]should be something different than "B 0:12:43.760,0:12:48.930 is a child of A." Then there's the[br]qualifiers that might be important for 0:12:48.930,0:12:54.740 some properties. So receiving an award for[br]something might be something different 0:12:54.740,0:13:00.740 than receiving an award for something[br]else. But while receiving an award in 2018 0:13:00.740,0:13:06.549 and receiving one in 2017, that's probably[br]more or less the same thing, so we don't 0:13:06.549,0:13:11.930 necessarily need to differentiate that.[br]And there's also a thing called subclasses 0:13:11.930,0:13:15.470 and they form a hierarchy on Wikidata. And[br]you might also want to take that into 0:13:15.470,0:13:20.150 account because while winning something[br]that is a Nobel prize, that means also 0:13:20.150,0:13:25.190 winning an award itself, and winning the[br]Nobel Peace prize means winning a peace 0:13:25.190,0:13:32.586 prize. So there's also implications going[br]on there that you want to respect. So, 0:13:32.586,0:13:38.400 to see how we actually do that, let's look[br]at an example. So we have here, well, this 0:13:38.400,0:13:47.030 is Donald Strickland. And – I forgot his[br]first name – Ashkin, this is one of the 0:13:47.030,0:13:51.720 people that won the Nobel prize in physics[br]with her last year. And also Gérard 0:13:51.720,0:13:57.990 Mourou. That is the third one. They all[br]got the Nobel prize in physics last year. 0:13:57.990,0:14:04.190 So we have all these statements here, and[br]these two have a qualifier that says 0:14:04.190,0:14:10.260 "with: Gérard Mourou" here. And I don't[br]think the qualifier is on this statement 0:14:10.260,0:14:15.160 here, actually, but it doesn't actually[br]matter. So what we've done here is, 0:14:15.160,0:14:21.190 put all the entities in the small graph as[br]rows in the table. So we have Strickland 0:14:21.190,0:14:27.850 and Mourou and Ashkin, and also Arnold and[br]Curie that are not in the picture. But you 0:14:27.850,0:14:33.290 can maybe remember that. And then here we[br]have awarded, and we scaled that by the 0:14:33.290,0:14:37.250 instance of the different Nobel prizes[br]that people have won. So that's the 0:14:37.250,0:14:42.209 physics Nobel in the first column, the[br]chemistry Nobel Prize in the second column 0:14:42.209,0:14:48.380 and just general Nobel prizes in the third[br]column. There's awarded and that is scaled 0:14:48.380,0:14:55.240 by the "with" qualifier, so awarded with[br]Gérard Mourou. And then there's field of 0:14:55.240,0:15:00.450 work, and we have lasers here and[br]radioactivity, so we scale by the actual 0:15:00.450,0:15:06.580 field of work that people have. And well[br]then, if we look at what kind of incidence 0:15:06.580,0:15:11.370 we get for Donna Strickland, she has a[br]Nobel prize in physics and that is also a 0:15:11.370,0:15:17.190 Nobel prize, and she has that together[br]with Mourou. And she has "field of work: 0:15:17.190,0:15:23.220 lasers," but not radioactivity. Then,[br]Mourou himself: he has a Nobel prize in 0:15:23.220,0:15:29.450 physics, and that is a Nobel prize, but[br]none of the others. Ashkin gets the Nobel 0:15:29.450,0:15:33.890 prize in physics, and that is still a[br]Nobel prize, and he gets that with Gérard 0:15:33.890,0:15:40.970 Mourou. And also he works on lasers, but[br]not in radioactivity. So Frances Arnold 0:15:40.970,0:15:47.230 has a Nobel prize in chemistry, and that[br]is a Nobel prize. And Marie Curie, she has 0:15:47.230,0:15:50.510 a Nobel prize in physics and one in[br]chemistry, and they are both a Nobel 0:15:50.510,0:15:55.319 prize. And she also works on[br]radioactivity. But lasers didn't exist 0:15:55.319,0:16:02.490 back then, so she doesn't get "field of[br]work: lasers." And then basically this 0:16:02.490,0:16:10.289 table here is a representation of our[br]formal context. So and then we've actually 0:16:10.289,0:16:14.840 gone ahead and started building a tool[br]where you can interactively do all these 0:16:14.840,0:16:20.320 things, and it will take care of building[br]the context for you. You just put in the 0:16:20.320,0:16:24.540 properties, and Tom will show[br]you how that works. 0:16:24.540,0:16:29.030 T: So here you see some first screenshots[br]of this tool. So please do not comment on 0:16:29.030,0:16:32.520 the graphic design. We have no idea about[br]that, we have to ask someone about that. 0:16:32.520,0:16:36.120 We're just into logics, more or less. On[br]the left, you see the initial state of the 0:16:36.120,0:16:41.120 game. On the left you have five boxes:[br]they're called countries and borders, 0:16:41.120,0:16:47.370 credit cards, use of energy, memory and[br]computation – I think –, and space 0:16:47.370,0:16:53.180 launches, which are just presets we[br]defined. You can explore, for example, in 0:16:53.180,0:16:57.050 the case of the credit card, you can[br]explore the properties from Wikidata which 0:16:57.050,0:17:02.170 are called "card network," "operator," and[br]"fee," so you can just choose one of them, 0:17:02.170,0:17:05.530 or on the right, "custom properties," you[br]can just input the properties you're 0:17:05.530,0:17:10.640 interested in Wikidata, whatever one of[br]the seven thousand you like, or some 0:17:10.640,0:17:15.140 number of them. On the right, I chose then[br]the credit card thingy and I now want to 0:17:15.140,0:17:21.860 show you what happens if you now explore[br]these properties, right? The first step in 0:17:21.860,0:17:25.750 the game is that the game will ask – I[br]mean, the game, the exploration process – 0:17:25.750,0:17:31.020 will ask, is it true that every entity in[br]Wikidata will have these three properties? 0:17:31.020,0:17:36.360 So are they common among all entities in[br]your data, which is most probably not 0:17:36.360,0:17:41.540 true, right? I mean, not everything in[br]Wikidata has a fee, at least I hope. So, 0:17:41.540,0:17:46.520 what I will do now, I would click the[br]"reject this implication" button, since 0:17:46.520,0:17:51.480 the implication "Nothing implies[br]everything" is not true. In the second 0:17:51.480,0:17:56.360 step now, the algorithm tries to find the[br]minimal number of questions to obtain the 0:17:56.360,0:18:01.820 domain knowledge, so to obtain all valid[br]rules in this domain. So next question is 0:18:01.820,0:18:06.120 "is it true that everything in Wikidata[br]that has a 'card network' property also 0:18:06.120,0:18:12.560 has a 'fee' and an 'operator' property?"[br]And down here you can see Wikidata says 0:18:12.560,0:18:18.110 "ok, there are 26 items which are[br]counterexamples," so there's 26 items in 0:18:18.110,0:18:22.670 Wikidata which have the "card network"[br]property but do not have the other two 0:18:22.670,0:18:28.200 ones. So, 26 is not a big number, this[br]could mean "ok, that's an error, so 26 0:18:28.200,0:18:32.860 statements are missing." Or maybe that[br]that's, really, that's the true case. 0:18:32.860,0:18:36.890 That's also ok. But you can now choose[br]what you think is right. You can say, "oh, 0:18:36.890,0:18:40.470 I would say it should be true" or you can[br]say "no, I think that's ok, one of these 0:18:40.470,0:18:46.380 counterexamples seems valid. Let's reject[br]it." I in this case, rejected it. The next 0:18:46.380,0:18:51.020 question it asks: "is it true that[br]everything that has an operator has also a 0:18:51.020,0:18:56.290 fee and a card network?" Yeah, this is[br]possibly not true. There's also more than 0:18:56.290,0:19:03.110 1000 counterexamples, one being, I think a[br]telecommunication operator in Hungary or 0:19:03.110,0:19:10.340 something. And so we can reject this as[br]well. Next question, everything that has 0:19:10.340,0:19:15.360 an operator and a card network – so card[br]network means Visa, MasterCard, whatever, 0:19:15.360,0:19:21.690 all this stuff – is it true that they have[br]to have a fee?" Wikidata says "no," it has 0:19:21.690,0:19:27.570 23 items that contradict it. But one of[br]the items, for example, is the American 0:19:27.570,0:19:32.090 Express Gold Card. I suppose the American[br]Express Gold Card has some fee. So this 0:19:32.090,0:19:36.140 indicates, "oh, there is some missing data[br]in Wikidata," there is something that 0:19:36.140,0:19:40.680 Wikidata does not know but should know to[br]reason correctly in Wikidata with your 0:19:40.680,0:19:46.520 SPARQL queries. So we can now say, "yeah,[br]that's, uh, that's not a reject, that's an 0:19:46.520,0:19:51.470 accept," because we think it should be[br]true. But Wikidata thinks otherwise. And 0:19:51.470,0:19:55.800 you go on, we go on. This is then the last[br]question: "Is it true that everything that 0:19:55.800,0:20:00.950 has a fee and a card work should have an[br]operator," and you see, "oh, no counter 0:20:00.950,0:20:05.930 examples." This means Wikidata says "this[br]is true," because it says there is no 0:20:05.930,0:20:09.580 counterexample. If you're asking Wikidata[br]it says this is a valid implication in the 0:20:09.580,0:20:15.400 data set so far, which could also be[br]indicating that something is missing, I'm 0:20:15.400,0:20:20.310 not aware if this is possible or not, but[br]ok, for me it sounds reasonable. Everyone 0:20:20.310,0:20:23.800 has a fee and a card network should also[br]have an operator, which meens a bank or 0:20:23.800,0:20:29.220 something like that. So I accept this[br]implication. And then, yeah, you have won 0:20:29.220,0:20:34.410 the exploration game, which essentially[br]means you've won some knowledge. Thank 0:20:34.410,0:20:40.300 you. And the knowledge is that you know[br]which implications in Wikidata are true or 0:20:40.300,0:20:44.340 should be true from your point of view.[br]And yeah, this is more or less the state 0:20:44.340,0:20:50.700 of the game so far as we programmed it in[br]October. And the next state will be to 0:20:50.700,0:20:54.970 show you some – "How much does your[br]opinion of the world differ from the 0:20:54.970,0:20:59.950 opinion that is now reflected in the[br]data?" So is what you think about the data 0:20:59.950,0:21:05.430 true, close to true to what is true in[br]Wikidata. Or maybe Wikidata has wrong 0:21:05.430,0:21:10.680 information. You can find it with that.[br]But Max will tell me more about that. 0:21:10.680,0:21:18.220 M: Ok. So let me just quickly come[br]back to what we have actually done. So we 0:21:18.220,0:21:23.670 offer a procedure that allows you to[br]explore properties in Wikidata and the 0:21:23.670,0:21:30.720 implicational knowledge that holds between[br]these properties. And the key idea's here 0:21:30.720,0:21:34.661 that when you look at these implications[br]that you get, while there might be some 0:21:34.661,0:21:39.280 that you don't actually want because they[br]shouldn't be true, and there might also be 0:21:39.280,0:21:46.220 ones that you don't get, but you expect to[br]get because they should hold. And these 0:21:46.220,0:21:51.840 unwanted and/or missing implications, they[br]point to missing statements and items in 0:21:51.840,0:21:56.130 Wikidata. So they show you where the[br]opportunities to improve the knowledge in 0:21:56.130,0:22:00.100 Wikidata are, and, well, sometimes you[br]also get to learn something about the 0:22:00.100,0:22:04.080 world, and in most cases, it's that the[br]world is more complicated than you thought 0:22:04.080,0:22:10.260 it was – and that's just how life is. But[br]in general, implications can guide you in 0:22:10.260,0:22:17.220 your way of improving Wikidata and the[br]state of knowledge therein. So what's 0:22:17.220,0:22:22.380 next? Well, so what we currently don't[br]offer in the exploration game and what we 0:22:22.380,0:22:27.710 definitely will focus next on is having[br]configurable counterexamples and also 0:22:27.710,0:22:32.030 filterable counterexamples – right now you[br]just get a list of a random number of 0:22:32.030,0:22:36.880 counterexamples. And you might want to[br]search through this list for something you 0:22:36.880,0:22:42.520 recognise and you might also want to[br]explicitly say, well, this one should be a 0:22:42.520,0:22:48.600 counterexample, and that's definitely[br]coming next. Then, well, domain specific 0:22:48.600,0:22:53.750 scaling of properties, there's still much[br]work to be done. Currently, we only have 0:22:53.750,0:23:00.500 some very basic support for that. So you[br]can have properties, but you can't do the 0:23:00.500,0:23:03.780 fancy things where you say, "well,[br]everything that is an award should be 0:23:03.780,0:23:10.840 considered as one instance of this[br]property." That's also coming and then 0:23:10.840,0:23:15.550 what Tom mentioned alread: compare your[br]knowledge that you have explored through 0:23:15.550,0:23:21.610 this process against the knowledge that is[br]currently on Wikidata as a form of seeing 0:23:21.610,0:23:26.540 "where do you stand? What is missing in[br]Wikidata? How can you improve Wikidata?" 0:23:26.540,0:23:32.600 And well, if you have any more suggestions[br]for features, then just tell us. There's a 0:23:32.600,0:23:39.530 Github link on the implication game page.[br]And here's the link to the tool again. So, 0:23:39.530,0:23:46.140 yeah, just let us know. Open an issue and[br]have fun. And if you have any questions, 0:23:46.140,0:23:50.230 then I guess now would be the time to ask.[br]T: Thank you. 0:23:50.230,0:23:52.730 Herald: Thank you very much, Tom and Max. 0:23:52.730,0:23:55.020 applause 0:23:55.020,0:24:01.510 Herald: So we will switch microphones now[br]because then I can hand this microphone to 0:24:01.510,0:24:07.250 you if any of you have a question for our[br]two speakers. Are there any questions or 0:24:07.250,0:24:14.370 suggestions? Yes.[br]Question: Hi. Thanks for the nice talk. I 0:24:14.370,0:24:18.720 wanted to ask what's the first question,[br]what's the most interesting implication 0:24:18.720,0:24:25.020 that you've found?[br]M: Yeah. That would have made for a 0:24:25.020,0:24:31.850 good back up slide. The most interesting[br]implication so far – 0:24:31.850,0:24:36.010 T: The most basic thing you would expect[br]everything that is launched in space by 0:24:36.010,0:24:41.920 humans – no, everything that landed from[br]space, that has a landing date, also has a 0:24:41.920,0:24:46.450 start date. So nothing landed on earth,[br]which was not started here. 0:24:46.450,0:24:55.200 M: Yes.[br]Q: Right now, the game only helps you find 0:24:55.200,0:25:00.710 out implications. Are you also planning to[br]have that I can also add data like for 0:25:00.710,0:25:04.309 example, let's say I have twenty five[br]Nobel laureates who don't have a Nobel 0:25:04.309,0:25:08.220 laureate ID. Is there plans where you[br]could give me a simple interface for me to 0:25:08.220,0:25:12.760 Google and add that ID because it would[br]make the process of adding new entities to 0:25:12.760,0:25:17.400 Wikidata itself more simple.[br]M: Yes. And that's partly hidden 0:25:17.400,0:25:23.050 behind this "configurable and filterable[br]counterexamples" thing. We will probably 0:25:23.050,0:25:28.380 not have an explicit interface for adding[br]stuff, but most likely interface with some 0:25:28.380,0:25:32.270 other tool built around Wikidata, so[br]probably something that will give you 0:25:32.270,0:25:37.100 QuickStatements or something like that.[br]But yes, adding data is definitely on the 0:25:37.100,0:25:41.710 roadmap.[br]Herald: Any more questions? Yes. 0:25:41.710,0:25:48.860 Q: Wouldn't it be nice to do this in other[br]languages, too? 0:25:48.860,0:25:52.600 T: Actually it's language independent, so[br]we use Wikidata and then as far as we 0:25:52.600,0:25:58.110 know, Wikidata has no language itself. You[br]know, it has just items and properties, so 0:25:58.110,0:26:02.640 Qs and Ps, and whatever language you use,[br]it should be translated in the language of 0:26:02.640,0:26:06.180 the properties, if there is a label for[br]that property or for that item that you 0:26:06.180,0:26:12.420 have. So if Wikidata is aware of your[br]language, we are. 0:26:12.420,0:26:15.020 Herald: Oh, yes. More![br]M: Of course, the tool still needs to be 0:26:15.020,0:26:18.360 translated, but –[br]T: The tool itself, it should be. 0:26:18.360,0:26:21.850 Q: Hi, thanks for the talk. I have a[br]question. Right now you only can find 0:26:21.850,0:26:25.990 missing data with this, right? Or surplus[br]data. Would you think you'd be able to 0:26:25.990,0:26:31.560 find wrong information with a similar[br]approach. 0:26:31.560,0:26:37.001 T: Actually, we do. I mean, if we Wikidata[br]has a counterexample to something we would 0:26:37.001,0:26:42.830 expect to be true, this could point to[br]wrong data, right? If the counterexample 0:26:42.830,0:26:47.450 is a wrong counterexample. If there is a[br]missing property or missing property to an 0:26:47.450,0:26:58.160 item.[br]Q: Ok, I get to ask a second question. So 0:26:58.160,0:27:06.000 the horizontal axis in the incidence[br]matrix. You said it has 7000, it spans 0:27:06.000,0:27:10.300 7000 columns, right?[br]M: Yes, because there's 7000 properties in 0:27:10.300,0:27:13.850 Wikidata.[br]Q: But it's actually way more columns, 0:27:13.850,0:27:17.849 right? Because you multiply the properties[br]times the arguments, right? 0:27:17.849,0:27:21.360 M: Yes. So if you do any scaling then of[br]course that might give you multiple 0:27:21.360,0:27:23.380 entries.[br]Q: So that's what you mean with scaling, 0:27:23.380,0:27:27.770 basically?[br]M: Yes. But already seven thousand is way 0:27:27.770,0:27:35.580 too big to actually compute that.[br]Q: How many would it be if you multiply 0:27:35.580,0:27:48.060 all the arguments?[br]M: I have no idea, probably a few million. 0:27:48.060,0:27:55.309 Q: Have you thought about a recursive[br]method, as counterexamples may be wrong by 0:27:55.309,0:28:00.350 other counterexamples, like in an[br]argumentative graph or something like 0:28:00.350,0:28:06.708 this?[br]T: Actually, I don't get it. How can a 0:28:06.708,0:28:14.040 counterexample be wrong through another[br]counterxample? 0:28:14.040,0:28:24.450 Q: Maybe some example says that cats can[br]have golden hair and then another example 0:28:24.450,0:28:31.260 might say that this is not a cat.[br]T: Ah, so the property to be a cat or 0:28:31.260,0:28:38.000 something cat-ish is missing then. Okay.[br]No, we have not considered so far deeper 0:28:38.000,0:28:44.570 reasoning. This horn-propositional logic,[br]you know, it has no contradictions, 0:28:44.570,0:28:47.740 because all you can do is you can[br]contradict by counterexamples, but there 0:28:47.740,0:28:52.740 can never be a rule that is not true, so[br]far. Just in your or my opinion, maybe, 0:28:52.740,0:28:56.370 but not in the logic. So what we have to[br]think about is that we have bigger 0:28:56.370,0:29:01.780 reasoning, right? So.[br]Q: Sorry, quick question. Because you're 0:29:01.780,0:29:04.929 not considering all the 7000 odd[br]properties for each of the entities, 0:29:04.929,0:29:07.570 right? What's your current process of[br]filtering? What are the relevant 0:29:07.570,0:29:14.820 properties? I'm sorry, I didn't get that.[br]M: Well, we basically handpick those. So 0:29:14.820,0:29:19.940 you have this input field? Yeah, we can go[br]ahead and select our properties. We also 0:29:19.940,0:29:26.870 have some predefined sets. Okay. And[br]there's also some classes for groups of 0:29:26.870,0:29:30.780 properties that are related that you could[br]use if you want bigger sets, 0:29:30.780,0:29:35.960 T: for example, space or family or what[br]was the other? 0:29:35.960,0:29:43.410 M: Awards is one.[br]T: It depends on the size of the class. 0:29:43.410,0:29:47.390 For example, for space, it's not that[br]much, I think it's 10 or 15 properties. It 0:29:47.390,0:29:51.520 will take you some hours, but you can do[br]because they are 15 or something like 0:29:51.520,0:29:58.150 that. I think for family, it's way too[br]much, it's like 40 of 50 properties. So a 0:29:58.150,0:30:04.540 lot of questions.[br]Herald: I don't see any more hands. Maybe 0:30:04.540,0:30:09.760 someone who has not asked the question yet[br]has another one we could take that, 0:30:09.760,0:30:14.270 otherwise we would be perfectly on time.[br]And maybe you can tell us where you will 0:30:14.270,0:30:18.860 be for deeper discussions where people can[br]find you. 0:30:18.860,0:30:22.400 T: Probably at the couches.[br]Herald: The couches, behind our stage. 0:30:22.400,0:30:26.720 M: Or just running around somewhere. So[br]there's also our DECT numbers on the 0:30:26.720,0:30:35.960 slides; it's 6284 for Tom and 6279 for me.[br]So just call and ask where we're hanging 0:30:35.960,0:30:38.470 around.[br]H: Well then, thank you again. Have a 0:30:38.470,0:30:40.210 round of applause.[br]applause 0:30:40.210,0:30:42.650 T: Thank you.[br]M: Well, thanks for having us. 0:30:42.650,0:30:45.310 Applause 0:30:45.310,0:30:49.740 postroll music 0:30:49.740,0:31:12.000 subtitles created by c3subtitles.de[br]in the year 2020. Join, and help us!