0:00:00.000,0:00:19.480
36C3 preroll music
0:00:19.480,0:00:24.140
Herald Angel: We have Tom and Max here.[br]They have a talk here with a very
0:00:24.140,0:00:28.140
complicated title that I don't quite[br]understand yet. It's called "Interactively
0:00:28.140,0:00:35.810
Discovering Implicational Knowledge in[br]Wikidata. And they told me the point of
0:00:35.810,0:00:39.190
the talk is that I would like to[br]understand what it means and I hope I
0:00:39.190,0:00:42.190
will. So good luck.[br]Tom: Thank you very much.
0:00:42.190,0:00:44.310
Herald: And have some applause, please.
0:00:44.310,0:00:47.880
applause
0:00:47.880,0:00:54.980
T: Thank you very much. Do you hear me?[br]Does it work? Hello? Oh, very good. Thank
0:00:54.980,0:00:58.789
you very much and welcome to our talk[br]about interactively discovering
0:00:58.789,0:01:05.110
implicational knowledge in Wikidata. It[br]is more or less a fun project we started
0:01:05.110,0:01:10.890
for finding rules that are implicit in[br]Wikidata – entailed just by the data it
0:01:10.890,0:01:18.850
has, that people inserted into the[br]Wikidata database so far. And we will
0:01:18.850,0:01:23.570
start with the explicit knowledge. So the[br]explicit data in Wikidata, with Max.
0:01:23.570,0:01:28.340
Max: So. Right. What what is Wikidata?[br]Maybe you have heard about Wikidata, then
0:01:28.340,0:01:33.210
that's all fine. Maybe you haven't, then[br]surely you've heard of Wikipedia. And
0:01:33.210,0:01:36.790
Wikipedia is run by the Wikimedia[br]Foundation and the Wikimedia Foundation
0:01:36.790,0:01:41.330
has several other projects. And one of[br]those is Wikidata. And Wikidata is
0:01:41.330,0:01:45.490
basically a large graph that encodes[br]machine readable knowledge in the form of
0:01:45.490,0:01:51.730
statements. And a statement basically[br]consists of some entity that is connected
0:01:51.730,0:01:58.200
– or some some entities that are connected[br]by some property. And these properties
0:01:58.200,0:02:02.909
can then even have annotations on them.[br]So, for example, we have Donna Strickland
0:02:02.909,0:02:09.149
here and we encode that she has received a[br]Nobel prize in physics last year by this
0:02:09.149,0:02:16.290
property "awarded" and this has then a[br]qualifier "time: 2018" and also "for:
0:02:16.290,0:02:23.100
Chirped Pulse Amplification". And all in[br]all, we have some 890 million statements
0:02:23.100,0:02:31.960
on Wikidata that connect 71 million items[br]using 7000 properties. But there's also a
0:02:31.960,0:02:36.830
bit more. So we also know that Donna[br]Strickland has "field of work: optics" and
0:02:36.830,0:02:41.420
also "field of work: lasers" so we can use[br]the same property to connect some entity
0:02:41.420,0:02:46.480
with different other entities. And we[br]don't even have to have knowledge that
0:02:46.480,0:02:56.530
connects the entities. We can have a date[br]of birth, which is 1959. Nineteen ninety.
0:02:56.530,0:03:05.530
No. Nineteen fifty nine. Yes. And this is[br]then just a plain date, not an entity. And
0:03:05.530,0:03:11.510
now coming from the explicit knowledge[br]then, well, we have some more we have
0:03:11.510,0:03:16.209
Donna Strickland has received a Nobel[br]prize in physics and also Marie Curie has
0:03:16.209,0:03:21.170
received the Nobel prize in physics. And[br]we also know that Marie Curie has a Nobel
0:03:21.170,0:03:27.780
prize ID that starts with "phys" and then[br]"1903" and some random numbers that
0:03:27.780,0:03:32.970
basically are this ID. Then Marie Curie[br]also has received a Nobel prize in
0:03:32.970,0:03:38.580
chemistry in 1911. So she has another[br]Nobel ID that starts with "chem" and has
0:03:38.580,0:03:43.590
"1911" there. And then there's also[br]Frances Arnold, who received the Nobel
0:03:43.590,0:03:48.549
prize in chemistry last year. So she has a[br]Nobel ID that starts with "chem" and has
0:03:48.549,0:03:54.740
"2018" there. And now one one could assume[br]that, well, everybody who was awarded the
0:03:54.740,0:04:00.156
Nobel prize should also have a Nobel ID.[br]So everybody who was awarded the Nobel
0:04:00.156,0:04:05.670
prize should also have a Nobel prize ID,[br]and we could write that as some
0:04:05.670,0:04:11.791
implication here. So "awarded(nobelPrize)"[br]implies "nobelID". And well, if you
0:04:11.791,0:04:16.349
look sharply at this picture, then there's[br]this arrow here conspicuously missing that
0:04:16.349,0:04:22.550
Donald Strickland doesn't have a Nobel[br]prize ID. And indeed, there's 25 people
0:04:22.550,0:04:26.669
currently on Wikidata that are missing[br]Nobel prize IDs, and Donna Strickland is
0:04:26.669,0:04:34.060
one of them. So we call these people that[br]don't satisfy this implication – we call
0:04:34.060,0:04:40.419
those counterexamples and well, if you[br]look at Wikidata on the scale of really
0:04:40.419,0:04:45.350
these 890 million statements, then you[br]won't find any counterexamples because
0:04:45.350,0:04:52.550
it's just too big. So we need some way to[br]automatically do that. And the idea is
0:04:52.550,0:04:58.930
that, well, if we had this knowledge that[br]while some implications are not satisfied,
0:04:58.930,0:05:03.840
then this encodes maybe missing[br]information or wrong information, and we
0:05:03.840,0:05:10.870
want to represent that in a way that is[br]easy to understand and also succinct. So
0:05:10.870,0:05:16.090
it doesn't take long to write it down, it[br]should have a short representation. So
0:05:16.090,0:05:23.060
that rules out anything, including complex[br]syntax or logical quantifies. So no SPARQL
0:05:23.060,0:05:27.480
queries as a description of that implicit[br]knowledge. No description logics, if
0:05:27.480,0:05:33.199
you've heard of that. And we also want[br]something that we can actually compute on
0:05:33.199,0:05:41.539
actual hardware in a reasonable timeframe.[br]So our approach is we use Formal Concept
0:05:41.539,0:05:46.889
Analysis, which is a technique that has[br]been developed over the past several years
0:05:46.889,0:05:52.070
to extract what is called propositional[br]implications. So just logical formulas of
0:05:52.070,0:05:56.240
propositional logic that are an[br]implication in the form of this
0:05:56.240,0:06:03.020
"awarded(nobelPrize)" implies "nobleID".[br]So what exactly is Formal Concept
0:06:03.020,0:06:08.500
Analysis? Off to Tom.[br]T: Thank you. So what is Formal Concept
0:06:08.500,0:06:14.420
Analysis? It was developed in 1980s by a[br]guy called Rudolf Wille and Bernard Ganter
0:06:14.420,0:06:18.539
and they were restructuring lattice[br]theory. Lattice theory is an ambiguous
0:06:18.539,0:06:23.370
name in math, it has two meanings: One[br]meaning is you have a grid and have a
0:06:23.370,0:06:29.050
lattice there. The other thing is to speak[br]about orders – order relations. So I like
0:06:29.050,0:06:34.150
steaks, I like pudding and I like steaks[br]more than pudding. And I like rice more
0:06:34.150,0:06:40.960
than steaks. That's an order, right? And[br]lattices are particular orders which can
0:06:40.960,0:06:46.770
be used to represent propositional logic.[br]So easy rules like "when it rains, the
0:06:46.770,0:06:52.990
street gets wet", right? So and the data[br]representation those guys used back then,
0:06:52.990,0:06:57.080
they called it a formal context, which is[br]basically just a set of objects – they
0:06:57.080,0:07:02.000
call them objects, it's just a name –, a[br]set of attributes and some incidence,
0:07:02.000,0:07:07.890
which basically means which object does[br]have which attributes. So, for example, my
0:07:07.890,0:07:13.150
laptop has the colour black. So this[br]object has some property, right? So that's
0:07:13.150,0:07:17.870
a small example on the right for such a[br]formal context. So the objects there are
0:07:17.870,0:07:24.379
some animals: a platypus – that's the fun[br]animal from Australia, the mammal which is
0:07:24.379,0:07:30.279
also laying eggs and which is also[br]venomous –, a black widow – the spider –,
0:07:30.279,0:07:35.449
the duck and the cat. So we see, the[br]platypus has all the properties; it has
0:07:35.449,0:07:39.729
being venomous, laying eggs and being a[br]mammal; we have the duck, which is not a
0:07:39.729,0:07:44.169
mammal, but it lays eggs, and so on and so[br]on. And it's very easy to grasp some
0:07:44.169,0:07:49.430
implicational knowledge here. An easy rule[br]you can find is whenever you endeavour a
0:07:49.430,0:07:54.300
mammal that is venomous, it has to lay[br]eggs. So this is a rule that falls out of
0:07:54.300,0:07:59.639
this binary data table. Our main problem[br]then or at this point is we do not have
0:07:59.639,0:08:03.470
such a data table for Wikidata, right? We[br]have the implicit graph, which is way more
0:08:03.470,0:08:09.030
expressive than binary data, and we cannot[br]even store Wikidata as a binary table.
0:08:09.030,0:08:13.859
Even if you tried to, we have no chance to[br]compute such rules from that. And for
0:08:13.859,0:08:21.460
this, the people from Formal Context[br]Analysis proposed an algorithm to extract
0:08:21.460,0:08:27.160
implicit knowledge from an expert. So our[br]expert here could be Wikidata. It's an
0:08:27.160,0:08:31.240
expert, you can ask Wikidata questions,[br]right? Using this SPARQL interface, you
0:08:31.240,0:08:34.739
can ask. You can ask "Is there an example[br]for that? Is there a counterexample for
0:08:34.739,0:08:39.880
something else?" So the algorithm is quite[br]easy. The algorithm is the algorithm and
0:08:39.880,0:08:45.380
some expert – in our case, Wikidata –, and[br]the algorithm keeps notes for
0:08:45.380,0:08:49.449
counterexamples and keeps notes for valid[br]implications. So in the beginning, we do
0:08:49.449,0:08:53.569
not have any valid implications, so this[br]list on the right is empty, and in the
0:08:53.569,0:08:56.780
beginning we do not have any[br]counterexamples. So the list on the left,
0:08:56.780,0:09:01.900
the formal context to build up is also[br]empty. And all the algorithm does now is,
0:09:01.900,0:09:09.170
it asks "is this implication, X follows Y,[br]Y follows X or X implies Y, is it true?"
0:09:09.170,0:09:14.000
So "is it true," for example, "that an[br]animal that is a mammal and is venomous
0:09:14.000,0:09:18.880
lays eggs?" So now the expert, which in[br]our case is Wikidata, can answer it. We
0:09:18.880,0:09:24.860
can query that. We showed in our paper we[br]can query that. So we query it, and if the
0:09:24.860,0:09:28.491
Wikidata expert does not find any[br]counterexamples, it will say, ok, that's
0:09:28.491,0:09:36.200
maybe a true, true thing; it's yes. Or if[br]it's not a true implication in Wikidata,
0:09:36.200,0:09:41.779
it can say, no, no, no, it's not true, and[br]here's a counterexample. So this is
0:09:41.779,0:09:48.510
something you contradict by example. You[br]say this rule cannot be true. For example,
0:09:48.510,0:09:52.900
when the street is wet, that does not mean[br]it has rained, right? It could be the
0:09:52.900,0:10:01.380
cleaning service car or something else. So[br]our idea now was to use Wikidata as an
0:10:01.380,0:10:05.819
expert, but also include a human into this[br]loop. So we do not just want to ask
0:10:05.819,0:10:11.709
Wikidata, we also want to ask a human[br]expert as well. So we first ask in our
0:10:11.709,0:10:18.520
tool the Wikidata expert for some rule.[br]After that, we also inquire the human
0:10:18.520,0:10:22.080
expert. And he can also say "yeah, that's[br]true, I know that," or "No, no. Wikidata
0:10:22.080,0:10:27.200
is not aware of this counterexample, I[br]know one." Or, in the other case "oh,
0:10:27.200,0:10:32.770
Wikidata says this is true. I am aware of[br]a counterexample." Yeah, and so on and so
0:10:32.770,0:10:37.600
on. And you can represent this more or[br]less – this is just some mathematical
0:10:37.600,0:10:41.689
picture, it's not very important. But you[br]can see on the left there's an exploration
0:10:41.689,0:10:46.720
going on, just Wikidata with the[br]algorithm, on the right an exploration, a
0:10:46.720,0:10:51.419
human expert versus Wikidata which can[br]answer all the queries. And we combined
0:10:51.419,0:10:57.720
those two into one small tool, still under[br]development. So, back to Max.
0:10:57.720,0:11:02.980
M: Okay. So far for that to work, we[br]basically need to have a way of viewing
0:11:02.980,0:11:08.070
Wikidata, or at least parts of Wikidata,[br]as a formal context. And this formal
0:11:08.070,0:11:13.610
context, well, this was a binary table, so[br]what do we do? We just take all the items
0:11:13.610,0:11:18.880
in Wikidata as objects and all the[br]properties as attributes of our context
0:11:18.880,0:11:24.159
and then have an incidence relation that[br]says "well, this entity has this
0:11:24.159,0:11:30.549
property," so it is incident there, and[br]then we end up with a context that has 71
0:11:30.549,0:11:36.430
million rows and seven thousand columns.[br]So, well, that might actually be a slight
0:11:36.430,0:11:40.180
problem there, because we want to have[br]something that we can run on actual
0:11:40.180,0:11:45.811
hardware and not on a supercomputer. So[br]let's maybe not do that and focus on
0:11:45.811,0:11:50.900
a smaller set of properties that are[br]actually related to one another through
0:11:50.900,0:11:55.689
some kind of common domain, yeah? So it[br]doesn't make any sense to have a property
0:11:55.689,0:11:59.640
that relates to spacecraft and then a[br]property that relates to books – that's
0:11:59.640,0:12:05.050
probably not a good idea to try to find[br]implicit knowledge between those two. But
0:12:05.050,0:12:10.259
two different properties about spacecraft,[br]that sounds good, right? And then the
0:12:10.259,0:12:15.000
interesting question is just how do we[br]define the incidence for our set of
0:12:15.000,0:12:20.150
properties? And that actually depends very[br]much on which properties we choose,
0:12:20.150,0:12:25.550
because it does – for some properties, it[br]makes sense to account for the direction
0:12:25.550,0:12:32.679
of the statement: So there is a property[br]called parent? Actually, no, it's child,
0:12:32.679,0:12:38.309
and then there's father and mother, and[br]you don't want to turn those around, as do
0:12:38.309,0:12:43.760
you want to have "A is a child of B," that[br]should be something different than "B
0:12:43.760,0:12:48.930
is a child of A." Then there's the[br]qualifiers that might be important for
0:12:48.930,0:12:54.740
some properties. So receiving an award for[br]something might be something different
0:12:54.740,0:13:00.740
than receiving an award for something[br]else. But while receiving an award in 2018
0:13:00.740,0:13:06.549
and receiving one in 2017, that's probably[br]more or less the same thing, so we don't
0:13:06.549,0:13:11.930
necessarily need to differentiate that.[br]And there's also a thing called subclasses
0:13:11.930,0:13:15.470
and they form a hierarchy on Wikidata. And[br]you might also want to take that into
0:13:15.470,0:13:20.150
account because while winning something[br]that is a Nobel prize, that means also
0:13:20.150,0:13:25.190
winning an award itself, and winning the[br]Nobel Peace prize means winning a peace
0:13:25.190,0:13:32.586
prize. So there's also implications going[br]on there that you want to respect. So,
0:13:32.586,0:13:38.400
to see how we actually do that, let's look[br]at an example. So we have here, well, this
0:13:38.400,0:13:47.030
is Donald Strickland. And – I forgot his[br]first name – Ashkin, this is one of the
0:13:47.030,0:13:51.720
people that won the Nobel prize in physics[br]with her last year. And also Gérard
0:13:51.720,0:13:57.990
Mourou. That is the third one. They all[br]got the Nobel prize in physics last year.
0:13:57.990,0:14:04.190
So we have all these statements here, and[br]these two have a qualifier that says
0:14:04.190,0:14:10.260
"with: Gérard Mourou" here. And I don't[br]think the qualifier is on this statement
0:14:10.260,0:14:15.160
here, actually, but it doesn't actually[br]matter. So what we've done here is,
0:14:15.160,0:14:21.190
put all the entities in the small graph as[br]rows in the table. So we have Strickland
0:14:21.190,0:14:27.850
and Mourou and Ashkin, and also Arnold and[br]Curie that are not in the picture. But you
0:14:27.850,0:14:33.290
can maybe remember that. And then here we[br]have awarded, and we scaled that by the
0:14:33.290,0:14:37.250
instance of the different Nobel prizes[br]that people have won. So that's the
0:14:37.250,0:14:42.209
physics Nobel in the first column, the[br]chemistry Nobel Prize in the second column
0:14:42.209,0:14:48.380
and just general Nobel prizes in the third[br]column. There's awarded and that is scaled
0:14:48.380,0:14:55.240
by the "with" qualifier, so awarded with[br]Gérard Mourou. And then there's field of
0:14:55.240,0:15:00.450
work, and we have lasers here and[br]radioactivity, so we scale by the actual
0:15:00.450,0:15:06.580
field of work that people have. And well[br]then, if we look at what kind of incidence
0:15:06.580,0:15:11.370
we get for Donna Strickland, she has a[br]Nobel prize in physics and that is also a
0:15:11.370,0:15:17.190
Nobel prize, and she has that together[br]with Mourou. And she has "field of work:
0:15:17.190,0:15:23.220
lasers," but not radioactivity. Then,[br]Mourou himself: he has a Nobel prize in
0:15:23.220,0:15:29.450
physics, and that is a Nobel prize, but[br]none of the others. Ashkin gets the Nobel
0:15:29.450,0:15:33.890
prize in physics, and that is still a[br]Nobel prize, and he gets that with Gérard
0:15:33.890,0:15:40.970
Mourou. And also he works on lasers, but[br]not in radioactivity. So Frances Arnold
0:15:40.970,0:15:47.230
has a Nobel prize in chemistry, and that[br]is a Nobel prize. And Marie Curie, she has
0:15:47.230,0:15:50.510
a Nobel prize in physics and one in[br]chemistry, and they are both a Nobel
0:15:50.510,0:15:55.319
prize. And she also works on[br]radioactivity. But lasers didn't exist
0:15:55.319,0:16:02.490
back then, so she doesn't get "field of[br]work: lasers." And then basically this
0:16:02.490,0:16:10.289
table here is a representation of our[br]formal context. So and then we've actually
0:16:10.289,0:16:14.840
gone ahead and started building a tool[br]where you can interactively do all these
0:16:14.840,0:16:20.320
things, and it will take care of building[br]the context for you. You just put in the
0:16:20.320,0:16:24.540
properties, and Tom will show[br]you how that works.
0:16:24.540,0:16:29.030
T: So here you see some first screenshots[br]of this tool. So please do not comment on
0:16:29.030,0:16:32.520
the graphic design. We have no idea about[br]that, we have to ask someone about that.
0:16:32.520,0:16:36.120
We're just into logics, more or less. On[br]the left, you see the initial state of the
0:16:36.120,0:16:41.120
game. On the left you have five boxes:[br]they're called countries and borders,
0:16:41.120,0:16:47.370
credit cards, use of energy, memory and[br]computation – I think –, and space
0:16:47.370,0:16:53.180
launches, which are just presets we[br]defined. You can explore, for example, in
0:16:53.180,0:16:57.050
the case of the credit card, you can[br]explore the properties from Wikidata which
0:16:57.050,0:17:02.170
are called "card network," "operator," and[br]"fee," so you can just choose one of them,
0:17:02.170,0:17:05.530
or on the right, "custom properties," you[br]can just input the properties you're
0:17:05.530,0:17:10.640
interested in Wikidata, whatever one of[br]the seven thousand you like, or some
0:17:10.640,0:17:15.140
number of them. On the right, I chose then[br]the credit card thingy and I now want to
0:17:15.140,0:17:21.860
show you what happens if you now explore[br]these properties, right? The first step in
0:17:21.860,0:17:25.750
the game is that the game will ask – I[br]mean, the game, the exploration process –
0:17:25.750,0:17:31.020
will ask, is it true that every entity in[br]Wikidata will have these three properties?
0:17:31.020,0:17:36.360
So are they common among all entities in[br]your data, which is most probably not
0:17:36.360,0:17:41.540
true, right? I mean, not everything in[br]Wikidata has a fee, at least I hope. So,
0:17:41.540,0:17:46.520
what I will do now, I would click the[br]"reject this implication" button, since
0:17:46.520,0:17:51.480
the implication "Nothing implies[br]everything" is not true. In the second
0:17:51.480,0:17:56.360
step now, the algorithm tries to find the[br]minimal number of questions to obtain the
0:17:56.360,0:18:01.820
domain knowledge, so to obtain all valid[br]rules in this domain. So next question is
0:18:01.820,0:18:06.120
"is it true that everything in Wikidata[br]that has a 'card network' property also
0:18:06.120,0:18:12.560
has a 'fee' and an 'operator' property?"[br]And down here you can see Wikidata says
0:18:12.560,0:18:18.110
"ok, there are 26 items which are[br]counterexamples," so there's 26 items in
0:18:18.110,0:18:22.670
Wikidata which have the "card network"[br]property but do not have the other two
0:18:22.670,0:18:28.200
ones. So, 26 is not a big number, this[br]could mean "ok, that's an error, so 26
0:18:28.200,0:18:32.860
statements are missing." Or maybe that[br]that's, really, that's the true case.
0:18:32.860,0:18:36.890
That's also ok. But you can now choose[br]what you think is right. You can say, "oh,
0:18:36.890,0:18:40.470
I would say it should be true" or you can[br]say "no, I think that's ok, one of these
0:18:40.470,0:18:46.380
counterexamples seems valid. Let's reject[br]it." I in this case, rejected it. The next
0:18:46.380,0:18:51.020
question it asks: "is it true that[br]everything that has an operator has also a
0:18:51.020,0:18:56.290
fee and a card network?" Yeah, this is[br]possibly not true. There's also more than
0:18:56.290,0:19:03.110
1000 counterexamples, one being, I think a[br]telecommunication operator in Hungary or
0:19:03.110,0:19:10.340
something. And so we can reject this as[br]well. Next question, everything that has
0:19:10.340,0:19:15.360
an operator and a card network – so card[br]network means Visa, MasterCard, whatever,
0:19:15.360,0:19:21.690
all this stuff – is it true that they have[br]to have a fee?" Wikidata says "no," it has
0:19:21.690,0:19:27.570
23 items that contradict it. But one of[br]the items, for example, is the American
0:19:27.570,0:19:32.090
Express Gold Card. I suppose the American[br]Express Gold Card has some fee. So this
0:19:32.090,0:19:36.140
indicates, "oh, there is some missing data[br]in Wikidata," there is something that
0:19:36.140,0:19:40.680
Wikidata does not know but should know to[br]reason correctly in Wikidata with your
0:19:40.680,0:19:46.520
SPARQL queries. So we can now say, "yeah,[br]that's, uh, that's not a reject, that's an
0:19:46.520,0:19:51.470
accept," because we think it should be[br]true. But Wikidata thinks otherwise. And
0:19:51.470,0:19:55.800
you go on, we go on. This is then the last[br]question: "Is it true that everything that
0:19:55.800,0:20:00.950
has a fee and a card work should have an[br]operator," and you see, "oh, no counter
0:20:00.950,0:20:05.930
examples." This means Wikidata says "this[br]is true," because it says there is no
0:20:05.930,0:20:09.580
counterexample. If you're asking Wikidata[br]it says this is a valid implication in the
0:20:09.580,0:20:15.400
data set so far, which could also be[br]indicating that something is missing, I'm
0:20:15.400,0:20:20.310
not aware if this is possible or not, but[br]ok, for me it sounds reasonable. Everyone
0:20:20.310,0:20:23.800
has a fee and a card network should also[br]have an operator, which meens a bank or
0:20:23.800,0:20:29.220
something like that. So I accept this[br]implication. And then, yeah, you have won
0:20:29.220,0:20:34.410
the exploration game, which essentially[br]means you've won some knowledge. Thank
0:20:34.410,0:20:40.300
you. And the knowledge is that you know[br]which implications in Wikidata are true or
0:20:40.300,0:20:44.340
should be true from your point of view.[br]And yeah, this is more or less the state
0:20:44.340,0:20:50.700
of the game so far as we programmed it in[br]October. And the next state will be to
0:20:50.700,0:20:54.970
show you some – "How much does your[br]opinion of the world differ from the
0:20:54.970,0:20:59.950
opinion that is now reflected in the[br]data?" So is what you think about the data
0:20:59.950,0:21:05.430
true, close to true to what is true in[br]Wikidata. Or maybe Wikidata has wrong
0:21:05.430,0:21:10.680
information. You can find it with that.[br]But Max will tell me more about that.
0:21:10.680,0:21:18.220
M: Ok. So let me just quickly come[br]back to what we have actually done. So we
0:21:18.220,0:21:23.670
offer a procedure that allows you to[br]explore properties in Wikidata and the
0:21:23.670,0:21:30.720
implicational knowledge that holds between[br]these properties. And the key idea's here
0:21:30.720,0:21:34.661
that when you look at these implications[br]that you get, while there might be some
0:21:34.661,0:21:39.280
that you don't actually want because they[br]shouldn't be true, and there might also be
0:21:39.280,0:21:46.220
ones that you don't get, but you expect to[br]get because they should hold. And these
0:21:46.220,0:21:51.840
unwanted and/or missing implications, they[br]point to missing statements and items in
0:21:51.840,0:21:56.130
Wikidata. So they show you where the[br]opportunities to improve the knowledge in
0:21:56.130,0:22:00.100
Wikidata are, and, well, sometimes you[br]also get to learn something about the
0:22:00.100,0:22:04.080
world, and in most cases, it's that the[br]world is more complicated than you thought
0:22:04.080,0:22:10.260
it was – and that's just how life is. But[br]in general, implications can guide you in
0:22:10.260,0:22:17.220
your way of improving Wikidata and the[br]state of knowledge therein. So what's
0:22:17.220,0:22:22.380
next? Well, so what we currently don't[br]offer in the exploration game and what we
0:22:22.380,0:22:27.710
definitely will focus next on is having[br]configurable counterexamples and also
0:22:27.710,0:22:32.030
filterable counterexamples – right now you[br]just get a list of a random number of
0:22:32.030,0:22:36.880
counterexamples. And you might want to[br]search through this list for something you
0:22:36.880,0:22:42.520
recognise and you might also want to[br]explicitly say, well, this one should be a
0:22:42.520,0:22:48.600
counterexample, and that's definitely[br]coming next. Then, well, domain specific
0:22:48.600,0:22:53.750
scaling of properties, there's still much[br]work to be done. Currently, we only have
0:22:53.750,0:23:00.500
some very basic support for that. So you[br]can have properties, but you can't do the
0:23:00.500,0:23:03.780
fancy things where you say, "well,[br]everything that is an award should be
0:23:03.780,0:23:10.840
considered as one instance of this[br]property." That's also coming and then
0:23:10.840,0:23:15.550
what Tom mentioned alread: compare your[br]knowledge that you have explored through
0:23:15.550,0:23:21.610
this process against the knowledge that is[br]currently on Wikidata as a form of seeing
0:23:21.610,0:23:26.540
"where do you stand? What is missing in[br]Wikidata? How can you improve Wikidata?"
0:23:26.540,0:23:32.600
And well, if you have any more suggestions[br]for features, then just tell us. There's a
0:23:32.600,0:23:39.530
Github link on the implication game page.[br]And here's the link to the tool again. So,
0:23:39.530,0:23:46.140
yeah, just let us know. Open an issue and[br]have fun. And if you have any questions,
0:23:46.140,0:23:50.230
then I guess now would be the time to ask.[br]T: Thank you.
0:23:50.230,0:23:52.730
Herald: Thank you very much, Tom and Max.
0:23:52.730,0:23:55.020
applause
0:23:55.020,0:24:01.510
Herald: So we will switch microphones now[br]because then I can hand this microphone to
0:24:01.510,0:24:07.250
you if any of you have a question for our[br]two speakers. Are there any questions or
0:24:07.250,0:24:14.370
suggestions? Yes.[br]Question: Hi. Thanks for the nice talk. I
0:24:14.370,0:24:18.720
wanted to ask what's the first question,[br]what's the most interesting implication
0:24:18.720,0:24:25.020
that you've found?[br]M: Yeah. That would have made for a
0:24:25.020,0:24:31.850
good back up slide. The most interesting[br]implication so far –
0:24:31.850,0:24:36.010
T: The most basic thing you would expect[br]everything that is launched in space by
0:24:36.010,0:24:41.920
humans – no, everything that landed from[br]space, that has a landing date, also has a
0:24:41.920,0:24:46.450
start date. So nothing landed on earth,[br]which was not started here.
0:24:46.450,0:24:55.200
M: Yes.[br]Q: Right now, the game only helps you find
0:24:55.200,0:25:00.710
out implications. Are you also planning to[br]have that I can also add data like for
0:25:00.710,0:25:04.309
example, let's say I have twenty five[br]Nobel laureates who don't have a Nobel
0:25:04.309,0:25:08.220
laureate ID. Is there plans where you[br]could give me a simple interface for me to
0:25:08.220,0:25:12.760
Google and add that ID because it would[br]make the process of adding new entities to
0:25:12.760,0:25:17.400
Wikidata itself more simple.[br]M: Yes. And that's partly hidden
0:25:17.400,0:25:23.050
behind this "configurable and filterable[br]counterexamples" thing. We will probably
0:25:23.050,0:25:28.380
not have an explicit interface for adding[br]stuff, but most likely interface with some
0:25:28.380,0:25:32.270
other tool built around Wikidata, so[br]probably something that will give you
0:25:32.270,0:25:37.100
QuickStatements or something like that.[br]But yes, adding data is definitely on the
0:25:37.100,0:25:41.710
roadmap.[br]Herald: Any more questions? Yes.
0:25:41.710,0:25:48.860
Q: Wouldn't it be nice to do this in other[br]languages, too?
0:25:48.860,0:25:52.600
T: Actually it's language independent, so[br]we use Wikidata and then as far as we
0:25:52.600,0:25:58.110
know, Wikidata has no language itself. You[br]know, it has just items and properties, so
0:25:58.110,0:26:02.640
Qs and Ps, and whatever language you use,[br]it should be translated in the language of
0:26:02.640,0:26:06.180
the properties, if there is a label for[br]that property or for that item that you
0:26:06.180,0:26:12.420
have. So if Wikidata is aware of your[br]language, we are.
0:26:12.420,0:26:15.020
Herald: Oh, yes. More![br]M: Of course, the tool still needs to be
0:26:15.020,0:26:18.360
translated, but –[br]T: The tool itself, it should be.
0:26:18.360,0:26:21.850
Q: Hi, thanks for the talk. I have a[br]question. Right now you only can find
0:26:21.850,0:26:25.990
missing data with this, right? Or surplus[br]data. Would you think you'd be able to
0:26:25.990,0:26:31.560
find wrong information with a similar[br]approach.
0:26:31.560,0:26:37.001
T: Actually, we do. I mean, if we Wikidata[br]has a counterexample to something we would
0:26:37.001,0:26:42.830
expect to be true, this could point to[br]wrong data, right? If the counterexample
0:26:42.830,0:26:47.450
is a wrong counterexample. If there is a[br]missing property or missing property to an
0:26:47.450,0:26:58.160
item.[br]Q: Ok, I get to ask a second question. So
0:26:58.160,0:27:06.000
the horizontal axis in the incidence[br]matrix. You said it has 7000, it spans
0:27:06.000,0:27:10.300
7000 columns, right?[br]M: Yes, because there's 7000 properties in
0:27:10.300,0:27:13.850
Wikidata.[br]Q: But it's actually way more columns,
0:27:13.850,0:27:17.849
right? Because you multiply the properties[br]times the arguments, right?
0:27:17.849,0:27:21.360
M: Yes. So if you do any scaling then of[br]course that might give you multiple
0:27:21.360,0:27:23.380
entries.[br]Q: So that's what you mean with scaling,
0:27:23.380,0:27:27.770
basically?[br]M: Yes. But already seven thousand is way
0:27:27.770,0:27:35.580
too big to actually compute that.[br]Q: How many would it be if you multiply
0:27:35.580,0:27:48.060
all the arguments?[br]M: I have no idea, probably a few million.
0:27:48.060,0:27:55.309
Q: Have you thought about a recursive[br]method, as counterexamples may be wrong by
0:27:55.309,0:28:00.350
other counterexamples, like in an[br]argumentative graph or something like
0:28:00.350,0:28:06.708
this?[br]T: Actually, I don't get it. How can a
0:28:06.708,0:28:14.040
counterexample be wrong through another[br]counterxample?
0:28:14.040,0:28:24.450
Q: Maybe some example says that cats can[br]have golden hair and then another example
0:28:24.450,0:28:31.260
might say that this is not a cat.[br]T: Ah, so the property to be a cat or
0:28:31.260,0:28:38.000
something cat-ish is missing then. Okay.[br]No, we have not considered so far deeper
0:28:38.000,0:28:44.570
reasoning. This horn-propositional logic,[br]you know, it has no contradictions,
0:28:44.570,0:28:47.740
because all you can do is you can[br]contradict by counterexamples, but there
0:28:47.740,0:28:52.740
can never be a rule that is not true, so[br]far. Just in your or my opinion, maybe,
0:28:52.740,0:28:56.370
but not in the logic. So what we have to[br]think about is that we have bigger
0:28:56.370,0:29:01.780
reasoning, right? So.[br]Q: Sorry, quick question. Because you're
0:29:01.780,0:29:04.929
not considering all the 7000 odd[br]properties for each of the entities,
0:29:04.929,0:29:07.570
right? What's your current process of[br]filtering? What are the relevant
0:29:07.570,0:29:14.820
properties? I'm sorry, I didn't get that.[br]M: Well, we basically handpick those. So
0:29:14.820,0:29:19.940
you have this input field? Yeah, we can go[br]ahead and select our properties. We also
0:29:19.940,0:29:26.870
have some predefined sets. Okay. And[br]there's also some classes for groups of
0:29:26.870,0:29:30.780
properties that are related that you could[br]use if you want bigger sets,
0:29:30.780,0:29:35.960
T: for example, space or family or what[br]was the other?
0:29:35.960,0:29:43.410
M: Awards is one.[br]T: It depends on the size of the class.
0:29:43.410,0:29:47.390
For example, for space, it's not that[br]much, I think it's 10 or 15 properties. It
0:29:47.390,0:29:51.520
will take you some hours, but you can do[br]because they are 15 or something like
0:29:51.520,0:29:58.150
that. I think for family, it's way too[br]much, it's like 40 of 50 properties. So a
0:29:58.150,0:30:04.540
lot of questions.[br]Herald: I don't see any more hands. Maybe
0:30:04.540,0:30:09.760
someone who has not asked the question yet[br]has another one we could take that,
0:30:09.760,0:30:14.270
otherwise we would be perfectly on time.[br]And maybe you can tell us where you will
0:30:14.270,0:30:18.860
be for deeper discussions where people can[br]find you.
0:30:18.860,0:30:22.400
T: Probably at the couches.[br]Herald: The couches, behind our stage.
0:30:22.400,0:30:26.720
M: Or just running around somewhere. So[br]there's also our DECT numbers on the
0:30:26.720,0:30:35.960
slides; it's 6284 for Tom and 6279 for me.[br]So just call and ask where we're hanging
0:30:35.960,0:30:38.470
around.[br]H: Well then, thank you again. Have a
0:30:38.470,0:30:40.210
round of applause.[br]applause
0:30:40.210,0:30:42.650
T: Thank you.[br]M: Well, thanks for having us.
0:30:42.650,0:30:45.310
Applause
0:30:45.310,0:30:49.740
postroll music
0:30:49.740,0:31:12.000
subtitles created by c3subtitles.de[br]in the year 2020. Join, and help us!