36C3 Wikipaka WG: Interactively Discovering Implicational Knowledge in Wikidata

Edit subtitles

0:00 - 0:19

36C3 preroll music
0:19 - 0:24

Herald Angel: We have Tom and Max here.
They have a talk here with a very
0:24 - 0:28

complicated title that I don't quite
understand yet. It's called "Interactively
0:28 - 0:36

Discovering Implicational Knowledge in
Wikidata. And they told me the point of
0:36 - 0:39

the talk is that I would like to
understand what it means and I hope I
0:39 - 0:42

will. So good luck.
Tom: Thank you very much.
0:42 - 0:44

Herald: And have some applause, please.
0:44 - 0:48

applause
0:48 - 0:55

T: Thank you very much. Do you hear me?
Does it work? Hello? Oh, very good. Thank
0:55 - 0:59

you very much and welcome to our talk
about interactively discovering
0:59 - 1:05

implicational knowledge in Wikidata. It
is more or less a fun project we started
1:05 - 1:11

for finding rules that are implicit in
Wikidata – entailed just by the data it
1:11 - 1:19

has, that people inserted into the
Wikidata database so far. And we will
1:19 - 1:24

start with the explicit knowledge. So the
explicit data in Wikidata, with Max.
1:24 - 1:28

Max: So. Right. What what is Wikidata?
Maybe you have heard about Wikidata, then
1:28 - 1:33

that's all fine. Maybe you haven't, then
surely you've heard of Wikipedia. And
1:33 - 1:37

Wikipedia is run by the Wikimedia
Foundation and the Wikimedia Foundation
1:37 - 1:41

has several other projects. And one of
those is Wikidata. And Wikidata is
1:41 - 1:45

basically a large graph that encodes
machine readable knowledge in the form of
1:45 - 1:52

statements. And a statement basically
consists of some entity that is connected
1:52 - 1:58

– or some some entities that are connected
by some property. And these properties
1:58 - 2:03

can then even have annotations on them.
So, for example, we have Donna Strickland
2:03 - 2:09

here and we encode that she has received a
Nobel prize in physics last year by this
2:09 - 2:16

property "awarded" and this has then a
qualifier "time: 2018" and also "for:
2:16 - 2:23

Chirped Pulse Amplification". And all in
all, we have some 890 million statements
2:23 - 2:32

on Wikidata that connect 71 million items
using 7000 properties. But there's also a
2:32 - 2:37

bit more. So we also know that Donna
Strickland has "field of work: optics" and
2:37 - 2:41

also "field of work: lasers" so we can use
the same property to connect some entity
2:41 - 2:46

with different other entities. And we
don't even have to have knowledge that
2:46 - 2:57

connects the entities. We can have a date
of birth, which is 1959. Nineteen ninety.
2:57 - 3:06

No. Nineteen fifty nine. Yes. And this is
then just a plain date, not an entity. And
3:06 - 3:12

now coming from the explicit knowledge
then, well, we have some more we have
3:12 - 3:16

Donna Strickland has received a Nobel
prize in physics and also Marie Curie has
3:16 - 3:21

received the Nobel prize in physics. And
we also know that Marie Curie has a Nobel
3:21 - 3:28

prize ID that starts with "phys" and then
"1903" and some random numbers that
3:28 - 3:33

basically are this ID. Then Marie Curie
also has received a Nobel prize in
3:33 - 3:39

chemistry in 1911. So she has another
Nobel ID that starts with "chem" and has
3:39 - 3:44

"1911" there. And then there's also
Frances Arnold, who received the Nobel
3:44 - 3:49

prize in chemistry last year. So she has a
Nobel ID that starts with "chem" and has
3:49 - 3:55

"2018" there. And now one one could assume
that, well, everybody who was awarded the
3:55 - 4:00

Nobel prize should also have a Nobel ID.
So everybody who was awarded the Nobel
4:00 - 4:06

prize should also have a Nobel prize ID,
and we could write that as some
4:06 - 4:12

implication here. So "awarded(nobelPrize)"
implies "nobelID". And well, if you
4:12 - 4:16

look sharply at this picture, then there's
this arrow here conspicuously missing that
4:16 - 4:23

Donald Strickland doesn't have a Nobel
prize ID. And indeed, there's 25 people
4:23 - 4:27

currently on Wikidata that are missing
Nobel prize IDs, and Donna Strickland is
4:27 - 4:34

one of them. So we call these people that
don't satisfy this implication – we call
4:34 - 4:40

those counterexamples and well, if you
look at Wikidata on the scale of really
4:40 - 4:45

these 890 million statements, then you
won't find any counterexamples because
4:45 - 4:53

it's just too big. So we need some way to
automatically do that. And the idea is
4:53 - 4:59

that, well, if we had this knowledge that
while some implications are not satisfied,
4:59 - 5:04

then this encodes maybe missing
information or wrong information, and we
5:04 - 5:11

want to represent that in a way that is
easy to understand and also succinct. So
5:11 - 5:16

it doesn't take long to write it down, it
should have a short representation. So
5:16 - 5:23

that rules out anything, including complex
syntax or logical quantifies. So no SPARQL
5:23 - 5:27

queries as a description of that implicit
knowledge. No description logics, if
5:27 - 5:33

you've heard of that. And we also want
something that we can actually compute on
5:33 - 5:42

actual hardware in a reasonable timeframe.
So our approach is we use Formal Concept
5:42 - 5:47

Analysis, which is a technique that has
been developed over the past several years
5:47 - 5:52

to extract what is called propositional
implications. So just logical formulas of
5:52 - 5:56

propositional logic that are an
implication in the form of this
5:56 - 6:03

"awarded(nobelPrize)" implies "nobleID".
So what exactly is Formal Concept
6:03 - 6:08

Analysis? Off to Tom.
T: Thank you. So what is Formal Concept
6:08 - 6:14

Analysis? It was developed in 1980s by a
guy called Rudolf Wille and Bernard Ganter
6:14 - 6:19

and they were restructuring lattice
theory. Lattice theory is an ambiguous
6:19 - 6:23

name in math, it has two meanings: One
meaning is you have a grid and have a
6:23 - 6:29

lattice there. The other thing is to speak
about orders – order relations. So I like
6:29 - 6:34

steaks, I like pudding and I like steaks
more than pudding. And I like rice more
6:34 - 6:41

than steaks. That's an order, right? And
lattices are particular orders which can
6:41 - 6:47

be used to represent propositional logic.
So easy rules like "when it rains, the
6:47 - 6:53

street gets wet", right? So and the data
representation those guys used back then,
6:53 - 6:57

they called it a formal context, which is
basically just a set of objects – they
6:57 - 7:02

call them objects, it's just a name –, a
set of attributes and some incidence,
7:02 - 7:08

which basically means which object does
have which attributes. So, for example, my
7:08 - 7:13

laptop has the colour black. So this
object has some property, right? So that's
7:13 - 7:18

a small example on the right for such a
formal context. So the objects there are
7:18 - 7:24

some animals: a platypus – that's the fun
animal from Australia, the mammal which is
7:24 - 7:30

also laying eggs and which is also
venomous –, a black widow – the spider –,
7:30 - 7:35

the duck and the cat. So we see, the
platypus has all the properties; it has
7:35 - 7:40

being venomous, laying eggs and being a
mammal; we have the duck, which is not a
7:40 - 7:44

mammal, but it lays eggs, and so on and so
on. And it's very easy to grasp some
7:44 - 7:49

implicational knowledge here. An easy rule
you can find is whenever you endeavour a
7:49 - 7:54

mammal that is venomous, it has to lay
eggs. So this is a rule that falls out of
7:54 - 8:00

this binary data table. Our main problem
then or at this point is we do not have
8:00 - 8:03

such a data table for Wikidata, right? We
have the implicit graph, which is way more
8:03 - 8:09

expressive than binary data, and we cannot
even store Wikidata as a binary table.
8:09 - 8:14

Even if you tried to, we have no chance to
compute such rules from that. And for
8:14 - 8:21

this, the people from Formal Context
Analysis proposed an algorithm to extract
8:21 - 8:27

implicit knowledge from an expert. So our
expert here could be Wikidata. It's an
8:27 - 8:31

expert, you can ask Wikidata questions,
right? Using this SPARQL interface, you
8:31 - 8:35

can ask. You can ask "Is there an example
for that? Is there a counterexample for
8:35 - 8:40

something else?" So the algorithm is quite
easy. The algorithm is the algorithm and
8:40 - 8:45

some expert – in our case, Wikidata –, and
the algorithm keeps notes for
8:45 - 8:49

counterexamples and keeps notes for valid
implications. So in the beginning, we do
8:49 - 8:54

not have any valid implications, so this
list on the right is empty, and in the
8:54 - 8:57

beginning we do not have any
counterexamples. So the list on the left,
8:57 - 9:02

the formal context to build up is also
empty. And all the algorithm does now is,
9:02 - 9:09

it asks "is this implication, X follows Y,
Y follows X or X implies Y, is it true?"
9:09 - 9:14

So "is it true," for example, "that an
animal that is a mammal and is venomous
9:14 - 9:19

lays eggs?" So now the expert, which in
our case is Wikidata, can answer it. We
9:19 - 9:25

can query that. We showed in our paper we
can query that. So we query it, and if the
9:25 - 9:28

Wikidata expert does not find any
counterexamples, it will say, ok, that's
9:28 - 9:36

maybe a true, true thing; it's yes. Or if
it's not a true implication in Wikidata,
9:36 - 9:42

it can say, no, no, no, it's not true, and
here's a counterexample. So this is
9:42 - 9:49

something you contradict by example. You
say this rule cannot be true. For example,
9:49 - 9:53

when the street is wet, that does not mean
it has rained, right? It could be the
9:53 - 10:01

cleaning service car or something else. So
our idea now was to use Wikidata as an
10:01 - 10:06

expert, but also include a human into this
loop. So we do not just want to ask
10:06 - 10:12

Wikidata, we also want to ask a human
expert as well. So we first ask in our
10:12 - 10:19

tool the Wikidata expert for some rule.
After that, we also inquire the human
10:19 - 10:22

expert. And he can also say "yeah, that's
true, I know that," or "No, no. Wikidata
10:22 - 10:27

is not aware of this counterexample, I
know one." Or, in the other case "oh,
10:27 - 10:33

Wikidata says this is true. I am aware of
a counterexample." Yeah, and so on and so
10:33 - 10:38

on. And you can represent this more or
less – this is just some mathematical
10:38 - 10:42

picture, it's not very important. But you
can see on the left there's an exploration
10:42 - 10:47

going on, just Wikidata with the
algorithm, on the right an exploration, a
10:47 - 10:51

human expert versus Wikidata which can
answer all the queries. And we combined
10:51 - 10:58

those two into one small tool, still under
development. So, back to Max.
10:58 - 11:03

M: Okay. So far for that to work, we
basically need to have a way of viewing
11:03 - 11:08

Wikidata, or at least parts of Wikidata,
as a formal context. And this formal
11:08 - 11:14

context, well, this was a binary table, so
what do we do? We just take all the items
11:14 - 11:19

in Wikidata as objects and all the
properties as attributes of our context
11:19 - 11:24

and then have an incidence relation that
says "well, this entity has this
11:24 - 11:31

property," so it is incident there, and
then we end up with a context that has 71
11:31 - 11:36

million rows and seven thousand columns.
So, well, that might actually be a slight
11:36 - 11:40

problem there, because we want to have
something that we can run on actual
11:40 - 11:46

hardware and not on a supercomputer. So
let's maybe not do that and focus on
11:46 - 11:51

a smaller set of properties that are
actually related to one another through
11:51 - 11:56

some kind of common domain, yeah? So it
doesn't make any sense to have a property
11:56 - 12:00

that relates to spacecraft and then a
property that relates to books – that's
12:00 - 12:05

probably not a good idea to try to find
implicit knowledge between those two. But
12:05 - 12:10

two different properties about spacecraft,
that sounds good, right? And then the
12:10 - 12:15

interesting question is just how do we
define the incidence for our set of
12:15 - 12:20

properties? And that actually depends very
much on which properties we choose,
12:20 - 12:26

because it does – for some properties, it
makes sense to account for the direction
12:26 - 12:33

of the statement: So there is a property
called parent? Actually, no, it's child,
12:33 - 12:38

and then there's father and mother, and
you don't want to turn those around, as do
12:38 - 12:44

you want to have "A is a child of B," that
should be something different than "B
12:44 - 12:49

is a child of A." Then there's the
qualifiers that might be important for
12:49 - 12:55

some properties. So receiving an award for
something might be something different
12:55 - 13:01

than receiving an award for something
else. But while receiving an award in 2018
13:01 - 13:07

and receiving one in 2017, that's probably
more or less the same thing, so we don't
13:07 - 13:12

necessarily need to differentiate that.
And there's also a thing called subclasses
13:12 - 13:15

and they form a hierarchy on Wikidata. And
you might also want to take that into
13:15 - 13:20

account because while winning something
that is a Nobel prize, that means also
13:20 - 13:25

winning an award itself, and winning the
Nobel Peace prize means winning a peace
13:25 - 13:33

prize. So there's also implications going
on there that you want to respect. So,
13:33 - 13:38

to see how we actually do that, let's look
at an example. So we have here, well, this
13:38 - 13:47

is Donald Strickland. And – I forgot his
first name – Ashkin, this is one of the
13:47 - 13:52

people that won the Nobel prize in physics
with her last year. And also Gérard
13:52 - 13:58

Mourou. That is the third one. They all
got the Nobel prize in physics last year.
13:58 - 14:04

So we have all these statements here, and
these two have a qualifier that says
14:04 - 14:10

"with: Gérard Mourou" here. And I don't
think the qualifier is on this statement
14:10 - 14:15

here, actually, but it doesn't actually
matter. So what we've done here is,
14:15 - 14:21

put all the entities in the small graph as
rows in the table. So we have Strickland
14:21 - 14:28

and Mourou and Ashkin, and also Arnold and
Curie that are not in the picture. But you
14:28 - 14:33

can maybe remember that. And then here we
have awarded, and we scaled that by the
14:33 - 14:37

instance of the different Nobel prizes
that people have won. So that's the
14:37 - 14:42

physics Nobel in the first column, the
chemistry Nobel Prize in the second column
14:42 - 14:48

and just general Nobel prizes in the third
column. There's awarded and that is scaled
14:48 - 14:55

by the "with" qualifier, so awarded with
Gérard Mourou. And then there's field of
14:55 - 15:00

work, and we have lasers here and
radioactivity, so we scale by the actual
15:00 - 15:07

field of work that people have. And well
then, if we look at what kind of incidence
15:07 - 15:11

we get for Donna Strickland, she has a
Nobel prize in physics and that is also a
15:11 - 15:17

Nobel prize, and she has that together
with Mourou. And she has "field of work:
15:17 - 15:23

lasers," but not radioactivity. Then,
Mourou himself: he has a Nobel prize in
15:23 - 15:29

physics, and that is a Nobel prize, but
none of the others. Ashkin gets the Nobel
15:29 - 15:34

prize in physics, and that is still a
Nobel prize, and he gets that with Gérard
15:34 - 15:41

Mourou. And also he works on lasers, but
not in radioactivity. So Frances Arnold
15:41 - 15:47

has a Nobel prize in chemistry, and that
is a Nobel prize. And Marie Curie, she has
15:47 - 15:51

a Nobel prize in physics and one in
chemistry, and they are both a Nobel
15:51 - 15:55

prize. And she also works on
radioactivity. But lasers didn't exist
15:55 - 16:02

back then, so she doesn't get "field of
work: lasers." And then basically this
16:02 - 16:10

table here is a representation of our
formal context. So and then we've actually
16:10 - 16:15

gone ahead and started building a tool
where you can interactively do all these
16:15 - 16:20

things, and it will take care of building
the context for you. You just put in the
16:20 - 16:25

properties, and Tom will show
you how that works.
16:25 - 16:29

T: So here you see some first screenshots
of this tool. So please do not comment on
16:29 - 16:33

the graphic design. We have no idea about
that, we have to ask someone about that.
16:33 - 16:36

We're just into logics, more or less. On
the left, you see the initial state of the
16:36 - 16:41

game. On the left you have five boxes:
they're called countries and borders,
16:41 - 16:47

credit cards, use of energy, memory and
computation – I think –, and space
16:47 - 16:53

launches, which are just presets we
defined. You can explore, for example, in
16:53 - 16:57

the case of the credit card, you can
explore the properties from Wikidata which
16:57 - 17:02

are called "card network," "operator," and
"fee," so you can just choose one of them,
17:02 - 17:06

or on the right, "custom properties," you
can just input the properties you're
17:06 - 17:11

interested in Wikidata, whatever one of
the seven thousand you like, or some
17:11 - 17:15

number of them. On the right, I chose then
the credit card thingy and I now want to
17:15 - 17:22

show you what happens if you now explore
these properties, right? The first step in
17:22 - 17:26

the game is that the game will ask – I
mean, the game, the exploration process –
17:26 - 17:31

will ask, is it true that every entity in
Wikidata will have these three properties?
17:31 - 17:36

So are they common among all entities in
your data, which is most probably not
17:36 - 17:42

true, right? I mean, not everything in
Wikidata has a fee, at least I hope. So,
17:42 - 17:47

what I will do now, I would click the
"reject this implication" button, since
17:47 - 17:51

the implication "Nothing implies
everything" is not true. In the second
17:51 - 17:56

step now, the algorithm tries to find the
minimal number of questions to obtain the
17:56 - 18:02

domain knowledge, so to obtain all valid
rules in this domain. So next question is
18:02 - 18:06

"is it true that everything in Wikidata
that has a 'card network' property also
18:06 - 18:13

has a 'fee' and an 'operator' property?"
And down here you can see Wikidata says
18:13 - 18:18

"ok, there are 26 items which are
counterexamples," so there's 26 items in
18:18 - 18:23

Wikidata which have the "card network"
property but do not have the other two
18:23 - 18:28

ones. So, 26 is not a big number, this
could mean "ok, that's an error, so 26
18:28 - 18:33

statements are missing." Or maybe that
that's, really, that's the true case.
18:33 - 18:37

That's also ok. But you can now choose
what you think is right. You can say, "oh,
18:37 - 18:40

I would say it should be true" or you can
say "no, I think that's ok, one of these
18:40 - 18:46

counterexamples seems valid. Let's reject
it." I in this case, rejected it. The next
18:46 - 18:51

question it asks: "is it true that
everything that has an operator has also a
18:51 - 18:56

fee and a card network?" Yeah, this is
possibly not true. There's also more than
18:56 - 19:03

1000 counterexamples, one being, I think a
telecommunication operator in Hungary or
19:03 - 19:10

something. And so we can reject this as
well. Next question, everything that has
19:10 - 19:15

an operator and a card network – so card
network means Visa, MasterCard, whatever,
19:15 - 19:22

all this stuff – is it true that they have
to have a fee?" Wikidata says "no," it has
19:22 - 19:28

23 items that contradict it. But one of
the items, for example, is the American
19:28 - 19:32

Express Gold Card. I suppose the American
Express Gold Card has some fee. So this
19:32 - 19:36

indicates, "oh, there is some missing data
in Wikidata," there is something that
19:36 - 19:41

Wikidata does not know but should know to
reason correctly in Wikidata with your
19:41 - 19:47

SPARQL queries. So we can now say, "yeah,
that's, uh, that's not a reject, that's an
19:47 - 19:51

accept," because we think it should be
true. But Wikidata thinks otherwise. And
19:51 - 19:56

you go on, we go on. This is then the last
question: "Is it true that everything that
19:56 - 20:01

has a fee and a card work should have an
operator," and you see, "oh, no counter
20:01 - 20:06

examples." This means Wikidata says "this
is true," because it says there is no
20:06 - 20:10

counterexample. If you're asking Wikidata
it says this is a valid implication in the
20:10 - 20:15

data set so far, which could also be
indicating that something is missing, I'm
20:15 - 20:20

not aware if this is possible or not, but
ok, for me it sounds reasonable. Everyone
20:20 - 20:24

has a fee and a card network should also
have an operator, which meens a bank or
20:24 - 20:29

something like that. So I accept this
implication. And then, yeah, you have won
20:29 - 20:34

the exploration game, which essentially
means you've won some knowledge. Thank
20:34 - 20:40

you. And the knowledge is that you know
which implications in Wikidata are true or
20:40 - 20:44

should be true from your point of view.
And yeah, this is more or less the state
20:44 - 20:51

of the game so far as we programmed it in
October. And the next state will be to
20:51 - 20:55

show you some – "How much does your
opinion of the world differ from the
20:55 - 21:00

opinion that is now reflected in the
data?" So is what you think about the data
21:00 - 21:05

true, close to true to what is true in
Wikidata. Or maybe Wikidata has wrong
21:05 - 21:11

information. You can find it with that.
But Max will tell me more about that.
21:11 - 21:18

M: Ok. So let me just quickly come
back to what we have actually done. So we
21:18 - 21:24

offer a procedure that allows you to
explore properties in Wikidata and the
21:24 - 21:31

implicational knowledge that holds between
these properties. And the key idea's here
21:31 - 21:35

that when you look at these implications
that you get, while there might be some
21:35 - 21:39

that you don't actually want because they
shouldn't be true, and there might also be
21:39 - 21:46

ones that you don't get, but you expect to
get because they should hold. And these
21:46 - 21:52

unwanted and/or missing implications, they
point to missing statements and items in
21:52 - 21:56

Wikidata. So they show you where the
opportunities to improve the knowledge in
21:56 - 22:00

Wikidata are, and, well, sometimes you
also get to learn something about the
22:00 - 22:04

world, and in most cases, it's that the
world is more complicated than you thought
22:04 - 22:10

it was – and that's just how life is. But
in general, implications can guide you in
22:10 - 22:17

your way of improving Wikidata and the
state of knowledge therein. So what's
22:17 - 22:22

next? Well, so what we currently don't
offer in the exploration game and what we
22:22 - 22:28

definitely will focus next on is having
configurable counterexamples and also
22:28 - 22:32

filterable counterexamples – right now you
just get a list of a random number of
22:32 - 22:37

counterexamples. And you might want to
search through this list for something you
22:37 - 22:43

recognise and you might also want to
explicitly say, well, this one should be a
22:43 - 22:49

counterexample, and that's definitely
coming next. Then, well, domain specific
22:49 - 22:54

scaling of properties, there's still much
work to be done. Currently, we only have
22:54 - 23:00

some very basic support for that. So you
can have properties, but you can't do the
23:00 - 23:04

fancy things where you say, "well,
everything that is an award should be
23:04 - 23:11

considered as one instance of this
property." That's also coming and then
23:11 - 23:16

what Tom mentioned alread: compare your
knowledge that you have explored through
23:16 - 23:22

this process against the knowledge that is
currently on Wikidata as a form of seeing
23:22 - 23:27

"where do you stand? What is missing in
Wikidata? How can you improve Wikidata?"
23:27 - 23:33

And well, if you have any more suggestions
for features, then just tell us. There's a
23:33 - 23:40

Github link on the implication game page.
And here's the link to the tool again. So,
23:40 - 23:46

yeah, just let us know. Open an issue and
have fun. And if you have any questions,
23:46 - 23:50

then I guess now would be the time to ask.
T: Thank you.
23:50 - 23:53

Herald: Thank you very much, Tom and Max.
23:53 - 23:55

applause
23:55 - 24:02

Herald: So we will switch microphones now
because then I can hand this microphone to
24:02 - 24:07

you if any of you have a question for our
two speakers. Are there any questions or
24:07 - 24:14

suggestions? Yes.
Question: Hi. Thanks for the nice talk. I
24:14 - 24:19

wanted to ask what's the first question,
what's the most interesting implication
24:19 - 24:25

that you've found?
M: Yeah. That would have made for a
24:25 - 24:32

good back up slide. The most interesting
implication so far –
24:32 - 24:36

T: The most basic thing you would expect
everything that is launched in space by
24:36 - 24:42

humans – no, everything that landed from
space, that has a landing date, also has a
24:42 - 24:46

start date. So nothing landed on earth,
which was not started here.
24:46 - 24:55

M: Yes.
Q: Right now, the game only helps you find
24:55 - 25:01

out implications. Are you also planning to
have that I can also add data like for
25:01 - 25:04

example, let's say I have twenty five
Nobel laureates who don't have a Nobel
25:04 - 25:08

laureate ID. Is there plans where you
could give me a simple interface for me to
25:08 - 25:13

Google and add that ID because it would
make the process of adding new entities to
25:13 - 25:17

Wikidata itself more simple.
M: Yes. And that's partly hidden
25:17 - 25:23

behind this "configurable and filterable
counterexamples" thing. We will probably
25:23 - 25:28

not have an explicit interface for adding
stuff, but most likely interface with some
25:28 - 25:32

other tool built around Wikidata, so
probably something that will give you
25:32 - 25:37

QuickStatements or something like that.
But yes, adding data is definitely on the
25:37 - 25:42

roadmap.
Herald: Any more questions? Yes.
25:42 - 25:49

Q: Wouldn't it be nice to do this in other
languages, too?
25:49 - 25:53

T: Actually it's language independent, so
we use Wikidata and then as far as we
25:53 - 25:58

know, Wikidata has no language itself. You
know, it has just items and properties, so
25:58 - 26:03

Qs and Ps, and whatever language you use,
it should be translated in the language of
26:03 - 26:06

the properties, if there is a label for
that property or for that item that you
26:06 - 26:12

have. So if Wikidata is aware of your
language, we are.
26:12 - 26:15

Herald: Oh, yes. More!
M: Of course, the tool still needs to be
26:15 - 26:18

translated, but –
T: The tool itself, it should be.
26:18 - 26:22

Q: Hi, thanks for the talk. I have a
question. Right now you only can find
26:22 - 26:26

missing data with this, right? Or surplus
data. Would you think you'd be able to
26:26 - 26:32

find wrong information with a similar
approach.
26:32 - 26:37

T: Actually, we do. I mean, if we Wikidata
has a counterexample to something we would
26:37 - 26:43

expect to be true, this could point to
wrong data, right? If the counterexample
26:43 - 26:47

is a wrong counterexample. If there is a
missing property or missing property to an
26:47 - 26:58

item.
Q: Ok, I get to ask a second question. So
26:58 - 27:06

the horizontal axis in the incidence
matrix. You said it has 7000, it spans
27:06 - 27:10

7000 columns, right?
M: Yes, because there's 7000 properties in
27:10 - 27:14

Wikidata.
Q: But it's actually way more columns,
27:14 - 27:18

right? Because you multiply the properties
times the arguments, right?
27:18 - 27:21

M: Yes. So if you do any scaling then of
course that might give you multiple
27:21 - 27:23

entries.
Q: So that's what you mean with scaling,
27:23 - 27:28

basically?
M: Yes. But already seven thousand is way
27:28 - 27:36

too big to actually compute that.
Q: How many would it be if you multiply
27:36 - 27:48

all the arguments?
M: I have no idea, probably a few million.
27:48 - 27:55

Q: Have you thought about a recursive
method, as counterexamples may be wrong by
27:55 - 28:00

other counterexamples, like in an
argumentative graph or something like
28:00 - 28:07

this?
T: Actually, I don't get it. How can a
28:07 - 28:14

counterexample be wrong through another
counterxample?
28:14 - 28:24

Q: Maybe some example says that cats can
have golden hair and then another example
28:24 - 28:31

might say that this is not a cat.
T: Ah, so the property to be a cat or
28:31 - 28:38

something cat-ish is missing then. Okay.
No, we have not considered so far deeper
28:38 - 28:45

reasoning. This horn-propositional logic,
you know, it has no contradictions,
28:45 - 28:48

because all you can do is you can
contradict by counterexamples, but there
28:48 - 28:53

can never be a rule that is not true, so
far. Just in your or my opinion, maybe,
28:53 - 28:56

but not in the logic. So what we have to
think about is that we have bigger
28:56 - 29:02

reasoning, right? So.
Q: Sorry, quick question. Because you're
29:02 - 29:05

not considering all the 7000 odd
properties for each of the entities,
29:05 - 29:08

right? What's your current process of
filtering? What are the relevant
29:08 - 29:15

properties? I'm sorry, I didn't get that.
M: Well, we basically handpick those. So
29:15 - 29:20

you have this input field? Yeah, we can go
ahead and select our properties. We also
29:20 - 29:27

have some predefined sets. Okay. And
there's also some classes for groups of
29:27 - 29:31

properties that are related that you could
use if you want bigger sets,
29:31 - 29:36

T: for example, space or family or what
was the other?
29:36 - 29:43

M: Awards is one.
T: It depends on the size of the class.
29:43 - 29:47

For example, for space, it's not that
much, I think it's 10 or 15 properties. It
29:47 - 29:52

will take you some hours, but you can do
because they are 15 or something like
29:52 - 29:58

that. I think for family, it's way too
much, it's like 40 of 50 properties. So a
29:58 - 30:05

lot of questions.
Herald: I don't see any more hands. Maybe
30:05 - 30:10

someone who has not asked the question yet
has another one we could take that,
30:10 - 30:14

otherwise we would be perfectly on time.
And maybe you can tell us where you will
30:14 - 30:19

be for deeper discussions where people can
find you.
30:19 - 30:22

T: Probably at the couches.
Herald: The couches, behind our stage.
30:22 - 30:27

M: Or just running around somewhere. So
there's also our DECT numbers on the
30:27 - 30:36

slides; it's 6284 for Tom and 6279 for me.
So just call and ask where we're hanging
30:36 - 30:38

around.
H: Well then, thank you again. Have a
30:38 - 30:40

round of applause.
applause
30:40 - 30:43

T: Thank you.
M: Well, thanks for having us.
30:43 - 30:45

Applause
30:45 - 30:50

postroll music
30:50 - 31:12

subtitles created by c3subtitles.de
in the year 2020. Join, and help us!

Title:: 36C3 Wikipaka WG: Interactively Discovering Implicational Knowledge in Wikidata
Description:: more » « less
Video Language:: English
Duration:: 31:12

	Bar Sch edited English subtitles for 36C3 Wikipaka WG: Interactively Discovering Implicational Knowledge in Wikidata
	Bar Sch edited English subtitles for 36C3 Wikipaka WG: Interactively Discovering Implicational Knowledge in Wikidata
	C3Subtitles edited English subtitles for 36C3 Wikipaka WG: Interactively Discovering Implicational Knowledge in Wikidata
	C3Subtitles edited English subtitles for 36C3 Wikipaka WG: Interactively Discovering Implicational Knowledge in Wikidata

English subtitles

Revisions

Revision 4 Edited

Bar Sch

36C3 Wikipaka WG: Interactively Discovering Implicational Knowledge in Wikidata

Revisions

Our website uses cookies

Operating cookies (Required)