Return to Video

36C3 Wikipaka WG: Interactively Discovering Implicational Knowledge in Wikidata

  • 0:00 - 0:19
    36C3 preroll music
  • 0:19 - 0:24
    Herald Angel: We have Tom and Max here.
    They have a talk here with a very
  • 0:24 - 0:28
    complicated title that I don't quite
    understand yet. It's called "Interactively
  • 0:28 - 0:36
    Discovering Implicational Knowledge in
    Wikidata. And they told me the point of
  • 0:36 - 0:39
    the talk is that I would like to
    understand what it means and I hope I
  • 0:39 - 0:42
    will. So good luck.
    Tom: Thank you very much.
  • 0:42 - 0:44
    Herald: And have some applause, please.
  • 0:44 - 0:48
    applause
  • 0:48 - 0:55
    T: Thank you very much. Do you hear me?
    Does it work? Hello? Oh, very good. Thank
  • 0:55 - 0:59
    you very much and welcome to our talk
    about interactively discovering
  • 0:59 - 1:05
    implicational knowledge in Wikidata. It
    is more or less a fun project we started
  • 1:05 - 1:11
    for finding rules that are implicit in
    Wikidata – entailed just by the data it
  • 1:11 - 1:19
    has, that people inserted into the
    Wikidata database so far. And we will
  • 1:19 - 1:24
    start with the explicit knowledge. So the
    explicit data in Wikidata, with Max.
  • 1:24 - 1:28
    Max: So. Right. What what is Wikidata?
    Maybe you have heard about Wikidata, then
  • 1:28 - 1:33
    that's all fine. Maybe you haven't, then
    surely you've heard of Wikipedia. And
  • 1:33 - 1:37
    Wikipedia is run by the Wikimedia
    Foundation and the Wikimedia Foundation
  • 1:37 - 1:41
    has several other projects. And one of
    those is Wikidata. And Wikidata is
  • 1:41 - 1:45
    basically a large graph that encodes
    machine readable knowledge in the form of
  • 1:45 - 1:52
    statements. And a statement basically
    consists of some entity that is connected
  • 1:52 - 1:58
    – or some some entities that are connected
    by some property. And these properties
  • 1:58 - 2:03
    can then even have annotations on them.
    So, for example, we have Donna Strickland
  • 2:03 - 2:09
    here and we encode that she has received a
    Nobel prize in physics last year by this
  • 2:09 - 2:16
    property "awarded" and this has then a
    qualifier "time: 2018" and also "for:
  • 2:16 - 2:23
    Chirped Pulse Amplification". And all in
    all, we have some 890 million statements
  • 2:23 - 2:32
    on Wikidata that connect 71 million items
    using 7000 properties. But there's also a
  • 2:32 - 2:37
    bit more. So we also know that Donna
    Strickland has "field of work: optics" and
  • 2:37 - 2:41
    also "field of work: lasers" so we can use
    the same property to connect some entity
  • 2:41 - 2:46
    with different other entities. And we
    don't even have to have knowledge that
  • 2:46 - 2:57
    connects the entities. We can have a date
    of birth, which is 1959. Nineteen ninety.
  • 2:57 - 3:06
    No. Nineteen fifty nine. Yes. And this is
    then just a plain date, not an entity. And
  • 3:06 - 3:12
    now coming from the explicit knowledge
    then, well, we have some more we have
  • 3:12 - 3:16
    Donna Strickland has received a Nobel
    prize in physics and also Marie Curie has
  • 3:16 - 3:21
    received the Nobel prize in physics. And
    we also know that Marie Curie has a Nobel
  • 3:21 - 3:28
    prize ID that starts with "phys" and then
    "1903" and some random numbers that
  • 3:28 - 3:33
    basically are this ID. Then Marie Curie
    also has received a Nobel prize in
  • 3:33 - 3:39
    chemistry in 1911. So she has another
    Nobel ID that starts with "chem" and has
  • 3:39 - 3:44
    "1911" there. And then there's also
    Frances Arnold, who received the Nobel
  • 3:44 - 3:49
    prize in chemistry last year. So she has a
    Nobel ID that starts with "chem" and has
  • 3:49 - 3:55
    "2018" there. And now one one could assume
    that, well, everybody who was awarded the
  • 3:55 - 4:00
    Nobel prize should also have a Nobel ID.
    So everybody who was awarded the Nobel
  • 4:00 - 4:06
    prize should also have a Nobel prize ID,
    and we could write that as some
  • 4:06 - 4:12
    implication here. So "awarded(nobelPrize)"
    implies "nobelID". And well, if you
  • 4:12 - 4:16
    look sharply at this picture, then there's
    this arrow here conspicuously missing that
  • 4:16 - 4:23
    Donald Strickland doesn't have a Nobel
    prize ID. And indeed, there's 25 people
  • 4:23 - 4:27
    currently on Wikidata that are missing
    Nobel prize IDs, and Donna Strickland is
  • 4:27 - 4:34
    one of them. So we call these people that
    don't satisfy this implication – we call
  • 4:34 - 4:40
    those counterexamples and well, if you
    look at Wikidata on the scale of really
  • 4:40 - 4:45
    these 890 million statements, then you
    won't find any counterexamples because
  • 4:45 - 4:53
    it's just too big. So we need some way to
    automatically do that. And the idea is
  • 4:53 - 4:59
    that, well, if we had this knowledge that
    while some implications are not satisfied,
  • 4:59 - 5:04
    then this encodes maybe missing
    information or wrong information, and we
  • 5:04 - 5:11
    want to represent that in a way that is
    easy to understand and also succinct. So
  • 5:11 - 5:16
    it doesn't take long to write it down, it
    should have a short representation. So
  • 5:16 - 5:23
    that rules out anything, including complex
    syntax or logical quantifies. So no SPARQL
  • 5:23 - 5:27
    queries as a description of that implicit
    knowledge. No description logics, if
  • 5:27 - 5:33
    you've heard of that. And we also want
    something that we can actually compute on
  • 5:33 - 5:42
    actual hardware in a reasonable timeframe.
    So our approach is we use Formal Concept
  • 5:42 - 5:47
    Analysis, which is a technique that has
    been developed over the past several years
  • 5:47 - 5:52
    to extract what is called propositional
    implications. So just logical formulas of
  • 5:52 - 5:56
    propositional logic that are an
    implication in the form of this
  • 5:56 - 6:03
    "awarded(nobelPrize)" implies "nobleID".
    So what exactly is Formal Concept
  • 6:03 - 6:08
    Analysis? Off to Tom.
    T: Thank you. So what is Formal Concept
  • 6:08 - 6:14
    Analysis? It was developed in 1980s by a
    guy called Rudolf Wille and Bernard Ganter
  • 6:14 - 6:19
    and they were restructuring lattice
    theory. Lattice theory is an ambiguous
  • 6:19 - 6:23
    name in math, it has two meanings: One
    meaning is you have a grid and have a
  • 6:23 - 6:29
    lattice there. The other thing is to speak
    about orders – order relations. So I like
  • 6:29 - 6:34
    steaks, I like pudding and I like steaks
    more than pudding. And I like rice more
  • 6:34 - 6:41
    than steaks. That's an order, right? And
    lattices are particular orders which can
  • 6:41 - 6:47
    be used to represent propositional logic.
    So easy rules like "when it rains, the
  • 6:47 - 6:53
    street gets wet", right? So and the data
    representation those guys used back then,
  • 6:53 - 6:57
    they called it a formal context, which is
    basically just a set of objects – they
  • 6:57 - 7:02
    call them objects, it's just a name –, a
    set of attributes and some incidence,
  • 7:02 - 7:08
    which basically means which object does
    have which attributes. So, for example, my
  • 7:08 - 7:13
    laptop has the colour black. So this
    object has some property, right? So that's
  • 7:13 - 7:18
    a small example on the right for such a
    formal context. So the objects there are
  • 7:18 - 7:24
    some animals: a platypus – that's the fun
    animal from Australia, the mammal which is
  • 7:24 - 7:30
    also laying eggs and which is also
    venomous –, a black widow – the spider –,
  • 7:30 - 7:35
    the duck and the cat. So we see, the
    platypus has all the properties; it has
  • 7:35 - 7:40
    being venomous, laying eggs and being a
    mammal; we have the duck, which is not a
  • 7:40 - 7:44
    mammal, but it lays eggs, and so on and so
    on. And it's very easy to grasp some
  • 7:44 - 7:49
    implicational knowledge here. An easy rule
    you can find is whenever you endeavour a
  • 7:49 - 7:54
    mammal that is venomous, it has to lay
    eggs. So this is a rule that falls out of
  • 7:54 - 8:00
    this binary data table. Our main problem
    then or at this point is we do not have
  • 8:00 - 8:03
    such a data table for Wikidata, right? We
    have the implicit graph, which is way more
  • 8:03 - 8:09
    expressive than binary data, and we cannot
    even store Wikidata as a binary table.
  • 8:09 - 8:14
    Even if you tried to, we have no chance to
    compute such rules from that. And for
  • 8:14 - 8:21
    this, the people from Formal Context
    Analysis proposed an algorithm to extract
  • 8:21 - 8:27
    implicit knowledge from an expert. So our
    expert here could be Wikidata. It's an
  • 8:27 - 8:31
    expert, you can ask Wikidata questions,
    right? Using this SPARQL interface, you
  • 8:31 - 8:35
    can ask. You can ask "Is there an example
    for that? Is there a counterexample for
  • 8:35 - 8:40
    something else?" So the algorithm is quite
    easy. The algorithm is the algorithm and
  • 8:40 - 8:45
    some expert – in our case, Wikidata –, and
    the algorithm keeps notes for
  • 8:45 - 8:49
    counterexamples and keeps notes for valid
    implications. So in the beginning, we do
  • 8:49 - 8:54
    not have any valid implications, so this
    list on the right is empty, and in the
  • 8:54 - 8:57
    beginning we do not have any
    counterexamples. So the list on the left,
  • 8:57 - 9:02
    the formal context to build up is also
    empty. And all the algorithm does now is,
  • 9:02 - 9:09
    it asks "is this implication, X follows Y,
    Y follows X or X implies Y, is it true?"
  • 9:09 - 9:14
    So "is it true," for example, "that an
    animal that is a mammal and is venomous
  • 9:14 - 9:19
    lays eggs?" So now the expert, which in
    our case is Wikidata, can answer it. We
  • 9:19 - 9:25
    can query that. We showed in our paper we
    can query that. So we query it, and if the
  • 9:25 - 9:28
    Wikidata expert does not find any
    counterexamples, it will say, ok, that's
  • 9:28 - 9:36
    maybe a true, true thing; it's yes. Or if
    it's not a true implication in Wikidata,
  • 9:36 - 9:42
    it can say, no, no, no, it's not true, and
    here's a counterexample. So this is
  • 9:42 - 9:49
    something you contradict by example. You
    say this rule cannot be true. For example,
  • 9:49 - 9:53
    when the street is wet, that does not mean
    it has rained, right? It could be the
  • 9:53 - 10:01
    cleaning service car or something else. So
    our idea now was to use Wikidata as an
  • 10:01 - 10:06
    expert, but also include a human into this
    loop. So we do not just want to ask
  • 10:06 - 10:12
    Wikidata, we also want to ask a human
    expert as well. So we first ask in our
  • 10:12 - 10:19
    tool the Wikidata expert for some rule.
    After that, we also inquire the human
  • 10:19 - 10:22
    expert. And he can also say "yeah, that's
    true, I know that," or "No, no. Wikidata
  • 10:22 - 10:27
    is not aware of this counterexample, I
    know one." Or, in the other case "oh,
  • 10:27 - 10:33
    Wikidata says this is true. I am aware of
    a counterexample." Yeah, and so on and so
  • 10:33 - 10:38
    on. And you can represent this more or
    less – this is just some mathematical
  • 10:38 - 10:42
    picture, it's not very important. But you
    can see on the left there's an exploration
  • 10:42 - 10:47
    going on, just Wikidata with the
    algorithm, on the right an exploration, a
  • 10:47 - 10:51
    human expert versus Wikidata which can
    answer all the queries. And we combined
  • 10:51 - 10:58
    those two into one small tool, still under
    development. So, back to Max.
  • 10:58 - 11:03
    M: Okay. So far for that to work, we
    basically need to have a way of viewing
  • 11:03 - 11:08
    Wikidata, or at least parts of Wikidata,
    as a formal context. And this formal
  • 11:08 - 11:14
    context, well, this was a binary table, so
    what do we do? We just take all the items
  • 11:14 - 11:19
    in Wikidata as objects and all the
    properties as attributes of our context
  • 11:19 - 11:24
    and then have an incidence relation that
    says "well, this entity has this
  • 11:24 - 11:31
    property," so it is incident there, and
    then we end up with a context that has 71
  • 11:31 - 11:36
    million rows and seven thousand columns.
    So, well, that might actually be a slight
  • 11:36 - 11:40
    problem there, because we want to have
    something that we can run on actual
  • 11:40 - 11:46
    hardware and not on a supercomputer. So
    let's maybe not do that and focus on
  • 11:46 - 11:51
    a smaller set of properties that are
    actually related to one another through
  • 11:51 - 11:56
    some kind of common domain, yeah? So it
    doesn't make any sense to have a property
  • 11:56 - 12:00
    that relates to spacecraft and then a
    property that relates to books – that's
  • 12:00 - 12:05
    probably not a good idea to try to find
    implicit knowledge between those two. But
  • 12:05 - 12:10
    two different properties about spacecraft,
    that sounds good, right? And then the
  • 12:10 - 12:15
    interesting question is just how do we
    define the incidence for our set of
  • 12:15 - 12:20
    properties? And that actually depends very
    much on which properties we choose,
  • 12:20 - 12:26
    because it does – for some properties, it
    makes sense to account for the direction
  • 12:26 - 12:33
    of the statement: So there is a property
    called parent? Actually, no, it's child,
  • 12:33 - 12:38
    and then there's father and mother, and
    you don't want to turn those around, as do
  • 12:38 - 12:44
    you want to have "A is a child of B," that
    should be something different than "B
  • 12:44 - 12:49
    is a child of A." Then there's the
    qualifiers that might be important for
  • 12:49 - 12:55
    some properties. So receiving an award for
    something might be something different
  • 12:55 - 13:01
    than receiving an award for something
    else. But while receiving an award in 2018
  • 13:01 - 13:07
    and receiving one in 2017, that's probably
    more or less the same thing, so we don't
  • 13:07 - 13:12
    necessarily need to differentiate that.
    And there's also a thing called subclasses
  • 13:12 - 13:15
    and they form a hierarchy on Wikidata. And
    you might also want to take that into
  • 13:15 - 13:20
    account because while winning something
    that is a Nobel prize, that means also
  • 13:20 - 13:25
    winning an award itself, and winning the
    Nobel Peace prize means winning a peace
  • 13:25 - 13:33
    prize. So there's also implications going
    on there that you want to respect. So,
  • 13:33 - 13:38
    to see how we actually do that, let's look
    at an example. So we have here, well, this
  • 13:38 - 13:47
    is Donald Strickland. And – I forgot his
    first name – Ashkin, this is one of the
  • 13:47 - 13:52
    people that won the Nobel prize in physics
    with her last year. And also Gérard
  • 13:52 - 13:58
    Mourou. That is the third one. They all
    got the Nobel prize in physics last year.
  • 13:58 - 14:04
    So we have all these statements here, and
    these two have a qualifier that says
  • 14:04 - 14:10
    "with: Gérard Mourou" here. And I don't
    think the qualifier is on this statement
  • 14:10 - 14:15
    here, actually, but it doesn't actually
    matter. So what we've done here is,
  • 14:15 - 14:21
    put all the entities in the small graph as
    rows in the table. So we have Strickland
  • 14:21 - 14:28
    and Mourou and Ashkin, and also Arnold and
    Curie that are not in the picture. But you
  • 14:28 - 14:33
    can maybe remember that. And then here we
    have awarded, and we scaled that by the
  • 14:33 - 14:37
    instance of the different Nobel prizes
    that people have won. So that's the
  • 14:37 - 14:42
    physics Nobel in the first column, the
    chemistry Nobel Prize in the second column
  • 14:42 - 14:48
    and just general Nobel prizes in the third
    column. There's awarded and that is scaled
  • 14:48 - 14:55
    by the "with" qualifier, so awarded with
    Gérard Mourou. And then there's field of
  • 14:55 - 15:00
    work, and we have lasers here and
    radioactivity, so we scale by the actual
  • 15:00 - 15:07
    field of work that people have. And well
    then, if we look at what kind of incidence
  • 15:07 - 15:11
    we get for Donna Strickland, she has a
    Nobel prize in physics and that is also a
  • 15:11 - 15:17
    Nobel prize, and she has that together
    with Mourou. And she has "field of work:
  • 15:17 - 15:23
    lasers," but not radioactivity. Then,
    Mourou himself: he has a Nobel prize in
  • 15:23 - 15:29
    physics, and that is a Nobel prize, but
    none of the others. Ashkin gets the Nobel
  • 15:29 - 15:34
    prize in physics, and that is still a
    Nobel prize, and he gets that with Gérard
  • 15:34 - 15:41
    Mourou. And also he works on lasers, but
    not in radioactivity. So Frances Arnold
  • 15:41 - 15:47
    has a Nobel prize in chemistry, and that
    is a Nobel prize. And Marie Curie, she has
  • 15:47 - 15:51
    a Nobel prize in physics and one in
    chemistry, and they are both a Nobel
  • 15:51 - 15:55
    prize. And she also works on
    radioactivity. But lasers didn't exist
  • 15:55 - 16:02
    back then, so she doesn't get "field of
    work: lasers." And then basically this
  • 16:02 - 16:10
    table here is a representation of our
    formal context. So and then we've actually
  • 16:10 - 16:15
    gone ahead and started building a tool
    where you can interactively do all these
  • 16:15 - 16:20
    things, and it will take care of building
    the context for you. You just put in the
  • 16:20 - 16:25
    properties, and Tom will show
    you how that works.
  • 16:25 - 16:29
    T: So here you see some first screenshots
    of this tool. So please do not comment on
  • 16:29 - 16:33
    the graphic design. We have no idea about
    that, we have to ask someone about that.
  • 16:33 - 16:36
    We're just into logics, more or less. On
    the left, you see the initial state of the
  • 16:36 - 16:41
    game. On the left you have five boxes:
    they're called countries and borders,
  • 16:41 - 16:47
    credit cards, use of energy, memory and
    computation – I think –, and space
  • 16:47 - 16:53
    launches, which are just presets we
    defined. You can explore, for example, in
  • 16:53 - 16:57
    the case of the credit card, you can
    explore the properties from Wikidata which
  • 16:57 - 17:02
    are called "card network," "operator," and
    "fee," so you can just choose one of them,
  • 17:02 - 17:06
    or on the right, "custom properties," you
    can just input the properties you're
  • 17:06 - 17:11
    interested in Wikidata, whatever one of
    the seven thousand you like, or some
  • 17:11 - 17:15
    number of them. On the right, I chose then
    the credit card thingy and I now want to
  • 17:15 - 17:22
    show you what happens if you now explore
    these properties, right? The first step in
  • 17:22 - 17:26
    the game is that the game will ask – I
    mean, the game, the exploration process –
  • 17:26 - 17:31
    will ask, is it true that every entity in
    Wikidata will have these three properties?
  • 17:31 - 17:36
    So are they common among all entities in
    your data, which is most probably not
  • 17:36 - 17:42
    true, right? I mean, not everything in
    Wikidata has a fee, at least I hope. So,
  • 17:42 - 17:47
    what I will do now, I would click the
    "reject this implication" button, since
  • 17:47 - 17:51
    the implication "Nothing implies
    everything" is not true. In the second
  • 17:51 - 17:56
    step now, the algorithm tries to find the
    minimal number of questions to obtain the
  • 17:56 - 18:02
    domain knowledge, so to obtain all valid
    rules in this domain. So next question is
  • 18:02 - 18:06
    "is it true that everything in Wikidata
    that has a 'card network' property also
  • 18:06 - 18:13
    has a 'fee' and an 'operator' property?"
    And down here you can see Wikidata says
  • 18:13 - 18:18
    "ok, there are 26 items which are
    counterexamples," so there's 26 items in
  • 18:18 - 18:23
    Wikidata which have the "card network"
    property but do not have the other two
  • 18:23 - 18:28
    ones. So, 26 is not a big number, this
    could mean "ok, that's an error, so 26
  • 18:28 - 18:33
    statements are missing." Or maybe that
    that's, really, that's the true case.
  • 18:33 - 18:37
    That's also ok. But you can now choose
    what you think is right. You can say, "oh,
  • 18:37 - 18:40
    I would say it should be true" or you can
    say "no, I think that's ok, one of these
  • 18:40 - 18:46
    counterexamples seems valid. Let's reject
    it." I in this case, rejected it. The next
  • 18:46 - 18:51
    question it asks: "is it true that
    everything that has an operator has also a
  • 18:51 - 18:56
    fee and a card network?" Yeah, this is
    possibly not true. There's also more than
  • 18:56 - 19:03
    1000 counterexamples, one being, I think a
    telecommunication operator in Hungary or
  • 19:03 - 19:10
    something. And so we can reject this as
    well. Next question, everything that has
  • 19:10 - 19:15
    an operator and a card network – so card
    network means Visa, MasterCard, whatever,
  • 19:15 - 19:22
    all this stuff – is it true that they have
    to have a fee?" Wikidata says "no," it has
  • 19:22 - 19:28
    23 items that contradict it. But one of
    the items, for example, is the American
  • 19:28 - 19:32
    Express Gold Card. I suppose the American
    Express Gold Card has some fee. So this
  • 19:32 - 19:36
    indicates, "oh, there is some missing data
    in Wikidata," there is something that
  • 19:36 - 19:41
    Wikidata does not know but should know to
    reason correctly in Wikidata with your
  • 19:41 - 19:47
    SPARQL queries. So we can now say, "yeah,
    that's, uh, that's not a reject, that's an
  • 19:47 - 19:51
    accept," because we think it should be
    true. But Wikidata thinks otherwise. And
  • 19:51 - 19:56
    you go on, we go on. This is then the last
    question: "Is it true that everything that
  • 19:56 - 20:01
    has a fee and a card work should have an
    operator," and you see, "oh, no counter
  • 20:01 - 20:06
    examples." This means Wikidata says "this
    is true," because it says there is no
  • 20:06 - 20:10
    counterexample. If you're asking Wikidata
    it says this is a valid implication in the
  • 20:10 - 20:15
    data set so far, which could also be
    indicating that something is missing, I'm
  • 20:15 - 20:20
    not aware if this is possible or not, but
    ok, for me it sounds reasonable. Everyone
  • 20:20 - 20:24
    has a fee and a card network should also
    have an operator, which meens a bank or
  • 20:24 - 20:29
    something like that. So I accept this
    implication. And then, yeah, you have won
  • 20:29 - 20:34
    the exploration game, which essentially
    means you've won some knowledge. Thank
  • 20:34 - 20:40
    you. And the knowledge is that you know
    which implications in Wikidata are true or
  • 20:40 - 20:44
    should be true from your point of view.
    And yeah, this is more or less the state
  • 20:44 - 20:51
    of the game so far as we programmed it in
    October. And the next state will be to
  • 20:51 - 20:55
    show you some – "How much does your
    opinion of the world differ from the
  • 20:55 - 21:00
    opinion that is now reflected in the
    data?" So is what you think about the data
  • 21:00 - 21:05
    true, close to true to what is true in
    Wikidata. Or maybe Wikidata has wrong
  • 21:05 - 21:11
    information. You can find it with that.
    But Max will tell me more about that.
  • 21:11 - 21:18
    M: Ok. So let me just quickly come
    back to what we have actually done. So we
  • 21:18 - 21:24
    offer a procedure that allows you to
    explore properties in Wikidata and the
  • 21:24 - 21:31
    implicational knowledge that holds between
    these properties. And the key idea's here
  • 21:31 - 21:35
    that when you look at these implications
    that you get, while there might be some
  • 21:35 - 21:39
    that you don't actually want because they
    shouldn't be true, and there might also be
  • 21:39 - 21:46
    ones that you don't get, but you expect to
    get because they should hold. And these
  • 21:46 - 21:52
    unwanted and/or missing implications, they
    point to missing statements and items in
  • 21:52 - 21:56
    Wikidata. So they show you where the
    opportunities to improve the knowledge in
  • 21:56 - 22:00
    Wikidata are, and, well, sometimes you
    also get to learn something about the
  • 22:00 - 22:04
    world, and in most cases, it's that the
    world is more complicated than you thought
  • 22:04 - 22:10
    it was – and that's just how life is. But
    in general, implications can guide you in
  • 22:10 - 22:17
    your way of improving Wikidata and the
    state of knowledge therein. So what's
  • 22:17 - 22:22
    next? Well, so what we currently don't
    offer in the exploration game and what we
  • 22:22 - 22:28
    definitely will focus next on is having
    configurable counterexamples and also
  • 22:28 - 22:32
    filterable counterexamples – right now you
    just get a list of a random number of
  • 22:32 - 22:37
    counterexamples. And you might want to
    search through this list for something you
  • 22:37 - 22:43
    recognise and you might also want to
    explicitly say, well, this one should be a
  • 22:43 - 22:49
    counterexample, and that's definitely
    coming next. Then, well, domain specific
  • 22:49 - 22:54
    scaling of properties, there's still much
    work to be done. Currently, we only have
  • 22:54 - 23:00
    some very basic support for that. So you
    can have properties, but you can't do the
  • 23:00 - 23:04
    fancy things where you say, "well,
    everything that is an award should be
  • 23:04 - 23:11
    considered as one instance of this
    property." That's also coming and then
  • 23:11 - 23:16
    what Tom mentioned alread: compare your
    knowledge that you have explored through
  • 23:16 - 23:22
    this process against the knowledge that is
    currently on Wikidata as a form of seeing
  • 23:22 - 23:27
    "where do you stand? What is missing in
    Wikidata? How can you improve Wikidata?"
  • 23:27 - 23:33
    And well, if you have any more suggestions
    for features, then just tell us. There's a
  • 23:33 - 23:40
    Github link on the implication game page.
    And here's the link to the tool again. So,
  • 23:40 - 23:46
    yeah, just let us know. Open an issue and
    have fun. And if you have any questions,
  • 23:46 - 23:50
    then I guess now would be the time to ask.
    T: Thank you.
  • 23:50 - 23:53
    Herald: Thank you very much, Tom and Max.
  • 23:53 - 23:55
    applause
  • 23:55 - 24:02
    Herald: So we will switch microphones now
    because then I can hand this microphone to
  • 24:02 - 24:07
    you if any of you have a question for our
    two speakers. Are there any questions or
  • 24:07 - 24:14
    suggestions? Yes.
    Question: Hi. Thanks for the nice talk. I
  • 24:14 - 24:19
    wanted to ask what's the first question,
    what's the most interesting implication
  • 24:19 - 24:25
    that you've found?
    M: Yeah. That would have made for a
  • 24:25 - 24:32
    good back up slide. The most interesting
    implication so far –
  • 24:32 - 24:36
    T: The most basic thing you would expect
    everything that is launched in space by
  • 24:36 - 24:42
    humans – no, everything that landed from
    space, that has a landing date, also has a
  • 24:42 - 24:46
    start date. So nothing landed on earth,
    which was not started here.
  • 24:46 - 24:55
    M: Yes.
    Q: Right now, the game only helps you find
  • 24:55 - 25:01
    out implications. Are you also planning to
    have that I can also add data like for
  • 25:01 - 25:04
    example, let's say I have twenty five
    Nobel laureates who don't have a Nobel
  • 25:04 - 25:08
    laureate ID. Is there plans where you
    could give me a simple interface for me to
  • 25:08 - 25:13
    Google and add that ID because it would
    make the process of adding new entities to
  • 25:13 - 25:17
    Wikidata itself more simple.
    M: Yes. And that's partly hidden
  • 25:17 - 25:23
    behind this "configurable and filterable
    counterexamples" thing. We will probably
  • 25:23 - 25:28
    not have an explicit interface for adding
    stuff, but most likely interface with some
  • 25:28 - 25:32
    other tool built around Wikidata, so
    probably something that will give you
  • 25:32 - 25:37
    QuickStatements or something like that.
    But yes, adding data is definitely on the
  • 25:37 - 25:42
    roadmap.
    Herald: Any more questions? Yes.
  • 25:42 - 25:49
    Q: Wouldn't it be nice to do this in other
    languages, too?
  • 25:49 - 25:53
    T: Actually it's language independent, so
    we use Wikidata and then as far as we
  • 25:53 - 25:58
    know, Wikidata has no language itself. You
    know, it has just items and properties, so
  • 25:58 - 26:03
    Qs and Ps, and whatever language you use,
    it should be translated in the language of
  • 26:03 - 26:06
    the properties, if there is a label for
    that property or for that item that you
  • 26:06 - 26:12
    have. So if Wikidata is aware of your
    language, we are.
  • 26:12 - 26:15
    Herald: Oh, yes. More!
    M: Of course, the tool still needs to be
  • 26:15 - 26:18
    translated, but –
    T: The tool itself, it should be.
  • 26:18 - 26:22
    Q: Hi, thanks for the talk. I have a
    question. Right now you only can find
  • 26:22 - 26:26
    missing data with this, right? Or surplus
    data. Would you think you'd be able to
  • 26:26 - 26:32
    find wrong information with a similar
    approach.
  • 26:32 - 26:37
    T: Actually, we do. I mean, if we Wikidata
    has a counterexample to something we would
  • 26:37 - 26:43
    expect to be true, this could point to
    wrong data, right? If the counterexample
  • 26:43 - 26:47
    is a wrong counterexample. If there is a
    missing property or missing property to an
  • 26:47 - 26:58
    item.
    Q: Ok, I get to ask a second question. So
  • 26:58 - 27:06
    the horizontal axis in the incidence
    matrix. You said it has 7000, it spans
  • 27:06 - 27:10
    7000 columns, right?
    M: Yes, because there's 7000 properties in
  • 27:10 - 27:14
    Wikidata.
    Q: But it's actually way more columns,
  • 27:14 - 27:18
    right? Because you multiply the properties
    times the arguments, right?
  • 27:18 - 27:21
    M: Yes. So if you do any scaling then of
    course that might give you multiple
  • 27:21 - 27:23
    entries.
    Q: So that's what you mean with scaling,
  • 27:23 - 27:28
    basically?
    M: Yes. But already seven thousand is way
  • 27:28 - 27:36
    too big to actually compute that.
    Q: How many would it be if you multiply
  • 27:36 - 27:48
    all the arguments?
    M: I have no idea, probably a few million.
  • 27:48 - 27:55
    Q: Have you thought about a recursive
    method, as counterexamples may be wrong by
  • 27:55 - 28:00
    other counterexamples, like in an
    argumentative graph or something like
  • 28:00 - 28:07
    this?
    T: Actually, I don't get it. How can a
  • 28:07 - 28:14
    counterexample be wrong through another
    counterxample?
  • 28:14 - 28:24
    Q: Maybe some example says that cats can
    have golden hair and then another example
  • 28:24 - 28:31
    might say that this is not a cat.
    T: Ah, so the property to be a cat or
  • 28:31 - 28:38
    something cat-ish is missing then. Okay.
    No, we have not considered so far deeper
  • 28:38 - 28:45
    reasoning. This horn-propositional logic,
    you know, it has no contradictions,
  • 28:45 - 28:48
    because all you can do is you can
    contradict by counterexamples, but there
  • 28:48 - 28:53
    can never be a rule that is not true, so
    far. Just in your or my opinion, maybe,
  • 28:53 - 28:56
    but not in the logic. So what we have to
    think about is that we have bigger
  • 28:56 - 29:02
    reasoning, right? So.
    Q: Sorry, quick question. Because you're
  • 29:02 - 29:05
    not considering all the 7000 odd
    properties for each of the entities,
  • 29:05 - 29:08
    right? What's your current process of
    filtering? What are the relevant
  • 29:08 - 29:15
    properties? I'm sorry, I didn't get that.
    M: Well, we basically handpick those. So
  • 29:15 - 29:20
    you have this input field? Yeah, we can go
    ahead and select our properties. We also
  • 29:20 - 29:27
    have some predefined sets. Okay. And
    there's also some classes for groups of
  • 29:27 - 29:31
    properties that are related that you could
    use if you want bigger sets,
  • 29:31 - 29:36
    T: for example, space or family or what
    was the other?
  • 29:36 - 29:43
    M: Awards is one.
    T: It depends on the size of the class.
  • 29:43 - 29:47
    For example, for space, it's not that
    much, I think it's 10 or 15 properties. It
  • 29:47 - 29:52
    will take you some hours, but you can do
    because they are 15 or something like
  • 29:52 - 29:58
    that. I think for family, it's way too
    much, it's like 40 of 50 properties. So a
  • 29:58 - 30:05
    lot of questions.
    Herald: I don't see any more hands. Maybe
  • 30:05 - 30:10
    someone who has not asked the question yet
    has another one we could take that,
  • 30:10 - 30:14
    otherwise we would be perfectly on time.
    And maybe you can tell us where you will
  • 30:14 - 30:19
    be for deeper discussions where people can
    find you.
  • 30:19 - 30:22
    T: Probably at the couches.
    Herald: The couches, behind our stage.
  • 30:22 - 30:27
    M: Or just running around somewhere. So
    there's also our DECT numbers on the
  • 30:27 - 30:36
    slides; it's 6284 for Tom and 6279 for me.
    So just call and ask where we're hanging
  • 30:36 - 30:38
    around.
    H: Well then, thank you again. Have a
  • 30:38 - 30:40
    round of applause.
    applause
  • 30:40 - 30:43
    T: Thank you.
    M: Well, thanks for having us.
  • 30:43 - 30:45
    Applause
  • 30:45 - 30:50
    postroll music
  • 30:50 - 31:12
    subtitles created by c3subtitles.de
    in the year 2020. Join, and help us!
Title:
36C3 Wikipaka WG: Interactively Discovering Implicational Knowledge in Wikidata
Description:

more » « less
Video Language:
English
Duration:
31:12

English subtitles

Revisions