Return to Video

cdn.media.ccc.de/.../wikidatacon2019-1101-eng-Barriers_to_Using_Wikidata_as_a_Knowledge_Base_hd.mp4

  • 0:06 - 0:09
    (woman) Hello everyone.
    Thank you for being here this afternoon.
  • 0:09 - 0:11
    We are first going to hear--
  • 0:11 - 0:13
    I'm just going to jump straight in
    to give him plenty of time--
  • 0:13 - 0:16
    so we're first going to hear from
    Peter Patel-Schneider
  • 0:16 - 0:20
    about barriers to using Wikidata
    as a knowledge base.
  • 0:20 - 0:21
    (Peter) Thank you.
  • 0:23 - 0:26
    I'll skip over the abstract
    because you've already seen it all.
  • 0:26 - 0:29
    And I should say a little bit
    about myself.
  • 0:32 - 0:37
    I'm much more of a user of Wikidata
    than an actual editor of Wikidata,
  • 0:37 - 0:42
    and much more of a user of Wikidata
    than somebody who contributes to Wikidata,
  • 0:42 - 0:45
    but I very much believe in
    the aims of Wikidata.
  • 0:45 - 0:47
    In particular, it aligns
    with my research areas
  • 0:47 - 0:51
    which is knowledge representation,
    at least in a certain sense.
  • 0:51 - 0:55
    I worked in description logics
    for a long time, worked with W3C.
  • 0:55 - 0:59
    I've worked in Silicon Valley for a while,
  • 0:59 - 1:02
    largely building what might be called
    knowledge graphs,
  • 1:02 - 1:04
    but I don't like the term
    knowledge graphs--
  • 1:04 - 1:05
    I don't like what they mean,
  • 1:05 - 1:08
    I want to do something better
    than knowledge graphs.
  • 1:08 - 1:10
    And I want to put this together
    from various sources.
  • 1:10 - 1:13
    So Wikidata is a very, very good one,
  • 1:13 - 1:16
    but DBpedia is not so good.
  • 1:16 - 1:19
    Freebase is dead.
  • 1:19 - 1:22
    Open Street Map,
    Open Movie Database, things like that.
  • 1:22 - 1:25
    And then I want to use
    this store of knowledge
  • 1:25 - 1:26
    to do something.
  • 1:26 - 1:32
    And I want to use it as the source
    of knowledge to do something,
  • 1:32 - 1:36
    and not only just facts
    but also organizing my knowledge.
  • 1:36 - 1:39
    And currently, working where I am,
  • 1:39 - 1:43
    we're interested in supporting
    conversational agents.
  • 1:43 - 1:49
    Not just things that let you play Avatar,
  • 1:49 - 1:53
    but lets you play the movie
    that's directed by the wife
  • 1:53 - 1:56
    of the director of Avatar.
  • 1:56 - 2:00
    So how can we build a conversational agent
    that will do something like that?
  • 2:00 - 2:04
    Well, you need to know
    all the facts that go behind it,
  • 2:04 - 2:07
    but you also need to know
    that the fact that there are movies--
  • 2:07 - 2:10
    not just, we have Avatar,
    but that we have movies--
  • 2:10 - 2:12
    we need to know things about movies,
  • 2:12 - 2:15
    we need to know things
    about directorships.
  • 2:15 - 2:18
    We need to know things about humans--
    that they're married to each other.
  • 2:18 - 2:21
    We need to know that there are men
    and women in the world,
  • 2:21 - 2:25
    and somehow be able to use
    this knowledge of what we're saying
  • 2:25 - 2:28
    to come up with the actual reference
    to these things,
  • 2:28 - 2:32
    and then actually do
    what we were asked to do.
  • 2:32 - 2:34
    So, though it's one end,
  • 2:34 - 2:37
    the other thing
    that we want to be able to do
  • 2:37 - 2:41
    is if you think of systems like Siri,
    there are hundreds or thousands--
  • 2:41 - 2:43
    actually, maybe Siri's not
    the best example.
  • 2:43 - 2:50
    The Amazon system has hundreds
    or thousands of little programs
  • 2:50 - 2:52
    that will do something for you.
  • 2:52 - 2:53
    And the problem that we're interested in
  • 2:53 - 2:56
    is how do you pick
    which one can do something.
  • 2:56 - 3:01
    So for example, which back-end
    can find me train trips
  • 3:01 - 3:05
    between San Francisco and Palo Alto.
  • 3:05 - 3:09
    There may be many systems
    that will try and sell me train tickets,
  • 3:09 - 3:14
    but only one or perhaps two of them
    will sell me that particular train ticket.
  • 3:14 - 3:19
    And how do I get the system to do that
    without having to be able to tell it
  • 3:19 - 3:21
    that I want a Caltrain ticket.
  • 3:23 - 3:29
    So, what happens is I want to use Wikidata
    as the source of a lot of this stuff,
  • 3:29 - 3:32
    and I regularly run into problems.
  • 3:32 - 3:36
    And from those problems,
    I have a bunch of suggestions.
  • 3:37 - 3:41
    You may agree with my suggestions
    or disagree with them.
  • 3:41 - 3:45
    Some of them are kind of on their way
    to being implemented in Wikidata,
  • 3:45 - 3:47
    some of them aren't.
  • 3:47 - 3:50
    So, I'm going to do this talk
    from the back forward.
  • 3:50 - 3:54
    I'm going to give you the summary,
    and then an expansion of the summary,
  • 3:54 - 3:58
    and then some rationale
    for my suggestions.
  • 3:58 - 4:01
    And the reason I'm going to do that
    is if I started with all of the rationale,
  • 4:01 - 4:04
    I might never get to the end,
    and the end is the important thing,
  • 4:04 - 4:06
    at least in my viewpoint.
  • 4:06 - 4:12
    So, my biggest suggestion, I guess,
    on the community side is,
  • 4:12 - 4:17
    gee, guys, speak with a single voice.
  • 4:17 - 4:19
    (chuckles)
  • 4:20 - 4:24
    And speak with a voice
    where I can find it.
  • 4:24 - 4:26
    So, it turns out
    that one of my suggestions
  • 4:26 - 4:31
    is actually implemented,
    but I only found out about it today,
  • 4:31 - 4:35
    because it's not used very much at all,
    and it's hard to find it.
  • 4:35 - 4:40
    So, I really want you guys--
    and me too, in some sense--
  • 4:40 - 4:44
    to spend some effort at the beginning
    when you're creating these classes
  • 4:44 - 4:46
    and other things that are important,
  • 4:46 - 4:49
    so that a poor user like me,
  • 4:49 - 4:53
    who can't afford to go through five years
    of impassioned discussion
  • 4:53 - 4:57
    to find out what male actually is,
  • 4:57 - 5:01
    can actually use it in our system--
    in my system.
  • 5:01 - 5:03
    So that's sort of on the community side.
  • 5:03 - 5:05
    I'm a formalist.
  • 5:05 - 5:09
    I really want to--
    and my programs are dumb.
  • 5:09 - 5:12
    I don't write smart programs,
    I write dumb programs.
  • 5:12 - 5:15
    Now, they tend to be
    very fancy dumb programs,
  • 5:15 - 5:20
    but these dumb programs
    can't really handle all of the shades
  • 5:20 - 5:26
    of everything that you have
    with start time, end time, inception.
  • 5:26 - 5:29
    I want to have some simple
    formal mechanism
  • 5:29 - 5:33
    that will tell my program what's true now,
  • 5:33 - 5:36
    or what's true in 1987,
  • 5:36 - 5:39
    without having to search through
    a bunch of things,
  • 5:39 - 5:42
    and make a bunch of guesses,
    and use a lot of heuristics,
  • 5:42 - 5:45
    or have a machine-learning program
    that's done for this particular task.
  • 5:45 - 5:50
    I just want you to tell me
    this stuff somehow, and have a take.
  • 5:50 - 5:54
    So, I want to be able
    to look at something which says
  • 5:54 - 5:58
    what the things I see in Wikidata
    actually mean.
  • 5:58 - 6:01
    And I don't find that these days.
  • 6:01 - 6:03
    And then, of course, once we have that,
  • 6:03 - 6:07
    I want somebody--
    I'm willing to do some of this work--
  • 6:07 - 6:11
    build tools that actually use
    that formal description and say,
  • 6:11 - 6:16
    tell me, for example,
    if I'm an instance
  • 6:16 - 6:22
    of architectural structure,
    like the Eiffel Tower,
  • 6:22 - 6:24
    am I a geographic location?
  • 6:26 - 6:27
    I don't know.
  • 6:27 - 6:31
    I mean, Wikidata doesn't tell me
    whether this is true or not.
  • 6:31 - 6:34
    I can find nowhere in Wikidata
    that will do that,
  • 6:34 - 6:36
    because there's no formal thing.
  • 6:36 - 6:38
    But once you give me a formal thing
    then I'm going to write a tool,
  • 6:38 - 6:41
    which essentially gives the implications
    of what the formal things are.
  • 6:43 - 6:48
    The fourth suggestion is about bots.
  • 6:48 - 6:50
    Bots are great.
  • 6:50 - 6:56
    Bots have ultimate power
    and as has been said,
  • 6:56 - 6:58
    with ultimate power,
    comes ultimate responsibility.
  • 6:58 - 7:02
    And I don't believe that bots get
    very much responsibility
  • 7:02 - 7:05
    for the things that they do,
    and they need to have.
  • 7:05 - 7:09
    We need to be able to control the bots
    and figure out what they've done wrong,
  • 7:09 - 7:12
    and essentially, once a bot
    makes a thousand mistakes,
  • 7:12 - 7:14
    we want to undo that once,
  • 7:14 - 7:17
    as opposed to undoing that
    a thousand times.
  • 7:17 - 7:19
    Of course, as I said,
    these are my suggestions.
  • 7:19 - 7:21
    Other people
    may have different suggestions.
  • 7:21 - 7:24
    I'm coming at it from a user viewpoint.
  • 7:24 - 7:26
    I suppose I could say something like,
  • 7:26 - 7:28
    I'm coming at it from a binary viewpoint.
  • 7:28 - 7:33
    I mean, this is a program
    that really wants yes or no answers.
  • 7:33 - 7:36
    It doesn't understand much
    in shades of gray.
  • 7:36 - 7:42
    So, I would really like you to tell me
    what's true and what's not true.
  • 7:42 - 7:49
    So, that's the end of the talk, right?
    (laughs)
  • 7:52 - 7:55
    And I sort of expanded on some things
  • 7:55 - 7:58
    but let me-- oops,
    where are we, here, yes.
  • 7:58 - 8:01
    So, here let me expand upon
    the things that I said.
  • 8:01 - 8:06
    So formally,
    I really want a logic for Wikidata
  • 8:06 - 8:10
    because that let's me know
    what Wikidata means to me.
  • 8:10 - 8:12
    I don't want to have data structure
  • 8:12 - 8:16
    with some sort of English description
    somewhere that tells me something.
  • 8:16 - 8:19
    I want a formal statement of what this is.
  • 8:19 - 8:24
    And maybe it produces the wrong answers,
    in which case we fix it,
  • 8:24 - 8:26
    but at least we know
    what the answers are supposed to be,
  • 8:26 - 8:32
    as opposed to having to go through
    five or ten different pages
  • 8:32 - 8:33
    of people arguing with each other
  • 8:33 - 8:36
    what this particular part
    of Wikidata means.
  • 8:36 - 8:41
    So, in particular, I want to have things
    that I think are useful,
  • 8:41 - 8:42
    like disjointness.
  • 8:42 - 8:48
    I want Wikidata to say that
    rocks aren't humans,
  • 8:48 - 8:51
    to pick an example.
  • 8:51 - 8:54
    Now, there's lots of that stuff
    in Wikidata at the moment.
  • 8:54 - 8:57
    There's lots of this
    opposite from things,
  • 8:57 - 9:00
    but what does it mean?
  • 9:00 - 9:01
    Somebody who's an opposite--
  • 9:01 - 9:06
    there was something this morning
    about transgender man
  • 9:06 - 9:09
    is the opposite of transgender woman.
  • 9:12 - 9:16
    Yes, in some sense,
    but in what sense are they opposites?
  • 9:16 - 9:19
    It's not a logical sense,
    it's something else.
  • 9:19 - 9:23
    I want to give definitions of classes
    and to give an example,
  • 9:23 - 9:27
    I would very much like Wikidata to say
  • 9:27 - 9:32
    that "woman" is adult, female, human,
  • 9:32 - 9:37
    because if I query Wikidata--
    this is going to the end--
  • 9:37 - 9:39
    and I ask how many women are in Wikidata,
  • 9:39 - 9:42
    I get... any guesses?
  • 9:43 - 9:45
    (woman) Less than men.
  • 9:45 - 9:46
    Thirty-seven.
  • 9:46 - 9:47
    Less than men.
  • 9:47 - 9:48
    Thirty-seven.
  • 9:48 - 9:53
    Instances of "woman" in Wikidata-- 37.
  • 9:54 - 9:55
    That's obviously wrong.
  • 9:55 - 9:57
    Obviously, obviously wrong.
  • 9:57 - 9:59
    I know it, you know it,
  • 9:59 - 10:02
    but my program doesn't know it.
  • 10:02 - 10:05
    My program says 37--
    well, it's not zero.
  • 10:05 - 10:07
    So it might be right.
  • 10:09 - 10:13
    I would much prefer
    there to be something on "woman"
  • 10:13 - 10:16
    that says, "Hey, if you're trying
    to figure out the women in Wikidata,
  • 10:16 - 10:20
    don't look at the things
    that are stated to be instances of 'woman,'
  • 10:20 - 10:24
    look at things, well, a SPARQL query
    or something like that,
  • 10:24 - 10:27
    find all the humans, find the female one,
  • 10:27 - 10:33
    the ones with sex or gender
    which is female or female-ish.
  • 10:33 - 10:35
    That's kind of difficult there,
  • 10:35 - 10:37
    and then the ones that are adult--
    whatever adult means--
  • 10:37 - 10:38
    at least that's a definition.
  • 10:38 - 10:40
    We can argue whether
    it's the right definition or not.
  • 10:40 - 10:45
    But we get a number which is not 37,
    much better than 37.
  • 10:45 - 10:48
    So, I want this so that
    we can actually come up with answers
  • 10:48 - 10:50
    to some of these questions.
  • 10:50 - 10:54
    So, and again, tools--
    I would really like to have tools
  • 10:54 - 10:55
    that show implications of claims.
  • 10:55 - 10:59
    So, that shows that
    the Eiffel Tower is a location.
  • 10:59 - 11:04
    Whether it is or not in the real world,
    is somehow kind of irrelevant.
  • 11:04 - 11:10
    We can argue whether the Eiffel Tower
    is a location or has a location.
  • 11:10 - 11:13
    Philosophers probably
    have argued for decades
  • 11:13 - 11:15
    over whether this is the case or not.
  • 11:15 - 11:16
    I don't care.
  • 11:16 - 11:20
    Just come up with an answer that makes
    at least a little bit of sense,
  • 11:20 - 11:23
    and I'll be happy.
  • 11:23 - 11:25
    So, I want a tool that'll do that.
  • 11:25 - 11:27
    I want, essentially,
    a tool that will tell me
  • 11:27 - 11:29
    what's true at a particular time.
  • 11:29 - 11:33
    So, how big is the Aral Sea?
  • 11:35 - 11:39
    It's certainly not 22,000 square miles.
  • 11:39 - 11:42
    It's much, much smaller than that,
  • 11:42 - 11:47
    but the claims on the Aral Sea
    are historical claims.
  • 11:47 - 11:49
    What's true now?
  • 11:49 - 11:52
    I think, 3,000 square miles.
  • 11:52 - 11:57
    Anyway, it's a mere puddle
    of its former self, you might say.
  • 11:57 - 12:00
    I would also like tools
    that help in cleaning the data.
  • 12:00 - 12:02
    So, what are inconsistencies?
  • 12:02 - 12:05
    Is there something
    that's both a rock and a human.
  • 12:05 - 12:09
    Well, right now,
    is that a problem in Wikidata?
  • 12:09 - 12:12
    Well, there are these
    constraint mechanisms,
  • 12:12 - 12:13
    but they're kind of weak,
  • 12:13 - 12:16
    and they're not used very well
    in many places.
  • 12:16 - 12:22
    So, I would really like to have some tool
    which essentially says, "No!
  • 12:22 - 12:24
    You can't have a rock and a human!
  • 12:24 - 12:29
    You can have, perhaps,
    a human and a Klingon,
  • 12:29 - 12:32
    but rocks and humans, just, no."
  • 12:36 - 12:39
    There's an old science fiction story
    called The God Makers
  • 12:39 - 12:43
    where they take a rock [inaudible],
    make it into a God,
  • 12:43 - 12:45
    so maybe a rock
    could be a person in that sense.
  • 12:45 - 12:47
    But human, no.
  • 12:48 - 12:49
    Hm?
  • 12:52 - 12:58
    (man) Are you asking for
    exhaustive disjunction?
  • 12:59 - 13:02
    [inaudible]
  • 13:02 - 13:06
    (Peter) No, I'm not asking for
    exhaustive decompositions.
  • 13:06 - 13:07
    Just junctions.
  • 13:07 - 13:09
    I mean, in some sense--
  • 13:09 - 13:10
    In what?
  • 13:10 - 13:12
    (woman) That's undecidable.
  • 13:12 - 13:15
    (Peter) What? No, well,
    you mean not logically.
  • 13:15 - 13:19
    So, the question is
    whether we can actually,
  • 13:19 - 13:22
    can have exhaustive definition,
  • 13:22 - 13:24
    exhaustive disjunctions?
  • 13:24 - 13:25
    Well...
  • 13:25 - 13:28
    (man) That's pricey, right?
    To find out that bots are... yeah.
  • 13:30 - 13:33
    (man 2) To say that rocks
    are disjoint from humans is easy,
  • 13:33 - 13:36
    but to do that in all the cases
    you're going to want it, is--
  • 13:36 - 13:37
    (Peter) It's computation.
  • 13:37 - 13:40
    Yes, now we have a problem
    with computational costs, right?
  • 13:40 - 13:41
    Yeah.
  • 13:42 - 13:49
    The computational cost of deciding it
    for Wikidata as it exists right now,
  • 13:49 - 13:55
    is not impossible,
    it's just computationally non-trivial.
  • 13:55 - 13:59
    So given that the query service
    is running out of [inaudible],
  • 13:59 - 14:04
    so to do this right, requires tools
    that actually think a little bit.
  • 14:04 - 14:07
    And that's going to require computation.
  • 14:07 - 14:08
    How much computation?
  • 14:08 - 14:11
    Well, it's not the heat death
    of the universe,
  • 14:11 - 14:14
    it's tomorrow, perhaps,
    or two seconds from now.
  • 14:14 - 14:18
    But two seconds times
    how many million things are in Wikidata
  • 14:18 - 14:22
    is getting to be a reasonably big number.
  • 14:22 - 14:23
    One of the things you can do
  • 14:23 - 14:26
    is this thing doesn't have to be
    completely run in one thing.
  • 14:26 - 14:31
    You can farm these out into other systems.
  • 14:31 - 14:36
    We don't have to have everything
    all in one computer.
  • 14:36 - 14:38
    And, of course,
    Google just gave us the answer.
  • 14:38 - 14:41
    We can just put it on
    this new Google quantum computer,
  • 14:41 - 14:42
    and it'll do everything forever.
  • 14:42 - 14:45
    (woman) But it sounds like
    you're asking for OWL, and--
  • 14:45 - 14:47
    (Peter) No, I'm asking for part of OWL.
  • 14:47 - 14:49
    (woman) You've been asking for
    a lot of things about OWL,
  • 14:49 - 14:50
    and that just is not possible.
  • 14:50 - 14:54
    That's why Wikidata works,
    is because it's not OWL.
  • 14:54 - 14:56
    There are actually things
    that you can compute with.
  • 14:56 - 15:01
    (Peter) So, I am asking for
    a bigger part of OWL,
  • 15:01 - 15:03
    not all of it, yeah?
  • 15:03 - 15:07
    Well, I mean, so the question is,
  • 15:07 - 15:09
    is Wikidata going to spend the effort
  • 15:09 - 15:14
    to buy another, perhaps, ten computers
    to crunch away on this permanently,
  • 15:14 - 15:18
    or is it going to spend the effort
    of having a whole bunch of people
  • 15:18 - 15:21
    argue about it, or whatever.
  • 15:21 - 15:25
    And my view is computers are dirt cheap.
  • 15:25 - 15:31
    I mean, I'm willing to pony up
    some of my very own money
  • 15:31 - 15:34
    to buy Wikidata another computer
    to do this stuff,
  • 15:34 - 15:36
    because I think it's important.
  • 15:36 - 15:38
    (man) [inaudible]
  • 15:38 - 15:40
    Yes. (laughs)
  • 15:40 - 15:43
    I didn't say I would give it
    to Wikimedia Foundation.
  • 15:45 - 15:49
    But I'm not asking for things
    that are trivial.
  • 15:49 - 15:52
    I'm asking for things
    that require compute power,
  • 15:52 - 15:57
    that require intellectual power,
    that require the community to do things.
  • 15:57 - 15:59
    The community is doing
    some of these things.
  • 15:59 - 16:03
    I found out that there is this property
    which essentially says,
  • 16:03 - 16:07
    "Hey, here's how
    you're supposed to use this thing."
  • 16:07 - 16:09
    I forget the exact name of it.
  • 16:09 - 16:13
    User instructions,
    I thought it was three words.
  • 16:13 - 16:18
    Whatever, anyway,
    it essentially says-- and it's on male.
  • 16:18 - 16:20
    And there was a big argument about it.
  • 16:20 - 16:22
    The trouble is it's not supported at all.
  • 16:22 - 16:25
    There was this plan to have this property
    and have it supported,
  • 16:25 - 16:26
    to have it show up everywhere,
  • 16:26 - 16:30
    so that people would realize
    that human-- in other words,
  • 16:30 - 16:34
    you don't use person for humans,
    right now it's stuck on the description.
  • 16:34 - 16:36
    And it's stuck on
    a very short description.
  • 16:36 - 16:39
    And it's very hard to figure out
    what it really means,
  • 16:39 - 16:42
    and only a few classes have these things.
  • 16:42 - 16:45
    So, we go up in the class hierarchy
    to these more general things,
  • 16:45 - 16:47
    it's very hard to figure out
    what belongs to them,
  • 16:47 - 16:49
    is what doesn't belong to them.
  • 16:49 - 16:52
    So it's no surprise
    that people use them the wrong way.
  • 16:52 - 16:56
    Because the people in this room--
    or metaphorically in this room--
  • 16:56 - 17:01
    may understand that geographic location
    is used for a particular purpose,
  • 17:01 - 17:03
    but even me--
  • 17:03 - 17:06
    I think I have a fairly good background
    in representing things--
  • 17:06 - 17:11
    don't know the answer to that,
    or at least, it requires me to spend
  • 17:11 - 17:13
    at least an hour of effort
    to get a good answer to that.
  • 17:13 - 17:17
    And that's really not scalable.
  • 17:17 - 17:18
    So, I'm not asking for nothing,
  • 17:18 - 17:20
    I'm asking for lots of things,
  • 17:20 - 17:24
    but the trouble is, I mean, I think--
  • 17:24 - 17:27
    well, I think I'm important
    but anyway, you can ignore me.
  • 17:27 - 17:31
    I think that I'm a pretty good
    use case for Wikidata.
  • 17:31 - 17:34
    I really want, not just a bit of Wikidata,
  • 17:34 - 17:36
    I want a lot of it.
  • 17:36 - 17:42
    And I work for a very big company
    but the part of that company
  • 17:42 - 17:48
    that needs, or wants,
    or cares about Wikidata is quite small.
  • 17:48 - 17:53
    So, if I worked for a company
    that really cared about data,
  • 17:53 - 17:56
    and was willing to put
    hundreds of millions of dollars
  • 17:56 - 18:00
    into curating Wikidata,
    and put it into their own knowledge graph,
  • 18:00 - 18:03
    using Wikidata would be no problem.
  • 18:03 - 18:08
    My company, perhaps,
    has a million dollars to take Wikidata
  • 18:08 - 18:09
    and put it into a knowledge graph.
  • 18:09 - 18:13
    A million dollars
    doesn't go very far these days.
  • 18:13 - 18:17
    So, the problem--
    and let me say something
  • 18:17 - 18:22
    that actually isn't in the slides,
    but which I really firmly believe in.
  • 18:22 - 18:24
    The problem with Wikidata not--
  • 18:24 - 18:28
    Wikidata's great,
  • 18:28 - 18:33
    but to really use it,
    you have to spend a lot of effort.
  • 18:33 - 18:40
    And most companies,
    and most individuals, and most groups
  • 18:40 - 18:46
    can't expend that amount of effort
    to really use it well.
  • 18:46 - 18:52
    I think that on the Wikidata side,
    they should try to be greater
  • 18:52 - 18:55
    so that more people could really use it.
  • 18:55 - 18:59
    And that's really, I think,
    the guts of this presentation
  • 18:59 - 19:04
    is that if Wikidata community
    improved Wikidata
  • 19:04 - 19:08
    so it would be more clear
    as to what's going on,
  • 19:08 - 19:11
    then more people
    could put information into it
  • 19:11 - 19:12
    without making mistakes,
  • 19:12 - 19:15
    and more people could use it
    without having to spend a lot of time
  • 19:15 - 19:17
    to curate it.
  • 19:18 - 19:24
    Alright, so, we've gone through
    lots of this stuff.
  • 19:25 - 19:28
    Let me just say a few things.
  • 19:28 - 19:33
    So, I've looked at a fair bit of Wikidata,
  • 19:33 - 19:37
    and every time I look, I find a problem.
  • 19:37 - 19:40
    That's bad.
  • 19:41 - 19:43
    I haven't done a quantitative study,
  • 19:43 - 19:44
    and somebody should do
    a quantitative study
  • 19:44 - 19:47
    of some of these things,
    it would require a lot of work to do it,
  • 19:47 - 19:50
    but essentially, I look at something
    and I find a problem,
  • 19:50 - 19:51
    and that's not great.
  • 19:51 - 19:52
    I find missing information.
  • 19:52 - 19:58
    But I don't have anything to say about
    adding in missing information.
  • 19:58 - 19:59
    Yes, Dan?
  • 20:00 - 20:02
    (Dan) With respect,
    you always find problems.
  • 20:02 - 20:03
    (Peter) Yes.
  • 20:03 - 20:05
    (audience laughs)
  • 20:05 - 20:08
    I am very good at finding problems.
  • 20:08 - 20:13
    Actually, so one of the problems
    that I have, the problem with "woman"--
  • 20:13 - 20:15
    (laughter)
  • 20:15 - 20:18
    The problem with--
    I didn't find the problem with "woman".
  • 20:18 - 20:20
    (chuckles)
  • 20:20 - 20:23
    Turns out that a co-worker,
    I showed her a page,
  • 20:23 - 20:25
    where I had found a different problem
    and she looked at it
  • 20:25 - 20:27
    and said, "Oh, 'woman'."
  • 20:27 - 20:29
    And so she found that problem
  • 20:29 - 20:32
    on a display that I already
    found the problem.
  • 20:32 - 20:36
    So, missing information--
  • 20:36 - 20:38
    there just should be
    more information in Wikidata.
  • 20:38 - 20:40
    There's factual errors in Wikidata,
  • 20:40 - 20:41
    but everybody's got factual errors.
  • 20:41 - 20:43
    Bots make it a little bit worse.
  • 20:43 - 20:46
    There's problems with the ontology,
  • 20:46 - 20:49
    which I think is a place that--
  • 20:49 - 20:53
    you can expend effort there
    and really improve quite a lot of things.
  • 20:53 - 20:56
    And then there's also
    the problems with qualifiers,
  • 20:56 - 20:57
    and really temporal qualifiers.
  • 20:57 - 21:01
    It's very hard to figure out
    what's true at a particular time
  • 21:01 - 21:04
    because there's a whole bunch of
    temporal qualifiers
  • 21:04 - 21:06
    that could be relevant.
  • 21:06 - 21:09
    Which ones count and which ones get used,
  • 21:09 - 21:11
    and are they going to stay the same?
  • 21:11 - 21:12
    Are we going to add a new one tomorrow?
  • 21:12 - 21:15
    So then I have to change
    every one of my programs.
  • 21:15 - 21:19
    I really think all this kind of stuff,
    it would be better to hide that
  • 21:19 - 21:22
    from the consumer
    so that Wikidata would just say,
  • 21:22 - 21:24
    "Okay, you want to know
    what's true at time X?
  • 21:24 - 21:28
    Here's an interface that tells you
    what's true at time X,"
  • 21:28 - 21:31
    instead of having me
    to write all of this stuff.
  • 21:36 - 21:37
    It's on, I think it's on.
  • 21:37 - 21:39
    Yeah.
  • 21:40 - 21:47
    (man) I think you like the idea
    of what is possible with Wikidata,
  • 21:47 - 21:53
    but you say that it's not used
    like your idea.
  • 21:54 - 22:01
    So if, from my perspective,
    Wikidata is a collection of statements
  • 22:01 - 22:05
    from persons and from machines,
    and so on, and some might be true,
  • 22:05 - 22:09
    some might be discussable.
  • 22:09 - 22:13
    What you could do would be,
    from my perspective,
  • 22:13 - 22:16
    you could use a computational intelligence
  • 22:16 - 22:22
    to score the statements
    if they are...
  • 22:22 - 22:24
    (speaking German)
  • 22:24 - 22:25
    ...contradictory,
  • 22:25 - 22:28
    or if they are common sense.
  • 22:28 - 22:32
    So you could score them,
    and then you can filter on the score,
  • 22:32 - 22:34
    and then you have what you wanted.
  • 22:34 - 22:39
    (Peter) Possibly, except without a notion
    of what things mean in WIkidata,
  • 22:39 - 22:42
    I can't even figure out
    whether two things are contradictory.
  • 22:42 - 22:45
    I mean, there's constraints
    and that helps,
  • 22:45 - 22:48
    but I don't think that's a full solution.
  • 22:48 - 22:53
    And common sense--
    I don't have much common sense
  • 22:53 - 22:57
    and my programs have a lot less than I do.
  • 22:57 - 23:01
    We could write a lot of stuff
    which tries to say some things
  • 23:01 - 23:05
    about common sense, but, again,
    I think that requires an understanding
  • 23:05 - 23:07
    of what's going on.
  • 23:07 - 23:11
    And yes, so Wikidata has references
    which are supposed to be some notion
  • 23:11 - 23:14
    of what's really supported,
  • 23:14 - 23:21
    except, here's a problem,
    and it's very hard to see this.
  • 23:21 - 23:23
    Here's a problem with Wikidata
    from a while ago.
  • 23:23 - 23:26
    This is a movie
    that's got three directors listed--
  • 23:26 - 23:30
    the Corpse Bride--
    and it's got Mike Johnson, twice.
  • 23:30 - 23:33
    Different Mike Johnsons.
  • 23:33 - 23:36
    And they both have a lot of references.
  • 23:36 - 23:40
    So there's a lot of things
    that say that Corpse Bride
  • 23:40 - 23:45
    has got two different Mike Johnsons
    as directors.
  • 23:45 - 23:48
    And there they are,
    one is a director, one is a singer.
  • 23:48 - 23:52
    What happened, some bot went through
    and accidentally did a bad thing
  • 23:52 - 23:55
    in Italian Wikipedia--
    got the wrong thing in there--
  • 23:55 - 23:58
    and then a bunch of other bots piled on
  • 23:58 - 24:01
    and essentially created false references.
  • 24:01 - 24:02
    So, this is a real problem.
  • 24:02 - 24:06
    So, seven references!
  • 24:06 - 24:08
    That's really good.
  • 24:08 - 24:10
    And they're not crap references.
  • 24:10 - 24:16
    They're some movie databases--
    real things.
  • 24:16 - 24:19
    So, that's one of the things.
  • 24:19 - 24:21
    Here's another one--
    there's the Aral Sea.
  • 24:23 - 24:28
    These are the biggest--
    by volume-- lakes in the world.
  • 24:28 - 24:33
    There's the Aral Sea.
    That comes from Wikidata, by the way.
  • 24:33 - 24:36
    There's Lake Michigan-Huron.
  • 24:36 - 24:39
    I didn't realize
    there was a Lake Michigan-Huron,
  • 24:39 - 24:41
    and I live on one of them.
  • 24:42 - 24:44
    So, here we have two problems.
  • 24:44 - 24:47
    This is an ontological problem--
    what's a lake?
  • 24:47 - 24:50
    And so is Lake Michigan-Huron a lake?
  • 24:50 - 24:53
    Well, don't know.
  • 24:53 - 24:57
    This one here
    is a temporal qualifier problem--
  • 24:57 - 25:00
    how big is the Aral Sea now?
  • 25:00 - 25:02
    Not 22,000 square miles.
  • 25:02 - 25:05
    Not 11,000 square miles.
  • 25:05 - 25:10
    So, what is it?
    Sorry, 26,000 square miles.
  • 25:10 - 25:14
    Although this is something
    from Google, of course,
  • 25:14 - 25:16
    but that's in there.
  • 25:16 - 25:21
    So anyway, I got a bunch of other things
    along these lines,
  • 25:21 - 25:23
    which you can see if you care,
  • 25:23 - 25:27
    but I've given you my suggestions already,
  • 25:27 - 25:30
    you can either like my suggestions or not,
  • 25:30 - 25:32
    but I've-- woah-- (chuckles)
  • 25:33 - 25:36
    I think I've sort of
    supported some things.
  • 25:36 - 25:38
    So, anyway, I had questions in the middle,
  • 25:38 - 25:40
    and we are done,
    are we having a question or not?
  • 25:40 - 25:42
    - (woman) We're done.
    - (Peter) Okay.
  • 25:42 - 25:44
    - (woman) Sorry, that's it.
    - (Peter) (laughs)
  • 25:44 - 25:47
    (audience applause)
Title:
cdn.media.ccc.de/.../wikidatacon2019-1101-eng-Barriers_to_Using_Wikidata_as_a_Knowledge_Base_hd.mp4
Video Language:
English
Duration:
25:57

English subtitles

Revisions