-
(woman) Hello everyone.
Thank you for being here this afternoon.
-
We are first going to hear--
-
I'm just going to jump straight in
to give him plenty of time--
-
so we're first going to hear from
Peter Patel-Schneider
-
about barriers to using Wikidata
as a knowledge base.
-
(Peter) Thank you.
-
I'll skip over the abstract
because you've already seen it all.
-
And I should say a little bit
about myself.
-
I'm much more of a user of Wikidata
than an actual editor of Wikidata,
-
and much more of a user of Wikidata
than somebody who contributes to Wikidata,
-
but I very much believe in
the aims of Wikidata.
-
In particular, it aligns
with my research areas
-
which is knowledge representation,
at least in a certain sense.
-
I worked in description logics
for a long time, worked with W3C.
-
I've worked in Silicon Valley for a while,
-
largely building what might be called
knowledge graphs,
-
but I don't like the term
knowledge graphs--
-
I don't like what they mean,
-
I want to do something better
than knowledge graphs.
-
And I want to put this together
from various sources.
-
So Wikidata is a very, very good one,
-
but DBpedia is not so good.
-
Freebase is dead.
-
Open Street Map,
Open Movie Database, things like that.
-
And then I want to use
this store of knowledge
-
to do something.
-
And I want to use it as the source
of knowledge to do something,
-
and not only just facts
but also organizing my knowledge.
-
And currently, working where I am,
-
we're interested in supporting
conversational agents.
-
Not just things that let you play Avatar,
-
but lets you play the movie
that's directed by the wife
-
of the director of Avatar.
-
So how can we build a conversational agent
that will do something like that?
-
Well, you need to know
all the facts that go behind it,
-
but you also need to know
that the fact that there are movies--
-
not just, we have Avatar,
but that we have movies--
-
we need to know things about movies,
-
we need to know things
about directorships.
-
We need to know things about humans--
that they're married to each other.
-
We need to know that there are men
and women in the world,
-
and somehow be able to use
this knowledge of what we're saying
-
to come up with the actual reference
to these things,
-
and then actually do
what we were asked to do.
-
So, though it's one end,
-
the other thing
that we want to be able to do
-
is if you think of systems like Siri,
there are hundreds or thousands--
-
actually, maybe Siri's not
the best example.
-
The Amazon system has hundreds
or thousands of little programs
-
that will do something for you.
-
And the problem that we're interested in
-
is how do you pick
which one can do something.
-
So for example, which back-end
can find me train trips
-
between San Francisco and Palo Alto.
-
There may be many systems
that will try and sell me train tickets,
-
but only one or perhaps two of them
will sell me that particular train ticket.
-
And how do I get the system to do that
without having to be able to tell it
-
that I want a Caltrain ticket.
-
So, what happens is I want to use Wikidata
as the source of a lot of this stuff,
-
and I regularly run into problems.
-
And from those problems,
I have a bunch of suggestions.
-
You may agree with my suggestions
or disagree with them.
-
Some of them are kind of on their way
to being implemented in Wikidata,
-
some of them aren't.
-
So, I'm going to do this talk
from the back forward.
-
I'm going to give you the summary,
and then an expansion of the summary,
-
and then some rationale
for my suggestions.
-
And the reason I'm going to do that
is if I started with all of the rationale,
-
I might never get to the end,
and the end is the important thing,
-
at least in my viewpoint.
-
So, my biggest suggestion, I guess,
on the community side is,
-
gee, guys, speak with a single voice.
-
(chuckles)
-
And speak with a voice
where I can find it.
-
So, it turns out
that one of my suggestions
-
is actually implemented,
but I only found out about it today,
-
because it's not used very much at all,
and it's hard to find it.
-
So, I really want you guys--
and me too, in some sense--
-
to spend some effort at the beginning
when you're creating these classes
-
and other things that are important,
-
so that a poor user like me,
-
who can't afford to go through five years
of impassioned discussion
-
to find out what male actually is,
-
can actually use it in our system--
in my system.
-
So that's sort of on the community side.
-
I'm a formalist.
-
I really want to--
and my programs are dumb.
-
I don't write smart programs,
I write dumb programs.
-
Now, they tend to be
very fancy dumb programs,
-
but these dumb programs
can't really handle all of the shades
-
of everything that you have
with start time, end time, inception.
-
I want to have some simple
formal mechanism
-
that will tell my program what's true now,
-
or what's true in 1987,
-
without having to search through
a bunch of things,
-
and make a bunch of guesses,
and use a lot of heuristics,
-
or have a machine-learning program
that's done for this particular task.
-
I just want you to tell me
this stuff somehow, and have a take.
-
So, I want to be able
to look at something which says
-
what the things I see in Wikidata
actually mean.
-
And I don't find that these days.
-
And then, of course, once we have that,
-
I want somebody--
I'm willing to do some of this work--
-
build tools that actually use
that formal description and say,
-
tell me, for example,
if I'm an instance
-
of architectural structure,
like the Eiffel Tower,
-
am I a geographic location?
-
I don't know.
-
I mean, Wikidata doesn't tell me
whether this is true or not.
-
I can find nowhere in Wikidata
that will do that,
-
because there's no formal thing.
-
But once you give me a formal thing
then I'm going to write a tool,
-
which essentially gives the implications
of what the formal things are.
-
The fourth suggestion is about bots.
-
Bots are great.
-
Bots have ultimate power
and as has been said,
-
with ultimate power,
comes ultimate responsibility.
-
And I don't believe that bots get
very much responsibility
-
for the things that they do,
and they need to have.
-
We need to be able to control the bots
and figure out what they've done wrong,
-
and essentially, once a bot
makes a thousand mistakes,
-
we want to undo that once,
-
as opposed to undoing that
a thousand times.
-
Of course, as I said,
these are my suggestions.
-
Other people
may have different suggestions.
-
I'm coming at it from a user viewpoint.
-
I suppose I could say something like,
-
I'm coming at it from a binary viewpoint.
-
I mean, this is a program
that really wants yes or no answers.
-
It doesn't understand much
in shades of gray.
-
So, I would really like you to tell me
what's true and what's not true.
-
So, that's the end of the talk, right?
(laughs)
-
And I sort of expanded on some things
-
but let me-- oops,
where are we, here, yes.
-
So, here let me expand upon
the things that I said.
-
So formally,
I really want a logic for Wikidata
-
because that let's me know
what Wikidata means to me.
-
I don't want to have data structure
-
with some sort of English description
somewhere that tells me something.
-
I want a formal statement of what this is.
-
And maybe it produces the wrong answers,
in which case we fix it,
-
but at least we know
what the answers are supposed to be,
-
as opposed to having to go through
five or ten different pages
-
of people arguing with each other
-
what this particular part
of Wikidata means.
-
So, in particular, I want to have things
that I think are useful,
-
like disjointness.
-
I want Wikidata to say that
rocks aren't humans,
-
to pick an example.
-
Now, there's lots of that stuff
in Wikidata at the moment.
-
There's lots of this
opposite from things,
-
but what does it mean?
-
Somebody who's an opposite--
-
there was something this morning
about transgender man
-
is the opposite of transgender woman.
-
Yes, in some sense,
but in what sense are they opposites?
-
It's not a logical sense,
it's something else.
-
I want to give definitions of classes
and to give an example,
-
I would very much like Wikidata to say
-
that "woman" is adult, female, human,
-
because if I query Wikidata--
this is going to the end--
-
and I ask how many women are in Wikidata,
-
I get... any guesses?
-
(woman) Less than men.
-
Thirty-seven.
-
Less than men.
-
Thirty-seven.
-
Instances of "woman" in Wikidata-- 37.
-
That's obviously wrong.
-
Obviously, obviously wrong.
-
I know it, you know it,
-
but my program doesn't know it.
-
My program says 37--
well, it's not zero.
-
So it might be right.
-
I would much prefer
there to be something on "woman"
-
that says, "Hey, if you're trying
to figure out the women in Wikidata,
-
don't look at the things
that are stated to be instances of 'woman,'
-
look at things, well, a SPARQL query
or something like that,
-
find all the humans, find the female one,
-
the ones with sex or gender
which is female or female-ish.
-
That's kind of difficult there,
-
and then the ones that are adult--
whatever adult means--
-
at least that's a definition.
-
We can argue whether
it's the right definition or not.
-
But we get a number which is not 37,
much better than 37.
-
So, I want this so that
we can actually come up with answers
-
to some of these questions.
-
So, and again, tools--
I would really like to have tools
-
that show implications of claims.
-
So, that shows that
the Eiffel Tower is a location.
-
Whether it is or not in the real world,
is somehow kind of irrelevant.
-
We can argue whether the Eiffel Tower
is a location or has a location.
-
Philosophers probably
have argued for decades
-
over whether this is the case or not.
-
I don't care.
-
Just come up with an answer that makes
at least a little bit of sense,
-
and I'll be happy.
-
So, I want a tool that'll do that.
-
I want, essentially,
a tool that will tell me
-
what's true at a particular time.
-
So, how big is the Aral Sea?
-
It's certainly not 22,000 square miles.
-
It's much, much smaller than that,
-
but the claims on the Aral Sea
are historical claims.
-
What's true now?
-
I think, 3,000 square miles.
-
Anyway, it's a mere puddle
of its former self, you might say.
-
I would also like tools
that help in cleaning the data.
-
So, what are inconsistencies?
-
Is there something
that's both a rock and a human.
-
Well, right now,
is that a problem in Wikidata?
-
Well, there are these
constraint mechanisms,
-
but they're kind of weak,
-
and they're not used very well
in many places.
-
So, I would really like to have some tool
which essentially says, "No!
-
You can't have a rock and a human!
-
You can have, perhaps,
a human and a Klingon,
-
but rocks and humans, just, no."
-
There's an old science fiction story
called The God Makers
-
where they take a rock [inaudible],
make it into a God,
-
so maybe a rock
could be a person in that sense.
-
But human, no.
-
Hm?
-
(man) Are you asking for
exhaustive disjunction?
-
[inaudible]
-
(Peter) No, I'm not asking for
exhaustive decompositions.
-
Just junctions.
-
I mean, in some sense--
-
In what?
-
(woman) That's undecidable.
-
(Peter) What? No, well,
you mean not logically.
-
So, the question is
whether we can actually,
-
can have exhaustive definition,
-
exhaustive disjunctions?
-
Well...
-
(man) That's pricey, right?
To find out that bots are... yeah.
-
(man 2) To say that rocks
are disjoint from humans is easy,
-
but to do that in all the cases
you're going to want it, is--
-
(Peter) It's computation.
-
Yes, now we have a problem
with computational costs, right?
-
Yeah.
-
The computational cost of deciding it
for Wikidata as it exists right now,
-
is not impossible,
it's just computationally non-trivial.
-
So given that the query service
is running out of [inaudible],
-
so to do this right, requires tools
that actually think a little bit.
-
And that's going to require computation.
-
How much computation?
-
Well, it's not the heat death
of the universe,
-
it's tomorrow, perhaps,
or two seconds from now.
-
But two seconds times
how many million things are in Wikidata
-
is getting to be a reasonably big number.
-
One of the things you can do
-
is this thing doesn't have to be
completely run in one thing.
-
You can farm these out into other systems.
-
We don't have to have everything
all in one computer.
-
And, of course,
Google just gave us the answer.
-
We can just put it on
this new Google quantum computer,
-
and it'll do everything forever.
-
(woman) But it sounds like
you're asking for OWL, and--
-
(Peter) No, I'm asking for part of OWL.
-
(woman) You've been asking for
a lot of things about OWL,
-
and that just is not possible.
-
That's why Wikidata works,
is because it's not OWL.
-
There are actually things
that you can compute with.
-
(Peter) So, I am asking for
a bigger part of OWL,
-
not all of it, yeah?
-
Well, I mean, so the question is,
-
is Wikidata going to spend the effort
-
to buy another, perhaps, ten computers
to crunch away on this permanently,
-
or is it going to spend the effort
of having a whole bunch of people
-
argue about it, or whatever.
-
And my view is computers are dirt cheap.
-
I mean, I'm willing to pony up
some of my very own money
-
to buy Wikidata another computer
to do this stuff,
-
because I think it's important.
-
(man) [inaudible]
-
Yes. (laughs)
-
I didn't say I would give it
to Wikimedia Foundation.
-
But I'm not asking for things
that are trivial.
-
I'm asking for things
that require compute power,
-
that require intellectual power,
that require the community to do things.
-
The community is doing
some of these things.
-
I found out that there is this property
which essentially says,
-
"Hey, here's how
you're supposed to use this thing."
-
I forget the exact name of it.
-
User instructions,
I thought it was three words.
-
Whatever, anyway,
it essentially says-- and it's on male.
-
And there was a big argument about it.
-
The trouble is it's not supported at all.
-
There was this plan to have this property
and have it supported,
-
to have it show up everywhere,
-
so that people would realize
that human-- in other words,
-
you don't use person for humans,
right now it's stuck on the description.
-
And it's stuck on
a very short description.
-
And it's very hard to figure out
what it really means,
-
and only a few classes have these things.
-
So, we go up in the class hierarchy
to these more general things,
-
it's very hard to figure out
what belongs to them,
-
is what doesn't belong to them.
-
So it's no surprise
that people use them the wrong way.
-
Because the people in this room--
or metaphorically in this room--
-
may understand that geographic location
is used for a particular purpose,
-
but even me--
-
I think I have a fairly good background
in representing things--
-
don't know the answer to that,
or at least, it requires me to spend
-
at least an hour of effort
to get a good answer to that.
-
And that's really not scalable.
-
So, I'm not asking for nothing,
-
I'm asking for lots of things,
-
but the trouble is, I mean, I think--
-
well, I think I'm important
but anyway, you can ignore me.
-
I think that I'm a pretty good
use case for Wikidata.
-
I really want, not just a bit of Wikidata,
-
I want a lot of it.
-
And I work for a very big company
but the part of that company
-
that needs, or wants,
or cares about Wikidata is quite small.
-
So, if I worked for a company
that really cared about data,
-
and was willing to put
hundreds of millions of dollars
-
into curating Wikidata,
and put it into their own knowledge graph,
-
using Wikidata would be no problem.
-
My company, perhaps,
has a million dollars to take Wikidata
-
and put it into a knowledge graph.
-
A million dollars
doesn't go very far these days.
-
So, the problem--
and let me say something
-
that actually isn't in the slides,
but which I really firmly believe in.
-
The problem with Wikidata not--
-
Wikidata's great,
-
but to really use it,
you have to spend a lot of effort.
-
And most companies,
and most individuals, and most groups
-
can't expend that amount of effort
to really use it well.
-
I think that on the Wikidata side,
they should try to be greater
-
so that more people could really use it.
-
And that's really, I think,
the guts of this presentation
-
is that if Wikidata community
improved Wikidata
-
so it would be more clear
as to what's going on,
-
then more people
could put information into it
-
without making mistakes,
-
and more people could use it
without having to spend a lot of time
-
to curate it.
-
Alright, so, we've gone through
lots of this stuff.
-
Let me just say a few things.
-
So, I've looked at a fair bit of Wikidata,
-
and every time I look, I find a problem.
-
That's bad.
-
I haven't done a quantitative study,
-
and somebody should do
a quantitative study
-
of some of these things,
it would require a lot of work to do it,
-
but essentially, I look at something
and I find a problem,
-
and that's not great.
-
I find missing information.
-
But I don't have anything to say about
adding in missing information.
-
Yes, Dan?
-
(Dan) With respect,
you always find problems.
-
(Peter) Yes.
-
(audience laughs)
-
I am very good at finding problems.
-
Actually, so one of the problems
that I have, the problem with "woman"--
-
(laughter)
-
The problem with--
I didn't find the problem with "woman".
-
(chuckles)
-
Turns out that a co-worker,
I showed her a page,
-
where I had found a different problem
and she looked at it
-
and said, "Oh, 'woman'."
-
And so she found that problem
-
on a display that I already
found the problem.
-
So, missing information--
-
there just should be
more information in Wikidata.
-
There's factual errors in Wikidata,
-
but everybody's got factual errors.
-
Bots make it a little bit worse.
-
There's problems with the ontology,
-
which I think is a place that--
-
you can expend effort there
and really improve quite a lot of things.
-
And then there's also
the problems with qualifiers,
-
and really temporal qualifiers.
-
It's very hard to figure out
what's true at a particular time
-
because there's a whole bunch of
temporal qualifiers
-
that could be relevant.
-
Which ones count and which ones get used,
-
and are they going to stay the same?
-
Are we going to add a new one tomorrow?
-
So then I have to change
every one of my programs.
-
I really think all this kind of stuff,
it would be better to hide that
-
from the consumer
so that Wikidata would just say,
-
"Okay, you want to know
what's true at time X?
-
Here's an interface that tells you
what's true at time X,"
-
instead of having me
to write all of this stuff.
-
It's on, I think it's on.
-
Yeah.
-
(man) I think you like the idea
of what is possible with Wikidata,
-
but you say that it's not used
like your idea.
-
So if, from my perspective,
Wikidata is a collection of statements
-
from persons and from machines,
and so on, and some might be true,
-
some might be discussable.
-
What you could do would be,
from my perspective,
-
you could use a computational intelligence
-
to score the statements
if they are...
-
(speaking German)
-
...contradictory,
-
or if they are common sense.
-
So you could score them,
and then you can filter on the score,
-
and then you have what you wanted.
-
(Peter) Possibly, except without a notion
of what things mean in WIkidata,
-
I can't even figure out
whether two things are contradictory.
-
I mean, there's constraints
and that helps,
-
but I don't think that's a full solution.
-
And common sense--
I don't have much common sense
-
and my programs have a lot less than I do.
-
We could write a lot of stuff
which tries to say some things
-
about common sense, but, again,
I think that requires an understanding
-
of what's going on.
-
And yes, so Wikidata has references
which are supposed to be some notion
-
of what's really supported,
-
except, here's a problem,
and it's very hard to see this.
-
Here's a problem with Wikidata
from a while ago.
-
This is a movie
that's got three directors listed--
-
the Corpse Bride--
and it's got Mike Johnson, twice.
-
Different Mike Johnsons.
-
And they both have a lot of references.
-
So there's a lot of things
that say that Corpse Bride
-
has got two different Mike Johnsons
as directors.
-
And there they are,
one is a director, one is a singer.
-
What happened, some bot went through
and accidentally did a bad thing
-
in Italian Wikipedia--
got the wrong thing in there--
-
and then a bunch of other bots piled on
-
and essentially created false references.
-
So, this is a real problem.
-
So, seven references!
-
That's really good.
-
And they're not crap references.
-
They're some movie databases--
real things.
-
So, that's one of the things.
-
Here's another one--
there's the Aral Sea.
-
These are the biggest--
by volume-- lakes in the world.
-
There's the Aral Sea.
That comes from Wikidata, by the way.
-
There's Lake Michigan-Huron.
-
I didn't realize
there was a Lake Michigan-Huron,
-
and I live on one of them.
-
So, here we have two problems.
-
This is an ontological problem--
what's a lake?
-
And so is Lake Michigan-Huron a lake?
-
Well, don't know.
-
This one here
is a temporal qualifier problem--
-
how big is the Aral Sea now?
-
Not 22,000 square miles.
-
Not 11,000 square miles.
-
So, what is it?
Sorry, 26,000 square miles.
-
Although this is something
from Google, of course,
-
but that's in there.
-
So anyway, I got a bunch of other things
along these lines,
-
which you can see if you care,
-
but I've given you my suggestions already,
-
you can either like my suggestions or not,
-
but I've-- woah-- (chuckles)
-
I think I've sort of
supported some things.
-
So, anyway, I had questions in the middle,
-
and we are done,
are we having a question or not?
-
- (woman) We're done.
- (Peter) Okay.
-
- (woman) Sorry, that's it.
- (Peter) (laughs)
-
(audience applause)