-
Asaf Bartov: Testing, testing.
-
Is this heard in the room?
-
Testing.
-
Hello, everyone.
-
This is a gentle
introduction to Wikidata
-
for absolute beginners.
-
If you're an absolute
beginner, if you've never heard
-
of Wikidata, or if you've heard
of Wikidata but don't quite get
-
it, don't know what it's
good for, have only used it
-
for inter-wiki links--
-
if you're anywhere
on this range,
-
you're in the right place.
-
My name is Asaf Bartov.
-
I work for the
Wikimedia Foundation,
-
and I am a Wikidata enthusiast.
-
So the first thing I want to
say is that you are lucky.
-
You are lucky because
Wikidata is already
-
and is quickly becoming even
more of an important research
-
tool for anyone who's
trying to ask questions
-
about large amounts
of information.
-
It will become more and more
used across the humanities,
-
in particular, because of the
things that it's able to do,
-
some of which we will
demonstrate shortly.
-
And you are lucky because you
get to find out about it now
-
before most of the world.
-
So by the end of this talk,
you will be a Wikidata hipster
-
because you'll be
able to say, oh yeah.
-
I knew about Wikidata
before it was cool.
-
So before we actually
visit Wikidata,
-
I want to share two key problems
that Wikidata seeks to solve
-
and which would help us
understand why it exists.
-
The first problem is that
have of dated data, that
-
is data that is out of date.
-
And this is apparent
on Wikipedia
-
across our free
knowledge encyclopedias.
-
Data on Wikipedia is
not always up to date.
-
And the more obscure
it is, the more likely
-
it is not to be up to date.
-
So the Polish Wikipedia may have
an article about a small town
-
in Argentina, and that article
will include information
-
about that town like population
size, name of the mayor.
-
And that information,
ideally, was
-
correct at the time the article
was created on the Polish
-
Wikipedia--
-
maybe translated
from another wiki.
-
But then how likely is
it to be kept up to date?
-
How likely is it that the
Polish Wikipedia would give us
-
the correct and latest numbers
or data about the population
-
size of that town
or the mayor, right?
-
So this is the kind of data
that does go out of date, right?
-
Every few years--
five, 10 years--
-
there is a census, and now there
are new population figures.
-
Now the census in Argentina will
be made available in Argentina
-
in Spanish, probably,
which brings us
-
to another component of the
problem of dated data, which
-
is there are no obvious
triggers for updating the data.
-
So the Polish Wikipedian
is not sent an email
-
by the Argentinean
government saying, hey,
-
we have a new census.
-
There are new population numbers
for you to update on Wikipedia.
-
No such email is sent.
-
So it's kind of
hard to notice when.
-
And of course, multiply that by
all the different jurisdictions
-
around the world.
-
There's no easy
way and notice when
-
your data goes out of date.
-
So that's difficult
to keep up to date.
-
And even if we were to receive
some kind of indication--
-
oh, there's a new
census in Argentina,
-
so a whole bunch of
population figures
-
have now gone out of date.
-
Updating it on the
Polish Wikipedia
-
and the French Wikipedia
and the Indonesian Wikipedia
-
and the Arabic Wikipedia is a
whole bunch of repetitive work
-
that a lot of
different volunteers
-
will need to do just for
that one updated piece
-
of information about Argentina.
-
So I hope this is
clear and resonates
-
with some of your experience
editing Wikipedia--
-
data that is out of
date or that needs
-
to be updated
manually, menially,
-
on a fairly frequent schedule
across the different countries
-
and data sources.
-
The other-- and I think
maybe more interesting--
-
shortcoming or problem
that I want to discuss
-
is what I call the
inflexible ways
-
of lateral queries, crosscutting
queries of knowledge.
-
So if I want an answer to
the question, what countries
-
in the world export rubber--
-
that's a reasonable
question, right?
-
That information
is on Wikipedia.
-
Do you agree?
-
If you go to
Wikipedia and read up
-
about Brazil, about Peru, about
Germany, somewhere in there--
-
maybe a sub-article called
Economics of Brazil--
-
you will find the main
exports of that country.
-
And you can find
out whether or not
-
that country exports rubber.
-
But what if I don't want
to go country by country
-
looking for the word rubber?
-
I just want an answer.
-
What are the countries
that export rubber?
-
Even though that
information is in Wikipedia,
-
it's hard to get at.
-
It's hard to query.
-
Now, you may say, well, that's
what we have categories for,
-
right?
-
Categories are a way to
cut across Wikipedia.
-
So if someone made a
category called rubber
-
exporting countries, then
you can go to that category
-
and see a list of countries
that export rubber.
-
And if nobody has
made it yet, well, you
-
can create that category and,
with a kind of one-time effort,
-
populate that category,
and you're done.
-
Well, yes.
-
That's still not
very convenient.
-
But also, it's still
very, very limited,
-
because what if I only want
countries that export rubber
-
and have a democratic
system of government,
-
or any other kind of
additional condition
-
that I would like
to add to this?
-
Or take a completely
different example.
-
What if I want to know
which Flemish town had
-
the most painters born in it?
-
There's a ton of
Flemish painters.
-
Most of them were
born somewhere.
-
We could theoretically,
just you know,
-
look up all the birthplaces
of all the Flemish painters
-
and tally up the
numbers and figure out
-
what is the place where the
most Flemish painters come from?
-
I don't know the answer to that.
-
It would be nice to be
able to get that answer.
-
Again, the data is in Wikipedia.
-
Those birthplaces are
listed in the articles
-
about those painters.
-
But there's no easy way
to get that information.
-
What if I want to ask, who are
some painters whose father was
-
also a painter?
-
That's a thing
that exists, right?
-
Some painters are
sons of painters.
-
You know, Bruegel comes to
mind as an obvious example.
-
But there's a bunch
of others, right?
-
So who are those people?
-
What if I want to
ask that question?
-
That's the kind of question
that not only Wikipedia
-
doesn't answer today.
-
If you walk to your friendly
university library reference
-
desk and say,
hello, I would like
-
a list of painters whose
father was also a painter,
-
how would that
librarian help you?
-
There's no easy way to get an
answer to a question like that.
-
What if you only want
a list of painters
-
who were immigrants, painters
who lived somewhere else
-
than where they were born?
-
There's no book.
-
I guess maybe there
is, but you know,
-
it's not obvious that there's a
ready resource that says, list
-
of painters who are immigrants.
-
And the librarian would
probably refer you
-
to a book on the shelf
called, I don't know,
-
The Complete
Dictionary of Flemish
-
Painters and go,
look up the index,
-
you know, and if you
see a similar surname,
-
maybe they're father and son.
-
And kind of cobble together
the answer on your own.
-
The reason I'm comparing
this to a library
-
is to show you that this is a
kind of question that is not
-
readily satisfiable today.
-
Now, these questions may
sound contrived to you.
-
You may say to
yourself, well, you
-
know, painters who are also
sons of painters, yeah.
-
You know, that
never occurred to me
-
as a question I
might care about.
-
But I want to invite
you to consider
-
that this kind of question,
questions like that question,
-
may well be questions
you do care about.
-
And I also want to suggest
that the fact it is so nearly
-
impossible, the fact that
there's no obvious way
-
to ask that kind
of question today,
-
is partly responsible
to your not
-
coming up with those
questions, right?
-
We tend to be limited
by the possible.
-
You know, until human
flight was made possible,
-
it did not occur to anyone
to say, oh yeah, by this time
-
next week I will
be in Australia,
-
because that was
just impossible.
-
But when flight is
possible, there's
-
all kinds of things that
suddenly become possible,
-
and there's all
kinds of needs that
-
arise based on the
availability of resources
-
to fulfill those needs.
-
So many of these research
questions, compound lateral
-
cross-cutting queries, are not
being asked because people have
-
internalized the fact
that there is no way
-
to get an answer
to questions like,
-
what is the most popular first
name among British politicians?
-
I just made that up, you know?
-
Is it John?
-
Maybe.
-
Maybe it's William,
for whatever reason.
-
You know, these are the kinds
of questions we don't routinely
-
ask because we know that it's
like, who are you going to ask?
-
How are you going to
get an answer to that?
-
So this problem of not having
very flexible ways of querying
-
the data that we already have--
-
in Wikipedia, in
Wikisource, elsewhere--
-
is a significant limitation.
-
So these two key problems
have one solution.
-
And that is an editable,
central storage
-
for structured and
linked data on a wiki,
-
under a free license, which
is a very long way of saying
-
Wikidata.
-
That is Wikidata.
-
Wikidata is an editable,
central storage
-
for structured and
linked data on a wiki,
-
under a free license.
-
So let's take this
apart and unpack it.
-
First of all, it's
a central storage.
-
This relates to the
first problem, right?
-
If we had one place containing
data like population size,
-
we would be able to update
that one place and then have
-
all of the different Wikipedias
draw the data from that one
-
place so that we wouldn't
have to manually,
-
repetitively update it across
our hundreds of projects.
-
So having central storage
makes, I hope, kind
-
of immediate, intuitive sense.
-
But what do I mean by
structured and linked data?
-
So structured data means
that each datum, each piece--
-
individual piece-- of data
is managed on its own,
-
is identified and
defined on its own,
-
as distinct from Wikipedia.
-
Wikipedia has articles.
-
The article about Brazil
includes a ton of data,
-
all kinds of information,
and it's presented as text,
-
as several paragraphs--
several pages--
-
of text, right?
-
Now, we do have an
approximation of structured data
-
on Wikipedia.
-
If you've browsed
Wikipedia a little,
-
you've noticed that we often
have an info box, what we
-
call an info box on Wikipedia.
-
That's the table on the right
side if it's a left to right
-
language, the table
on the right side
-
that has information that
is easy to tabulate, right?
-
So you know, birth date, birth
place, death date, death place,
-
nationality--
-
or if it's about a country,
area, population, anthem,
-
type of government, whatever
you are likely to find.
-
If it's a movie, then
you know, starring,
-
genre, box office receipts,
whatever pieces of data
-
are relevant to an
article about a movie.
-
So we do already kind of
group pieces of information
-
on Wikipedia into this
kind of structured format.
-
Those of you who have
ever looked at the source,
-
at what the wiki code
under that looks like,
-
know that it's only
semi-structured.
-
It looks neat and
organized in a table,
-
but really, it's just a bunch
of text that is put there.
-
It is not centralized.
-
Every Wikipedia has its
own copy of that data.
-
And if I go and update
the population size
-
on Spanish Wikipedia of
that Argentinean town,
-
it does not get
updated automagically
-
on the English Wikipedia or
the Arabic Wikipedia, right?
-
So the structured data that
we already have on Wikipedia
-
is not managed centrally.
-
The other thing
about structured data
-
is, when you have a notion of an
individual piece of data, that
-
is the cornerstone of
allowing the kinds of queries
-
that I was talking about.
-
That is what will allow
me to ask questions like,
-
what is the Flemish town where
the most painters were born,
-
or what are the world's
largest cities that
-
have a female mayor?
-
I could come up with other
examples all day long, right?
-
These are all questions
that you can ask,
-
once you break down your data
into individual pieces, each
-
of which is--
-
you're able to refer to each
of those programmatically.
-
The computer can
identify, isolate,
-
and calculate based on each
of those pieces of data.
-
So that's why the
structure is important.
-
Now, Wikidata is also a
linked data repository.
-
What does it mean that
the data is linked?
-
Well, it means that a single
piece of data can point at,
-
can link to another
whole bag of data.
-
So if we are describing,
for example, a person,
-
and we record the
single piece of data
-
that this person was born
in Salem, Massachusetts,
-
that single piece of data
links to the item about Salem,
-
Massachusetts
because, of course,
-
we know a lot of things
about that place, Salem,
-
Massachusetts.
-
So it's not just the text--
-
S-A-L-E-M. It's not just,
that's where they were born.
-
But it's a link to all
the data that we have
-
about Salem, Massachusetts.
-
If we say someone's
nationality is French,
-
that is a link to France.
-
That is a link to everything we
know about the country France.
-
The fact that the data
is linked and structured
-
allows not only humans,
but also computers
-
to traverse information
and to bring
-
us different pieces of
relevant information
-
programmatically, automatically,
based on those links.
-
Because it's not just
text, it's an actual link
-
to another chunk of data.
-
If this sounds a
little abstract,
-
it will become much
clearer in just a second
-
when we see it in action.
-
But the other components of
this little definition are,
-
of course, this central storage
of structured and linked data
-
needs to be editable,
of course, because we
-
need to keep it up to date.
-
We need to correct mistakes.
-
And we want it on a wiki
under a free license.
-
The free license is, of
course, essential to enable
-
reuse of that data, to enable
all kinds of reuse of the data.
-
And Wikidata, unlike
Wikipedia, is released
-
under a different free license.
-
Wikidata is released
under CC0 waiver.
-
That means unlike
Wikipedia, where
-
you have to attribute Wikipedia
when you reuse information
-
from Wikipedia, you do not
need to attribute Wikidata,
-
and you do not need to
share alike your work.
-
It's an unencumbered license to
reuse the data in any way you
-
want, including commercially.
-
You don't have to say that
it comes from Wikidata.
-
I mean, it could be nice,
but you don't have to.
-
You're under no
obligation to do it.
-
And that is important to
allow certain kinds of reuse
-
where, for example, if you're
building some kind of device,
-
you may not have a practical
way to give attribution.
-
And had we required
that to use Wikidata,
-
we would have made
Wikidata less reusable.
-
So Wikidata is unencumbered by
the requirement of attribution.
-
And of course, because
it's on a wiki,
-
we get all the benefits that we
are used to expect from a wiki,
-
right?
-
So it's a wiki,
which means, yes.
-
It has discussion pages.
-
It has revision histories.
-
It remembers everything.
-
So if you screw it up, you
can always go a version back.
-
Or if someone else
vandalized the content,
-
we can always go back,
just like Wikipedia.
-
So we get all the
benefits we're used to--
-
user talk pages, group
discussion pages, watch lists,
-
all the features that
we expect in a wiki.
-
In short, Wikidata is love.
-
I hope you agree with me
by the end of this talk.
-
So let's zoom in and see
what this structured data
-
looks like.
-
So structured data on Wikidata
is collected in statements.
-
And statements have
the general form
-
of this triple, this
tripartite ascription--
-
items, properties, and values.
-
Now an item is the
subject, is the topic
-
that we are trying to describe.
-
It can be any topic that
Wikipedia can cover,
-
and many others that
Wikipedia wouldn't.
-
So the topic, the
item can be Germany,
-
or it can be Salem,
Massachusetts,
-
or it can be the
concept of redemption.
-
It can be anything at all.
-
Anything you can imagine
describing in any way with data
-
can be the item.
-
So the item, consider
it like the title
-
of the rest of the data.
-
And then what do we say
about Salem, Massachusetts
-
or about Germany?
-
Well, that's a series of
properties and values,
-
properties and values.
-
The property is
the kind of datum,
-
like birth date or language
spoken or manner of death.
-
These are all real properties.
-
Or national anthem, if I'm
trying to describe a country--
-
these are properties.
-
And then they have
values, right?
-
So this person, this
imaginary person's place
-
of birth, the value of the
property place of birth
-
is Salem, Massachusetts.
-
So you can think about it
as like a government form--
-
or not government, just any
form that you're filling out--
-
where there are field names,
and then empty spaces for you
-
to fill out.
-
That's the value, OK?
-
So the field names
or the categories
-
are the properties, right?
-
So name, language,
occupation, date of birth--
-
these are all properties.
-
And the values are
the actual piece
-
of data, the actual
information that we have.
-
And of course,
different kinds of data
-
are relevant for describing
different kinds of items.
-
And the key in the value is it
can be either a literal value--
-
like if we're describing
the height of a mountain,
-
we might say just
the number 8,848.
-
That's the height
of which mountain?
-
Not everyone at once.
-
Oh, because it's meters,
the metric system.
-
Yeah, Mt.
-
Everest is 8,848 meters.
-
Yes.
-
Get with it, America.
-
The metric system.
-
All right, so that
can be a literal value
-
like an actual number.
-
Or it can be a link to an
item, pointing at another item.
-
But in this statement,
it is the value.
-
So if I'm talking about
Germany, the item is Germany.
-
And the property capital
city has the value Berlin.
-
But the value is
not B-E-R-L-I-N.
-
The value is a pointer to
the item Berlin, right?
-
That's the link.
-
So a single item is described
by a series of such statements,
-
right?
-
There's hundreds and hundreds of
things I can say about Germany.
-
There's hundreds of things
I can say about a person.
-
And these will
generally take the form
-
of a property and a value.
-
By the way, some properties
may have more than one value.
-
Consider the property
languages spoken.
-
People can speak more
than one language, right?
-
So if I'm from
describing myself,
-
we can say languages spoken--
-
English, Hebrew,
Latin, whatever.
-
So a property can have
more than one value.
-
So if the item is
about a country,
-
it would have statements about
properties like population,
-
land area, official languages,
borders with, anthem,
-
capital city.
-
If I'm describing a person, I
have a whole mostly different
-
set of properties that
are relevant, right?
-
Date of birth, place of birth,
citizenship, occupation,
-
father, mother,
religion, notable works--
-
now, are all of these
relevant for all people?
-
No, of course not.
-
It depends.
-
And different items
about different people
-
will either have or not
have these fields, right?
-
So we wouldn't record religion
for absolutely every person.
-
Some people manage
to do without.
-
And also, it's not relevant
for a lot of people, like,
-
what their religion
happens to be.
-
Date of birth is generally
relevant for most people
-
that we're documenting.
-
So some properties kind of crop
up more commonly than others.
-
A person's height, for
example, is not generally
-
considered of
encyclopedic value, right?
-
We don't, for
example, if we have
-
an article about even a
really well-documented person
-
like Winston Churchill, does
Wikipedia mention his height?
-
I don't think it does.
-
Even though I'm sure
we could probably
-
find a source somewhere
that lists his height,
-
it's just not a
very relevant piece
-
of information about Churchill.
-
With everything else
that's written about him
-
and that we know
about him that we
-
want to include in the
article, a person's height
-
is not really something of
great value most of the time.
-
But if we are describing
Michael Jordan, it is relevant.
-
I'm dating myself.
-
People still know
Michael Jordan, right?
-
You know, a basketball
player, that's
-
when height is very
relevant, right?
-
That's one of the
first things you
-
say when you're describing
a basketball player,
-
is list their height.
-
So even within the
class of person,
-
some properties may be
more or less relevant,
-
depending on the context.
-
So let's look at some examples.
-
These are examples
of statements.
-
Each line is a statement.
-
So here's the first one.
-
I want to state, about the
item Earth, our planet.
-
And what I want
to say about Earth
-
is that the property
highest point on Earth
-
has the value Mt.
-
Everest.
-
Would you agree with that?
-
That is the highest
point on Earth.
-
That's a statement.
-
It says something
specific, one piece
-
of information about Earth.
-
Now of course, there's
a lot of other things
-
we want to say about Earth--
-
circumference,
average temperature,
-
I don't know, all
kinds of things
-
we can describe the planet
with, density, it's a galaxy,
-
it belongs to, all that.
-
But here's one piece
of information,
-
one very specific field in
the detailed form about Earth.
-
The highest point is Mt.
-
Everest.
-
Now here's a second statement.
-
This time Mt.
-
Everest itself is the item
that I'm describing, right?
-
The topic has changed.
-
Now I'm saying
something about Mt.
-
Everest, and what
I'm saying about Mt.
-
Everest is elevation
above sea level.
-
Sounds the same but it
isn't, because the highest
-
point on Earth answers
the question where,
-
like on the planet, what
is the highest point?
-
It's Mt.
-
Everest.
-
But how high is that highest
point is a different piece
-
of information.
-
Do you agree?
-
It's the actual altitude.
-
It's not where on
the planet it is.
-
So it may sound similar,
but these are actually
-
very different pieces
of information.
-
So that highest
point, how high is it?
-
Well, it's 8,848 meters high.
-
Now the third statement gives
another piece of information
-
about the first item.
-
Same item-- I could have
grouped them together.
-
Another thing I
know about the Earth
-
is that the deepest
point on the planet
-
is the Challenger Deep, part
of the so-called Mariana
-
Trench in the ocean.
-
So that is the deepest point.
-
And how deep is it?
-
I again use the elevation
above sea level.
-
That's the name of the
property even though it's not
-
above sea level.
-
I have a negative value because
the elevation of the Challenger
-
Deep is minus 11
kilometers, more or less.
-
All right?
-
So these are statements.
-
These are four individual
pieces of data.
-
And I could also
look at it this way.
-
Maybe that's closer to the
government form example
-
that I was giving, right?
-
So I want to say
something about Earth.
-
What do I want to say?
-
Two things-- highest point.
-
That's the field,
that's the property,
-
and this is the value.
-
The highest point is Mt.
-
Everest.
-
The deepest point
is Challenger Deep.
-
And then I have things to
say about Challenger Deep--
-
the property of elevation
above sea level, the value
-
is minus 11 kilometers.
-
Now here's yet another
view of the same data
-
once more, with numeric IDs.
-
So this is the same information,
the same four statements.
-
But this time, in
addition to using words,
-
I'm also including weird
numbers following either Q or P.
-
So P stands for property.
-
So the highest point
property is P610.
-
And the deepest point
property is P1589.
-
What do these numbers mean?
-
They don't mean anything at all.
-
They're just numbers.
-
They're just sequential numbers.
-
And if I create a new
Wikidata item right now,
-
it'll get just the
next available number.
-
So they're just numbers.
-
So P stands for property.
-
What does Q stand for?
-
Does anyone know?
-
It's a trick question
because it's hard to guess.
-
But the principal
architect of Wikidata,
-
a Wikipedian named Danny
[INAUDIBLE] and data scientist,
-
is married to a lovely
lady named [INAUDIBLE]
-
spelled with a Q. And
this is a loving tribute.
-
And she's also a Wikipedian and
an admin of Uzbek Wikipedia.
-
So Q2 is just the numeric
identifier of the item Earth.
-
And Q513 is the
identifier of Mt.
-
Everest.
-
You notice that we use that ID
across the statement, right?
-
So from Wikidata's
perspective, this
-
is actually what the
database actually contains.
-
What we were saying with words--
-
the Earth, highest
point, whatever--
-
never mind that.
-
Q2 has P610 with a value Q513.
-
That's what Wikidata
cares about, OK?
-
Now that, you'll agree,
is a little inaccessible.
-
Just these lists of numbers,
that's a little hard.
-
So Wikidata
understands and allows
-
us to continue using our words.
-
But actually, it gets
translated into numeric IDs.
-
Now why is this a good idea?
-
Why can't we just
say Earth or Mt.
-
Everest?
-
Any thoughts?
-
This is an open question.
-
Why is this a good
idea to use numbers
-
instead of the names of things?
-
Yes, because more than one
thing can have the same name.
-
What do you mean?
-
There's only one Mt.
-
Everest.
-
Well, yeah.
-
But there there's also a
movie called-- and probably
-
more than one-- called Mt.
-
Everest, or a TV documentary
literally called Mt.
-
Everest.
-
And of course, if I'm
describing a person named
-
Frank Johnson, not the only
Frank Johnson on the planet,
-
right?
-
But wait, you say.
-
On Wikipedia we deal
with that problem, right?
-
How do we deal with that
problem on Wikipedia?
-
Does anyone in
the audience know?
-
The standard way to
deal with the fact
-
that there is more than one
Frank Johnson in the world,
-
on Wikipedia, is to use
parentheses after the name.
-
So there is Frank
Johnson (actor)
-
and Frank Johnson
(politician), for example,
-
if that's the distinction
we need to make.
-
So you put in parentheses
kind of the minimal amount
-
of information you need to tell
apart these Frank Johnsons.
-
What if there's two
politician Frank Johnsons?
-
Well, then you would say Frank
Johnson, (Delaware politician)
-
versus Frank Johnson
(California politician), right?
-
You just put in that bit of
context to tell them apart.
-
So that's the solution
that Wikipedians came up
-
with years and years ago
because they did need
-
a unique name for the article.
-
You can't have two
articles literally called
-
Frank Johnson on Wikipedia.
-
So that's the
solution on Wikipedia.
-
But Wikidata was designed
much later, more than a decade
-
after Wikipedia, and was
able to kind of learn
-
from the experience
of Wikipedia, which
-
has tremendous experience
with multilingualism, much
-
more than most sites and
projects, as we know.
-
And so the Wikidata
team understood
-
from the get go that
this will be an issue,
-
and it's better to use
numbers that are unequivocally
-
different from each
other instead of labels,
-
instead of the actual
name, the actual text,
-
because names are not unique.
-
Names can change, right?
-
Just last year, there was a
big naming reform in Ukraine
-
and a whole bunch of towns
and districts were renamed.
-
Does that mean we should change
all the data that we have, like
-
lose all the data that we
have about the old name?
-
No, we ideally just
want to change the name
-
without breaking links.
-
So having the links actually
refer to the numbers
-
is one way to ensure the
integrity of the data,
-
of the links, when
renaming happens.
-
Another reason is well, even
if the name doesn't change,
-
not all humans call
everything the same, right?
-
So Earth is Earth
in English, but it's
-
[SPEAKING ARABIC] in Arabic.
-
It's [SPEAKING HEBREW]
in Hebrew.
-
So obviously, Earth--
even that is not
-
as unambiguous or unequivocal
as you might think.
-
And so that is the
reason Wikidata,
-
which is built to be
multilingual from the start,
-
talks about numbers
rather than labels.
-
OK.
-
Ha, I had a whole slide
about that and I forgot.
-
Yes, so even London,
again, is not
-
just London, England, which is
what you were thinking about.
-
It's also a city in Canada.
-
And it's also a family
name, like Jack London.
-
It's also a movie company.
-
There must be some hotel
named London somewhere.
-
This is a good opportunity
to remind everyone
-
that the vast
majority of humankind
-
does not speak a
word of English.
-
That's a statistic
worth remembering.
-
The vast majority of the planet
does not speak English at all.
-
That does not
contradict the datum
-
that English is the most
widely spoken language.
-
And yet, in aggregate,
a majority of people
-
speak other languages,
and not English at all.
-
So moving swiftly on, this
is a pause for questions
-
about what I've covered so far.
-
Any questions in the audience?
-
If not, we moved to IRC.
-
If there are any questions--
-
Any questions?
-
No?
-
IRC?
-
Any questions?
-
OK.
-
We will have additional
pauses for questions later.
-
But enough of my hand-waving.
-
Let's go explore Wikidata.
-
So Wikidata lives
at wikidata.org.
-
And Wikidata already has
more than 25 million items.
-
That is, it collects
statements about more than 25
-
million topics.
-
It has many, many more
than 25 million statements
-
because many of these items
have dozens or hundreds
-
of statements.
-
So it documents 25
million things--
-
people, books, rivers, whatever.
-
Just to give us a sense
of how big that number is,
-
how many articles do we
have on English Wikipedia?
-
More than-- yes, more
than 5 million articles.
-
And that's the
largest Wikipedia.
-
So Wikidata is
already describing
-
more than five times, or
about five times as many items
-
as even our largest Wikipedia.
-
So obviously,
Wikidata contains data
-
about things that have no
article on any Wikipedia.
-
It is a much, much larger,
more comprehensive project.
-
All right, the second
thing we might notice
-
is, well, this looks kind
of like Wikipedia, right?
-
If we've never visited, it
looks kind of like Wikipedia.
-
It has this sidebar.
-
It has these buttons at the top.
-
It looks like it's
from the '90s.
-
Yeah.
-
So the reason it
looks like Wikipedia
-
is that it is a wiki running
on Mediawiki software.
-
It is running on software
very much like Wikipedia.
-
But it is running on
a kind of modification
-
of the standard wiki software.
-
It has an additional,
very important component
-
named Wikibase,
which gives it all
-
of its structured and
linked data power.
-
So let's start
exploring Wikidata.
-
Let's take something local--
-
Harvey Milk.
-
Harvey Milk.
-
What does Wikidata
know about Harvey Milk?
-
For those on YouTube
who may not be local,
-
he's a San Francisco politician
and gay rights activist
-
who was murdered in the '70s.
-
It was very significant in
the history of those struggles
-
in this country.
-
So what does Wikidata
tell us about Harvey Milk?
-
Well, the first
thing is it knows
-
that Harvey Milk is Q17141.
-
That's the most important
piece of information,
-
is first of all, that
is the identifier.
-
That is the item
number of all the data
-
that we will collect
about Harvey Milk.
-
The second thing you see
right under the title
-
is this line, this very,
very brief summary, right?
-
"American politician who became
a martyr in the gay community."
-
This line is the
description line.
-
So the name of the item--
-
this is the label.
-
We call it label on Wikidata.
-
That's the label.
-
And this line is
the description.
-
Now why is this
description important?
-
This is the description that
helps us tell this Harvey
-
Milk from any other Harvey
Milk that may exist, all right?
-
So again, this would
be useful if I'm
-
looking up someone with a
slightly more generic name.
-
That line will help me tell
apart the item about Harvey
-
Milk the gay activist rather
than Harvey Milk the film
-
actor, OK?
-
And where is it coming from?
-
Well, Wikidata has
this whole table,
-
as you can see, with
descriptions and labels
-
in other languages.
-
So Wikidata is able to refer
to Harvey Milk in Arabic which,
-
don't panic, is written
from right to left.
-
It also knows what to
call him in Bulgarian.
-
I mean, it's the same name,
but it's in a different script.
-
In French, in Hebrew,
and that's it?
-
Does it not know a name
for Harvey Milk in Italian?
-
Of course it does.
-
It actually has
labels for this person
-
in many, many, many languages.
-
It doesn't have descriptions in
every language, as you can see.
-
OK?
-
So why was Wikidata showing me
these languages and not others?
-
I mean, why this somewhat
arbitrary collection--
-
English, Arabic, Bulgarian,
German, French, and Hebrew?
-
Because I told it to.
-
So if we briefly click
over to my user page--
-
again, like every wiki,
you have user accounts.
-
You have user pages.
-
This is my user page.
-
And as you can see,
there's this little user
-
information box here called
a Babel box by Wikipedians,
-
where I list the
languages that I speak.
-
And Wikidata uses this box
just to kind of helpfully
-
show me these languages.
-
Of course, all the
other languages
-
are still available, as you saw,
by clicking the more languages.
-
But this is just a
useful little way
-
of getting the languages I
care about up there first.
-
By the way, this is a lie.
-
I don't actually
speak Bulgarian.
-
That stayed on my user page
because I was demonstrating
-
this in Bulgaria and I wanted
that label to show up there
-
during the talk--
-
just in case you
were going to tell me
-
a really good Bulgarian joke.
-
OK so for example, Hebrew
is my mother tongue.
-
And we have a Hebrew
label for Harvey Milk.
-
But we don't have a description.
-
So let's fix that right now by
clicking the edit button right
-
here.
-
I click edit, and this
table became editable.
-
And now I can very briefly
type a description.
-
AUDIENCE: Online in
about 20 seconds.
-
But can we hold it?
-
ASAF BARTOV: OK.
-
That was good timing
for the screen to crash.
-
OK?
-
Are we back?
-
OK.
-
Sorry about that.
-
So this was all about what to
call him in different languages
-
and scripts and how to
tell this person apart
-
from other people with
potentially the same name.
-
Let's scroll down and see
what else does Wikidata
-
know about this person?
-
So as you can see, this is
a list of statements, right?
-
This is a list of statements.
-
And the properties
are on the left,
-
the values are on the right.
-
So the first thing Wikidata
knows about Harvey Milk
-
is a very important
property called instance of.
-
Instance of.
-
And the property instance of
answers the very basic question
-
what kind of thing is
this that I'm describing?
-
Is it a book?
-
Is it a poem?
-
Is it a mountain?
-
Is it a theological concept?
-
No, it's a human.
-
It's a person, OK?
-
The item about Mt.
-
Everest will say
instance of mountain, OK?
-
This is a very
important property.
-
Why is it important?
-
Wouldn't anyone looking
at this know that this is
-
a human being?
-
Yes.
-
Anyone looking at
this will know.
-
But if I want a computer to
be able to pull information
-
about people, I want to
be able to easily exclude
-
all the mountains and
poems and other things that
-
are not people from my query.
-
So this single datum,
this single piece of data,
-
is what tells computers and
algorithms very clearly,
-
this is a human.
-
Things that aren't instance
of human are other things.
-
OK?
-
So it may sound very
trivial, but it's not.
-
It's very important
to have an instance
-
of field for Wikidata items.
-
All right, what else do we know?
-
Well, Wikidata knows about
an image for Harvey Milk.
-
Again, we can find a ton of
images-- or maybe not a ton,
-
but we can find dozens
of images of Harvey Milk
-
on Commons, on our Wikimedia
multimedia repository.
-
So why should we have a
single image here on Wikidata?
-
Again, this is
mostly for reusers.
-
If I'm building some kind of
tool that pulls information
-
from Wikidata, it's
nice if there's
-
at least one representative
image to kind of use
-
as the default or immediate
image for Harvey Milk
-
in some other reused context.
-
All right, sex or gender--
-
male.
-
Country of citizenship--
United States of America.
-
Given name is Harvey.
-
The date of birth is so and so.
-
The place of birth is Woodmere.
-
The place of death
is San Francisco.
-
The manner of death is homicide.
-
Wikidata knows that.
-
Now again, every
little datum like that
-
is the basis for later querying
and answering questions.
-
So the fact that we record the
manner of death of people--
-
or at least of some people--
-
will allow us later
to go, you know,
-
who are some people from
Belgium who died by homicide?
-
That's a question Wikidata can
answer, thanks to this field.
-
The other thing I mentioned
is that things are links.
-
So the place of
birth is Woodmere.
-
I don't know where
Woodmere is, but I
-
can click that and find out.
-
Here is the Wikidata item
about Woodmere, right?
-
It was the value in the
statement about Harvey Milk,
-
but now I'm looking at
the item about Woodmere.
-
And it turns out it's in
Nassau County, New York, right?
-
And of course, Wikidata has
a whole bunch of information
-
for me about Woodmere--
-
what country it's in and the
coordinates and the population
-
and the area, all the things you
would expect about a place, OK?
-
Let's get back to Harvey Milk.
-
So the manner of death,
the cause of death--
-
now here, Wikidata gives
us excellent information.
-
The actual cause of death
is ballistic trauma.
-
That's a professional term.
-
And this statement
has qualifiers.
-
So until now, I was talking
about triples, right?
-
The item has a property
with a certain value.
-
Actually, each
statement can also
-
have a number of
qualifiers which
-
add aspects of information,
still about that one question
-
that we're answering, right?
-
So if this property
answers cause of death,
-
it's not discussing
anything else.
-
It's not discussing languages.
-
It's not discussing
date of birth, right?
-
It's talking about
the cause of death.
-
But we're not just
saying ballistic trauma.
-
We're saying ballistic trauma
with the quantity attribute
-
being five.
-
What does that mean?
-
Five bullets, right?
-
There are five
ballistic traumas.
-
He was he was shot five times.
-
And he was shot by this
person named Dan White.
-
And this ballistic trauma,
like this actual shooting,
-
is itself the subject
of this other thing.
-
This is a link to a
whole other Wikidata
-
item about the Moscone-Milk
assassinations.
-
Moscone was the San
Francisco mayor at the time.
-
We'll see slightly better or
easier to understand examples
-
of qualifiers in a bit.
-
So if this was
confusing, hang on.
-
So he was killed by Dan White.
-
He spoke English.
-
His occupation--
here's an example
-
of a property with more
than one value, right?
-
So Milk was a politician.
-
But he was also a Navy
officer, at least for a while.
-
That was another thing that
he did during his life.
-
And he was a human
rights activist, right?
-
So some people are
writers and translators.
-
So people can have more
than one occupation.
-
People can speak more
than one language.
-
Here's a better
example of a qualifier.
-
So the property award received
has the value Presidential
-
Medal of Freedom.
-
And that award has an
attribute called point in time,
-
like when was this?
-
This was in 2009.
-
Do you see that
this piece of data--
-
2009-- is a sub-statement
or is subjugated
-
to the context of this award,
was the Presidential Medal
-
of Freedom?
-
It can't just kind of
free float in the article.
-
It's not that 2009 is itself
a meaningful thing, right?
-
This medal was awarded in 2009.
-
If
-
Wikidata doesn't
tell us, for example,
-
when he was a Navy officer, OK?
-
But if we were, for example,
to look that up right now
-
and find out that Milk was
a Navy officer between 1962
-
and 1964, we could go back
here to the Navy officer bit
-
and click edit.
-
This is how I edit this
particular little piece
-
of information.
-
And add a qualifier like this.
-
I click Add Qualifier.
-
And I could pick start
time and end time, right?
-
And then I could
type 1962 to 1964,
-
and that would be
teaching Wikidata.
-
Oh, I'm sorry, I meant to
do that for Navy officer.
-
OK.
-
But, you know,
that is the exact--
-
the accurate time span
of that statement.
-
So it's true to say about a
person, he was a Navy officer,
-
even if of course he wasn't a
Navy officer his entire life.
-
But it's better and
it's more accurate,
-
to say he was a Navy officer
between 1962 and 1964.
-
Don't worry, I'm
not saving this.
-
No vandalizing of
Wikidata in this session.
-
OK.
-
Moving on.
-
What else does Wikidata know?
-
He was educated at
this university.
-
He was a member of
this political party.
-
Right?
-
That's of course if
they're a relevant property
-
for a politician.
-
Religion, military branch,
what is the category on commons
-
that discusses this
item, is something
-
that Wikidata can tell us.
-
And that's it.
-
Now, is that everything
that we could possibly
-
say in a structured
way about Harvey Milk?
-
No.
-
We could probably find at
least a few more things to say.
-
We will see how to contribute
new information to Wikidata
-
in just a minute with
a different example.
-
But this-- all this was
a set of statements.
-
Right?
-
This was the title
statements here.
-
But at the bottom of the
list of statements is
-
another section
called identifiers.
-
And I want to spend a minute
talking about what that is.
-
So identifiers is a
collection of keys.
-
A collection of
IDs, or codes, that
-
are keys to other
information sources.
-
And a lot of Wikidata items
have a whole series of keys
-
to other databases, other
sites, other repositories,
-
that help you or a computer
be able to access not just
-
some database and look for
information about Harvey Milk,
-
but access the exact record
relevant to Harvey Milk.
-
And again, if you imagine
someone named John Smith,
-
that is really valuable, right?
-
If you're not just
told, oh yeah,
-
you can look at the
Library of Congress
-
for John Smith,
good luck with that.
-
Or if I tell you, go to
the Library of Congress
-
to this record for this John
Smith, you see the difference.
-
So Wikidata tells us that on
VIAF, which is the Virtual
-
International Authority File.
-
It's an aggregated master
index built by bibliographers,
-
by librarians, of people.
-
Right?
-
It tries to kind of aggregate
information about people
-
across library
catalogs everywhere.
-
So the VIAF ID for Harvey
Milk is this number.
-
And conveniently,
if I click that,
-
I'm not taking to
some Wikidata item.
-
I'm actually taken
to the relevant site.
-
So this took me right
to viaf.org, the Virtual
-
International Authority File,
directly to their record
-
about Harvey Milk.
-
All right?
-
And that itself leads
me to national catalogs
-
of national libraries
all over the world.
-
We won't get into the
things you can do with VIAF.
-
The point is Wikidata
contained the piece of thread
-
that I could tug on
to arrive directly
-
to that information
in other databases.
-
Yes.
-
And it has that for many,
many kinds of databases.
-
The BNF, for example, that's
the National Library of France.
-
And that will take me
to that index card.
-
IMDB.
-
We all know IMDB, right?
-
So here I have the key
to Harvey Milk in IMDB.
-
And this is what IMDB says
about Harvey Milk, right?
-
They have their own piece
of information about him,
-
of course, with filmography
and everything else.
-
And see, I did not have
to search IMDB for it.
-
I just had the key right
there waiting for me.
-
Now, again, this is
very convenient for me
-
as I just showed you the
human use case for this.
-
But it's even more
powerful in aggregate
-
when we allow computers to
traverse this network of links
-
between--
-
not just within wiki data, but
between data storage facilities
-
and repositories.
-
This is sometimes referred to
as the linked data open cloud.
-
Cloud, because it's multiple
different repositories
-
that are interlinked.
-
And Wikidata is already, and
to a growing extent, the Nexus,
-
the connection
point between a lot
-
of these different databases.
-
So IMDB, for example,
it's a good example
-
because it's site
almost everyone knows,
-
IMDB has information
about Harvey Milk.
-
But that information
does not include a link
-
to the French National Library.
-
Right?
-
Do you see what I'm saying?
-
So IMDB is a data repository
with IDs and allows linking.
-
But it does not give you
what Wikidata gives you which
-
is this kind of collection of--
-
it's like a junction of all
these different data sources.
-
So Wikidata is the
place where you
-
can document these
interrelationships
-
or equivalencies.
-
Right?
-
So ID, you know, 587548 on IMDB
is discussing the same topic
-
as French National
Library ID whatever.
-
Wikidata contains that
piece of information.
-
that this ID in this database
is about the same person
-
as that ID in that database.
-
OK.
-
So that's what
identifiers are about.
-
Still scrolling down the
Wikidata item about Harvey
-
Milk, we have the site links.
-
The site links are links
to Wikimedia projects
-
that are related to this item.
-
So of course there
are Wikipedia articles
-
about Harvey Milk in many,
many different wikipedias.
-
Quite a few language versions.
-
And there are
pages on Wikiquote,
-
one of the sister projects.
-
There are pages on
Wikiquote with some quotes
-
from Harvey Milk.
-
And there is even a page for
Harvey Milk on Wikisource.
-
Right?
-
So this is a collection
of those links.
-
And those of you who have maybe
only dealt with Wikidata data
-
for inter-wiki links, which
we used to do in the old days
-
manually within
the article text,
-
now we do it through
Wikidata, so maybe that's
-
the only thing you didn't
know about Wikidata
-
is how to update these
inter-wiki tables on Wikidata.
-
All right.
-
So that concludes
our little tour
-
of the anatomy of
a Wikidata page.
-
I will just remind you that
it's a wiki page, which
-
means it has a discussion
page, a talk page.
-
This one happens to be empty.
-
But, you know, if we have
concerns or arguments
-
about some of the
data here that is
-
what we would use
to discuss this
-
and to arrive at consensus.
-
It also has a history view just
like every Wikipedia article.
-
So you can see here
a list of edits.
-
Maybe some of you
have never looked
-
at a history page on Wikipedia,
so this looks overwhelming.
-
But every line here,
every entry here,
-
is a single edit, a single
revision, a single change
-
to this Wikidata item.
-
Just Harvey Milk.
-
And you can see at the very
top this edit that I just
-
made-- this is my
volunteer account
-
and I just made this edit,
and in parentheses you
-
can see what I did.
-
I added an HE,
Hebrew, description.
-
And this is the text
that I added in Hebrew.
-
Right?
-
So we can see who added
what to the Wikidata item,
-
just like we can do
the same on Wikipedia.
-
So we have the revision history.
-
We can undo edits.
-
We can revert, just
like on Wikipedia.
-
And what else did I
want to show here?
-
We can add an item to my
watch list using the star,
-
just like on Wikipedia.
-
So we have all these
standard wiki features
-
that we would come to expect.
-
Let's pause for questions.
-
Any questions about what
we've covered so far?
-
Yes.
-
Are attributes of statements
precept for the specific value?
-
No they're not reset.
-
And generally Wikidata data does
not enforce by default logic.
-
So, I mean, there's
nothing to prevent you
-
from editing the
item about Brazil,
-
and adding the property height.
-
Now height is not a relevant
property for a country.
-
Right?
-
I mean, maybe average
elevation, maybe.
-
But not just height,
which is used for humans
-
or for physical things.
-
So you could add that
property to Brazil and save it
-
and the wiki would not complain.
-
Now in the background
there are kind
-
of extra wiki outside the
wiki prostheses for constraint
-
validation.
-
So there are bots and
other processes that
-
run, and occasionally,
for example,
-
identify non-living things
with a date of birth field.
-
That's nonsensical.
-
That should not exist.
-
If someone mistakenly added
that there are processes
-
that would flag
that to be fixed.
-
But the wiki itself,
Wikidata, will not
-
prevent you from adding that.
-
And that is by design
to keep things flexible.
-
So that people don't
run into, oh wait,
-
but I can't add this
because nobody thought
-
that I would need this, maybe.
-
I hope that answers
your question.
-
You say helpful
answer, question mark.
-
So was it a helpful answer, or?
-
OK.
-
Yes, Eleanor.
-
AUDIENCE: [INAUDIBLE]
-
ASAF BARTOV: Excellent question.
-
I'll repeat it.
-
You ask how do I find
the wiki data item
-
number from Wikipedia.
-
If I'm reading about Harvey Milk
and I want to look at the data
-
how do I do that?
-
That is an excellent question
and let's skip to Wikipedia.
-
Conveniently I have the
link right here on English.
-
So this is the Wikipedia
article about Harvey Milk
-
and every item on Wikipedia
should have a wiki data
-
item associated with it, but it
doesn't happen automatically.
-
So if I just created
a page on Wikipedia
-
I also need to create a
Wikidata entity for it
-
if it doesn't already exist.
-
It could already exist
because it was already
-
covered in a different
language, for example.
-
So that was parenthetical.
-
But every article on Wikipedia
should have, here on the side,
-
on the side are under Tools,
a link called Wikidata item.
-
Right here.
-
OK.
-
That Wikidata data
item is a link
-
that takes you to
Wikidata, to the entity,
-
and there you find the number.
-
You can-- you don't
even have to click it.
-
I mean, the URL itself
tells you the number.
-
The number, you see, it's
wikidata.org/wiki/q17141.
-
OK.
-
So that was an
excellent question.
-
Other questions?
-
Yes.
-
Yeah, about the additional
attributes, the qualifiers.
-
So, yes, I answered
more generically.
-
But just like the
properties themselves
-
are not limited per item,
the qualifiers per statement
-
are also not
entirely preordained.
-
But there is some
structure to it.
-
I don't want to go into it
at great length right now.
-
If we have time in the end
we can get back to that.
-
But some qualifiers are again
relevant for some things,
-
start time, end time,
and others won't be.
-
Wikidata does try to offer you--
-
you may remember when I
clicked add qualifier,
-
it gave me kind of drop down
of some relevant qualifiers.
-
So it does try to
help you in that way.
-
Other question?
-
Are the values for
instance of already
-
mappable to external ontologies?
-
That is a complicated question.
-
I'll help people understand
the question first.
-
So an ontology is a
structure, some kind
-
of hierarchy or
cloud, of entities
-
and their interrelationships.
-
An ontology would
say, for example,
-
a person is a living thing.
-
So is a dog.
-
They're both living things,
but they're different things.
-
And then, you know, say
things about those entities
-
and their interrelationships.
-
Now there are many,
many competing,
-
or coexisting models
of ontology's.
-
Many of them were created
for specific needs.
-
Many of them want to be
a universal ontology.
-
But of course it's
impossible to quite
-
agree on one complete
and simple ontology.
-
And so there are
many ontology's.
-
Which brings up your question,
can we map across ontology's?
-
Can we say that when wiki data
says instance of book that
-
is equivalent to some other
ontology saying instance
-
of bibliographic record?
-
And the answer is yes.
-
There are some such mappings.
-
They are incomplete.
-
And there's no kind of
auto magic thing happening
-
in the wiki vis-a-vis
those other ontology's.
-
That's kind of
left as an exercise
-
for those dealing with those
other ontology's, and for tool
-
builders and other
platform improvements
-
beyond Wikidata itself.
-
OK.
-
Other questions?
-
Yeah, we have one from
the YouTube stream.
-
Someone asked, why can't I
link Howard Carter's occupation
-
to archeologists when I use
an info box that fetches info
-
from Wikidata?
-
Why can't I link it
from the info box?
-
So, someone on the
stream answered
-
saying, because it's
an improper connection,
-
because the target is not
about the subject only.
-
The target is not
about the subject?
-
If I understand the
question correctly,
-
what you would want to be able
to do is from within Wikipedia
-
be able to say occupation
and link to a Wikidata entry
-
about archeology.
-
That doesn't quite
work that way.
-
We will get to a
little discussion
-
of that in an upcoming
section of this talk.
-
So I will defer the rest
of my answer to then.
-
OK.
-
So we're done with
questions for this phase,
-
and my browser got
tired of waiting for me.
-
So, yes.
-
All right.
-
So we took a look at Wikidata,
and we took questions.
-
So now, let's teach
Wikidata some new things.
-
Some things it
doesn't already know.
-
Let's look at this item here.
-
So this item is about one
of my favorite writers,
-
an American writer
named Helen Dewitt.
-
Wikidata, of course, fondly
refers to her as q54674,
-
but we can call
her Helen Dewitt.
-
And what can we contribute here?
-
So Wikidata has far less
information about Helen Dewitt.
-
Most of you probably haven't
heard of her, that's OK.
-
What does Wikidata
know about her?
-
Well instance of human.
-
We have a photo of her.
-
She's female.
-
She's an American.
-
Her name is Helen.
-
Date of birth.
-
Place of birth.
-
She's an author, a
novelist, a writer.
-
She was educated at the
University of Oxford.
-
And Wikidata knows what
her official website is.
-
That's useful, but that's it.
-
Now we can contribute
information here.
-
For example, she's an American
author writing in English.
-
So we could add
that information.
-
We could click the
Add button here.
-
And this is a good
moment to acknowledge
-
that the user interface of
Wikidata is a work in progress.
-
It's not as intuitive
as it might be.
-
So you need to
understand that click--
-
to add a completely
new property,
-
You need to click
this Add button.
-
If you want to add an additional
value to the property official
-
website, you need to
click this Add button.
-
It makes a kind of
sense with a shaded box.
-
But, you know, you need
to kind of pay attention,
-
and it's not as
friendly as it might be.
-
[COUGHING] Excuse me.
-
So, let's add a property here.
-
Click the Add button.
-
Again, Wikidata tries to
be useful by suggesting
-
some relevant
properties for humans.
-
A bit more morbidly it suggests,
how about date of death?
-
That's not cool, Wikidata.
-
Helen Dewitt is still alive.
-
So I will not add
date of death, but I
-
can add languages spoken,
written, or signed.
-
OK, so I click that.
-
And she writes in English.
-
I just type English-- whoops.
-
Not in Hebrew.
-
Don't panic.
-
I type English here.
-
And, oh, and of course Wikidata
has auto-complete, right?
-
So it tries to help me along.
-
But you will notice that
it has all kinds of things
-
called English.
-
I mean, it turns out that
there is a place in Indiana
-
called English, Indiana.
-
Did I mean that?
-
No, of course I didn't mean
that she writes her books
-
in English, Indiana.
-
Right?
-
But, you know, Wikidata gives me
the option of linking to that.
-
I also don't mean the botanist
Carl Schwartz English.
-
No, no I mean the
west Germanic language
-
originating in England.
-
That's what I mean.
-
So I click that.
-
And I click Save.
-
And that's it.
-
Again I have just made
an edit to Wikidata.
-
I have just taught Wikidata
that this author speaks English.
-
Now, again, this
may be very obvious.
-
She's American.
-
Of course not all
Americans write in English.
-
It may be obvious if
you look at her books.
-
The important thing
is that now Wikidata
-
knows this as a piece of data.
-
And, again, think ahead
to queries, which we will
-
demonstrate in a little bit.
-
Without this piece
of information
-
that I just added, if I were to
ask Wikidata five minutes ago,
-
give me a list of novelists
writing in English, OK,
-
Wikidata would have returned
thousands of results.
-
But Helen Dewitt would
not have been among them.
-
Because up until two
minutes ago Wikidata
-
didn't know that Helen Dewitt
writes in English and not
-
in Spanish.
-
Do you see?
-
It is this explicit
statement that will now
-
make her be included in any
future queries that asks,
-
who are novelists
writing in English?
-
OK.
-
By the way, she's
a PhD in Classics.
-
She speaks-- or at least reads
and writes Latin and Greek,
-
ancient Greek, and I could--
-
I can-- I mean, I
happen to know that.
-
But wait, wait, wait,
wait, wait, you say.
-
What about original research?
-
I mean, you can't just add
stuff like that to Wikidata.
-
Don't you need sources?
-
Citations?
-
Of course I do.
-
Yes.
-
Let's add some sources to this.
-
So on Wikidata,
just like Wikipedia,
-
things should generally
be supported by citations,
-
by references.
-
And just like Wikipedia,
they aren't always supported
-
in that way.
-
OK so, I mean, I can
just add it to Wikidata.
-
Watch me.
-
I just did that, right?
-
I just added English and
Latin without any citation,
-
and I will not be
arrested for it.
-
Just like I could edit
a Wikipedia article
-
and add some information
without a citation.
-
It may stick.
-
It may stay in the article,
or it may be reverted.
-
It depends on the kind of
information I'm adding.
-
It depends how many people
are paying attention
-
to the article on Wikipedia.
-
And it works the
same way on Wikidata.
-
OK, so, you can add some
things without references.
-
Ideally, when you
add, information you
-
should include references.
-
So let's be good Wikidata
citizens and add a source.
-
Here is an article that
I prepared in advance.
-
This is Helen Dewitt.
-
And in this article,
somewhere, it actually
-
says right at the
bottom here, see,
-
Dewitt knows, in descending
order of proficiency, Latin,
-
ancient Greek, French,
German, Spanish,
-
and Portuguese, Dutch, Danish,
Norwegian, Swedish, Arabic,
-
Hebrew and Japanese.
-
This may sound
excessive, but it's true.
-
I met this woman.
-
So anyway, we don't have
to include all of that.
-
The point is this article from
a reasonably reliable source,
-
this magazine,
this interview, can
-
count as a source for
the languages she speaks.
-
So I copy the URL.
-
I just copied off my browser.
-
And, whoops-- that's not--
-
here we go.
-
And I can just add
a reference here
-
to the information that I
just added to Wikidata, right?
-
I can click Add Reference.
-
And then just say the reference
URL is, and I just paste.
-
I paste this URL.
-
Hit Enter.
-
And that's it.
-
And now the fact that she
speaks Latin has a reference.
-
If you look at the other
things here on Wikidata,
-
you can see that these IDs, for
example, have references, too.
-
Right?
-
In this case, the reference
just says, excuse me--
-
In this case it just as
imported from English Wikipedia.
-
But wait, you say, can
Wikipedia be a source?
-
Not properly, no.
-
I mean, just like Wikipedia
itself doesn't cite itself.
-
We don't say, this person
was born in this city
-
how do we know?
-
We read it on Wikipedia
in another language.
-
That's not a good citation.
-
It's not a good
citation for Wikidata
-
either so why do we put it here?
-
Well you can see the qualifier
here is different, right?
-
It's not reference URL, which
is what I put in for Latin here.
-
It's not reference URL here,
it's a different qualifier.
-
It says-- saying, imported from.
-
So this is not an
actual reference that
-
supports this piece of data.
-
It just shows where did
this data come from.
-
It's a slightly different
thing, because this data was
-
mass imported into Wikidata.
-
So it wasn't input by
hand by some volunteer.
-
It was imported into Wikidata
en masse by a script,
-
by a program.
-
And we want to know, where
did this number come from?
-
Well it came from
English Wikipedia.
-
So again, that's not
a proper reference
-
for the validity
of the information,
-
but it does at least tell us
it came from English Wikipedia.
-
We can click and look on
English Wikipedia and find out.
-
Maybe there's a
footnote there that
-
says where it did come from.
-
OK.
-
So this was an example of
teaching Wikidata something
-
that it didn't know.
-
Something about the languages.
-
And of course I could add
this reference for English.
-
I could add all the other
languages that she speaks.
-
And I won't bore you with
that, but that is basically
-
how it's done.
-
So you click this Add to
add a completely new--
-
completely new statement.
-
Now, by the way, the fact
that these are the only two
-
suggestions that
Wikidata can think of,
-
doesn't mean these
are the only options.
-
OK, you can just type
anything that may be relevant.
-
We could add, for
example, award.
-
Just start typing award.
-
And here I have I have
a bunch of properties
-
that are relevant for awards.
-
Awards received, together
with, conferred by, right?
-
There's all kinds of properties
that I could rely on.
-
And of course there is a list of
all the properties of Wikidata.
-
And that list is
also sorted by type.
-
So yes, there is a list of
properties relevant to people
-
so that you don't have to guess.
-
But a surprising
amount of the time
-
you can just start typing
and get the right properties
-
suggested to you.
-
OK.
-
So we taught Wikidata
something new,
-
and now let's teach Wikidata
something completely new.
-
Right?
-
So how do we create
a new Wikidata item?
-
So, like I said, if I
created a Wikipedia article
-
about something that was
not previously covered
-
on any other
Wikipedia, chances are
-
there would not be an already
existing Wikidata item.
-
Sometimes there might
be, because Wikidata
-
does have 25 million entities.
-
But sometimes there wouldn't be.
-
So, first of all, I could
search for it, right?
-
So I could go to Wikidata
to the search box
-
here and just start typing, and
search for what I want, right?
-
So if I'm searching for Helen
Dewitt I just say Helen,
-
and I can see whether
or not it exists.
-
And there's a detailed search
results page, et cetera,
-
where I can where I can find out
if the item does exist or not.
-
Excuse me, this reminds me
of a very important thing
-
I wanted to
demonstrate, and that
-
is the multilingualism
of Wikidata.
-
So remember all these
labels in other languages.
-
Wikidata knows what to call
Helen Dewitt in Hebrew.
-
And it will show it to Wikidata
users whose language is Hebrew.
-
Mine is set to
English, for your sake.
-
But if I change this I go to
Preferences here and change
-
my language.
-
[INAUDIBLE] All
right, and I hit Save.
-
Wikidata will start
talking to me in Hebrew.
-
Now brace yourselves.
-
Are you ready?
-
Don't panic, it's right to left.
-
Oh my god everything
is topsy-turvy.
-
So this is the same
article in Hebrew.
-
So the sidebar has
switched direction,
-
and I know most of
you cannot read it.
-
Bear with me.
-
This is the label
that we previously
-
saw in the label box.
-
This is how you spell
Helen Dewitt in Hebrew.
-
And here is the
description in Hebrew.
-
It's not the description in
English, this description,
-
American writer, which
I was shown previously.
-
Now I'm shown the Hebrew
description, appropriately.
-
But more interestingly,
oh my god!
-
All these statements
are suddenly in Hebrew.
-
How did that happen?
-
Well this tiny word here
is the very concise way
-
to say in Hebrew, instance of,
and this word here means human.
-
So these are links to
the same things, right?
-
It still links to Q5.
-
Q5 is the Wikidata
entity for human.
-
These are still the same things.
-
But because Wikidata has
multiple labels for everything,
-
it has multiple
labels for items.
-
And it also has multiple
labels for property names.
-
So Wikidata knows how
to say, instance of,
-
and award received,
in other languages.
-
That is why it is able to show
me all this data in Hebrew
-
even if none of that data was
actually input into Wikidata
-
by a Hebrew speaker.
-
That data could have been
input by English speakers,
-
but thanks to the
fact that someone once
-
translated the word
photo into Hebrew,
-
I can see this field in Hebrew.
-
So one of the things you
can do to help Wikidata,
-
right now, without
any special knowledge
-
is to help translate
those labels.
-
Every label only needs to
be translated just once.
-
So you can see that all
of these properties, date
-
of birth, name et cetera,
they all have Hebrew labels.
-
Maybe one of these would not.
-
No, they all have Hebrew labels.
-
Doing pretty good.
-
And I'm able to search
in my own language.
-
I'm able to click Add.
-
This word is Add,
so I click this,
-
and now I have the Add screen.
-
It all speaks my language,
and it's awesome.
-
And now for your sake I
will switch back to English,
-
but it is important
to know you can
-
edit Wikidata in any language.
-
And it is far more multi-lingual
and multi-lingual friendly
-
than, for example commons, which
is also a project we all share.
-
But commons has some limitations
on how multi-lingual it is.
-
For example, the category
names, et cetera.
-
OK.
-
So we were beginning
to discuss creating
-
something completely new.
-
AUDIENCE: Quick
questions, if that's OK?
-
So there's two questions on IRC.
-
The first one is, can you
show search for something
-
like getting the list of things?
-
I want to learn how to search
for something properly like,
-
show me all the items with
this value of this property.
-
ASAF BARTOV: Yes.
-
That is part of
this talk, but I'll
-
get to that in a
little bit later.
-
There's a whole section where I
will demonstrate the very, very
-
powerful query
system of Wikidata
-
where I will cash
that check that I gave
-
at the beginning of
all these painters
-
who are sons of painters
queries et cetera
-
So I will demonstrate
how to do that.
-
AUDIENCE: Other question.
-
How does Wikidata data deal
with link rot, and other issues
-
streaming from their URL refs.
-
ASAF BARTOV: URLs break.
-
We call that link rot.
-
Wikidata doesn't have
any particular magic
-
around link rot,
just like Wikipedia.
-
So if you do use a bare
URL it may well rot.
-
But you can add qualifiers
with back up URLs else
-
on the Internet Archive, or
another mirroring service.
-
And potentially that could be
a software feature for Wikidata
-
to automatically save
or ensure that something
-
is saved on Internet
Archive, but I don't
-
know that it is doing so now.
-
So, just like Wikipedia, if
it is a bear URL it may rot.
-
And may need to be
replaced, possibly by bot.
-
Other questions?
-
All right, so let's
talk about how you
-
create a completely new item.
-
It's very simple.
-
You go to Wikidata and you
click here on the side.
-
There's a link, create new item,
which gives you this screen.
-
And let's create an
item about a book
-
that I'm reading right now
by this Bulgarian writer.
-
So we have an article about this
writer guy named Deyan Enev.
-
But we don't have an
article or a Wikidata item
-
about one of his famous
books called Circus Bulgaria.
-
That's the book I'm reading,
his first collection
-
of short stories in English.
-
Circus Bulgaria came out
in 2010, Portobello Books,
-
translated by Kapka Kassabova.
-
So that's the book I'm reading.
-
As you can see it's not
a link on Wikipedia.
-
There's no article about
it, and there's not even
-
a Wikidata entity item about it.
-
But we can totally create
it, even without a Wikipedia
-
article.
-
So let's create this new item.
-
Let's create it in
English for the purposes
-
of our demonstration.
-
The name of the item
is Circus Bulgaria.
-
Circus Bulgaria,
that's the name.
-
Not Circus Bulgaria
parentheses book,
-
or anything you may be
used to from Wikipedia.
-
It's the actual
name of the book,
-
and the description,
again, remember,
-
the description field
is just to kind of help
-
tell apart this Circus Bulgaria
from any other potential Circus
-
Bulgaria.
-
Maybe there's a
film or something.
-
So it's enough to just say
something like short story
-
collection.
-
I might add by Deyan Enev
and if just in case, again,
-
some future other short story
collection by some other author
-
happens to have that same name.
-
That should be
disambiguating enough.
-
OK.
-
Short story collection
by Deyan Enev.
-
I could have aliases for this.
-
The aliases assist find-ability.
-
This particular book has just
this one name, so that's fine.
-
And I click Create.
-
That's it.
-
I just start with a
label, and a description.
-
I click Create.
-
I have a brand new queue number
for my new Wikidata item.
-
And Wikidata knows
what to call it.
-
And a description in
one language at least.
-
And that's it, and I
can start populating it.
-
As it can see, it it
has no site links,
-
but it's ready to be taught.
-
So, for example, I
can start by teaching
-
it the name of the book
in another language
-
that I happened to speak.
-
Now it has two labels
in English and Hebrew.
-
I could also look
up the book Areon,
-
the original Bulgarian
label for this book.
-
Seems relevant.
-
Again, I do not speak Bulgarian.
-
But I can go to the Bulgarian
Wikipedia through into Wiki.
-
This is this gentleman.
-
And I could find--
-
I can read Cyrillic so
I could easily find--
-
when I say easily--
-
when I say easily--
-
maybe not so easy, but
I can search for it.
-
Here we go.
-
Tsirk Bulgaria.
-
That is the name of the book.
-
Tsirk, as in circus.
-
No problem.
-
So I just copy this right here.
-
And I go back to my new item.
-
My new item, which is here,
and I edit the Bulgarian field.
-
And here it is.
-
Awesome.
-
All right.
-
But I still haven't told
Wikidata anything about this.
-
I know I'm talking about a book.
-
Wikidata that doesn't
know that yet.
-
So let's start by
adding some statements.
-
First of all, I click Add.
-
Wikidata sensibly
says, how about we
-
start with instance of.
-
Tell me what kind of animal--
no, not kind of animal.
-
What kind of thing are you
trying to describe here?
-
Well it's an instance of a book.
-
Not in Hebrew, please.
-
So it's an instance of a book.
-
I could even be a
little more specific
-
and say it's an instance of
a short story collection.
-
There we go, short
story collection.
-
I hit Save.
-
Awesome.
-
So now we know what
kind of thing it is.
-
It's not a human, it's not a
mountain, it's not a concept.
-
It's a short story collection.
-
Now I can add some other things.
-
See, Wikidata is
already working for me.
-
Because it's a short
story collection
-
it's offering me to populate
these properties, and not
-
other ones.
-
Publication date,
original language,
-
genre, country of origin,
these are all relevant, right?
-
So let's start with original
language of the work
-
is Bulgarian.
-
Not Bulgaria, Bulgarian.
-
This is the item I want to link.
-
Hit Save, and whatever.
-
Author.
-
Let's identify the author.
-
So the author, the main
creator of the work,
-
is that gentleman Deyan Enev.
-
And remember, he has
a Wikipedia article.
-
He also has a Wikidata entity.
-
So Wikidata does know about him.
-
So I hit Save, and I can add
something about the translator.
-
And what was that lady's name?
-
Kapka Kassabova.
-
Now it so happens that Wikidata
already knows about this lady.
-
See?
-
So I can just start typing
and then just link to it.
-
Awesome.
-
But what if it didn't?
-
What if it was translated
by someone who isn't
-
already covered on Wikidata?
-
Well I could just type
the name as a string,
-
but ideally I could
create a Wikidata entity
-
about this translator so
that there is a possibility
-
to link to her.
-
Now I might actually
add a qualifier here
-
because, she's not the
translator of the book, right?
-
She's the translator of
the book into English.
-
Right.
-
So the language that she
translated into is English.
-
Right?
-
This book-- remember
I'm describing the book.
-
The item is about the book.
-
So the book would have
a different translator
-
into Polish.
-
So this is an example of
a property or a statement
-
that doesn't make sense without
one of those qualifiers.
-
It's just not correct.
-
It doesn't make sense to
say that translator is.
-
The English translator, or
even this English translator.
-
In 50 years maybe there would
be an additional English
-
translation.
-
So that's an example of
needing that qualifier.
-
And of course I could go on
and populate the other fields.
-
We don't have to
do that right now.
-
Publication date, country
of origin, et cetera.
-
So this is already beginning
to look like all those items
-
that we already saw, but just
a moment ago it didn't exist.
-
Just a moment ago Wikidata
had no concept of this work.
-
This happens to be one
of his notable works.
-
So I could actually go to the
item about Deyan Enev which
-
has all this information
already, occupation, languages,
-
and add a property.
-
Remember, I'm not
limited to these.
-
I can add a property
called notable works,
-
and mention my new item.
-
Circus Bulgaria.
-
See?
-
My new item is
showing up, and thanks
-
to this description that I
wrote, short story collection,
-
it's already appearing here in
the dropdown very conveniently.
-
So I linked to this.
-
I hit Save.
-
Ideally again I should find
some references showing
-
that this is a
notable work by him,
-
but we won't spend
time on that right now.
-
But the point is we
created a new item.
-
We populated it a little bit.
-
We linked to it so that it's
more discoverable by mentioning
-
it in the author name, and
of course the book item
-
itself mentions the author
and links to the author.
-
So that's all good.
-
One last thing we shall do is
give it some useful identifier
-
so let's add, say, the
Library of Congress record
-
for this book.
-
OK.
-
So I have prepared
this in advance.
-
Ooh.
-
Just in time, with 80 seconds to
go before it's giving up on me.
-
Oh it has already
given up on me.
-
That is very unfortunate.
-
So I go to the Library of
Congress and I find this book.
-
I find this entry, right?
-
In the Library of Congress
database about this book.
-
And it has a permalink.
-
It has a kind of guaranteed
to be permanent link.
-
I can just copy that link,
go back to my little book,
-
and say the Library of Congress.
-
Yeah, LCCN, that's what they
call their IDs, the call
-
number.
-
And I paste it here.
-
I actually don't need the URL.
-
I need just a number.
-
And there we go.
-
I have added it,
and now Wikidata
-
knows how to find bibliographic
information about this book.
-
And any re-user of
Wikidata, some program,
-
some tool that connects
books to authors
-
or does statistical analysis or
whatever, some future yet to be
-
imagined tool
could automatically
-
find additional metadata on the
Library of Congress site thanks
-
to this connection
that I just made.
-
And of course I could
add many other IDs
-
to other catalogs
around the world,
-
and we won't do that right now.
-
You can see that it's now
showing up under identifiers.
-
So this is how we created
a brand new piece of data.
-
Questions about this,
about creating new items?
-
Yeah, all right.
-
So we've seen how to contribute
to Wikidata on our own,
-
kind of through--
-
directly through Wikidata.
-
Now you may you may be
thinking, but Asaf, this
-
sounds like a ton
of work recording
-
all of these little tiny bits of
information about every person
-
and every book and every town.
-
And if you think that
you would be correct.
-
That is a ton of work.
-
It's a lot of work.
-
However, it is centralized, so
it is reusable on other wikis
-
and we will show in just a
moment how we pull information
-
from Wikidata into
Wikipedia or other projects.
-
We will show that
in just a moment.
-
But here's an
awesome little game
-
that we Wikidata
volunteer, Magnis Monska,
-
has authored called the
Wikidata game, in which he
-
tricks people--
-
sorry, helps people
make contributions
-
to Wikidata in a very,
very easy and pleasant way.
-
Let's look at the Wikidata game.
-
So the first thing you need
to do in that Wikidata game
-
is to log in,
because the Wikidata
-
game makes edits in your name.
-
So we need to authorize it.
-
It's perfectly safe.
-
And after you do that you
can go to the Wikidata game.
-
So this is the game.
-
Now I'm logged in.
-
And the Wikidata game
actually includes
-
a number of different games.
-
Let's start with a person game.
-
So Wikidata shows you--
-
shows you an item, and asks
you a very simple question.
-
Person, or not a person?
-
So Wikidata goes through
Wikidata entities
-
that don't even have the
instance of property.
-
Which is why Wikidata
doesn't know,
-
literally doesn't know, if this
is a person, or a mountain,
-
or a city, or a country,
or anything else.
-
So it asks you, because this
is the kind of question that
-
Wikidata cannot
decide on its own,
-
but for us humans it's generally
trivial to be able to say
-
whether something that we're
looking at is a person or not.
-
It gets slightly trickier when
the information is in Javanese,
-
as it is here,
rather than English.
-
So this item happens to
be described in Javanese.
-
My Javanese, spoken in
Indonesia, is very weak.
-
However, I can tell that
this is not a person.
-
How can I tell?
-
Without understanding
a word of Japanese
-
I see that it mentions
1000 kilometers
-
and square kilometers, see?
-
So this is about a
place, or an area,
-
or a region, or whatever,
but not a person.
-
So this is an
example of how even
-
without understanding
language you can sometimes
-
make a determination.
-
However, of course,
you should be sure.
-
This is definitely not
what the Wikipedia article
-
about a person looks like.
-
So this is not a person.
-
I just click it and I'm
shown the next item.
-
This item is in another
language I do not speak,
-
and I just don't know.
-
I do not know if this is
about a person or not.
-
So I click Not Sure.
-
This is in Swedish, and
it's about Sulawesi, still
-
Indonesia.
-
And it is not about a person.
-
I have enough Swedish for that.
-
So I click not a person.
-
Now, you may say,
well, do I really
-
have to deal with all these
languages that I don't speak?
-
The answer is no.
-
You don't have to.
-
Here at the bottom
of the Wikidata game
-
there are settings.
-
You can click that
and tell Wikidata,
-
I cannot even read
Chinese or Japanese,
-
so please don't show me
items in those languages.
-
Because I wouldn't
even be able to guess.
-
I prefer these languages in
which I can relatively easily
-
make determinations.
-
And I can even tell Wikidata to
only show me these languages.
-
You see?
-
This was not selected,
which is why I
-
was shown some other languages.
-
I could say, only use
these languages, and save.
-
And now I can try
this game again.
-
However, that can
slow it down a little.
-
So here we go.
-
Here's a Spanish-- which
is one of the languages I
-
told Wikidata game it can use.
-
This is a Spanish item.
-
Now is it about a person or not?
-
It is not about a person.
-
Is it about a person?
-
No.
-
Yes, it is right?
-
Monk Cistercian, Pedro
de Ovideo Falconi.
-
That sounds like a person.
-
Frau Pedro Nasser.
-
Yeah, he was born
in Madrid 1577.
-
This is a person.
-
OK.
-
So I click person.
-
Again, if you're not
sure, click not sure.
-
The point is, just by clicking
person and as you can see
-
this would work
very well on mobile,
-
which is why I said you can
contribute on your commute.
-
You can just hold your
phone or tablet or whatever,
-
and just tap.
-
Person, not a person.
-
Person, not a person.
-
The amazing thing is that just
tapping person has actually
-
made an edit to Wikidata
on my behalf, which
-
I can find out, like every
wiki, by clicking contributions.
-
And as you can see in addition
to the stuff about circus
-
Bulgaria, my latest edit is in
fact about this Pedro de Ovideo
-
Falconi person.
-
And the edit was, you can--
-
I hope you can see this, created
the claim instance of human.
-
So I added--
-
I mean Wikidata game
added for me the statement
-
instance of human.
-
Now, the awesome thing is
that it was super easy to do.
-
I didn't have to go into that
entity, click the Add button,
-
choose the instance of property,
choose human, hit Save.
-
Instead of all these
operations I just
-
tapped on my screen,
person, not a person.
-
And I can do hundreds of
edits during my daily commute.
-
There are other games,
like the gender game.
-
So this is about--
-
this is when Wikidata
already knows
-
that this item is a
person, but it doesn't
-
know the gender of this person.
-
Which is another one of
the more basic items.
-
And this is taking a long
time because of the language
-
limitations that I set on it.
-
I guess the less exotic
languages have already
-
been exhausted in the game.
-
We don't have to
wait all this time.
-
We can try something else.
-
How about occupation?
-
The occupation game.
-
Here we go, this is in Russian.
-
And what is the occupation
of this gentleman?
-
Well he is an [INAUDIBLE].
-
He's a church person.
-
However, so the
occupation game is
-
where Wikidata game
will automatically
-
pull likely occupations
from the article text
-
and ask for confirmation.
-
So if he-- if this person
really is a deacon,
-
I should click that.
-
But I'm not sure.
-
I'm not clear on the Russian
church's distinctions between--
-
I mean [INAUDIBLE]
is pretty senior,
-
but I don't know if that
automatically also means
-
he's a deacon or not.
-
And [INAUDIBLE] is
not listed here.
-
So I will click not listed.
-
Also, these guesses
are not always correct.
-
So, this guy for
example, is in Russian.
-
I can read this.
-
He's a philologist.
-
He's a linguist.
-
So I can confirm it
and click linguist.
-
All right?
-
And again, if we look
at my contributions
-
we can see the Wikidata
game on my behalf
-
created occupation linguist.
-
OK.
-
Just by typing linguist there.
-
Now if it's taken
from the article,
-
why would it ever be wrong?
-
Well Jesus was the
son of a carpenter.
-
The word carpenter
appears in the text.
-
That doesn't mean it's correct
to say Jesus was a carpenter.
-
OK?
-
Just a trivial example, right?
-
So many, many articles will say,
you know, born to a physician.
-
And so the word physician
could be guessed,
-
but it wouldn't be correct
unless the son is also
-
a physician.
-
So I hope it gives
you the gist of it.
-
There is also a
distributed Wikidata game,
-
which is pretty awesome.
-
Here we go, which
has additional games.
-
So, for example, the
key on game gives you,
-
maybe it gives you,
some items to play with.
-
Yes?
-
No?
-
OK.
-
So it gives you
this little card,
-
and asks you to confirm is this
instance of human settlement?
-
That is, is it a village,
town, city, whatever.
-
Is it a kind of human
settlement or not?
-
Or maybe it's a book.
-
Maybe it's a poem.
-
Again, so, is it an
English settlement?
-
And you can click the languages
here to see the information.
-
So I can click English.
-
And indeed the article--
-
I mean the actual
Wikipedia article
-
says Camigji is a
town and territory
-
in this district in the Congo.
-
So yes, this is an instance
of human settlement.
-
So I clicked yes.
-
And just clicking yes
again went to that item,
-
and added property
of human settlement.
-
Now the point of
all these games is
-
these are tools,
written by programmers,
-
making kind of semi educated
guesses about these fairly
-
basic properties.
-
And they are meant to
semi automate, to assist,
-
in the accumulation of all
these important pieces of data.
-
Now every single
click here helps
-
Wikidata give better
results, richer results
-
in future queries.
-
Again, as of right now
Wikidata can include Camigji
-
if I ask it, you know, what
are some towns in Congo?
-
Until now it could not.
-
Because it literally
didn't know.
-
So every time we click male,
female, person, not a person,
-
make these decisions,
we help improve Wikidata
-
and enrich the results
that we could receive.
-
Any questions about this, about
kind of micro contributions
-
through the Wikidata game?
-
If that looks
appealing I encourage
-
you to go and visit
the Wikidata game
-
and start contributing
in that way.
-
There is a question here.
-
If I make an article about
Circus Bulgaria how should
-
I correctly connect them?
-
That is an excellent question.
-
So once-- so now there is a
Wikidata item about that book,
-
but there is no Wikipedia
article anywhere.
-
Now suppose I write one
in, Bulgarian maybe,
-
you go to Wikidata.
-
You find the item by searching.
-
You find the item, and then
the empty site links section
-
right at the bottom there--
-
where are we?
-
We have this?
-
Circus Bulgaria.
-
Let's demonstrate this.
-
So here is the item
about the book.
-
Let's say that now
there is an article
-
because I just created it.
-
I can go here to the empty
Wikipedia link section,
-
click Edit, type the
name of the wiki,
-
let's say English, and then
type the name of the page
-
that I just created.
-
Circus-- right?
-
And again, it offers
me auto-complete
-
for my convenience.
-
Now we don't actually
have the article created,
-
but I could let's just
say this was the article.
-
I can just click this,
hit Save, and that
-
would associate the
new Wikipedia article
-
with this Wikidata item.
-
That is the beginning of the
inter-wiki list for this item.
-
I will not click
Save Now, because we
-
didn't have the article yet.
-
So I hope that
answers that question.
-
Was there another question
that I missed here?
-
No.
-
OK.
-
Any questions about
the Wikidata game?
-
About this idea of
micro contributions?
-
If not then we can move
on to embedding data,
-
and after that we
can discuss queries,
-
how to get at all this
data from Wikidata.
-
So the short version of how
to embed data from Wikidata
-
is that there is this
little magic incantation.
-
Curly brace, curly brace,
hash mark, property.
-
It looks like a template, but
it isn't because of that hash.
-
And that is magic.
-
Take a look at this little
demo that I prepared.
-
This page, which is off
my user page on meta,
-
but it could be on any wiki.
-
OK.
-
Says, since San Francisco
is item Q62 in Wikidata,
-
and since population is
property P1082, I can tell you
-
that according to Wikidata the
population of San Francisco
-
is this.
-
And this bolded number here was
produced with this incantation.
-
Curly brace, curly brace,
hash mark, property P1082,
-
that's population,
type from what item?
-
Right?
-
Cause I'm pulling
an arbitrary number.
-
I could put any
property in any item
-
here, and kind of include
it, embedded, into my text.
-
This isn't even about-- you
notice this is my user page.
-
This isn't even the article
about San Francisco.
-
I just want to pull that
number into this thing
-
that I'm writing.
-
So it's fairly simple.
-
I identify the property.
-
I identify the item
to take it from.
-
And Wikidata will,
I mean Wikipedia,
-
or the wiki I'm on, in this
case meta, will go to Wikipedia
-
and fetch it for me.
-
Likewise, since Denny Vrandecic,
the designer of Wikidata
-
is item 18618629, right?
-
I mean, he's a notable person,
so he has a Wikidata entity.
-
And since occupation is property
106, and date of birth is 569,
-
and place of birth
is 19, because
-
of all that I can tell you
that Vrandecic was born
-
in Stuttgart, on this date,
and is researcher, programmer,
-
and computer scientist.
-
If you look at the source for
this page, click Edit Source,
-
you can see that the word
Stuttgart does not appear here,
-
because it came from Wikidata.
-
I did not write this into
my little demo page here.
-
See?
-
Place of birth is--
-
where is it?
-
Here.
-
Born in property 19 from
queue number so-and-so.
-
That is how easy
it is to pull stuff
-
into a wiki from Wikidata.
-
OK now there's
some nuance to it.
-
And there's there are
some additional parameters
-
you can give.
-
And you can ask
Wikidata to give you
-
not just the text of the values,
but actually make it links.
-
So, for example, if I change
this from property to values--
-
No, that did not work at all.
-
Wasn't it values?
-
What was it?
-
Values and then--
-
Oh, statements.
-
My bad, sorry.
-
The Magic word is statements.
-
Statements.
-
So going back here.
-
If I change the word property
to the word statements
-
here then this same value--
-
that did not work at all.
-
Oh, because I'm on meta.
-
So because I'm on
meta, meta doesn't
-
have an article named
researcher, programmer,
-
or computer scientist.
-
But Wikipedia does.
-
If I included this same
syntax in Wikipedia,
-
like English Wikipedia,
for example--
-
So let's go there right now.
-
And go-- go to my--
-
Go to my sandbox.
-
If I just brutally paste
this on my sandbox here--
-
So, see, these became links.
-
Because Wikipedia has an article
called programmer and computer
-
scientist.
-
So, like I said, there's
some additional nuance
-
to the embedding.
-
The important thing
is that this is
-
the key to delivering on that
first problem that I mentioned.
-
How to get data from
a central location
-
onto your wiki in your language.
-
Basically using property and
statements magic incantations.
-
And of course,
usually, this would be
-
in the context of an info box.
-
Some wikis-- English Wikipedia
is not leading the way there.
-
Some smaller wikis
are more advanced
-
actually in integrating
Wikidata embeddings like this
-
into their info boxes.
-
So that instead of
the info box just
-
being a template on the wiki
with field equals value,
-
field equals value.
-
That template of the
info box on the wiki
-
pulls the values, the birthdate,
the languages, et cetera,
-
pulls them from Wikidata.
-
So basically just-- I just
demonstrated single calls
-
to this, but of course
an info box template
-
would include maybe
20 or 40 such embeds,
-
and that is not a problem.
-
Of course, before you go and
edit the English Wikipedia's
-
info box person and replace
it all with Wikidata embeds,
-
you should discuss it with the
English Wikipedia community.
-
These discussions have
already been taking place.
-
There are some
concerns about how
-
to patrol this, how to keep
it newbie friendly, et cetera.
-
So there are legitimate concerns
with just moving everything
-
to be embedded from Wikidata.
-
But the communities are
gradually handling this.
-
I mean this ability to embed
from Wikidata is not very old.
-
It's been around
for about a year.
-
So communities are
still working on kind
-
of integrating that technology.
-
But that is that is kind
of just the basics of how
-
to pull data, individual bits
of data, that's not querying,
-
that's not asking those sweeping
questions that I was talking
-
about yet.
-
We'll get to that
right now this is
-
how to pull a specific datum,
a specific piece of data,
-
from Wikidata.
-
OK.
-
So here's another quick
thing to demonstrate
-
before we go to
queries, and that
-
is the article placeholder.
-
The article placeholder
is a feature
-
that is being tested on the
Esperanto Wikipedia, and maybe
-
another wiki, I don't remember.
-
And it is using the
potential of Wikidata
-
to offer a placeholder
for an article.
-
An automatically generated
Wikidata powered replacement
-
placeholder for an article
for articles that don't yet
-
exist on Esperanto.
-
So let's go to the
Esperanto Wikipedia.
-
I don't speak Esperanto.
-
But let's look for Helen
Dewitt, our friend,
-
in Esperanto Wikipedia.
-
Now Esperanto is not
one of the Wikipedias
-
that have an article
about Helen Dewitt.
-
And so it tells me that, right?
-
There is no Helen Dewitt.
-
Maybe you were looking
for Helena Dewitt.
-
No, I was not.
-
You can start an article
about Helen Dewitt.
-
You can search.
-
You know, there's
all this stuff.
-
But there is also this
little option here, hiding,
-
which tells me that the
Esperanto Wikipedia is--
-
what's happening here?
-
Yes.
-
The Esperanto Wikipedia is
ready to give me this page.
-
This page, as you can see, it's
on the Esperanto Wikipedia,
-
but it's not an article.
-
See, it's a special page.
-
It's machine generated.
-
You can see the URL as well.
-
It's not, you know,
slash Helen Dewitt.
-
It's slash specialio,
about topic,
-
and then the Wikidata
ID of Helen Dewitt.
-
And what I get here--
-
I get an English
description, by the way,
-
because there is no
Esperanto description.
-
Wikidata can't make it up.
-
But what it can do is
offer me these pieces
-
of data in my language,
in this case Esperanto.
-
I'm on the Esperanto Wikipedia.
-
OK.
-
So it tells me that she's
American, for example,
-
and it tells me
that in Esperanto.
-
OK and it tells me
that she speaks Latin.
-
Remember we taught
Wikidata that?
-
It tells me that she
was educated in Oxford,
-
you know, and gives me the
references to the extent
-
that they exist.
-
I mean this is not an article.
-
It's not, you know, paragraphs
of fluent Esperanto text.
-
But it is information
that I can understand
-
if I speak this language.
-
And it's better than nothing.
-
And remember Helen Dewitt was
not a very detailed article.
-
If I were to ask about, I
don't know, some politician,
-
or popular singer that
has more data in Wikidata,
-
than this machine generated
thing would have been richer.
-
So this feature is available
and is under beta testing
-
right now, but generally if
this sounds interesting for you
-
especially if you come
from a smaller wiki that
-
is missing a lot of articles
that people may want to learn
-
about, you can contact
the Wikimedia foundation
-
and ask for article placeholder
to be enabled on your wiki.
-
And again, this
is a placeholder.
-
Of course, it exists only
until someone actually
-
writes a proper Esperanto
article about Helen Dewitt.
-
So I hope this is clear.
-
This is all coming from
Wikidata on the fly.
-
In real time.
-
As you can see it includes my
latest edits to Helen Dewitt.
-
OK.
-
Questions about the-- questions
about the article placeholder?
-
If there are try and
put them on the channel.
-
And this brings us to one of
the main courses of this talk,
-
which is querying Wikidata.
-
So I've explained
how Wikidata works.
-
We've walked through it.
-
We've added to it.
-
We've created a new item.
-
We learned how to contribute
during our commutes.
-
And all this was you
kept promising us,
-
Asaf, that this would be--
-
this would enable
these amazing queries.
-
So time to make good on that.
-
The URL you need to remember
is query.wikidata.org.
-
And that will take you
to a query system that
-
uses a language called SPARQL.
-
SPARQL, spelt with
a Q. This language
-
is not a Wikimedia creation.
-
It's a standardized language
used for querying linked data
-
sources.
-
And because of that
there are there
-
are certain usability prices
that we pay for using SPARQL,
-
for using a standard language.
-
It's not completely custom
made for querying Wikidata,
-
and we'll see that
in just a moment.
-
The principle to
remember about Wikidata
-
query is that Wikidata will
tell you everything it knows,
-
but no more.
-
I have anticipated this
several times already, right?
-
Until this moment when
we taught Wikidata data
-
that Helen Dewitt
speaks Latin, she
-
would not have appeared
in query results
-
asking who are American
writers who speak Latin?
-
She would not have appeared.
-
But as of this
afternoon, she will
-
appear because I've added
that piece of information.
-
So a result of that principle
is that you can never say,
-
well I ran a Wikidata
query and this
-
is the list of Flemish painters
who are sons of painters.
-
The list.
-
That these are all
the Flemish painters
-
who are sons of painters.
-
That is never something you can
say based on a Wikidata query,
-
because of course, maybe
not all the Flemish painters
-
who are sons of painters have
been expressed in Wikidata data
-
yet.
-
Wikidata doesn't know
about some of them,
-
or maybe it knows
about all of them
-
but doesn't know
the important fact
-
that this person is
the son of that person,
-
because those properties
have not been added.
-
And so they cannot be
included in the results.
-
So the results of
a Wikidata query
-
are never the definitive sets.
-
What you can say about
a Wikidata query is here
-
are some Flemish painters
who are sons of painters.
-
Here are some cities
with female mayors.
-
Whatever it is
you're querying about
-
is never guaranteed
to be complete
-
because Wikidata,
like Wikipedia, is
-
a work in progress.
-
And of course, the more
we teach Wikidata the
-
more useful it becomes.
-
OK so lets go and
see those queries.
-
So this is query.wikidata.org.
-
It's not the wiki.
-
All right?
-
So this isn't like some
page on the wiki itself.
-
This is kind of an
external system.
-
So it's not a wiki.
-
You can see I don't
have a user page here.
-
I don't have a history tab.
-
This isn't a wiki page.
-
This is a special kind
of tool or system.
-
And it invites me to
input a SPARQL query.
-
Now most of us do
not speak SPARQL.
-
It's a a technical language.
-
It's a query language.
-
Some of you may be thinking
about SQL, the database query
-
language.
-
SPARQL is named with kind
of a wink, or a nod, to SQL.
-
But, I warn you, if
you are comfortable in
-
SQL don't expect to carry
over your knowledge of SQL
-
into SPARQL.
-
They're not the same.
-
They are superficially similar.
-
Right?
-
So they both use
the keyword select,
-
and they use the word where,
and they use things like limit,
-
and order.
-
So again, if you know
this already from SQL
-
those mean roughly
the same things,
-
but don't expect it to
behave just like SQL.
-
You do need to spend some time
understanding how SPARQL works.
-
So, by all means, I
invite you to go and read
-
one of the many fine
SPARQL tutorials that
-
are out there on the web, or
to click the Help button here,
-
which also includes
help about SPARQL.
-
But I also know
that most of us when
-
we want to do some advanced
formatting on wiki,
-
for example, we don't go
and read the help page
-
on templates, right?
-
We go to a page that already
does what we want to do,
-
and adopt and adapt the code
from that other page, right?
-
So we just take something that
does roughly what we want,
-
and just copy it over and
change what we need to change.
-
That is a very pragmatic
and reasonable way
-
to do things which is why--
-
and the wiki data
engineers know this,
-
which is why they prepared
this very handy button for us
-
called examples.
-
We click the examples button.
-
And, oh my god, there is a ton
of-- well there's 312 example
-
queries for us to choose from.
-
And we can just
pick something that
-
is roughly like what
we're trying to find out,
-
and then just change
what needs changing.
-
So let's take a very simple one.
-
The cats query.
-
Maybe one of the simplest
you could possibly have.
-
And let's run it first
and then I'll kind of
-
walk you through it.
-
The goal here is not
to teach you SPARQL,
-
but to get you to be kind
of literate in SPARQL.
-
To kind of understand why
this does what it does.
-
So let's run this query first.
-
We click Run and here I
have results at the bottom.
-
The item, which is
just a Wikidata item,
-
which of course is a number.
-
Remember, wiki data thinks
of items as queue numbers.
-
And the label,
because we're humans
-
and we prefer words to numbers.
-
So these 114 results
are all the cats
-
that wiki data knows about.
-
Is this all the
cats in the world?
-
No of course not, remember?
-
It's all the cats Wikidata
knows about, which
-
means they're somehow notable.
-
I mean someone bothered to
describe them on Wikidata.
-
And Wikidata was told this
item is an instance of cat.
-
Right?
-
So these are those cats.
-
And we can click any of them.
-
I don't know,
Pixel, for example.
-
Click the Wikipedia item.
-
And here is the Wikidata
item about Pixel
-
with the queue number.
-
And he is a tortoiseshell cat.
-
And as you can see
instance of cat.
-
OK.
-
And he is five inches high.
-
And he is apparently documented
in Indonesian, In Bahasa.
-
Right here this is Pixel.
-
And he is apparently somehow
related to the Guinness World
-
Records book.
-
I don't speak Bahasa, so
I don't know exactly why
-
this cat is so notable.
-
But, of course, cats
can become notable
-
for all kinds of reasons.
-
Maybe they're a
YouTube sensation,
-
you know, maybe
they were involved
-
in some historical event.
-
I like this cat named Gladstone.
-
This cat named Gladstone is--
-
he has position
held Chief Mouser
-
to Her Majesty's Treasury.
-
This is an official
cat with a job.
-
And he has been holding this
job, mind you, since the 28th
-
of June this past year.
-
That's the start time.
-
And there is no end time
which means he currently
-
holds the position
of Chief Mouser
-
to her Majesty's Treasury.
-
His employer is Her
Majesty's Treasury.
-
He's a male creature.
-
And Wikidata knows
that this cat is
-
named after William Gladstone,
the Victorian prime minister.
-
Of course if I don't
know who this person is
-
I can click through
and learn that he
-
was a liberal politician
and prime minister, right?
-
He even has a Twitter account.
-
And Wikidata sends
me right to it.
-
The treasury cat
Twitter account.
-
And he has articles in
German, and English,
-
and of course Japanese,
because he's a cat.
-
All right.
-
So this was a very simple query.
-
Let's find out why it works.
-
OK.
-
So what did we actually
tell Wikidata to do for us?
-
We said, please select
some items for us
-
along with their labels.
-
OK?
-
Along with their
human readable labels
-
because if I remove this
label what I get is, see,
-
just a list of item numbers.
-
That's not as fun.
-
So that's what this
little bit did.
-
I just said, give me the
items, but also they're
-
human readable label.
-
And I want you to
select a bunch of items,
-
but not just any
random bunch of items,
-
I want to select items where
a certain condition holds.
-
What is the condition?
-
The condition is that the
item that I want you to select
-
needs to have property
31 with a value of Q146.
-
Well, that's helpful.
-
If I hover over these numbers--
-
Again, I get the human
readable version.
-
So I'm looking for
items that have property
-
instance of with the value cat.
-
Right?
-
Because that's literally
what I want, right?
-
I want all the items that have
a property, a statement, that
-
says instance of cat.
-
That's the condition.
-
I'm not interested in items
that are instance of book,
-
or instance of human.
-
I'm interested in
instance of cat.
-
That is the only condition
here in this query.
-
This complicated line I ask
you to basically ignore.
-
This is one of those
sacrifices that we
-
make for using a standard
language like SPARQL.
-
But the role of this
complicated line
-
is to basically
ensure that we get
-
the English label for that cat.
-
OK?
-
So don't worry about that.
-
Just leave it there.
-
And we run the query
and we get the list
-
of cats with their English
labels, and that is awesome.
-
By the way, if I change EN,
without really understanding
-
this line, if I change
EN to HE, for Hebrew,
-
I get the same results
with a Hebrew label.
-
Of course, these cats,
nobody bothered to give them
-
Hebrew labels unfortunately.
-
So I get the queue number.
-
But if I changed
it to Japanese, JA,
-
I would get still a bunch of
queue numbers for where there
-
isn't a Japanese label,
but I would get the labels
-
in Japanese.
-
OK?
-
So this is an example
of how you don't even
-
need to understand all
the syntax of this query
-
to adapt it to your needs.
-
If you want this
query as is, but you
-
want the labels in
Japanese, you can just
-
change the language code here.
-
OK so that is all
this query does.
-
Again, just give
me the items that
-
have property 31, instance of,
with a value 146, which is cat.
-
Let's take a question just
about this very simple query
-
before we advance to
more complicated queries.
-
Any questions just about this?
-
Like, did anyone kind of
really lose me talking
-
about this simple query?
-
Again, this query just tells
Wikidata, get me all the items
-
that somewhere among
their statements
-
have instance of cat.
-
That's the only condition.
-
No questions.
-
OK, feel free to ask if
you'd come up with one.
-
So let's complicate
things a little.
-
Let's ask only for male cats.
-
OK.
-
Remember this cat
Gladstone is male,
-
and we know this because
he has a property called
-
sex or gender, and the value
is male creature, right?
-
So let's add another
condition right here
-
under the first condition.
-
OK?
-
This is a new line.
-
And I'm adding a new
condition to the query.
-
I'm saying, not only do I
want this item that you return
-
to be instance of cat, I
also want this same item
-
to have another property,
the property sex or gender.
-
Right?
-
And I need to refer to
the property by number.
-
But don't worry,
Wikidata will help you.
-
So you start with this
prefix, Wikidata WDDT.
-
Again, just ignore
that prefix it's
-
one of the features of SPARQL
that we need to respect.
-
WDT colon, and then I can
just type control space
-
to do a search, to
do an auto complete.
-
So I can just type sex
and Wikidata helpfully
-
offers me a drop down
with relevant properties.
-
So I click property 21, which
is the sex or gender property.
-
And then I say, so I want
the sex or gender property
-
to have the Wikidata value.
-
Again, control space.
-
And I can just
say male creature.
-
See?
-
There's a different item
for male, as inhuman,
-
and a different one for
male creature, for reasons
-
that we won't go into.
-
Let's pick male
creature, because we're
-
talking about cats here.
-
All right.
-
And add a period here at
the end and click Run.
-
And instead of 114 cats, we get,
this time, we got 43 results.
-
Including our friend Gladstone
who is a male creature cat.
-
So that means all the
rest are female, right?
-
Wrong.
-
Wrong.
-
That does not mean that at all.
-
What it means is of
the 114 items that
-
have instance of cat,
only 43 have explicitly
-
sex male creature.
-
The rest of them do not.
-
Maybe because they have
sex female creature,
-
but maybe because they don't
have that property at all.
-
I'm emphasizing
this to kind of help
-
you train yourself to
correctly interpret
-
the results of
queries from Wikidata.
-
Don't jump into this kind
of simplistic conclusion,
-
OK there's 114 total, 43 male,
therefore the rest are female.
-
That is not correct.
-
OK?
-
But 43 of those explicitly
had another statement, sex
-
or gender, male creature.
-
So I just added
another condition,
-
and now my query is
asking two separate things
-
about the results.
-
They need to be a cat
and a male creature.
-
AUDIENCE: Maybe we
should see how many
-
cats have Twitter accounts.
-
But there is a
question from YouTube,
-
which is will you talk about
the export possibilities
-
of the result of the query?
-
ASAF BARTOV: Absolutely.
-
Absolutely I will in
just a little bit.
-
I mean there is, in
addition to just getting
-
this kind of table, I can get
these results in other formats.
-
And I can also
download these results.
-
I can click the Download
button and get them
-
as a comma separated
file, tab separated
-
file, a JSON file, which is
useful for programmatic uses.
-
I can also get a link.
-
So I can get a
link to this query.
-
I mean, I spent all this time
designing this beautiful query.
-
I can get a short URL that was
generated especially for me
-
right now with a tiny URL.
-
I can just paste this
into Twitter and go,
-
hey people look at all the male
cats that Wikidata knows about.
-
OK, this is not a
very exciting query.
-
But once I get to a really
complicated exciting query
-
I can totally share that
very easily through this.
-
And we will get to more
interesting queries
-
in just a second.
-
Any questions on this kind
of basic querying so far?
-
OK.
-
So that was a very
simple example.
-
Let's spend a moment exploring.
-
So this cat Gladstone was
named after this dude, William
-
Gladstone, who was an
important British politician.
-
I'm sure he's not the
only thing out there
-
in the universe that's named
after Gladstone, right?
-
I mean there has got
to be, I don't know,
-
park benches,
planets, asteroids,
-
something other than the
cat, named after this guy.
-
So we can ask Wikidata
to tell us all the things
-
that, you know, without
saying instance of something.
-
Like, I don't know, anything
named after William Gladstone.
-
So how do I do that?
-
Same principle.
-
Instead of asking about the
property instance of, property
-
31, instead of that, I
will ask about the property
-
named after--
-
sorry, named after--
-
I don't need to
remember the number.
-
I have auto-complete.
-
Named after is property 138.
-
And I want anything
at all that is
-
named after this person,
William Gladstone.
-
Here we go.
-
Which is 160852.
-
Whatever.
-
OK.
-
You notice I removed
instance of cat.
-
I remove the male creature.
-
I'm only asking,
get me all the items
-
that are somehow named after
that particular politician.
-
And I run the query,
and it turns out
-
the Wikidata knows
about three such things.
-
Does that mean that's
the only-- these
-
are the only three things
named after him in the world?
-
Of course not.
-
But these are the only three
items that are in Wikidata
-
and explicitly have the
property named after Gladstone.
-
For all I know, there
may be a village
-
in England called Gladstone
named after this person.
-
But if nobody added the
property, named after, linking
-
to the person, he wouldn't show
up in the results to my query.
-
So Wikidata knows about
three such things.
-
One of them is something
called the Gladstone Professor
-
of Government.
-
I can click through and see
that it's a chair at Oxford
-
University, right?
-
So it's a position.
-
And another is the William
Gladstone school number 18.
-
William Gladstone
school number 18.
-
Where is that?
-
That is in Sofia, Bulgaria.
-
Again.
-
All right, so that's a
particular school in Bulgaria
-
named after William Gladstone.
-
And finally, the third
result is, of course, our pal
-
Gladstone the Cheif Mouser.
-
If I click through,
that's the cat.
-
All right, so that
was an example.
-
I mean, you saw how easy it was.
-
I just named the property and
the value that I care about,
-
and I get the results.
-
Again, I mean, it's
kind of a silly example,
-
but think about it.
-
This is-- how else can
you answer that question?
-
There's no reference desk,
even at a great University
-
of Oxford, where you can
walk in and say, give me
-
a list of things
named after Gladstone.
-
There's no easy way to
answer that unless you happen
-
to have a very large
structured and linked
-
data store, like Wikidata.
-
All right, so that
was a silly example.
-
Let's take some--
-
AUDIENCE: There's a
bunch of stuff on there.
-
ASAF: Oh, OK.
-
AUDIENCE: Can you show
easy query on the video?
-
And somebody needs to know
how to just do property
-
exists without giving
a specific value.
-
And then once you show easy
query you reload the page and--
-
ASAF: I don't know easy query.
-
So is that a gadget?
-
I don't know what easy query is.
-
I don't use it.
-
So someone can maybe
send a link or something?
-
Oh it is a gadget.
-
I don't have it enabled.
-
That is nice.
-
So now, what I just did by hand,
by formulating the query named
-
after Gladstone--
-
I guess this is the--
-
Is it?
-
Yeah.
-
So this-- I just
clicked the three--
-
the ellipsis here.
-
Right after the name.
-
You see this?
-
This was just added by
enabling easy query,
-
which I just learned about.
-
So you just click this
and it auto-magically
-
made this kind of trivial query.
-
Of course, if I want a more
complicated query like,
-
I don't know, give me
all the things that
-
are named after Lincoln
but are a school,
-
I will still need to kind
of edit a custom query.
-
But this is a super
easy and very nice
-
way of just doing a very super
quick query for exactly this.
-
Right?
-
Like. what other items have
exactly this property and value
-
named after William Gladstone?
-
So, thank you to whoever
made this suggestion
-
to demonstrate that, and
I'm glad I learned something
-
too today.
-
Let's move to
another sample query.
-
Here's a fun example.
-
Popular surnames among
fictional characters.
-
Think about that for a second.
-
Popular surnames among
fictional characters.
-
So we're asking Wikidata
to go through all
-
the fictional
characters you know,
-
and of those look through
their surnames, group
-
them so that you can count
them, the repetitions
-
of the surnames,
and give me the most
-
popular surnames among them.
-
Additionally, I want you to
awesomely present the results
-
as a bubble chart.
-
Oh, yeah.
-
Wikidata can do that.
-
And I run the query.
-
And check it out.
-
The most popular names
among fictional characters
-
we can say that knows about are
Joan, Smith, Taylor, et cetera.
-
I mean for all we know,
the most popular name
-
among fictional characters
actually in the world
-
may be Wu.
-
Or something in Chinese
for all we know.
-
But if that has not been
modeled in Wikidata,
-
we're not going to get that.
-
So Taylor, Smith,
Jones, Williams,
-
seem to be the
most popular names.
-
And again, I could limit this.
-
I could make the
same query but add,
-
only among works whose
original language
-
was Italian, for example, to get
more interesting results if I
-
only care about
Italian literature.
-
But this is an example of
how I got awesome bubble
-
charts for free, and
I can just plug this
-
into an awesome
presentation that I make.
-
Of course I can still
look at the raw table.
-
So the query still resulted
in a bunch of data, right?
-
So Smith repeats 41 times,
Jones 38 times, Taylor 34 times,
-
et cetera, et cetera.
-
And down that list.
-
And I could, again, I could
export this into a file
-
and load it up in a spreadsheet,
and do additional processing
-
on it.
-
I can link to it.
-
I can do all kinds of
awesome things with it.
-
So that's another awesome query.
-
We don't have to go into
every line by line analysis
-
here of why this
works the way it does.
-
I want to show you some
other queries first.
-
Let's look at-- this is just
fun, overall causes of death.
-
Again a bubble
chart just looking
-
at people who died
of things, and have
-
a cause of death listed.
-
And we learn that the most
commonly listed cause of death
-
is myocardial infarction,
pneumonitis, cerebral vascular,
-
lung cancer, et
cetera, et cetera.
-
And again, in a bubble chart.
-
And so how does that work?
-
So just very briefly, the
important parts of this query
-
are I'm looking for something,
for some person, who
-
is instance of 31, instance
of Q5, which is human.
-
So a human.
-
Again, just to kind
of limit the query.
-
I'm not interested in
books or mountains.
-
I'm looking for humans
who have that same person,
-
that same variable PID,
should have a 509, meaning--
-
Hello.
-
Why don't I have the--
-
Yeah.
-
A 509, which is cause of death.
-
And that cause of death
is another variable,
-
that I'm calling CID.
-
Now, previously
we were saying you
-
know I want things
that are named
-
after Gladstone specifically.
-
Only things that have
that particular value.
-
Here I'm saying I'm
looking for things
-
that have some cause of death.
-
Not a specific one.
-
I just wanted to
get everything that
-
has a statement with some
value about property 509
-
cause of death.
-
OK?
-
And then this other bit of
magic here, the group by,
-
tells Wikidata I'm not
actually interested
-
in every individual thing.
-
I want you to group those
causes, and then count them
-
and give me the top ones.
-
So that's how this query works.
-
Here's that query I promised.
-
Painters whose fathers
were also painters.
-
I can only think of a couple.
-
I mean, Monet and Vogel.
-
But I'm sure Wikidata
knows many more.
-
So let's run this query.
-
And I have 100 results.
-
By the way, I have limited
it to 100 results just
-
to keep it kind of snappy.
-
But actually, we could
maybe try removing the limit
-
and see if Wikidata
could tell us
-
the total number in Wikidata.
-
Yeah, that wasn't too bad.
-
So 1,270 results.
-
OK.
-
Wikidata, already at this
early date and it's progress,
-
already knows about
more than 1,200 painters
-
who are sons of painters.
-
Sons of male painters, like
their father is a painter.
-
There may be
additional painters who
-
are sons of female painters
not included in this query.
-
Again, always remember what
exactly you are asking.
-
In this query I was
asking about the father.
-
I'm leaving out any
possible painters who
-
are sons of mother painters.
-
OK?
-
So how does this work?
-
I'm asking for the painter
along with the human label,
-
and the father along
with the human label.
-
So Michel Monet is the
son of Claude Monet.
-
And Domenico Tintoretto is the
son of the famous Tintoretto
-
whose label, you know, is just
Tintoretto like Michelangelo.
-
You know, you don't always
have to have the full name
-
in the common label.
-
Paloma Picasso is the
daughter of Pablo Picasso.
-
OK.
-
So Wikidata knows about
all these results.
-
Of course Holbein the Younger
son of Holbein the Elder.
-
And how did we get there?
-
Well we asked Wikidata
to look for something,
-
let's call it painter, which
has 106, which is occupation,
-
with a value painter.
-
Right?
-
This unwieldy number
1028181, that's painter.
-
So I'm asking for any item
that has occupation painter.
-
And let's call
that item painter.
-
I also want that painter to have
a property 22, which is father.
-
OK.
-
Father.
-
And I want it to
have some value.
-
OK, I'm putting it into
another variable called father.
-
I could have called
it, you know, frog.
-
That doesn't change
anything, just to be clear.
-
What matters is that this
is the property father.
-
I could have called
it anything I want.
-
So, and then, I have
a third condition.
-
That the father, like whatever
it says here in property 22,
-
I want that father to have
himself a property 106
-
occupation with a value painter.
-
OK?
-
These conditions
combined to give me
-
a list of people who have
a father and that father
-
has occupation painter as well.
-
Of course, if I suddenly,
or if you suddenly,
-
are consumed by
curiosity to know
-
who are some politicians
who are sons of carpenters?
-
You could just
change that, right?
-
Change the first value
from painter to politician.
-
Change the third line's value
from painter to carpenter.
-
Maybe that list
will be very short
-
because carpenters don't
tend to be notable,
-
so they wouldn't be
represented on Wikidata.
-
That's why this works relatively
well with painters, right?
-
Because most of
them are notable.
-
But generally you
could do that, right?
-
That's an example of
how you can take a query
-
and just replace one of those
values, or even the language.
-
So again, I could ask
for these same painters.
-
It's limited again.
-
These same painters,
but with Arabic labels.
-
Same query, but I have Arabic
labels for these painters.
-
And of course where
there is no Arabic label
-
I get the queue number.
-
OK?
-
So that's that query
that I promised you,
-
painters who sons of painters
can be done by Wikidata
-
in under one second.
-
How awesome is that?
-
We can also get some statistics.
-
So how about counting
total articles
-
in a given wiki by gender.
-
This is what we call
the content gender
-
gap, as distinct from the
participation gender gap.
-
This is the gender gap in
what we cover on Wikipedia.
-
So let's take one of these.
-
So this is a query.
-
Articles about women in
some given Wikipedia.
-
All right.
-
So let's take--
-
I don't know.
-
Let's take the Tamil Wikipedia.
-
That's language code TA.
-
So I just put TA here.
-
And I click Run, and
I get this count.
-
That's all I wanted.
-
I'm not actually
interested in the items,
-
like in the list of women
on the Tamil Wikipedia.
-
I just want the number.
-
So I selected the count here.
-
And this number
turns out to be 2159.
-
So there are 2000
articles about women
-
the Tamil Wikipedia that
Wikidata knows to be female.
-
Right?
-
I'm asking about the gender
field, property 21 again.
-
Remember, if there's some
article about a woman in Tamil
-
Wikipedia, but wiki
data doesn't have
-
a statement about the
gender, that person
-
will not be counted here.
-
So again, be careful
about kind of stating
-
that is exactly the number
of women articles on Tamil
-
Wikipedia.
-
That's probably not true.
-
I'm sure some of those
articles are missing
-
a sex or gender or property.
-
But for raw statistics,
that's probably good,
-
because some men are also
missing the sex or gender
-
statistic property.
-
So we could take the
same query for men.
-
It's essentially the exact same.
-
It just has this unwieldy
number for males, 6581097.
-
I can change this language
code again to TA for Tamil.
-
And how many men are covered
on Tamil Wikipedia 14,649.
-
OK.
-
So women, 2,100, men,
about seven times as many.
-
Right?
-
So that's the approximate
size of the content gender
-
gap on Tamil Wikipedia.
-
And again, I can complicate
this query as much as I want.
-
For example, I can
try and find out
-
if this gender gap is wider
or narrower among musicians,
-
just as an example.
-
I could just add a line here
that says occupation musician,
-
and then I'm only
counting articles
-
on Tamil Wikipedia about
musicians who are female
-
versus articles
on Tamil Wikipedia
-
about musicians who are male.
-
And I can kind of
compare the gender--
-
the content gender gap across
occupations on Tamil Wikipedia.
-
Do you see the
important point here?
-
Is that this is not just
kind of a one purpose query.
-
I can just with a single
additional conditional suddenly
-
make it a much more interesting
query, because I break it down
-
by occupation.
-
Or I break it down by century.
-
Do we have more of the coverage
gap in 19th century people
-
than in 21st century people?
-
I mean, I sure hope so, right?
-
The patriarchy is
weakening somewhat.
-
So I wouldn't be surprised if
there are many more notable men
-
covered about the 19th century.
-
But if we are also covering--
-
I mean it's the
gender gap is just
-
as wide for 21st century
people, that would
-
be a little disappointing.
-
Again that's something I
can fairly easily find out
-
on Wikidata query.
-
Any questions so far, or
are you just sharing links?
-
AUDIENCE: Yep there is one.
-
So somebody is wondering if you
can demonstrate, or at least
-
give a short answer of the
latter of this question.
-
Is it possible using
in Wikidata SPARQL
-
to find specific
Wikidata articles, e.g.
-
featured articles, of a
certain language which do not
-
exist in another language.
-
I know it is possible
to find category based
-
results using a PET scan tool.
-
But can we specify
that by selecting e.g.
-
featured articles?
-
ASAF BARTOV: Yes.
-
Excellent question.
-
It is possible, indeed.
-
And I will demonstrate
one such query.
-
Another query that
I already mentioned
-
largest cities in the
world with a female mayor.
-
This query-- let's
close some of these tabs
-
before my browser chokes.
-
So this query lists
the major world cities
-
run by women currently.
-
And the answer is Mumbai, Mexico
City, Tokyo, bunch of others.
-
And wait-- that's not it at all.
-
I clicked the wrong one.
-
That's the map of paintings.
-
OK.
-
Let's demonstrate
that for a second.
-
So this is the map
of all paintings
-
for which we know a location
with the count per location.
-
And the results are
awesomely presented on a map.
-
OK.
-
Again, under the hood this is
a table, of course, of results.
-
But, awesomely, I can
browse it as a map.
-
So here is a map of the
world with all the paintings
-
that Wikidata knows about.
-
Not just knows
about the paintings,
-
but knows about their
location in a museum.
-
Not surprisingly
Europe is much better
-
covered than Russia or Africa.
-
There is a huge gap in
contribution to Wikidata
-
from these countries.
-
And some of it can be fixed.
-
And of course there is much more
documentation, and much more
-
art in Europe.
-
But if we zoom in, I
don't know, Rome probably
-
has a few paintings.
-
Right?
-
Hello.
-
Sorry.
-
It's-- Yes.
-
Vatican City sounds
like a good bet, right?
-
I can zoom in here.
-
And I can just click
one of these dots
-
and see in this point
there are two paintings.
-
And in this one there is one
and it's the Archbasilica
-
of St. John Lateran.
-
Let's see, this is the
actual St. Peter, right?
-
Sistine Chapel has 23 paintings.
-
What?
-
The Sistine Chapel has way
more than 23 paintings.
-
Correct, but 23 of them
are documented on Wikidata.
-
Have their own item
for the painting, not
-
the Sistine Chapel,
the painting has
-
an item that lists its
being in the Sistine Chapel.
-
There are 23 of those.
-
OK.
-
There is definitely
room to document
-
the rest of the artworks
in the Sistine Chapel.
-
So, again, this is just
not the kind of query
-
you were able to
make before Wikidata,
-
and it's a fairly simple
query, as you can see.
-
There are examples using
maps like airports within 100
-
kilometers of Berlin.
-
Again using the coordinates
as a useful data point.
-
And here is a map showing me
only airports within a 100
-
kilometer radius from Berlin.
-
But I wanted to show
you the mayors query.
-
Let's click the-- oh I just
have the wrong link here.
-
But I can still find it
here by typing mayor.
-
Here we go, largest
cities with female mayor.
-
So this is a slightly
more complicated query.
-
But if I run it, I get the top
10, because I set limit to 10.
-
I get the top 10
cities in the world,
-
by population, size that
are currently run by women.
-
Tokyo, Mumbai, Yokohama,
Caracas, et cetera.
-
And one interesting thing that
you may want to notice here
-
is that I'm asking for cities.
-
I mean items, that
are instance of city.
-
And that have a
head of government,
-
that have some
statement about who
-
is in charge, and that statement
has sex that's listed up here
-
as female.
-
Don't worry about
the syntax right now.
-
I just want to show you
some specific angle here.
-
And I'm further
filtering these results.
-
I only want those items where
there is not the property
-
and the qualifier, end time.
-
Why is that important?
-
Because if a city once
had a female mayor,
-
but that mayor is not the mayor
anymore, because mayors change,
-
I don't want them in this query.
-
I want to query of
cities currently having
-
a female mayor.
-
And of course Wikidata
may have historical data
-
with start and
end time, as we've
-
seen, that documents this
person was the mayor of Tokyo
-
or San Francisco
between these years.
-
But if there is no
end times that means
-
they are currently the mayor.
-
So that's an example of
asking about a qualifier
-
of a statement, to again, to get
the results we actually want.
-
If we want current mayors it's
important to put this filter.
-
If we don't, we will get
historical female mayors
-
as well.
-
All right.
-
So these are some
example queries.
-
Questions about that?
-
Oh, the featured
article example.
-
So let's look at that.
-
So I have prepared
such a query recently.
-
Here we go.
-
So this is a query.
-
I just saved it here
on my user page.
-
I mean, this is
not Wikidata query.
-
This is just a meta page
containing the query usefully.
-
And let's run this.
-
So this query, it's actually
not very complicated.
-
It's just has a long
list of countries,
-
because I'm asking
about African countries.
-
OK.
-
I'm looking for human
females from one
-
of these countries that
have an article in English.
-
That's what this line means.
-
But not in French.
-
That's what this part means.
-
OK.
-
This part, these
two lines together.
-
But not in French.
-
And this is what's
called a badge.
-
That's Wikidata's concept of
good and featured articles.
-
It's called a badge.
-
So I want them to have some
badge on English Wikipedia.
-
OK?
-
So again, this query is
asking for the top 100 women
-
from Africa who are documented
on English Wikipedia,
-
in a featured or
good article status.
-
But not on French Wikipedia.
-
So this is a query that's
a to-do query, right?
-
That's a query
for French editors
-
to consider what they might
usefully translate or create
-
in French.
-
And if we run this see
we have three results.
-
I mean, we have many
women from Africa
-
covered on English Wikipedia.
-
But only three articles
have featured or good status
-
among those that do not have
French Wikipedia coverage.
-
Let me rephrase that.
-
Among the English Wikipedia
articles about African women
-
that don't have a
French counterpart,
-
only three are featured or good.
-
OK?
-
Do you see this?
-
The badge is good article.
-
This little incantation
here is what allows
-
you to ask about the badge.
-
This here.
-
And, by the way, the slides
will be uploaded to commons.
-
And we will-- how shall we make
it available on the YouTube
-
thing as well?
-
No, no.
-
But, I mean, for people who
will later watch this video.
-
Oh yeah, we can add it to
the YouTube description
-
and the comments description.
-
So in the-- if you're
watching this video later,
-
in the description, we will
add a link to this query
-
specifically.
-
Because it's not in
the slides right now.
-
It will be.
-
OK.
-
So.
-
Questions so far?
-
We're almost done.
-
We have a few minutes left.
-
So questions about queries?
-
I mean, I'm sure
there's tons of things
-
you don't know how to do yet.
-
And you maybe you didn't really
get the sense for SPARQL.
-
It's something you need
to really do on your own
-
on your computer.
-
See how it works.
-
Fiddle with it.
-
Change something.
-
See that it breaks
and complains.
-
But, very importantly-- oh I
had this in the other questions
-
slide.
-
Remember Wikidata project chat.
-
That's kind of the Wikidata
equivalent of the village pump.
-
It's the page on Wikidata
where you can just
-
show up and ask a question.
-
In my experience, the
Wikidata community
-
is very nice, very
welcoming, and very eager
-
to help newer people integrate
and learn how to do things.
-
There's also an IRC channel.
-
If you know what IRC is and
how to use it, by all means,
-
go to IRC channel Wikidata.
-
There's people
there all the time,
-
and you can just ask a question.
-
If you're trying to do a
query, and you don't quite
-
understand the syntax, or you're
not sure how to get the result
-
you want.
-
There are people there who
will gladly help you do that.
-
There is also a
Wikidata newsletter
-
published by the Wikidata team,
which is centered in Germany
-
and Wikipedia Germany.
-
And they send out a newsletter
in English with Wikidata news.
-
You know, new
properties, new items,
-
new things in the project.
-
But also sample queries.
-
So once a week there is
kind of an awesome query
-
to learn from, if you want
to learn that way instead
-
of reading like a
whole manual on SPARQL.
-
So I'm just encouraging
you to get help
-
in one of those channels.
-
Of course you can write to me.
-
Just reach out to me and
ask me questions as well.
-
I hope by now you agree
that Wikidata is love,
-
and Wikidata data is awesome.
-
If there are no questions,
we do have a tiny bit of time
-
to demonstrate one
more tool but that's--
-
no?
-
No questions.
-
OK so let's talk about--
-
well, the resonator
is kind of nice,
-
but it's a little like
the article placeholder.
-
So this is not Wikidata
this is a tool again
-
built by Magnus Manske--
-
AUDIENCE: There's also one
final question to you in case--
-
ASAF BARTOV: Oh,
there is a question.
-
AUDIENCE: Yeah.
-
ASAF BARTOV: Which
advantages and disadvantages
-
to create an item
before an article is
-
done on English Wikipedia?
-
Well, I mean, this example
that I just made right.
-
I'm reading this book
by a notable author.
-
OK.
-
I want this to
exist on Wikidata,
-
and to be mentioned
on Wikidata, so
-
that when people look up
that author in Wikidata
-
they will know about one
of his notable works.
-
But I'm not prepared to
put in the time investment
-
to build a whole article
on English Wikipedia.
-
Either because I don't
have the time, or I
-
don't have good sources.
-
Or maybe my English
is not good enough,
-
but it is good enough to just
record these very basic facts
-
and point to the Library of
Congress records et cetera.
-
So that it's better
than nothing.
-
So that's one reason
to maybe do it.
-
Another reason is to
be able to link to it.
-
So remember that
translator lady already
-
had an item on Wikidata, but if
she hadn't we could have just
-
created a very, very basic
rudimentary item about her just
-
saying, you know,
this name is human.
-
Country, Bulgaria.
-
Occupation, translator.
-
Even just that would have
would have been something,
-
and would have enabled me
to link to this person.
-
So these are legitimate reasons
to create Wikidata entities
-
without, or at least before,
creating a Wikipedia article.
-
If you are going to create--
-
I mean if you're at and
edit-a-thon or something,
-
and you have come to
create Wikipedia articles,
-
by all means, first create
the Wikipedia article,
-
then create the Wikipedia
item and link to it.
-
I hope that answers
the question.
-
So the reasonator
is simply a kind
-
of prettier view of
items in Wikidata.
-
So you can just type the name
of an item or the number.
-
Let's pick just a
random number, 42.
-
Say 42.
-
Which happens to
be, maybe you've
-
heard of this guy,
Douglas Adams.
-
He happened to have received
the queue number 42.
-
I'm sure it's a
cosmic coincidence
-
of infinite improbability.
-
And this is a view--
-
this is a tool that
is not Wikidata.
-
It's a tool built on top of
Wikidata called resonator.
-
And it gives us the information
from Q42, that is from the--
-
this item in Wikidata, which
looks like an item in Wikidata.
-
But it gives it to us in a
slightly more rational kind
-
of lay out.
-
It even kind of
generates a little bit
-
of pseudo article text for us.
-
You know, Douglas Adams was
a British writer, playwright,
-
screenwriter,
bla-bla-bla, an author.
-
He was born on this date, in
this place, to these people.
-
He studied at this place
between these years.
-
That's all machine generated.
-
Nobody wrote this text.
-
That's all taken from those
statements in Wikidata,
-
and generates this reasonable
reading summary paragraph.
-
And then it gives us this
little table of relatives.
-
It's all taken from Wikidata.
-
But as you can see,
this is already
-
a little more accessible than
the essentially arbitrary
-
ordering of statements
on Wikidata.
-
And that's OK.
-
I mean, that's
kind of by design.
-
Wikidata is the platform.
-
There is going to
be-- there are going
-
to be many new applications,
and platforms, and tools,
-
and visual interfaces
on top of Wikidata
-
to browse Wikidata in a more
friendly or more customized
-
ways.
-
For example, one of the
things that resonator
-
does for us is give us pictures
and maps and a timeline.
-
Check it out this.
-
Time line machine generated,
just from dates and points
-
in time, mentioned in the
relatively rich Wikidata
-
item about Douglas Adams.
-
Right?
-
So this timeline, for example
again, completely machine
-
generated.
-
But he was educated
between these years,
-
so I can put it on the timeline.
-
And this is the year he was
nominated for a Hugo awards,
-
so I can put that in a timeline.
-
Et cetera.
-
So that's just a super
quick demonstration
-
of that tool, the resonator.
-
Links are all here
in the slides.
-
And the final tool I wanted
to mention very quickly
-
is the mix and match tool.
-
You remember my explanation
about Wikidata as Nexus,
-
as connection point between many
databases, many data sources.
-
Those depend on
these equivalencies.
-
On Wikidata being taught
that this item is like that
-
ID in this other database.
-
And mix and match is a tool
again by, Magnus Manske.
-
Maybe you're detecting
a pattern here.
-
It's a tool by Magnus
that is designed
-
to enable us to kind
of take a foreign,
-
an external data set, put
it alongside Wikidata,
-
and kind of try and align them.
-
So this item in this
external dataset,
-
is that already
covered in Wikidata?
-
If so, by what queue number?
-
By what item?
-
If not, maybe we need
to create a Wikidata
-
item to represent it.
-
Or maybe it's a
duplicate, or something.
-
So the mix and match tool has
a list of external data sets,
-
as you can see.
-
The Art and Architecture
Thesaurus by the Getty Research
-
Institute.
-
Or the Australian
Dictionary of Biography.
-
All kinds of external
data sets here.
-
Somewhere here I had a specific
link to the Royal Society.
-
It can also give
me some statistics.
-
So there is an external data set
of all the Fellows of the Royal
-
Society.
-
Right?
-
The oldest academic
learned society in England.
-
And the internet is tired.
-
Here we go.
-
Nope.
-
Did that work?
-
Fellows of the Royal
Society, here we go.
-
So this one is complete.
-
I mean, people have manually
gone over every single item
-
there and either
matched it to Wikidata
-
or declared that it was not
in scope, or a duplicate
-
or whatever.
-
But let's look at site stats.
-
This is a fun kind of
aspect of this tool.
-
But that is not working.
-
Or it's taking too long.
-
So let's just demonstrate
how this works.
-
Maybe Britannica?
-
Is that done already?
-
Here we go.
-
Encyclopedia Britannica.
-
Yeah.
-
So the Encyclopedia
Britannica has
-
40% of the items there
are not yet processed.
-
So let's process one of them.
-
For example there is an item
in the Encyclopedia Britannica
-
called Boston, England.
-
As you know
All-American place names
-
are totally stolen
from elsewhere.
-
So there is a Boston
in England, though it's
-
no longer the famous one.
-
And the mix and match
tool has automatically
-
matched it based on
the label to queue
-
100, which is Boston big
city in the United States.
-
And that is incorrect, right?
-
That's kind of naive computer
going, well this is Boston,
-
and this other thing
is also Boston.
-
And it is asking me to
confirm this match or not.
-
You see?
-
So this is the Boston,
England from Britannica.
-
And the tool is asking
me, is this the same as
-
Boston queue 100 in America?
-
The answer is no.
-
I removed this.
-
I remove this match.
-
And now this Boston,
England is unmatched.
-
And I can match it to the
correct one in England.
-
I can do this by searching
English Wikipedia,
-
or searching Wikidata.
-
I mean, it has
these handy links.
-
So the English town
is in Lincolnshire.
-
Boston, Lincolnshire.
-
So I can go there and then
get the Wikidata item number.
-
See this is not queue
100, Boston in the states,
-
this is queue 311975
town in Lincolnshire.
-
I can get this queue
number, go back to the mix
-
and match tool--
-
Where was that?
-
Here we are.
-
And set queue.
-
I can tell the tool that this is
the right Boston, and click OK.
-
And now this town
in Lincolnshire,
-
you can see this here,
this item, queue 311975,
-
is linked to Britannica.
-
What does this mean?
-
Well, if we go there.
-
If we actually go
to the Wikidata
-
entity you will see
that in addition
-
to the few statements that
it already had, it now has,
-
thanks to my clicking, it now
has another identifier here.
-
See?
-
Encyclopedia Britannica
Online ID, with this link.
-
And if we click it, we
will indeed reach this page
-
in the Britannica
online, which is indeed
-
about this town in Lincolnshire.
-
You see?
-
So I've contributed one
of those mappings, one
-
of those identifiers,
into Wikidata.
-
And I didn't have
to do it manually.
-
This tool kind of prompted
me to either confirm
-
if it was correct,
I could have just
-
clicked confirm since
it wasn't correct.
-
I corrected it manually, but
it made this edit on my behalf.
-
So that's another tool that
encourages us to systematically
-
teach Wikidata more things.
-
And we're out of time.
-
Go edit Wikidata, Now
that you have the power,
-
you know the deal.
-
Use it for good,
and not for evil.
-
If you have questions,
this is my email address.
-
If you're watching this video
not live the description
-
will have links to the
slides, and to a bunch
-
of other useful
pieces of information.
-
Any last questions on IRC?
-
If not, thank you
for your attention.
-
And if you like this, and if you
feel that you now get Wikidata,
-
and you get what it's
good for, and you're
-
inspired to contribute, I have
only one request from you.
-
I mean, in addition to using
it for good not for evil,
-
I ask that you spread the word.
-
Show this video--
share this video
-
with other people in your
community, or around you.
-
Teach this yourself
once you're comfortable
-
with these concepts.
-
Feel free to use my slides.
-
Yeah, and edit Wikidata.
-
Thank you very
much, and goodbye.