- 
Asaf Bartov: Testing, testing. 
- 
Is this heard in the room? 
- 
Testing. 
- 
Hello, everyone. 
- 
This is a gentle
 introduction to Wikidata
 
- 
for absolute beginners. 
- 
If you're an absolute
 beginner, if you've never heard
 
- 
of Wikidata, or if you've heard
 of Wikidata but don't quite get
 
- 
it, don't know what it's
 good for, have only used it
 
- 
for inter-wiki links-- 
- 
if you're anywhere
 on this range,
 
- 
you're in the right place. 
- 
My name is Asaf Bartov. 
- 
I work for the
 Wikimedia Foundation,
 
- 
and I am a Wikidata enthusiast. 
- 
So the first thing I want to
 say is that you are lucky.
 
- 
You are lucky because
 Wikidata is already
 
- 
and is quickly becoming even
 more of an important research
 
- 
tool for anyone who's
 trying to ask questions
 
- 
about large amounts
 of information.
 
- 
It will become more and more
 used across the humanities,
 
- 
in particular, because of the
 things that it's able to do,
 
- 
some of which we will
 demonstrate shortly.
 
- 
And you are lucky because you
 get to find out about it now
 
- 
before most of the world. 
- 
So by the end of this talk,
 you will be a Wikidata hipster
 
- 
because you'll be
 able to say, oh yeah.
 
- 
I knew about Wikidata
 before it was cool.
 
- 
So before we actually
 visit Wikidata,
 
- 
I want to share two key problems
 that Wikidata seeks to solve
 
- 
and which would help us
 understand why it exists.
 
- 
The first problem is that
 have of dated data, that
 
- 
is data that is out of date. 
- 
And this is apparent
 on Wikipedia
 
- 
across our free
 knowledge encyclopedias.
 
- 
Data on Wikipedia is
 not always up to date.
 
- 
And the more obscure
 it is, the more likely
 
- 
it is not to be up to date. 
- 
So the Polish Wikipedia may have
 an article about a small town
 
- 
in Argentina, and that article
 will include information
 
- 
about that town like population
 size, name of the mayor.
 
- 
And that information,
 ideally, was
 
- 
correct at the time the article
 was created on the Polish
 
- 
Wikipedia-- 
- 
maybe translated
 from another wiki.
 
- 
But then how likely is
 it to be kept up to date?
 
- 
How likely is it that the
 Polish Wikipedia would give us
 
- 
the correct and latest numbers
 or data about the population
 
- 
size of that town
 or the mayor, right?
 
- 
So this is the kind of data
 that does go out of date, right?
 
- 
Every few years--
 five, 10 years--
 
- 
there is a census, and now there
 are new population figures.
 
- 
Now the census in Argentina will
 be made available in Argentina
 
- 
in Spanish, probably,
 which brings us
 
- 
to another component of the
 problem of dated data, which
 
- 
is there are no obvious
 triggers for updating the data.
 
- 
So the Polish Wikipedian
 is not sent an email
 
- 
by the Argentinean
 government saying, hey,
 
- 
we have a new census. 
- 
There are new population numbers
 for you to update on Wikipedia.
 
- 
No such email is sent. 
- 
So it's kind of
 hard to notice when.
 
- 
And of course, multiply that by
 all the different jurisdictions
 
- 
around the world. 
- 
There's no easy
 way and notice when
 
- 
your data goes out of date. 
- 
So that's difficult
 to keep up to date.
 
- 
And even if we were to receive
 some kind of indication--
 
- 
oh, there's a new
 census in Argentina,
 
- 
so a whole bunch of
 population figures
 
- 
have now gone out of date. 
- 
Updating it on the
 Polish Wikipedia
 
- 
and the French Wikipedia
 and the Indonesian Wikipedia
 
- 
and the Arabic Wikipedia is a
 whole bunch of repetitive work
 
- 
that a lot of
 different volunteers
 
- 
will need to do just for
 that one updated piece
 
- 
of information about Argentina. 
- 
So I hope this is
 clear and resonates
 
- 
with some of your experience
 editing Wikipedia--
 
- 
data that is out of
 date or that needs
 
- 
to be updated
 manually, menially,
 
- 
on a fairly frequent schedule
 across the different countries
 
- 
and data sources. 
- 
The other-- and I think
 maybe more interesting--
 
- 
shortcoming or problem
 that I want to discuss
 
- 
is what I call the
 inflexible ways
 
- 
of lateral queries, crosscutting
 queries of knowledge.
 
- 
So if I want an answer to
 the question, what countries
 
- 
in the world export rubber-- 
- 
that's a reasonable
 question, right?
 
- 
That information
 is on Wikipedia.
 
- 
Do you agree? 
- 
If you go to
 Wikipedia and read up
 
- 
about Brazil, about Peru, about
 Germany, somewhere in there--
 
- 
maybe a sub-article called
 Economics of Brazil--
 
- 
you will find the main
 exports of that country.
 
- 
And you can find
 out whether or not
 
- 
that country exports rubber. 
- 
But what if I don't want
 to go country by country
 
- 
looking for the word rubber? 
- 
I just want an answer. 
- 
What are the countries
 that export rubber?
 
- 
Even though that
 information is in Wikipedia,
 
- 
it's hard to get at. 
- 
It's hard to query. 
- 
Now, you may say, well, that's
 what we have categories for,
 
- 
right? 
- 
Categories are a way to
 cut across Wikipedia.
 
- 
So if someone made a
 category called rubber
 
- 
exporting countries, then
 you can go to that category
 
- 
and see a list of countries
 that export rubber.
 
- 
And if nobody has
 made it yet, well, you
 
- 
can create that category and,
 with a kind of one-time effort,
 
- 
populate that category,
 and you're done.
 
- 
Well, yes. 
- 
That's still not
 very convenient.
 
- 
But also, it's still
 very, very limited,
 
- 
because what if I only want
 countries that export rubber
 
- 
and have a democratic
 system of government,
 
- 
or any other kind of
 additional condition
 
- 
that I would like
 to add to this?
 
- 
Or take a completely
 different example.
 
- 
What if I want to know
 which Flemish town had
 
- 
the most painters born in it? 
- 
There's a ton of
 Flemish painters.
 
- 
Most of them were
 born somewhere.
 
- 
We could theoretically,
 just you know,
 
- 
look up all the birthplaces
 of all the Flemish painters
 
- 
and tally up the
 numbers and figure out
 
- 
what is the place where the
 most Flemish painters come from?
 
- 
I don't know the answer to that. 
- 
It would be nice to be
 able to get that answer.
 
- 
Again, the data is in Wikipedia. 
- 
Those birthplaces are
 listed in the articles
 
- 
about those painters. 
- 
But there's no easy way
 to get that information.
 
- 
What if I want to ask, who are
 some painters whose father was
 
- 
also a painter? 
- 
That's a thing
 that exists, right?
 
- 
Some painters are
 sons of painters.
 
- 
You know, Bruegel comes to
 mind as an obvious example.
 
- 
But there's a bunch
 of others, right?
 
- 
So who are those people? 
- 
What if I want to
 ask that question?
 
- 
That's the kind of question
 that not only Wikipedia
 
- 
doesn't answer today. 
- 
If you walk to your friendly
 university library reference
 
- 
desk and say,
 hello, I would like
 
- 
a list of painters whose
 father was also a painter,
 
- 
how would that
 librarian help you?
 
- 
There's no easy way to get an
 answer to a question like that.
 
- 
What if you only want
 a list of painters
 
- 
who were immigrants, painters
 who lived somewhere else
 
- 
than where they were born? 
- 
There's no book. 
- 
I guess maybe there
 is, but you know,
 
- 
it's not obvious that there's a
 ready resource that says, list
 
- 
of painters who are immigrants. 
- 
And the librarian would
 probably refer you
 
- 
to a book on the shelf
 called, I don't know,
 
- 
The Complete
 Dictionary of Flemish
 
- 
Painters and go,
 look up the index,
 
- 
you know, and if you
 see a similar surname,
 
- 
maybe they're father and son. 
- 
And kind of cobble together
 the answer on your own.
 
- 
The reason I'm comparing
 this to a library
 
- 
is to show you that this is a
 kind of question that is not
 
- 
readily satisfiable today. 
- 
Now, these questions may
 sound contrived to you.
 
- 
You may say to
 yourself, well, you
 
- 
know, painters who are also
 sons of painters, yeah.
 
- 
You know, that
 never occurred to me
 
- 
as a question I
 might care about.
 
- 
But I want to invite
 you to consider
 
- 
that this kind of question,
 questions like that question,
 
- 
may well be questions
 you do care about.
 
- 
And I also want to suggest
 that the fact it is so nearly
 
- 
impossible, the fact that
 there's no obvious way
 
- 
to ask that kind
 of question today,
 
- 
is partly responsible
 to your not
 
- 
coming up with those
 questions, right?
 
- 
We tend to be limited
 by the possible.
 
- 
You know, until human
 flight was made possible,
 
- 
it did not occur to anyone
 to say, oh yeah, by this time
 
- 
next week I will
 be in Australia,
 
- 
because that was
 just impossible.
 
- 
But when flight is
 possible, there's
 
- 
all kinds of things that
 suddenly become possible,
 
- 
and there's all
 kinds of needs that
 
- 
arise based on the
 availability of resources
 
- 
to fulfill those needs. 
- 
So many of these research
 questions, compound lateral
 
- 
cross-cutting queries, are not
 being asked because people have
 
- 
internalized the fact
 that there is no way
 
- 
to get an answer
 to questions like,
 
- 
what is the most popular first
 name among British politicians?
 
- 
I just made that up, you know? 
- 
Is it John? 
- 
Maybe. 
- 
Maybe it's William,
 for whatever reason.
 
- 
You know, these are the kinds
 of questions we don't routinely
 
- 
ask because we know that it's
 like, who are you going to ask?
 
- 
How are you going to
 get an answer to that?
 
- 
So this problem of not having
 very flexible ways of querying
 
- 
the data that we already have-- 
- 
in Wikipedia, in
 Wikisource, elsewhere--
 
- 
is a significant limitation. 
- 
So these two key problems
 have one solution.
 
- 
And that is an editable,
 central storage
 
- 
for structured and
 linked data on a wiki,
 
- 
under a free license, which
 is a very long way of saying
 
- 
Wikidata. 
- 
That is Wikidata. 
- 
Wikidata is an editable,
 central storage
 
- 
for structured and
 linked data on a wiki,
 
- 
under a free license. 
- 
So let's take this
 apart and unpack it.
 
- 
First of all, it's
 a central storage.
 
- 
This relates to the
 first problem, right?
 
- 
If we had one place containing
 data like population size,
 
- 
we would be able to update
 that one place and then have
 
- 
all of the different Wikipedias
 draw the data from that one
 
- 
place so that we wouldn't
 have to manually,
 
- 
repetitively update it across
 our hundreds of projects.
 
- 
So having central storage
 makes, I hope, kind
 
- 
of immediate, intuitive sense. 
- 
But what do I mean by
 structured and linked data?
 
- 
So structured data means
 that each datum, each piece--
 
- 
individual piece-- of data
 is managed on its own,
 
- 
is identified and
 defined on its own,
 
- 
as distinct from Wikipedia. 
- 
Wikipedia has articles. 
- 
The article about Brazil
 includes a ton of data,
 
- 
all kinds of information,
 and it's presented as text,
 
- 
as several paragraphs--
 several pages--
 
- 
of text, right? 
- 
Now, we do have an
 approximation of structured data
 
- 
on Wikipedia. 
- 
If you've browsed
 Wikipedia a little,
 
- 
you've noticed that we often
 have an info box, what we
 
- 
call an info box on Wikipedia. 
- 
That's the table on the right
 side if it's a left to right
 
- 
language, the table
 on the right side
 
- 
that has information that
 is easy to tabulate, right?
 
- 
So you know, birth date, birth
 place, death date, death place,
 
- 
nationality-- 
- 
or if it's about a country,
 area, population, anthem,
 
- 
type of government, whatever
 you are likely to find.
 
- 
If it's a movie, then
 you know, starring,
 
- 
genre, box office receipts,
 whatever pieces of data
 
- 
are relevant to an
 article about a movie.
 
- 
So we do already kind of
 group pieces of information
 
- 
on Wikipedia into this
 kind of structured format.
 
- 
Those of you who have
 ever looked at the source,
 
- 
at what the wiki code
 under that looks like,
 
- 
know that it's only
 semi-structured.
 
- 
It looks neat and
 organized in a table,
 
- 
but really, it's just a bunch
 of text that is put there.
 
- 
It is not centralized. 
- 
Every Wikipedia has its
 own copy of that data.
 
- 
And if I go and update
 the population size
 
- 
on Spanish Wikipedia of
 that Argentinean town,
 
- 
it does not get
 updated automagically
 
- 
on the English Wikipedia or
 the Arabic Wikipedia, right?
 
- 
So the structured data that
 we already have on Wikipedia
 
- 
is not managed centrally. 
- 
The other thing
 about structured data
 
- 
is, when you have a notion of an
 individual piece of data, that
 
- 
is the cornerstone of
 allowing the kinds of queries
 
- 
that I was talking about. 
- 
That is what will allow
 me to ask questions like,
 
- 
what is the Flemish town where
 the most painters were born,
 
- 
or what are the world's
 largest cities that
 
- 
have a female mayor? 
- 
I could come up with other
 examples all day long, right?
 
- 
These are all questions
 that you can ask,
 
- 
once you break down your data
 into individual pieces, each
 
- 
of which is-- 
- 
you're able to refer to each
 of those programmatically.
 
- 
The computer can
 identify, isolate,
 
- 
and calculate based on each
 of those pieces of data.
 
- 
So that's why the
 structure is important.
 
- 
Now, Wikidata is also a
 linked data repository.
 
- 
What does it mean that
 the data is linked?
 
- 
Well, it means that a single
 piece of data can point at,
 
- 
can link to another
 whole bag of data.
 
- 
So if we are describing,
 for example, a person,
 
- 
and we record the
 single piece of data
 
- 
that this person was born
 in Salem, Massachusetts,
 
- 
that single piece of data
 links to the item about Salem,
 
- 
Massachusetts
 because, of course,
 
- 
we know a lot of things
 about that place, Salem,
 
- 
Massachusetts. 
- 
So it's not just the text-- 
- 
S-A-L-E-M. It's not just,
 that's where they were born.
 
- 
But it's a link to all
 the data that we have
 
- 
about Salem, Massachusetts. 
- 
If we say someone's
 nationality is French,
 
- 
that is a link to France. 
- 
That is a link to everything we
 know about the country France.
 
- 
The fact that the data
 is linked and structured
 
- 
allows not only humans,
 but also computers
 
- 
to traverse information
 and to bring
 
- 
us different pieces of
 relevant information
 
- 
programmatically, automatically,
 based on those links.
 
- 
Because it's not just
 text, it's an actual link
 
- 
to another chunk of data. 
- 
If this sounds a
 little abstract,
 
- 
it will become much
 clearer in just a second
 
- 
when we see it in action. 
- 
But the other components of
 this little definition are,
 
- 
of course, this central storage
 of structured and linked data
 
- 
needs to be editable,
 of course, because we
 
- 
need to keep it up to date. 
- 
We need to correct mistakes. 
- 
And we want it on a wiki
 under a free license.
 
- 
The free license is, of
 course, essential to enable
 
- 
reuse of that data, to enable
 all kinds of reuse of the data.
 
- 
And Wikidata, unlike
 Wikipedia, is released
 
- 
under a different free license. 
- 
Wikidata is released
 under CC0 waiver.
 
- 
That means unlike
 Wikipedia, where
 
- 
you have to attribute Wikipedia
 when you reuse information
 
- 
from Wikipedia, you do not
 need to attribute Wikidata,
 
- 
and you do not need to
 share alike your work.
 
- 
It's an unencumbered license to
 reuse the data in any way you
 
- 
want, including commercially. 
- 
You don't have to say that
 it comes from Wikidata.
 
- 
I mean, it could be nice,
 but you don't have to.
 
- 
You're under no
 obligation to do it.
 
- 
And that is important to
 allow certain kinds of reuse
 
- 
where, for example, if you're
 building some kind of device,
 
- 
you may not have a practical
 way to give attribution.
 
- 
And had we required
 that to use Wikidata,
 
- 
we would have made
 Wikidata less reusable.
 
- 
So Wikidata is unencumbered by
 the requirement of attribution.
 
- 
And of course, because
 it's on a wiki,
 
- 
we get all the benefits that we
 are used to expect from a wiki,
 
- 
right? 
- 
So it's a wiki,
 which means, yes.
 
- 
It has discussion pages. 
- 
It has revision histories. 
- 
It remembers everything. 
- 
So if you screw it up, you
 can always go a version back.
 
- 
Or if someone else
 vandalized the content,
 
- 
we can always go back,
 just like Wikipedia.
 
- 
So we get all the
 benefits we're used to--
 
- 
user talk pages, group
 discussion pages, watch lists,
 
- 
all the features that
 we expect in a wiki.
 
- 
In short, Wikidata is love. 
- 
I hope you agree with me
 by the end of this talk.
 
- 
So let's zoom in and see
 what this structured data
 
- 
looks like. 
- 
So structured data on Wikidata
 is collected in statements.
 
- 
And statements have
 the general form
 
- 
of this triple, this
 tripartite ascription--
 
- 
items, properties, and values. 
- 
Now an item is the
 subject, is the topic
 
- 
that we are trying to describe. 
- 
It can be any topic that
 Wikipedia can cover,
 
- 
and many others that
 Wikipedia wouldn't.
 
- 
So the topic, the
 item can be Germany,
 
- 
or it can be Salem,
 Massachusetts,
 
- 
or it can be the
 concept of redemption.
 
- 
It can be anything at all. 
- 
Anything you can imagine
 describing in any way with data
 
- 
can be the item. 
- 
So the item, consider
 it like the title
 
- 
of the rest of the data. 
- 
And then what do we say
 about Salem, Massachusetts
 
- 
or about Germany? 
- 
Well, that's a series of
 properties and values,
 
- 
properties and values. 
- 
The property is
 the kind of datum,
 
- 
like birth date or language
 spoken or manner of death.
 
- 
These are all real properties. 
- 
Or national anthem, if I'm
 trying to describe a country--
 
- 
these are properties. 
- 
And then they have
 values, right?
 
- 
So this person, this
 imaginary person's place
 
- 
of birth, the value of the
 property place of birth
 
- 
is Salem, Massachusetts. 
- 
So you can think about it
 as like a government form--
 
- 
or not government, just any
 form that you're filling out--
 
- 
where there are field names,
 and then empty spaces for you
 
- 
to fill out. 
- 
That's the value, OK? 
- 
So the field names
 or the categories
 
- 
are the properties, right? 
- 
So name, language,
 occupation, date of birth--
 
- 
these are all properties. 
- 
And the values are
 the actual piece
 
- 
of data, the actual
 information that we have.
 
- 
And of course,
 different kinds of data
 
- 
are relevant for describing
 different kinds of items.
 
- 
And the key in the value is it
 can be either a literal value--
 
- 
like if we're describing
 the height of a mountain,
 
- 
we might say just
 the number 8,848.
 
- 
That's the height
 of which mountain?
 
- 
Not everyone at once. 
- 
Oh, because it's meters,
 the metric system.
 
- 
Yeah, Mt. 
- 
Everest is 8,848 meters. 
- 
Yes. 
- 
Get with it, America. 
- 
The metric system. 
- 
All right, so that
 can be a literal value
 
- 
like an actual number. 
- 
Or it can be a link to an
 item, pointing at another item.
 
- 
But in this statement,
 it is the value.
 
- 
So if I'm talking about
 Germany, the item is Germany.
 
- 
And the property capital
 city has the value Berlin.
 
- 
But the value is
 not B-E-R-L-I-N.
 
- 
The value is a pointer to
 the item Berlin, right?
 
- 
That's the link. 
- 
So a single item is described
 by a series of such statements,
 
- 
right? 
- 
There's hundreds and hundreds of
 things I can say about Germany.
 
- 
There's hundreds of things
 I can say about a person.
 
- 
And these will
 generally take the form
 
- 
of a property and a value. 
- 
By the way, some properties
 may have more than one value.
 
- 
Consider the property
 languages spoken.
 
- 
People can speak more
 than one language, right?
 
- 
So if I'm from
 describing myself,
 
- 
we can say languages spoken-- 
- 
English, Hebrew,
 Latin, whatever.
 
- 
So a property can have
 more than one value.
 
- 
So if the item is
 about a country,
 
- 
it would have statements about
 properties like population,
 
- 
land area, official languages,
 borders with, anthem,
 
- 
capital city. 
- 
If I'm describing a person, I
 have a whole mostly different
 
- 
set of properties that
 are relevant, right?
 
- 
Date of birth, place of birth,
 citizenship, occupation,
 
- 
father, mother,
 religion, notable works--
 
- 
now, are all of these
 relevant for all people?
 
- 
No, of course not. 
- 
It depends. 
- 
And different items
 about different people
 
- 
will either have or not
 have these fields, right?
 
- 
So we wouldn't record religion
 for absolutely every person.
 
- 
Some people manage
 to do without.
 
- 
And also, it's not relevant
 for a lot of people, like,
 
- 
what their religion
 happens to be.
 
- 
Date of birth is generally
 relevant for most people
 
- 
that we're documenting. 
- 
So some properties kind of crop
 up more commonly than others.
 
- 
A person's height, for
 example, is not generally
 
- 
considered of
 encyclopedic value, right?
 
- 
We don't, for
 example, if we have
 
- 
an article about even a
 really well-documented person
 
- 
like Winston Churchill, does
 Wikipedia mention his height?
 
- 
I don't think it does. 
- 
Even though I'm sure
 we could probably
 
- 
find a source somewhere
 that lists his height,
 
- 
it's just not a
 very relevant piece
 
- 
of information about Churchill. 
- 
With everything else
 that's written about him
 
- 
and that we know
 about him that we
 
- 
want to include in the
 article, a person's height
 
- 
is not really something of
 great value most of the time.
 
- 
But if we are describing
 Michael Jordan, it is relevant.
 
- 
I'm dating myself. 
- 
People still know
 Michael Jordan, right?
 
- 
You know, a basketball
 player, that's
 
- 
when height is very
 relevant, right?
 
- 
That's one of the
 first things you
 
- 
say when you're describing
 a basketball player,
 
- 
is list their height. 
- 
So even within the
 class of person,
 
- 
some properties may be
 more or less relevant,
 
- 
depending on the context. 
- 
So let's look at some examples. 
- 
These are examples
 of statements.
 
- 
Each line is a statement. 
- 
So here's the first one. 
- 
I want to state, about the
 item Earth, our planet.
 
- 
And what I want
 to say about Earth
 
- 
is that the property
 highest point on Earth
 
- 
has the value Mt. 
- 
Everest. 
- 
Would you agree with that? 
- 
That is the highest
 point on Earth.
 
- 
That's a statement. 
- 
It says something
 specific, one piece
 
- 
of information about Earth. 
- 
Now of course, there's
 a lot of other things
 
- 
we want to say about Earth-- 
- 
circumference,
 average temperature,
 
- 
I don't know, all
 kinds of things
 
- 
we can describe the planet
 with, density, it's a galaxy,
 
- 
it belongs to, all that. 
- 
But here's one piece
 of information,
 
- 
one very specific field in
 the detailed form about Earth.
 
- 
The highest point is Mt. 
- 
Everest. 
- 
Now here's a second statement. 
- 
This time Mt. 
- 
Everest itself is the item
 that I'm describing, right?
 
- 
The topic has changed. 
- 
Now I'm saying
 something about Mt.
 
- 
Everest, and what
 I'm saying about Mt.
 
- 
Everest is elevation
 above sea level.
 
- 
Sounds the same but it
 isn't, because the highest
 
- 
point on Earth answers
 the question where,
 
- 
like on the planet, what
 is the highest point?
 
- 
It's Mt. 
- 
Everest. 
- 
But how high is that highest
 point is a different piece
 
- 
of information. 
- 
Do you agree? 
- 
It's the actual altitude. 
- 
It's not where on
 the planet it is.
 
- 
So it may sound similar,
 but these are actually
 
- 
very different pieces
 of information.
 
- 
So that highest
 point, how high is it?
 
- 
Well, it's 8,848 meters high. 
- 
Now the third statement gives
 another piece of information
 
- 
about the first item. 
- 
Same item-- I could have
 grouped them together.
 
- 
Another thing I
 know about the Earth
 
- 
is that the deepest
 point on the planet
 
- 
is the Challenger Deep, part
 of the so-called Mariana
 
- 
Trench in the ocean. 
- 
So that is the deepest point. 
- 
And how deep is it? 
- 
I again use the elevation
 above sea level.
 
- 
That's the name of the
 property even though it's not
 
- 
above sea level. 
- 
I have a negative value because
 the elevation of the Challenger
 
- 
Deep is minus 11
 kilometers, more or less.
 
- 
All right? 
- 
So these are statements. 
- 
These are four individual
 pieces of data.
 
- 
And I could also
 look at it this way.
 
- 
Maybe that's closer to the
 government form example
 
- 
that I was giving, right? 
- 
So I want to say
 something about Earth.
 
- 
What do I want to say? 
- 
Two things-- highest point. 
- 
That's the field,
 that's the property,
 
- 
and this is the value. 
- 
The highest point is Mt. 
- 
Everest. 
- 
The deepest point
 is Challenger Deep.
 
- 
And then I have things to
 say about Challenger Deep--
 
- 
the property of elevation
 above sea level, the value
 
- 
is minus 11 kilometers. 
- 
Now here's yet another
 view of the same data
 
- 
once more, with numeric IDs. 
- 
So this is the same information,
 the same four statements.
 
- 
But this time, in
 addition to using words,
 
- 
I'm also including weird
 numbers following either Q or P.
 
- 
So P stands for property. 
- 
So the highest point
 property is P610.
 
- 
And the deepest point
 property is P1589.
 
- 
What do these numbers mean? 
- 
They don't mean anything at all. 
- 
They're just numbers. 
- 
They're just sequential numbers. 
- 
And if I create a new
 Wikidata item right now,
 
- 
it'll get just the
 next available number.
 
- 
So they're just numbers. 
- 
So P stands for property. 
- 
What does Q stand for? 
- 
Does anyone know? 
- 
It's a trick question
 because it's hard to guess.
 
- 
But the principal
 architect of Wikidata,
 
- 
a Wikipedian named Danny
 [INAUDIBLE] and data scientist,
 
- 
is married to a lovely
 lady named [INAUDIBLE]
 
- 
spelled with a Q. And
 this is a loving tribute.
 
- 
And she's also a Wikipedian and
 an admin of Uzbek Wikipedia.
 
- 
So Q2 is just the numeric
 identifier of the item Earth.
 
- 
And Q513 is the
 identifier of Mt.
 
- 
Everest. 
- 
You notice that we use that ID
 across the statement, right?
 
- 
So from Wikidata's
 perspective, this
 
- 
is actually what the
 database actually contains.
 
- 
What we were saying with words-- 
- 
the Earth, highest
 point, whatever--
 
- 
never mind that. 
- 
Q2 has P610 with a value Q513. 
- 
That's what Wikidata
 cares about, OK?
 
- 
Now that, you'll agree,
 is a little inaccessible.
 
- 
Just these lists of numbers,
 that's a little hard.
 
- 
So Wikidata
 understands and allows
 
- 
us to continue using our words. 
- 
But actually, it gets
 translated into numeric IDs.
 
- 
Now why is this a good idea? 
- 
Why can't we just
 say Earth or Mt.
 
- 
Everest? 
- 
Any thoughts? 
- 
This is an open question. 
- 
Why is this a good
 idea to use numbers
 
- 
instead of the names of things? 
- 
Yes, because more than one
 thing can have the same name.
 
- 
What do you mean? 
- 
There's only one Mt. 
- 
Everest. 
- 
Well, yeah. 
- 
But there there's also a
 movie called-- and probably
 
- 
more than one-- called Mt. 
- 
Everest, or a TV documentary
 literally called Mt.
 
- 
Everest. 
- 
And of course, if I'm
 describing a person named
 
- 
Frank Johnson, not the only
 Frank Johnson on the planet,
 
- 
right? 
- 
But wait, you say. 
- 
On Wikipedia we deal
 with that problem, right?
 
- 
How do we deal with that
 problem on Wikipedia?
 
- 
Does anyone in
 the audience know?
 
- 
The standard way to
 deal with the fact
 
- 
that there is more than one
 Frank Johnson in the world,
 
- 
on Wikipedia, is to use
 parentheses after the name.
 
- 
So there is Frank
 Johnson (actor)
 
- 
and Frank Johnson
 (politician), for example,
 
- 
if that's the distinction
 we need to make.
 
- 
So you put in parentheses
 kind of the minimal amount
 
- 
of information you need to tell
 apart these Frank Johnsons.
 
- 
What if there's two
 politician Frank Johnsons?
 
- 
Well, then you would say Frank
 Johnson, (Delaware politician)
 
- 
versus Frank Johnson
 (California politician), right?
 
- 
You just put in that bit of
 context to tell them apart.
 
- 
So that's the solution
 that Wikipedians came up
 
- 
with years and years ago
 because they did need
 
- 
a unique name for the article. 
- 
You can't have two
 articles literally called
 
- 
Frank Johnson on Wikipedia. 
- 
So that's the
 solution on Wikipedia.
 
- 
But Wikidata was designed
 much later, more than a decade
 
- 
after Wikipedia, and was
 able to kind of learn
 
- 
from the experience
 of Wikipedia, which
 
- 
has tremendous experience
 with multilingualism, much
 
- 
more than most sites and
 projects, as we know.
 
- 
And so the Wikidata
 team understood
 
- 
from the get go that
 this will be an issue,
 
- 
and it's better to use
 numbers that are unequivocally
 
- 
different from each
 other instead of labels,
 
- 
instead of the actual
 name, the actual text,
 
- 
because names are not unique. 
- 
Names can change, right? 
- 
Just last year, there was a
 big naming reform in Ukraine
 
- 
and a whole bunch of towns
 and districts were renamed.
 
- 
Does that mean we should change
 all the data that we have, like
 
- 
lose all the data that we
 have about the old name?
 
- 
No, we ideally just
 want to change the name
 
- 
without breaking links. 
- 
So having the links actually
 refer to the numbers
 
- 
is one way to ensure the
 integrity of the data,
 
- 
of the links, when
 renaming happens.
 
- 
Another reason is well, even
 if the name doesn't change,
 
- 
not all humans call
 everything the same, right?
 
- 
So Earth is Earth
 in English, but it's
 
- 
[SPEAKING ARABIC] in Arabic. 
- 
It's [SPEAKING HEBREW]
 in Hebrew.
 
- 
So obviously, Earth--
 even that is not
 
- 
as unambiguous or unequivocal
 as you might think.
 
- 
And so that is the
 reason Wikidata,
 
- 
which is built to be
 multilingual from the start,
 
- 
talks about numbers
 rather than labels.
 
- 
OK. 
- 
Ha, I had a whole slide
 about that and I forgot.
 
- 
Yes, so even London,
 again, is not
 
- 
just London, England, which is
 what you were thinking about.
 
- 
It's also a city in Canada. 
- 
And it's also a family
 name, like Jack London.
 
- 
It's also a movie company. 
- 
There must be some hotel
 named London somewhere.
 
- 
This is a good opportunity
 to remind everyone
 
- 
that the vast
 majority of humankind
 
- 
does not speak a
 word of English.
 
- 
That's a statistic
 worth remembering.
 
- 
The vast majority of the planet
 does not speak English at all.
 
- 
That does not
 contradict the datum
 
- 
that English is the most
 widely spoken language.
 
- 
And yet, in aggregate,
 a majority of people
 
- 
speak other languages,
 and not English at all.
 
- 
So moving swiftly on, this
 is a pause for questions
 
- 
about what I've covered so far. 
- 
Any questions in the audience? 
- 
If not, we moved to IRC. 
- 
If there are any questions-- 
- 
Any questions? 
- 
No? 
- 
IRC? 
- 
Any questions? 
- 
OK. 
- 
We will have additional
 pauses for questions later.
 
- 
But enough of my hand-waving. 
- 
Let's go explore Wikidata. 
- 
So Wikidata lives
 at wikidata.org.
 
- 
And Wikidata already has
 more than 25 million items.
 
- 
That is, it collects
 statements about more than 25
 
- 
million topics. 
- 
It has many, many more
 than 25 million statements
 
- 
because many of these items
 have dozens or hundreds
 
- 
of statements. 
- 
So it documents 25
 million things--
 
- 
people, books, rivers, whatever. 
- 
Just to give us a sense
 of how big that number is,
 
- 
how many articles do we
 have on English Wikipedia?
 
- 
More than-- yes, more
 than 5 million articles.
 
- 
And that's the
 largest Wikipedia.
 
- 
So Wikidata is
 already describing
 
- 
more than five times, or
 about five times as many items
 
- 
as even our largest Wikipedia. 
- 
So obviously,
 Wikidata contains data
 
- 
about things that have no
 article on any Wikipedia.
 
- 
It is a much, much larger,
 more comprehensive project.
 
- 
All right, the second
 thing we might notice
 
- 
is, well, this looks kind
 of like Wikipedia, right?
 
- 
If we've never visited, it
 looks kind of like Wikipedia.
 
- 
It has this sidebar. 
- 
It has these buttons at the top. 
- 
It looks like it's
 from the '90s.
 
- 
Yeah. 
- 
So the reason it
 looks like Wikipedia
 
- 
is that it is a wiki running
 on Mediawiki software.
 
- 
It is running on software
 very much like Wikipedia.
 
- 
But it is running on
 a kind of modification
 
- 
of the standard wiki software. 
- 
It has an additional,
 very important component
 
- 
named Wikibase,
 which gives it all
 
- 
of its structured and
 linked data power.
 
- 
So let's start
 exploring Wikidata.
 
- 
Let's take something local-- 
- 
Harvey Milk. 
- 
Harvey Milk. 
- 
What does Wikidata
 know about Harvey Milk?
 
- 
For those on YouTube
 who may not be local,
 
- 
he's a San Francisco politician
 and gay rights activist
 
- 
who was murdered in the '70s. 
- 
It was very significant in
 the history of those struggles
 
- 
in this country. 
- 
So what does Wikidata
 tell us about Harvey Milk?
 
- 
Well, the first
 thing is it knows
 
- 
that Harvey Milk is Q17141. 
- 
That's the most important
 piece of information,
 
- 
is first of all, that
 is the identifier.
 
- 
That is the item
 number of all the data
 
- 
that we will collect
 about Harvey Milk.
 
- 
The second thing you see
 right under the title
 
- 
is this line, this very,
 very brief summary, right?
 
- 
"American politician who became
 a martyr in the gay community."
 
- 
This line is the
 description line.
 
- 
So the name of the item-- 
- 
this is the label. 
- 
We call it label on Wikidata. 
- 
That's the label. 
- 
And this line is
 the description.
 
- 
Now why is this
 description important?
 
- 
This is the description that
 helps us tell this Harvey
 
- 
Milk from any other Harvey
 Milk that may exist, all right?
 
- 
So again, this would
 be useful if I'm
 
- 
looking up someone with a
 slightly more generic name.
 
- 
That line will help me tell
 apart the item about Harvey
 
- 
Milk the gay activist rather
 than Harvey Milk the film
 
- 
actor, OK? 
- 
And where is it coming from? 
- 
Well, Wikidata has
 this whole table,
 
- 
as you can see, with
 descriptions and labels
 
- 
in other languages. 
- 
So Wikidata is able to refer
 to Harvey Milk in Arabic which,
 
- 
don't panic, is written
 from right to left.
 
- 
It also knows what to
 call him in Bulgarian.
 
- 
I mean, it's the same name,
 but it's in a different script.
 
- 
In French, in Hebrew,
 and that's it?
 
- 
Does it not know a name
 for Harvey Milk in Italian?
 
- 
Of course it does. 
- 
It actually has
 labels for this person
 
- 
in many, many, many languages. 
- 
It doesn't have descriptions in
 every language, as you can see.
 
- 
OK? 
- 
So why was Wikidata showing me
 these languages and not others?
 
- 
I mean, why this somewhat
 arbitrary collection--
 
- 
English, Arabic, Bulgarian,
 German, French, and Hebrew?
 
- 
Because I told it to. 
- 
So if we briefly click
 over to my user page--
 
- 
again, like every wiki,
 you have user accounts.
 
- 
You have user pages. 
- 
This is my user page. 
- 
And as you can see,
 there's this little user
 
- 
information box here called
 a Babel box by Wikipedians,
 
- 
where I list the
 languages that I speak.
 
- 
And Wikidata uses this box
 just to kind of helpfully
 
- 
show me these languages. 
- 
Of course, all the
 other languages
 
- 
are still available, as you saw,
 by clicking the more languages.
 
- 
But this is just a
 useful little way
 
- 
of getting the languages I
 care about up there first.
 
- 
By the way, this is a lie. 
- 
I don't actually
 speak Bulgarian.
 
- 
That stayed on my user page
 because I was demonstrating
 
- 
this in Bulgaria and I wanted
 that label to show up there
 
- 
during the talk-- 
- 
just in case you
 were going to tell me
 
- 
a really good Bulgarian joke. 
- 
OK so for example, Hebrew
 is my mother tongue.
 
- 
And we have a Hebrew
 label for Harvey Milk.
 
- 
But we don't have a description. 
- 
So let's fix that right now by
 clicking the edit button right
 
- 
here. 
- 
I click edit, and this
 table became editable.
 
- 
And now I can very briefly
 type a description.
 
- 
AUDIENCE: Online in
 about 20 seconds.
 
- 
But can we hold it? 
- 
ASAF BARTOV: OK. 
- 
That was good timing
 for the screen to crash.
 
- 
OK? 
- 
Are we back? 
- 
OK. 
- 
Sorry about that. 
- 
So this was all about what to
 call him in different languages
 
- 
and scripts and how to
 tell this person apart
 
- 
from other people with
 potentially the same name.
 
- 
Let's scroll down and see
 what else does Wikidata
 
- 
know about this person? 
- 
So as you can see, this is
 a list of statements, right?
 
- 
This is a list of statements. 
- 
And the properties
 are on the left,
 
- 
the values are on the right. 
- 
So the first thing Wikidata
 knows about Harvey Milk
 
- 
is a very important
 property called instance of.
 
- 
Instance of. 
- 
And the property instance of
 answers the very basic question
 
- 
what kind of thing is
 this that I'm describing?
 
- 
Is it a book? 
- 
Is it a poem? 
- 
Is it a mountain? 
- 
Is it a theological concept? 
- 
No, it's a human. 
- 
It's a person, OK? 
- 
The item about Mt. 
- 
Everest will say
 instance of mountain, OK?
 
- 
This is a very
 important property.
 
- 
Why is it important? 
- 
Wouldn't anyone looking
 at this know that this is
 
- 
a human being? 
- 
Yes. 
- 
Anyone looking at
 this will know.
 
- 
But if I want a computer to
 be able to pull information
 
- 
about people, I want to
 be able to easily exclude
 
- 
all the mountains and
 poems and other things that
 
- 
are not people from my query. 
- 
So this single datum,
 this single piece of data,
 
- 
is what tells computers and
 algorithms very clearly,
 
- 
this is a human. 
- 
Things that aren't instance
 of human are other things.
 
- 
OK? 
- 
So it may sound very
 trivial, but it's not.
 
- 
It's very important
 to have an instance
 
- 
of field for Wikidata items. 
- 
All right, what else do we know? 
- 
Well, Wikidata knows about
 an image for Harvey Milk.
 
- 
Again, we can find a ton of
 images-- or maybe not a ton,
 
- 
but we can find dozens
 of images of Harvey Milk
 
- 
on Commons, on our Wikimedia
 multimedia repository.
 
- 
So why should we have a
 single image here on Wikidata?
 
- 
Again, this is
 mostly for reusers.
 
- 
If I'm building some kind of
 tool that pulls information
 
- 
from Wikidata, it's
 nice if there's
 
- 
at least one representative
 image to kind of use
 
- 
as the default or immediate
 image for Harvey Milk
 
- 
in some other reused context. 
- 
All right, sex or gender-- 
- 
male. 
- 
Country of citizenship--
 United States of America.
 
- 
Given name is Harvey. 
- 
The date of birth is so and so. 
- 
The place of birth is Woodmere. 
- 
The place of death
 is San Francisco.
 
- 
The manner of death is homicide. 
- 
Wikidata knows that. 
- 
Now again, every
 little datum like that
 
- 
is the basis for later querying
 and answering questions.
 
- 
So the fact that we record the
 manner of death of people--
 
- 
or at least of some people-- 
- 
will allow us later
 to go, you know,
 
- 
who are some people from
 Belgium who died by homicide?
 
- 
That's a question Wikidata can
 answer, thanks to this field.
 
- 
The other thing I mentioned
 is that things are links.
 
- 
So the place of
 birth is Woodmere.
 
- 
I don't know where
 Woodmere is, but I
 
- 
can click that and find out. 
- 
Here is the Wikidata item
 about Woodmere, right?
 
- 
It was the value in the
 statement about Harvey Milk,
 
- 
but now I'm looking at
 the item about Woodmere.
 
- 
And it turns out it's in
 Nassau County, New York, right?
 
- 
And of course, Wikidata has
 a whole bunch of information
 
- 
for me about Woodmere-- 
- 
what country it's in and the
 coordinates and the population
 
- 
and the area, all the things you
 would expect about a place, OK?
 
- 
Let's get back to Harvey Milk. 
- 
So the manner of death,
 the cause of death--
 
- 
now here, Wikidata gives
 us excellent information.
 
- 
The actual cause of death
 is ballistic trauma.
 
- 
That's a professional term. 
- 
And this statement
 has qualifiers.
 
- 
So until now, I was talking
 about triples, right?
 
- 
The item has a property
 with a certain value.
 
- 
Actually, each
 statement can also
 
- 
have a number of
 qualifiers which
 
- 
add aspects of information,
 still about that one question
 
- 
that we're answering, right? 
- 
So if this property
 answers cause of death,
 
- 
it's not discussing
 anything else.
 
- 
It's not discussing languages. 
- 
It's not discussing
 date of birth, right?
 
- 
It's talking about
 the cause of death.
 
- 
But we're not just
 saying ballistic trauma.
 
- 
We're saying ballistic trauma
 with the quantity attribute
 
- 
being five. 
- 
What does that mean? 
- 
Five bullets, right? 
- 
There are five
 ballistic traumas.
 
- 
He was he was shot five times. 
- 
And he was shot by this
 person named Dan White.
 
- 
And this ballistic trauma,
 like this actual shooting,
 
- 
is itself the subject
 of this other thing.
 
- 
This is a link to a
 whole other Wikidata
 
- 
item about the Moscone-Milk
 assassinations.
 
- 
Moscone was the San
 Francisco mayor at the time.
 
- 
We'll see slightly better or
 easier to understand examples
 
- 
of qualifiers in a bit. 
- 
So if this was
 confusing, hang on.
 
- 
So he was killed by Dan White. 
- 
He spoke English. 
- 
His occupation--
 here's an example
 
- 
of a property with more
 than one value, right?
 
- 
So Milk was a politician. 
- 
But he was also a Navy
 officer, at least for a while.
 
- 
That was another thing that
 he did during his life.
 
- 
And he was a human
 rights activist, right?
 
- 
So some people are
 writers and translators.
 
- 
So people can have more
 than one occupation.
 
- 
People can speak more
 than one language.
 
- 
Here's a better
 example of a qualifier.
 
- 
So the property award received
 has the value Presidential
 
- 
Medal of Freedom. 
- 
And that award has an
 attribute called point in time,
 
- 
like when was this? 
- 
This was in 2009. 
- 
Do you see that
 this piece of data--
 
- 
2009-- is a sub-statement
 or is subjugated
 
- 
to the context of this award,
 was the Presidential Medal
 
- 
of Freedom? 
- 
It can't just kind of
 free float in the article.
 
- 
It's not that 2009 is itself
 a meaningful thing, right?
 
- 
This medal was awarded in 2009. 
- 
If 
- 
Wikidata doesn't
 tell us, for example,
 
- 
when he was a Navy officer, OK? 
- 
But if we were, for example,
 to look that up right now
 
- 
and find out that Milk was
 a Navy officer between 1962
 
- 
and 1964, we could go back
 here to the Navy officer bit
 
- 
and click edit. 
- 
This is how I edit this
 particular little piece
 
- 
of information. 
- 
And add a qualifier like this. 
- 
I click Add Qualifier. 
- 
And I could pick start
 time and end time, right?
 
- 
And then I could
 type 1962 to 1964,
 
- 
and that would be
 teaching Wikidata.
 
- 
Oh, I'm sorry, I meant to
 do that for Navy officer.
 
- 
OK. 
- 
But, you know,
 that is the exact--
 
- 
the accurate time span
 of that statement.
 
- 
So it's true to say about a
 person, he was a Navy officer,
 
- 
even if of course he wasn't a
 Navy officer his entire life.
 
- 
But it's better and
 it's more accurate,
 
- 
to say he was a Navy officer
 between 1962 and 1964.
 
- 
Don't worry, I'm
 not saving this.
 
- 
No vandalizing of
 Wikidata in this session.
 
- 
OK. 
- 
Moving on. 
- 
What else does Wikidata know? 
- 
He was educated at
 this university.
 
- 
He was a member of
 this political party.
 
- 
Right? 
- 
That's of course if
 they're a relevant property
 
- 
for a politician. 
- 
Religion, military branch,
 what is the category on commons
 
- 
that discusses this
 item, is something
 
- 
that Wikidata can tell us. 
- 
And that's it. 
- 
Now, is that everything
 that we could possibly
 
- 
say in a structured
 way about Harvey Milk?
 
- 
No. 
- 
We could probably find at
 least a few more things to say.
 
- 
We will see how to contribute
 new information to Wikidata
 
- 
in just a minute with
 a different example.
 
- 
But this-- all this was
 a set of statements.
 
- 
Right? 
- 
This was the title
 statements here.
 
- 
But at the bottom of the
 list of statements is
 
- 
another section
 called identifiers.
 
- 
And I want to spend a minute
 talking about what that is.
 
- 
So identifiers is a
 collection of keys.
 
- 
A collection of
 IDs, or codes, that
 
- 
are keys to other
 information sources.
 
- 
And a lot of Wikidata items
 have a whole series of keys
 
- 
to other databases, other
 sites, other repositories,
 
- 
that help you or a computer
 be able to access not just
 
- 
some database and look for
 information about Harvey Milk,
 
- 
but access the exact record
 relevant to Harvey Milk.
 
- 
And again, if you imagine
 someone named John Smith,
 
- 
that is really valuable, right? 
- 
If you're not just
 told, oh yeah,
 
- 
you can look at the
 Library of Congress
 
- 
for John Smith,
 good luck with that.
 
- 
Or if I tell you, go to
 the Library of Congress
 
- 
to this record for this John
 Smith, you see the difference.
 
- 
So Wikidata tells us that on
 VIAF, which is the Virtual
 
- 
International Authority File. 
- 
It's an aggregated master
 index built by bibliographers,
 
- 
by librarians, of people. 
- 
Right? 
- 
It tries to kind of aggregate
 information about people
 
- 
across library
 catalogs everywhere.
 
- 
So the VIAF ID for Harvey
 Milk is this number.
 
- 
And conveniently,
 if I click that,
 
- 
I'm not taking to
 some Wikidata item.
 
- 
I'm actually taken
 to the relevant site.
 
- 
So this took me right
 to viaf.org, the Virtual
 
- 
International Authority File,
 directly to their record
 
- 
about Harvey Milk. 
- 
All right? 
- 
And that itself leads
 me to national catalogs
 
- 
of national libraries
 all over the world.
 
- 
We won't get into the
 things you can do with VIAF.
 
- 
The point is Wikidata
 contained the piece of thread
 
- 
that I could tug on
 to arrive directly
 
- 
to that information
 in other databases.
 
- 
Yes. 
- 
And it has that for many,
 many kinds of databases.
 
- 
The BNF, for example, that's
 the National Library of France.
 
- 
And that will take me
 to that index card.
 
- 
IMDB. 
- 
We all know IMDB, right? 
- 
So here I have the key
 to Harvey Milk in IMDB.
 
- 
And this is what IMDB says
 about Harvey Milk, right?
 
- 
They have their own piece
 of information about him,
 
- 
of course, with filmography
 and everything else.
 
- 
And see, I did not have
 to search IMDB for it.
 
- 
I just had the key right
 there waiting for me.
 
- 
Now, again, this is
 very convenient for me
 
- 
as I just showed you the
 human use case for this.
 
- 
But it's even more
 powerful in aggregate
 
- 
when we allow computers to
 traverse this network of links
 
- 
between-- 
- 
not just within wiki data, but
 between data storage facilities
 
- 
and repositories. 
- 
This is sometimes referred to
 as the linked data open cloud.
 
- 
Cloud, because it's multiple
 different repositories
 
- 
that are interlinked. 
- 
And Wikidata is already, and
 to a growing extent, the Nexus,
 
- 
the connection
 point between a lot
 
- 
of these different databases. 
- 
So IMDB, for example,
 it's a good example
 
- 
because it's site
 almost everyone knows,
 
- 
IMDB has information
 about Harvey Milk.
 
- 
But that information
 does not include a link
 
- 
to the French National Library. 
- 
Right? 
- 
Do you see what I'm saying? 
- 
So IMDB is a data repository
 with IDs and allows linking.
 
- 
But it does not give you
 what Wikidata gives you which
 
- 
is this kind of collection of-- 
- 
it's like a junction of all
 these different data sources.
 
- 
So Wikidata is the
 place where you
 
- 
can document these
 interrelationships
 
- 
or equivalencies. 
- 
Right? 
- 
So ID, you know, 587548 on IMDB
 is discussing the same topic
 
- 
as French National
 Library ID whatever.
 
- 
Wikidata contains that
 piece of information.
 
- 
that this ID in this database
 is about the same person
 
- 
as that ID in that database. 
- 
OK. 
- 
So that's what
 identifiers are about.
 
- 
Still scrolling down the
 Wikidata item about Harvey
 
- 
Milk, we have the site links. 
- 
The site links are links
 to Wikimedia projects
 
- 
that are related to this item. 
- 
So of course there
 are Wikipedia articles
 
- 
about Harvey Milk in many,
 many different wikipedias.
 
- 
Quite a few language versions. 
- 
And there are
 pages on Wikiquote,
 
- 
one of the sister projects. 
- 
There are pages on
 Wikiquote with some quotes
 
- 
from Harvey Milk. 
- 
And there is even a page for
 Harvey Milk on Wikisource.
 
- 
Right? 
- 
So this is a collection
 of those links.
 
- 
And those of you who have maybe
 only dealt with Wikidata data
 
- 
for inter-wiki links, which
 we used to do in the old days
 
- 
manually within
 the article text,
 
- 
now we do it through
 Wikidata, so maybe that's
 
- 
the only thing you didn't
 know about Wikidata
 
- 
is how to update these
 inter-wiki tables on Wikidata.
 
- 
All right. 
- 
So that concludes
 our little tour
 
- 
of the anatomy of
 a Wikidata page.
 
- 
I will just remind you that
 it's a wiki page, which
 
- 
means it has a discussion
 page, a talk page.
 
- 
This one happens to be empty. 
- 
But, you know, if we have
 concerns or arguments
 
- 
about some of the
 data here that is
 
- 
what we would use
 to discuss this
 
- 
and to arrive at consensus. 
- 
It also has a history view just
 like every Wikipedia article.
 
- 
So you can see here
 a list of edits.
 
- 
Maybe some of you
 have never looked
 
- 
at a history page on Wikipedia,
 so this looks overwhelming.
 
- 
But every line here,
 every entry here,
 
- 
is a single edit, a single
 revision, a single change
 
- 
to this Wikidata item. 
- 
Just Harvey Milk. 
- 
And you can see at the very
 top this edit that I just
 
- 
made-- this is my
 volunteer account
 
- 
and I just made this edit,
 and in parentheses you
 
- 
can see what I did. 
- 
I added an HE,
 Hebrew, description.
 
- 
And this is the text
 that I added in Hebrew.
 
- 
Right? 
- 
So we can see who added
 what to the Wikidata item,
 
- 
just like we can do
 the same on Wikipedia.
 
- 
So we have the revision history. 
- 
We can undo edits. 
- 
We can revert, just
 like on Wikipedia.
 
- 
And what else did I
 want to show here?
 
- 
We can add an item to my
 watch list using the star,
 
- 
just like on Wikipedia. 
- 
So we have all these
 standard wiki features
 
- 
that we would come to expect. 
- 
Let's pause for questions. 
- 
Any questions about what
 we've covered so far?
 
- 
Yes. 
- 
Are attributes of statements
 precept for the specific value?
 
- 
No they're not reset. 
- 
And generally Wikidata data does
 not enforce by default logic.
 
- 
So, I mean, there's
 nothing to prevent you
 
- 
from editing the
 item about Brazil,
 
- 
and adding the property height. 
- 
Now height is not a relevant
 property for a country.
 
- 
Right? 
- 
I mean, maybe average
 elevation, maybe.
 
- 
But not just height,
 which is used for humans
 
- 
or for physical things. 
- 
So you could add that
 property to Brazil and save it
 
- 
and the wiki would not complain. 
- 
Now in the background
 there are kind
 
- 
of extra wiki outside the
 wiki prostheses for constraint
 
- 
validation. 
- 
So there are bots and
 other processes that
 
- 
run, and occasionally,
 for example,
 
- 
identify non-living things
 with a date of birth field.
 
- 
That's nonsensical. 
- 
That should not exist. 
- 
If someone mistakenly added
 that there are processes
 
- 
that would flag
 that to be fixed.
 
- 
But the wiki itself,
 Wikidata, will not
 
- 
prevent you from adding that. 
- 
And that is by design
 to keep things flexible.
 
- 
So that people don't
 run into, oh wait,
 
- 
but I can't add this
 because nobody thought
 
- 
that I would need this, maybe. 
- 
I hope that answers
 your question.
 
- 
You say helpful
 answer, question mark.
 
- 
So was it a helpful answer, or? 
- 
OK. 
- 
Yes, Eleanor. 
- 
AUDIENCE: [INAUDIBLE] 
- 
ASAF BARTOV: Excellent question. 
- 
I'll repeat it. 
- 
You ask how do I find
 the wiki data item
 
- 
number from Wikipedia. 
- 
If I'm reading about Harvey Milk
 and I want to look at the data
 
- 
how do I do that? 
- 
That is an excellent question
 and let's skip to Wikipedia.
 
- 
Conveniently I have the
 link right here on English.
 
- 
So this is the Wikipedia
 article about Harvey Milk
 
- 
and every item on Wikipedia
 should have a wiki data
 
- 
item associated with it, but it
 doesn't happen automatically.
 
- 
So if I just created
 a page on Wikipedia
 
- 
I also need to create a
 Wikidata entity for it
 
- 
if it doesn't already exist. 
- 
It could already exist
 because it was already
 
- 
covered in a different
 language, for example.
 
- 
So that was parenthetical. 
- 
But every article on Wikipedia
 should have, here on the side,
 
- 
on the side are under Tools,
 a link called Wikidata item.
 
- 
Right here. 
- 
OK. 
- 
That Wikidata data
 item is a link
 
- 
that takes you to
 Wikidata, to the entity,
 
- 
and there you find the number. 
- 
You can-- you don't
 even have to click it.
 
- 
I mean, the URL itself
 tells you the number.
 
- 
The number, you see, it's
 wikidata.org/wiki/q17141.
 
- 
OK. 
- 
So that was an
 excellent question.
 
- 
Other questions? 
- 
Yes. 
- 
Yeah, about the additional
 attributes, the qualifiers.
 
- 
So, yes, I answered
 more generically.
 
- 
But just like the
 properties themselves
 
- 
are not limited per item,
 the qualifiers per statement
 
- 
are also not
 entirely preordained.
 
- 
But there is some
 structure to it.
 
- 
I don't want to go into it
 at great length right now.
 
- 
If we have time in the end
 we can get back to that.
 
- 
But some qualifiers are again
 relevant for some things,
 
- 
start time, end time,
 and others won't be.
 
- 
Wikidata does try to offer you-- 
- 
you may remember when I
 clicked add qualifier,
 
- 
it gave me kind of drop down
 of some relevant qualifiers.
 
- 
So it does try to
 help you in that way.
 
- 
Other question? 
- 
Are the values for
 instance of already
 
- 
mappable to external ontologies? 
- 
That is a complicated question. 
- 
I'll help people understand
 the question first.
 
- 
So an ontology is a
 structure, some kind
 
- 
of hierarchy or
 cloud, of entities
 
- 
and their interrelationships. 
- 
An ontology would
 say, for example,
 
- 
a person is a living thing. 
- 
So is a dog. 
- 
They're both living things,
 but they're different things.
 
- 
And then, you know, say
 things about those entities
 
- 
and their interrelationships. 
- 
Now there are many,
 many competing,
 
- 
or coexisting models
 of ontology's.
 
- 
Many of them were created
 for specific needs.
 
- 
Many of them want to be
 a universal ontology.
 
- 
But of course it's
 impossible to quite
 
- 
agree on one complete
 and simple ontology.
 
- 
And so there are
 many ontology's.
 
- 
Which brings up your question,
 can we map across ontology's?
 
- 
Can we say that when wiki data
 says instance of book that
 
- 
is equivalent to some other
 ontology saying instance
 
- 
of bibliographic record? 
- 
And the answer is yes. 
- 
There are some such mappings. 
- 
They are incomplete. 
- 
And there's no kind of
 auto magic thing happening
 
- 
in the wiki vis-a-vis
 those other ontology's.
 
- 
That's kind of
 left as an exercise
 
- 
for those dealing with those
 other ontology's, and for tool
 
- 
builders and other
 platform improvements
 
- 
beyond Wikidata itself. 
- 
OK. 
- 
Other questions? 
- 
Yeah, we have one from
 the YouTube stream.
 
- 
Someone asked, why can't I
 link Howard Carter's occupation
 
- 
to archeologists when I use
 an info box that fetches info
 
- 
from Wikidata? 
- 
Why can't I link it
 from the info box?
 
- 
So, someone on the
 stream answered
 
- 
saying, because it's
 an improper connection,
 
- 
because the target is not
 about the subject only.
 
- 
The target is not
 about the subject?
 
- 
If I understand the
 question correctly,
 
- 
what you would want to be able
 to do is from within Wikipedia
 
- 
be able to say occupation
 and link to a Wikidata entry
 
- 
about archeology. 
- 
That doesn't quite
 work that way.
 
- 
We will get to a
 little discussion
 
- 
of that in an upcoming
 section of this talk.
 
- 
So I will defer the rest
 of my answer to then.
 
- 
OK. 
- 
So we're done with
 questions for this phase,
 
- 
and my browser got
 tired of waiting for me.
 
- 
So, yes. 
- 
All right. 
- 
So we took a look at Wikidata,
 and we took questions.
 
- 
So now, let's teach
 Wikidata some new things.
 
- 
Some things it
 doesn't already know.
 
- 
Let's look at this item here. 
- 
So this item is about one
 of my favorite writers,
 
- 
an American writer
 named Helen Dewitt.
 
- 
Wikidata, of course, fondly
 refers to her as q54674,
 
- 
but we can call
 her Helen Dewitt.
 
- 
And what can we contribute here? 
- 
So Wikidata has far less
 information about Helen Dewitt.
 
- 
Most of you probably haven't
 heard of her, that's OK.
 
- 
What does Wikidata
 know about her?
 
- 
Well instance of human. 
- 
We have a photo of her. 
- 
She's female. 
- 
She's an American. 
- 
Her name is Helen. 
- 
Date of birth. 
- 
Place of birth. 
- 
She's an author, a
 novelist, a writer.
 
- 
She was educated at the
 University of Oxford.
 
- 
And Wikidata knows what
 her official website is.
 
- 
That's useful, but that's it. 
- 
Now we can contribute
 information here.
 
- 
For example, she's an American
 author writing in English.
 
- 
So we could add
 that information.
 
- 
We could click the
 Add button here.
 
- 
And this is a good
 moment to acknowledge
 
- 
that the user interface of
 Wikidata is a work in progress.
 
- 
It's not as intuitive
 as it might be.
 
- 
So you need to
 understand that click--
 
- 
to add a completely
 new property,
 
- 
You need to click
 this Add button.
 
- 
If you want to add an additional
 value to the property official
 
- 
website, you need to
 click this Add button.
 
- 
It makes a kind of
 sense with a shaded box.
 
- 
But, you know, you need
 to kind of pay attention,
 
- 
and it's not as
 friendly as it might be.
 
- 
[COUGHING] Excuse me. 
- 
So, let's add a property here. 
- 
Click the Add button. 
- 
Again, Wikidata tries to
 be useful by suggesting
 
- 
some relevant
 properties for humans.
 
- 
A bit more morbidly it suggests,
 how about date of death?
 
- 
That's not cool, Wikidata. 
- 
Helen Dewitt is still alive. 
- 
So I will not add
 date of death, but I
 
- 
can add languages spoken,
 written, or signed.
 
- 
OK, so I click that. 
- 
And she writes in English. 
- 
I just type English-- whoops. 
- 
Not in Hebrew. 
- 
Don't panic. 
- 
I type English here. 
- 
And, oh, and of course Wikidata
 has auto-complete, right?
 
- 
So it tries to help me along. 
- 
But you will notice that
 it has all kinds of things
 
- 
called English. 
- 
I mean, it turns out that
 there is a place in Indiana
 
- 
called English, Indiana. 
- 
Did I mean that? 
- 
No, of course I didn't mean
 that she writes her books
 
- 
in English, Indiana. 
- 
Right? 
- 
But, you know, Wikidata gives me
 the option of linking to that.
 
- 
I also don't mean the botanist
 Carl Schwartz English.
 
- 
No, no I mean the
 west Germanic language
 
- 
originating in England. 
- 
That's what I mean. 
- 
So I click that. 
- 
And I click Save. 
- 
And that's it. 
- 
Again I have just made
 an edit to Wikidata.
 
- 
I have just taught Wikidata
 that this author speaks English.
 
- 
Now, again, this
 may be very obvious.
 
- 
She's American. 
- 
Of course not all
 Americans write in English.
 
- 
It may be obvious if
 you look at her books.
 
- 
The important thing
 is that now Wikidata
 
- 
knows this as a piece of data. 
- 
And, again, think ahead
 to queries, which we will
 
- 
demonstrate in a little bit. 
- 
Without this piece
 of information
 
- 
that I just added, if I were to
 ask Wikidata five minutes ago,
 
- 
give me a list of novelists
 writing in English, OK,
 
- 
Wikidata would have returned
 thousands of results.
 
- 
But Helen Dewitt would
 not have been among them.
 
- 
Because up until two
 minutes ago Wikidata
 
- 
didn't know that Helen Dewitt
 writes in English and not
 
- 
in Spanish. 
- 
Do you see? 
- 
It is this explicit
 statement that will now
 
- 
make her be included in any
 future queries that asks,
 
- 
who are novelists
 writing in English?
 
- 
OK. 
- 
By the way, she's
 a PhD in Classics.
 
- 
She speaks-- or at least reads
 and writes Latin and Greek,
 
- 
ancient Greek, and I could-- 
- 
I can-- I mean, I
 happen to know that.
 
- 
But wait, wait, wait,
 wait, wait, you say.
 
- 
What about original research? 
- 
I mean, you can't just add
 stuff like that to Wikidata.
 
- 
Don't you need sources? 
- 
Citations? 
- 
Of course I do. 
- 
Yes. 
- 
Let's add some sources to this. 
- 
So on Wikidata,
 just like Wikipedia,
 
- 
things should generally
 be supported by citations,
 
- 
by references. 
- 
And just like Wikipedia,
 they aren't always supported
 
- 
in that way. 
- 
OK so, I mean, I can
 just add it to Wikidata.
 
- 
Watch me. 
- 
I just did that, right? 
- 
I just added English and
 Latin without any citation,
 
- 
and I will not be
 arrested for it.
 
- 
Just like I could edit
 a Wikipedia article
 
- 
and add some information
 without a citation.
 
- 
It may stick. 
- 
It may stay in the article,
 or it may be reverted.
 
- 
It depends on the kind of
 information I'm adding.
 
- 
It depends how many people
 are paying attention
 
- 
to the article on Wikipedia. 
- 
And it works the
 same way on Wikidata.
 
- 
OK, so, you can add some
 things without references.
 
- 
Ideally, when you
 add, information you
 
- 
should include references. 
- 
So let's be good Wikidata
 citizens and add a source.
 
- 
Here is an article that
 I prepared in advance.
 
- 
This is Helen Dewitt. 
- 
And in this article,
 somewhere, it actually
 
- 
says right at the
 bottom here, see,
 
- 
Dewitt knows, in descending
 order of proficiency, Latin,
 
- 
ancient Greek, French,
 German, Spanish,
 
- 
and Portuguese, Dutch, Danish,
 Norwegian, Swedish, Arabic,
 
- 
Hebrew and Japanese. 
- 
This may sound
 excessive, but it's true.
 
- 
I met this woman. 
- 
So anyway, we don't have
 to include all of that.
 
- 
The point is this article from
 a reasonably reliable source,
 
- 
this magazine,
 this interview, can
 
- 
count as a source for
 the languages she speaks.
 
- 
So I copy the URL. 
- 
I just copied off my browser. 
- 
And, whoops-- that's not-- 
- 
here we go. 
- 
And I can just add
 a reference here
 
- 
to the information that I
 just added to Wikidata, right?
 
- 
I can click Add Reference. 
- 
And then just say the reference
 URL is, and I just paste.
 
- 
I paste this URL. 
- 
Hit Enter. 
- 
And that's it. 
- 
And now the fact that she
 speaks Latin has a reference.
 
- 
If you look at the other
 things here on Wikidata,
 
- 
you can see that these IDs, for
 example, have references, too.
 
- 
Right? 
- 
In this case, the reference
 just says, excuse me--
 
- 
In this case it just as
 imported from English Wikipedia.
 
- 
But wait, you say, can
 Wikipedia be a source?
 
- 
Not properly, no. 
- 
I mean, just like Wikipedia
 itself doesn't cite itself.
 
- 
We don't say, this person
 was born in this city
 
- 
how do we know? 
- 
We read it on Wikipedia
 in another language.
 
- 
That's not a good citation. 
- 
It's not a good
 citation for Wikidata
 
- 
either so why do we put it here? 
- 
Well you can see the qualifier
 here is different, right?
 
- 
It's not reference URL, which
 is what I put in for Latin here.
 
- 
It's not reference URL here,
 it's a different qualifier.
 
- 
It says-- saying, imported from. 
- 
So this is not an
 actual reference that
 
- 
supports this piece of data. 
- 
It just shows where did
 this data come from.
 
- 
It's a slightly different
 thing, because this data was
 
- 
mass imported into Wikidata. 
- 
So it wasn't input by
 hand by some volunteer.
 
- 
It was imported into Wikidata
 en masse by a script,
 
- 
by a program. 
- 
And we want to know, where
 did this number come from?
 
- 
Well it came from
 English Wikipedia.
 
- 
So again, that's not
 a proper reference
 
- 
for the validity
 of the information,
 
- 
but it does at least tell us
 it came from English Wikipedia.
 
- 
We can click and look on
 English Wikipedia and find out.
 
- 
Maybe there's a
 footnote there that
 
- 
says where it did come from. 
- 
OK. 
- 
So this was an example of
 teaching Wikidata something
 
- 
that it didn't know. 
- 
Something about the languages. 
- 
And of course I could add
 this reference for English.
 
- 
I could add all the other
 languages that she speaks.
 
- 
And I won't bore you with
 that, but that is basically
 
- 
how it's done. 
- 
So you click this Add to
 add a completely new--
 
- 
completely new statement. 
- 
Now, by the way, the fact
 that these are the only two
 
- 
suggestions that
 Wikidata can think of,
 
- 
doesn't mean these
 are the only options.
 
- 
OK, you can just type
 anything that may be relevant.
 
- 
We could add, for
 example, award.
 
- 
Just start typing award. 
- 
And here I have I have
 a bunch of properties
 
- 
that are relevant for awards. 
- 
Awards received, together
 with, conferred by, right?
 
- 
There's all kinds of properties
 that I could rely on.
 
- 
And of course there is a list of
 all the properties of Wikidata.
 
- 
And that list is
 also sorted by type.
 
- 
So yes, there is a list of
 properties relevant to people
 
- 
so that you don't have to guess. 
- 
But a surprising
 amount of the time
 
- 
you can just start typing
 and get the right properties
 
- 
suggested to you. 
- 
OK. 
- 
So we taught Wikidata
 something new,
 
- 
and now let's teach Wikidata
 something completely new.
 
- 
Right? 
- 
So how do we create
 a new Wikidata item?
 
- 
So, like I said, if I
 created a Wikipedia article
 
- 
about something that was
 not previously covered
 
- 
on any other
 Wikipedia, chances are
 
- 
there would not be an already
 existing Wikidata item.
 
- 
Sometimes there might
 be, because Wikidata
 
- 
does have 25 million entities. 
- 
But sometimes there wouldn't be. 
- 
So, first of all, I could
 search for it, right?
 
- 
So I could go to Wikidata
 to the search box
 
- 
here and just start typing, and
 search for what I want, right?
 
- 
So if I'm searching for Helen
 Dewitt I just say Helen,
 
- 
and I can see whether
 or not it exists.
 
- 
And there's a detailed search
 results page, et cetera,
 
- 
where I can where I can find out
 if the item does exist or not.
 
- 
Excuse me, this reminds me
 of a very important thing
 
- 
I wanted to
 demonstrate, and that
 
- 
is the multilingualism
 of Wikidata.
 
- 
So remember all these
 labels in other languages.
 
- 
Wikidata knows what to call
 Helen Dewitt in Hebrew.
 
- 
And it will show it to Wikidata
 users whose language is Hebrew.
 
- 
Mine is set to
 English, for your sake.
 
- 
But if I change this I go to
 Preferences here and change
 
- 
my language. 
- 
[INAUDIBLE] All
 right, and I hit Save.
 
- 
Wikidata will start
 talking to me in Hebrew.
 
- 
Now brace yourselves. 
- 
Are you ready? 
- 
Don't panic, it's right to left. 
- 
Oh my god everything
 is topsy-turvy.
 
- 
So this is the same
 article in Hebrew.
 
- 
So the sidebar has
 switched direction,
 
- 
and I know most of
 you cannot read it.
 
- 
Bear with me. 
- 
This is the label
 that we previously
 
- 
saw in the label box. 
- 
This is how you spell
 Helen Dewitt in Hebrew.
 
- 
And here is the
 description in Hebrew.
 
- 
It's not the description in
 English, this description,
 
- 
American writer, which
 I was shown previously.
 
- 
Now I'm shown the Hebrew
 description, appropriately.
 
- 
But more interestingly,
 oh my god!
 
- 
All these statements
 are suddenly in Hebrew.
 
- 
How did that happen? 
- 
Well this tiny word here
 is the very concise way
 
- 
to say in Hebrew, instance of,
 and this word here means human.
 
- 
So these are links to
 the same things, right?
 
- 
It still links to Q5. 
- 
Q5 is the Wikidata
 entity for human.
 
- 
These are still the same things. 
- 
But because Wikidata has
 multiple labels for everything,
 
- 
it has multiple
 labels for items.
 
- 
And it also has multiple
 labels for property names.
 
- 
So Wikidata knows how
 to say, instance of,
 
- 
and award received,
 in other languages.
 
- 
That is why it is able to show
 me all this data in Hebrew
 
- 
even if none of that data was
 actually input into Wikidata
 
- 
by a Hebrew speaker. 
- 
That data could have been
 input by English speakers,
 
- 
but thanks to the
 fact that someone once
 
- 
translated the word
 photo into Hebrew,
 
- 
I can see this field in Hebrew. 
- 
So one of the things you
 can do to help Wikidata,
 
- 
right now, without
 any special knowledge
 
- 
is to help translate
 those labels.
 
- 
Every label only needs to
 be translated just once.
 
- 
So you can see that all
 of these properties, date
 
- 
of birth, name et cetera,
 they all have Hebrew labels.
 
- 
Maybe one of these would not. 
- 
No, they all have Hebrew labels. 
- 
Doing pretty good. 
- 
And I'm able to search
 in my own language.
 
- 
I'm able to click Add. 
- 
This word is Add,
 so I click this,
 
- 
and now I have the Add screen. 
- 
It all speaks my language,
 and it's awesome.
 
- 
And now for your sake I
 will switch back to English,
 
- 
but it is important
 to know you can
 
- 
edit Wikidata in any language. 
- 
And it is far more multi-lingual
 and multi-lingual friendly
 
- 
than, for example commons, which
 is also a project we all share.
 
- 
But commons has some limitations
 on how multi-lingual it is.
 
- 
For example, the category
 names, et cetera.
 
- 
OK. 
- 
So we were beginning
 to discuss creating
 
- 
something completely new. 
- 
AUDIENCE: Quick
 questions, if that's OK?
 
- 
So there's two questions on IRC. 
- 
The first one is, can you
 show search for something
 
- 
like getting the list of things? 
- 
I want to learn how to search
 for something properly like,
 
- 
show me all the items with
 this value of this property.
 
- 
ASAF BARTOV: Yes. 
- 
That is part of
 this talk, but I'll
 
- 
get to that in a
 little bit later.
 
- 
There's a whole section where I
 will demonstrate the very, very
 
- 
powerful query
 system of Wikidata
 
- 
where I will cash
 that check that I gave
 
- 
at the beginning of
 all these painters
 
- 
who are sons of painters
 queries et cetera
 
- 
So I will demonstrate
 how to do that.
 
- 
AUDIENCE: Other question. 
- 
How does Wikidata data deal
 with link rot, and other issues
 
- 
streaming from their URL refs. 
- 
ASAF BARTOV: URLs break. 
- 
We call that link rot. 
- 
Wikidata doesn't have
 any particular magic
 
- 
around link rot,
 just like Wikipedia.
 
- 
So if you do use a bare
 URL it may well rot.
 
- 
But you can add qualifiers
 with back up URLs else
 
- 
on the Internet Archive, or
 another mirroring service.
 
- 
And potentially that could be
 a software feature for Wikidata
 
- 
to automatically save
 or ensure that something
 
- 
is saved on Internet
 Archive, but I don't
 
- 
know that it is doing so now. 
- 
So, just like Wikipedia, if
 it is a bear URL it may rot.
 
- 
And may need to be
 replaced, possibly by bot.
 
- 
Other questions? 
- 
All right, so let's
 talk about how you
 
- 
create a completely new item. 
- 
It's very simple. 
- 
You go to Wikidata and you
 click here on the side.
 
- 
There's a link, create new item,
 which gives you this screen.
 
- 
And let's create an
 item about a book
 
- 
that I'm reading right now
 by this Bulgarian writer.
 
- 
So we have an article about this
 writer guy named Deyan Enev.
 
- 
But we don't have an
 article or a Wikidata item
 
- 
about one of his famous
 books called Circus Bulgaria.
 
- 
That's the book I'm reading,
 his first collection
 
- 
of short stories in English. 
- 
Circus Bulgaria came out
 in 2010, Portobello Books,
 
- 
translated by Kapka Kassabova. 
- 
So that's the book I'm reading. 
- 
As you can see it's not
 a link on Wikipedia.
 
- 
There's no article about
 it, and there's not even
 
- 
a Wikidata entity item about it. 
- 
But we can totally create
 it, even without a Wikipedia
 
- 
article. 
- 
So let's create this new item. 
- 
Let's create it in
 English for the purposes
 
- 
of our demonstration. 
- 
The name of the item
 is Circus Bulgaria.
 
- 
Circus Bulgaria,
 that's the name.
 
- 
Not Circus Bulgaria
 parentheses book,
 
- 
or anything you may be
 used to from Wikipedia.
 
- 
It's the actual
 name of the book,
 
- 
and the description,
 again, remember,
 
- 
the description field
 is just to kind of help
 
- 
tell apart this Circus Bulgaria
 from any other potential Circus
 
- 
Bulgaria. 
- 
Maybe there's a
 film or something.
 
- 
So it's enough to just say
 something like short story
 
- 
collection. 
- 
I might add by Deyan Enev
 and if just in case, again,
 
- 
some future other short story
 collection by some other author
 
- 
happens to have that same name. 
- 
That should be
 disambiguating enough.
 
- 
OK. 
- 
Short story collection
 by Deyan Enev.
 
- 
I could have aliases for this. 
- 
The aliases assist find-ability. 
- 
This particular book has just
 this one name, so that's fine.
 
- 
And I click Create. 
- 
That's it. 
- 
I just start with a
 label, and a description.
 
- 
I click Create. 
- 
I have a brand new queue number
 for my new Wikidata item.
 
- 
And Wikidata knows
 what to call it.
 
- 
And a description in
 one language at least.
 
- 
And that's it, and I
 can start populating it.
 
- 
As it can see, it it
 has no site links,
 
- 
but it's ready to be taught. 
- 
So, for example, I
 can start by teaching
 
- 
it the name of the book
 in another language
 
- 
that I happened to speak. 
- 
Now it has two labels
 in English and Hebrew.
 
- 
I could also look
 up the book Areon,
 
- 
the original Bulgarian
 label for this book.
 
- 
Seems relevant. 
- 
Again, I do not speak Bulgarian. 
- 
But I can go to the Bulgarian
 Wikipedia through into Wiki.
 
- 
This is this gentleman. 
- 
And I could find-- 
- 
I can read Cyrillic so
 I could easily find--
 
- 
when I say easily-- 
- 
when I say easily-- 
- 
maybe not so easy, but
 I can search for it.
 
- 
Here we go. 
- 
Tsirk Bulgaria. 
- 
That is the name of the book. 
- 
Tsirk, as in circus. 
- 
No problem. 
- 
So I just copy this right here. 
- 
And I go back to my new item. 
- 
My new item, which is here,
 and I edit the Bulgarian field.
 
- 
And here it is. 
- 
Awesome. 
- 
All right. 
- 
But I still haven't told
 Wikidata anything about this.
 
- 
I know I'm talking about a book. 
- 
Wikidata that doesn't
 know that yet.
 
- 
So let's start by
 adding some statements.
 
- 
First of all, I click Add. 
- 
Wikidata sensibly
 says, how about we
 
- 
start with instance of. 
- 
Tell me what kind of animal--
 no, not kind of animal.
 
- 
What kind of thing are you
 trying to describe here?
 
- 
Well it's an instance of a book. 
- 
Not in Hebrew, please. 
- 
So it's an instance of a book. 
- 
I could even be a
 little more specific
 
- 
and say it's an instance of
 a short story collection.
 
- 
There we go, short
 story collection.
 
- 
I hit Save. 
- 
Awesome. 
- 
So now we know what
 kind of thing it is.
 
- 
It's not a human, it's not a
 mountain, it's not a concept.
 
- 
It's a short story collection. 
- 
Now I can add some other things. 
- 
See, Wikidata is
 already working for me.
 
- 
Because it's a short
 story collection
 
- 
it's offering me to populate
 these properties, and not
 
- 
other ones. 
- 
Publication date,
 original language,
 
- 
genre, country of origin,
 these are all relevant, right?
 
- 
So let's start with original
 language of the work
 
- 
is Bulgarian. 
- 
Not Bulgaria, Bulgarian. 
- 
This is the item I want to link. 
- 
Hit Save, and whatever. 
- 
Author. 
- 
Let's identify the author. 
- 
So the author, the main
 creator of the work,
 
- 
is that gentleman Deyan Enev. 
- 
And remember, he has
 a Wikipedia article.
 
- 
He also has a Wikidata entity. 
- 
So Wikidata does know about him. 
- 
So I hit Save, and I can add
 something about the translator.
 
- 
And what was that lady's name? 
- 
Kapka Kassabova. 
- 
Now it so happens that Wikidata
 already knows about this lady.
 
- 
See? 
- 
So I can just start typing
 and then just link to it.
 
- 
Awesome. 
- 
But what if it didn't? 
- 
What if it was translated
 by someone who isn't
 
- 
already covered on Wikidata? 
- 
Well I could just type
 the name as a string,
 
- 
but ideally I could
 create a Wikidata entity
 
- 
about this translator so
 that there is a possibility
 
- 
to link to her. 
- 
Now I might actually
 add a qualifier here
 
- 
because, she's not the
 translator of the book, right?
 
- 
She's the translator of
 the book into English.
 
- 
Right. 
- 
So the language that she
 translated into is English.
 
- 
Right? 
- 
This book-- remember
 I'm describing the book.
 
- 
The item is about the book. 
- 
So the book would have
 a different translator
 
- 
into Polish. 
- 
So this is an example of
 a property or a statement
 
- 
that doesn't make sense without
 one of those qualifiers.
 
- 
It's just not correct. 
- 
It doesn't make sense to
 say that translator is.
 
- 
The English translator, or
 even this English translator.
 
- 
In 50 years maybe there would
 be an additional English
 
- 
translation. 
- 
So that's an example of
 needing that qualifier.
 
- 
And of course I could go on
 and populate the other fields.
 
- 
We don't have to
 do that right now.
 
- 
Publication date, country
 of origin, et cetera.
 
- 
So this is already beginning
 to look like all those items
 
- 
that we already saw, but just
 a moment ago it didn't exist.
 
- 
Just a moment ago Wikidata
 had no concept of this work.
 
- 
This happens to be one
 of his notable works.
 
- 
So I could actually go to the
 item about Deyan Enev which
 
- 
has all this information
 already, occupation, languages,
 
- 
and add a property. 
- 
Remember, I'm not
 limited to these.
 
- 
I can add a property
 called notable works,
 
- 
and mention my new item. 
- 
Circus Bulgaria. 
- 
See? 
- 
My new item is
 showing up, and thanks
 
- 
to this description that I
 wrote, short story collection,
 
- 
it's already appearing here in
 the dropdown very conveniently.
 
- 
So I linked to this. 
- 
I hit Save. 
- 
Ideally again I should find
 some references showing
 
- 
that this is a
 notable work by him,
 
- 
but we won't spend
 time on that right now.
 
- 
But the point is we
 created a new item.
 
- 
We populated it a little bit. 
- 
We linked to it so that it's
 more discoverable by mentioning
 
- 
it in the author name, and
 of course the book item
 
- 
itself mentions the author
 and links to the author.
 
- 
So that's all good. 
- 
One last thing we shall do is
 give it some useful identifier
 
- 
so let's add, say, the
 Library of Congress record
 
- 
for this book. 
- 
OK. 
- 
So I have prepared
 this in advance.
 
- 
Ooh. 
- 
Just in time, with 80 seconds to
 go before it's giving up on me.
 
- 
Oh it has already
 given up on me.
 
- 
That is very unfortunate. 
- 
So I go to the Library of
 Congress and I find this book.
 
- 
I find this entry, right? 
- 
In the Library of Congress
 database about this book.
 
- 
And it has a permalink. 
- 
It has a kind of guaranteed
 to be permanent link.
 
- 
I can just copy that link,
 go back to my little book,
 
- 
and say the Library of Congress. 
- 
Yeah, LCCN, that's what they
 call their IDs, the call
 
- 
number. 
- 
And I paste it here. 
- 
I actually don't need the URL. 
- 
I need just a number. 
- 
And there we go. 
- 
I have added it,
 and now Wikidata
 
- 
knows how to find bibliographic
 information about this book.
 
- 
And any re-user of
 Wikidata, some program,
 
- 
some tool that connects
 books to authors
 
- 
or does statistical analysis or
 whatever, some future yet to be
 
- 
imagined tool
 could automatically
 
- 
find additional metadata on the
 Library of Congress site thanks
 
- 
to this connection
 that I just made.
 
- 
And of course I could
 add many other IDs
 
- 
to other catalogs
 around the world,
 
- 
and we won't do that right now. 
- 
You can see that it's now
 showing up under identifiers.
 
- 
So this is how we created
 a brand new piece of data.
 
- 
Questions about this,
 about creating new items?
 
- 
Yeah, all right. 
- 
So we've seen how to contribute
 to Wikidata on our own,
 
- 
kind of through-- 
- 
directly through Wikidata. 
- 
Now you may you may be
 thinking, but Asaf, this
 
- 
sounds like a ton
 of work recording
 
- 
all of these little tiny bits of
 information about every person
 
- 
and every book and every town. 
- 
And if you think that
 you would be correct.
 
- 
That is a ton of work. 
- 
It's a lot of work. 
- 
However, it is centralized, so
 it is reusable on other wikis
 
- 
and we will show in just a
 moment how we pull information
 
- 
from Wikidata into
 Wikipedia or other projects.
 
- 
We will show that
 in just a moment.
 
- 
But here's an
 awesome little game
 
- 
that we Wikidata
 volunteer, Magnis Monska,
 
- 
has authored called the
 Wikidata game, in which he
 
- 
tricks people-- 
- 
sorry, helps people
 make contributions
 
- 
to Wikidata in a very,
 very easy and pleasant way.
 
- 
Let's look at the Wikidata game. 
- 
So the first thing you need
 to do in that Wikidata game
 
- 
is to log in,
 because the Wikidata
 
- 
game makes edits in your name. 
- 
So we need to authorize it. 
- 
It's perfectly safe. 
- 
And after you do that you
 can go to the Wikidata game.
 
- 
So this is the game. 
- 
Now I'm logged in. 
- 
And the Wikidata game
 actually includes
 
- 
a number of different games. 
- 
Let's start with a person game. 
- 
So Wikidata shows you-- 
- 
shows you an item, and asks
 you a very simple question.
 
- 
Person, or not a person? 
- 
So Wikidata goes through
 Wikidata entities
 
- 
that don't even have the
 instance of property.
 
- 
Which is why Wikidata
 doesn't know,
 
- 
literally doesn't know, if this
 is a person, or a mountain,
 
- 
or a city, or a country,
 or anything else.
 
- 
So it asks you, because this
 is the kind of question that
 
- 
Wikidata cannot
 decide on its own,
 
- 
but for us humans it's generally
 trivial to be able to say
 
- 
whether something that we're
 looking at is a person or not.
 
- 
It gets slightly trickier when
 the information is in Javanese,
 
- 
as it is here,
 rather than English.
 
- 
So this item happens to
 be described in Javanese.
 
- 
My Javanese, spoken in
 Indonesia, is very weak.
 
- 
However, I can tell that
 this is not a person.
 
- 
How can I tell? 
- 
Without understanding
 a word of Japanese
 
- 
I see that it mentions
 1000 kilometers
 
- 
and square kilometers, see? 
- 
So this is about a
 place, or an area,
 
- 
or a region, or whatever,
 but not a person.
 
- 
So this is an
 example of how even
 
- 
without understanding
 language you can sometimes
 
- 
make a determination. 
- 
However, of course,
 you should be sure.
 
- 
This is definitely not
 what the Wikipedia article
 
- 
about a person looks like. 
- 
So this is not a person. 
- 
I just click it and I'm
 shown the next item.
 
- 
This item is in another
 language I do not speak,
 
- 
and I just don't know. 
- 
I do not know if this is
 about a person or not.
 
- 
So I click Not Sure. 
- 
This is in Swedish, and
 it's about Sulawesi, still
 
- 
Indonesia. 
- 
And it is not about a person. 
- 
I have enough Swedish for that. 
- 
So I click not a person. 
- 
Now, you may say,
 well, do I really
 
- 
have to deal with all these
 languages that I don't speak?
 
- 
The answer is no. 
- 
You don't have to. 
- 
Here at the bottom
 of the Wikidata game
 
- 
there are settings. 
- 
You can click that
 and tell Wikidata,
 
- 
I cannot even read
 Chinese or Japanese,
 
- 
so please don't show me
 items in those languages.
 
- 
Because I wouldn't
 even be able to guess.
 
- 
I prefer these languages in
 which I can relatively easily
 
- 
make determinations. 
- 
And I can even tell Wikidata to
 only show me these languages.
 
- 
You see? 
- 
This was not selected,
 which is why I
 
- 
was shown some other languages. 
- 
I could say, only use
 these languages, and save.
 
- 
And now I can try
 this game again.
 
- 
However, that can
 slow it down a little.
 
- 
So here we go. 
- 
Here's a Spanish-- which
 is one of the languages I
 
- 
told Wikidata game it can use. 
- 
This is a Spanish item. 
- 
Now is it about a person or not? 
- 
It is not about a person. 
- 
Is it about a person? 
- 
No. 
- 
Yes, it is right? 
- 
Monk Cistercian, Pedro
 de Ovideo Falconi.
 
- 
That sounds like a person. 
- 
Frau Pedro Nasser. 
- 
Yeah, he was born
 in Madrid 1577.
 
- 
This is a person. 
- 
OK. 
- 
So I click person. 
- 
Again, if you're not
 sure, click not sure.
 
- 
The point is, just by clicking
 person and as you can see
 
- 
this would work
 very well on mobile,
 
- 
which is why I said you can
 contribute on your commute.
 
- 
You can just hold your
 phone or tablet or whatever,
 
- 
and just tap. 
- 
Person, not a person. 
- 
Person, not a person. 
- 
The amazing thing is that just
 tapping person has actually
 
- 
made an edit to Wikidata
 on my behalf, which
 
- 
I can find out, like every
 wiki, by clicking contributions.
 
- 
And as you can see in addition
 to the stuff about circus
 
- 
Bulgaria, my latest edit is in
 fact about this Pedro de Ovideo
 
- 
Falconi person. 
- 
And the edit was, you can-- 
- 
I hope you can see this, created
 the claim instance of human.
 
- 
So I added-- 
- 
I mean Wikidata game
 added for me the statement
 
- 
instance of human. 
- 
Now, the awesome thing is
 that it was super easy to do.
 
- 
I didn't have to go into that
 entity, click the Add button,
 
- 
choose the instance of property,
 choose human, hit Save.
 
- 
Instead of all these
 operations I just
 
- 
tapped on my screen,
 person, not a person.
 
- 
And I can do hundreds of
 edits during my daily commute.
 
- 
There are other games,
 like the gender game.
 
- 
So this is about-- 
- 
this is when Wikidata
 already knows
 
- 
that this item is a
 person, but it doesn't
 
- 
know the gender of this person. 
- 
Which is another one of
 the more basic items.
 
- 
And this is taking a long
 time because of the language
 
- 
limitations that I set on it. 
- 
I guess the less exotic
 languages have already
 
- 
been exhausted in the game. 
- 
We don't have to
 wait all this time.
 
- 
We can try something else. 
- 
How about occupation? 
- 
The occupation game. 
- 
Here we go, this is in Russian. 
- 
And what is the occupation
 of this gentleman?
 
- 
Well he is an [INAUDIBLE]. 
- 
He's a church person. 
- 
However, so the
 occupation game is
 
- 
where Wikidata game
 will automatically
 
- 
pull likely occupations
 from the article text
 
- 
and ask for confirmation. 
- 
So if he-- if this person
 really is a deacon,
 
- 
I should click that. 
- 
But I'm not sure. 
- 
I'm not clear on the Russian
 church's distinctions between--
 
- 
I mean [INAUDIBLE]
 is pretty senior,
 
- 
but I don't know if that
 automatically also means
 
- 
he's a deacon or not. 
- 
And [INAUDIBLE] is
 not listed here.
 
- 
So I will click not listed. 
- 
Also, these guesses
 are not always correct.
 
- 
So, this guy for
 example, is in Russian.
 
- 
I can read this. 
- 
He's a philologist. 
- 
He's a linguist. 
- 
So I can confirm it
 and click linguist.
 
- 
All right? 
- 
And again, if we look
 at my contributions
 
- 
we can see the Wikidata
 game on my behalf
 
- 
created occupation linguist. 
- 
OK. 
- 
Just by typing linguist there. 
- 
Now if it's taken
 from the article,
 
- 
why would it ever be wrong? 
- 
Well Jesus was the
 son of a carpenter.
 
- 
The word carpenter
 appears in the text.
 
- 
That doesn't mean it's correct
 to say Jesus was a carpenter.
 
- 
OK? 
- 
Just a trivial example, right? 
- 
So many, many articles will say,
 you know, born to a physician.
 
- 
And so the word physician
 could be guessed,
 
- 
but it wouldn't be correct
 unless the son is also
 
- 
a physician. 
- 
So I hope it gives
 you the gist of it.
 
- 
There is also a
 distributed Wikidata game,
 
- 
which is pretty awesome. 
- 
Here we go, which
 has additional games.
 
- 
So, for example, the
 key on game gives you,
 
- 
maybe it gives you,
 some items to play with.
 
- 
Yes? 
- 
No? 
- 
OK. 
- 
So it gives you
 this little card,
 
- 
and asks you to confirm is this
 instance of human settlement?
 
- 
That is, is it a village,
 town, city, whatever.
 
- 
Is it a kind of human
 settlement or not?
 
- 
Or maybe it's a book. 
- 
Maybe it's a poem. 
- 
Again, so, is it an
 English settlement?
 
- 
And you can click the languages
 here to see the information.
 
- 
So I can click English. 
- 
And indeed the article-- 
- 
I mean the actual
 Wikipedia article
 
- 
says Camigji is a
 town and territory
 
- 
in this district in the Congo. 
- 
So yes, this is an instance
 of human settlement.
 
- 
So I clicked yes. 
- 
And just clicking yes
 again went to that item,
 
- 
and added property
 of human settlement.
 
- 
Now the point of
 all these games is
 
- 
these are tools,
 written by programmers,
 
- 
making kind of semi educated
 guesses about these fairly
 
- 
basic properties. 
- 
And they are meant to
 semi automate, to assist,
 
- 
in the accumulation of all
 these important pieces of data.
 
- 
Now every single
 click here helps
 
- 
Wikidata give better
 results, richer results
 
- 
in future queries. 
- 
Again, as of right now
 Wikidata can include Camigji
 
- 
if I ask it, you know, what
 are some towns in Congo?
 
- 
Until now it could not. 
- 
Because it literally
 didn't know.
 
- 
So every time we click male,
 female, person, not a person,
 
- 
make these decisions,
 we help improve Wikidata
 
- 
and enrich the results
 that we could receive.
 
- 
Any questions about this, about
 kind of micro contributions
 
- 
through the Wikidata game? 
- 
If that looks
 appealing I encourage
 
- 
you to go and visit
 the Wikidata game
 
- 
and start contributing
 in that way.
 
- 
There is a question here. 
- 
If I make an article about
 Circus Bulgaria how should
 
- 
I correctly connect them? 
- 
That is an excellent question. 
- 
So once-- so now there is a
 Wikidata item about that book,
 
- 
but there is no Wikipedia
 article anywhere.
 
- 
Now suppose I write one
 in, Bulgarian maybe,
 
- 
you go to Wikidata. 
- 
You find the item by searching. 
- 
You find the item, and then
 the empty site links section
 
- 
right at the bottom there-- 
- 
where are we? 
- 
We have this? 
- 
Circus Bulgaria. 
- 
Let's demonstrate this. 
- 
So here is the item
 about the book.
 
- 
Let's say that now
 there is an article
 
- 
because I just created it. 
- 
I can go here to the empty
 Wikipedia link section,
 
- 
click Edit, type the
 name of the wiki,
 
- 
let's say English, and then
 type the name of the page
 
- 
that I just created. 
- 
Circus-- right? 
- 
And again, it offers
 me auto-complete
 
- 
for my convenience. 
- 
Now we don't actually
 have the article created,
 
- 
but I could let's just
 say this was the article.
 
- 
I can just click this,
 hit Save, and that
 
- 
would associate the
 new Wikipedia article
 
- 
with this Wikidata item. 
- 
That is the beginning of the
 inter-wiki list for this item.
 
- 
I will not click
 Save Now, because we
 
- 
didn't have the article yet. 
- 
So I hope that
 answers that question.
 
- 
Was there another question
 that I missed here?
 
- 
No. 
- 
OK. 
- 
Any questions about
 the Wikidata game?
 
- 
About this idea of
 micro contributions?
 
- 
If not then we can move
 on to embedding data,
 
- 
and after that we
 can discuss queries,
 
- 
how to get at all this
 data from Wikidata.
 
- 
So the short version of how
 to embed data from Wikidata
 
- 
is that there is this
 little magic incantation.
 
- 
Curly brace, curly brace,
 hash mark, property.
 
- 
It looks like a template, but
 it isn't because of that hash.
 
- 
And that is magic. 
- 
Take a look at this little
 demo that I prepared.
 
- 
This page, which is off
 my user page on meta,
 
- 
but it could be on any wiki. 
- 
OK. 
- 
Says, since San Francisco
 is item Q62 in Wikidata,
 
- 
and since population is
 property P1082, I can tell you
 
- 
that according to Wikidata the
 population of San Francisco
 
- 
is this. 
- 
And this bolded number here was
 produced with this incantation.
 
- 
Curly brace, curly brace,
 hash mark, property P1082,
 
- 
that's population,
 type from what item?
 
- 
Right? 
- 
Cause I'm pulling
 an arbitrary number.
 
- 
I could put any
 property in any item
 
- 
here, and kind of include
 it, embedded, into my text.
 
- 
This isn't even about-- you
 notice this is my user page.
 
- 
This isn't even the article
 about San Francisco.
 
- 
I just want to pull that
 number into this thing
 
- 
that I'm writing. 
- 
So it's fairly simple. 
- 
I identify the property. 
- 
I identify the item
 to take it from.
 
- 
And Wikidata will,
 I mean Wikipedia,
 
- 
or the wiki I'm on, in this
 case meta, will go to Wikipedia
 
- 
and fetch it for me. 
- 
Likewise, since Denny Vrandecic,
 the designer of Wikidata
 
- 
is item 18618629, right? 
- 
I mean, he's a notable person,
 so he has a Wikidata entity.
 
- 
And since occupation is property
 106, and date of birth is 569,
 
- 
and place of birth
 is 19, because
 
- 
of all that I can tell you
 that Vrandecic was born
 
- 
in Stuttgart, on this date,
 and is researcher, programmer,
 
- 
and computer scientist. 
- 
If you look at the source for
 this page, click Edit Source,
 
- 
you can see that the word
 Stuttgart does not appear here,
 
- 
because it came from Wikidata. 
- 
I did not write this into
 my little demo page here.
 
- 
See? 
- 
Place of birth is-- 
- 
where is it? 
- 
Here. 
- 
Born in property 19 from
 queue number so-and-so.
 
- 
That is how easy
 it is to pull stuff
 
- 
into a wiki from Wikidata. 
- 
OK now there's
 some nuance to it.
 
- 
And there's there are
 some additional parameters
 
- 
you can give. 
- 
And you can ask
 Wikidata to give you
 
- 
not just the text of the values,
 but actually make it links.
 
- 
So, for example, if I change
 this from property to values--
 
- 
No, that did not work at all. 
- 
Wasn't it values? 
- 
What was it? 
- 
Values and then-- 
- 
Oh, statements. 
- 
My bad, sorry. 
- 
The Magic word is statements. 
- 
Statements. 
- 
So going back here. 
- 
If I change the word property
 to the word statements
 
- 
here then this same value-- 
- 
that did not work at all. 
- 
Oh, because I'm on meta. 
- 
So because I'm on
 meta, meta doesn't
 
- 
have an article named
 researcher, programmer,
 
- 
or computer scientist. 
- 
But Wikipedia does. 
- 
If I included this same
 syntax in Wikipedia,
 
- 
like English Wikipedia,
 for example--
 
- 
So let's go there right now. 
- 
And go-- go to my-- 
- 
Go to my sandbox. 
- 
If I just brutally paste
 this on my sandbox here--
 
- 
So, see, these became links. 
- 
Because Wikipedia has an article
 called programmer and computer
 
- 
scientist. 
- 
So, like I said, there's
 some additional nuance
 
- 
to the embedding. 
- 
The important thing
 is that this is
 
- 
the key to delivering on that
 first problem that I mentioned.
 
- 
How to get data from
 a central location
 
- 
onto your wiki in your language. 
- 
Basically using property and
 statements magic incantations.
 
- 
And of course,
 usually, this would be
 
- 
in the context of an info box. 
- 
Some wikis-- English Wikipedia
 is not leading the way there.
 
- 
Some smaller wikis
 are more advanced
 
- 
actually in integrating
 Wikidata embeddings like this
 
- 
into their info boxes. 
- 
So that instead of
 the info box just
 
- 
being a template on the wiki
 with field equals value,
 
- 
field equals value. 
- 
That template of the
 info box on the wiki
 
- 
pulls the values, the birthdate,
 the languages, et cetera,
 
- 
pulls them from Wikidata. 
- 
So basically just-- I just
 demonstrated single calls
 
- 
to this, but of course
 an info box template
 
- 
would include maybe
 20 or 40 such embeds,
 
- 
and that is not a problem. 
- 
Of course, before you go and
 edit the English Wikipedia's
 
- 
info box person and replace
 it all with Wikidata embeds,
 
- 
you should discuss it with the
 English Wikipedia community.
 
- 
These discussions have
 already been taking place.
 
- 
There are some
 concerns about how
 
- 
to patrol this, how to keep
 it newbie friendly, et cetera.
 
- 
So there are legitimate concerns
 with just moving everything
 
- 
to be embedded from Wikidata. 
- 
But the communities are
 gradually handling this.
 
- 
I mean this ability to embed
 from Wikidata is not very old.
 
- 
It's been around
 for about a year.
 
- 
So communities are
 still working on kind
 
- 
of integrating that technology. 
- 
But that is that is kind
 of just the basics of how
 
- 
to pull data, individual bits
 of data, that's not querying,
 
- 
that's not asking those sweeping
 questions that I was talking
 
- 
about yet. 
- 
We'll get to that
 right now this is
 
- 
how to pull a specific datum,
 a specific piece of data,
 
- 
from Wikidata. 
- 
OK. 
- 
So here's another quick
 thing to demonstrate
 
- 
before we go to
 queries, and that
 
- 
is the article placeholder. 
- 
The article placeholder
 is a feature
 
- 
that is being tested on the
 Esperanto Wikipedia, and maybe
 
- 
another wiki, I don't remember. 
- 
And it is using the
 potential of Wikidata
 
- 
to offer a placeholder
 for an article.
 
- 
An automatically generated
 Wikidata powered replacement
 
- 
placeholder for an article
 for articles that don't yet
 
- 
exist on Esperanto. 
- 
So let's go to the
 Esperanto Wikipedia.
 
- 
I don't speak Esperanto. 
- 
But let's look for Helen
 Dewitt, our friend,
 
- 
in Esperanto Wikipedia. 
- 
Now Esperanto is not
 one of the Wikipedias
 
- 
that have an article
 about Helen Dewitt.
 
- 
And so it tells me that, right? 
- 
There is no Helen Dewitt. 
- 
Maybe you were looking
 for Helena Dewitt.
 
- 
No, I was not. 
- 
You can start an article
 about Helen Dewitt.
 
- 
You can search. 
- 
You know, there's
 all this stuff.
 
- 
But there is also this
 little option here, hiding,
 
- 
which tells me that the
 Esperanto Wikipedia is--
 
- 
what's happening here? 
- 
Yes. 
- 
The Esperanto Wikipedia is
 ready to give me this page.
 
- 
This page, as you can see, it's
 on the Esperanto Wikipedia,
 
- 
but it's not an article. 
- 
See, it's a special page. 
- 
It's machine generated. 
- 
You can see the URL as well. 
- 
It's not, you know,
 slash Helen Dewitt.
 
- 
It's slash specialio,
 about topic,
 
- 
and then the Wikidata
 ID of Helen Dewitt.
 
- 
And what I get here-- 
- 
I get an English
 description, by the way,
 
- 
because there is no
 Esperanto description.
 
- 
Wikidata can't make it up. 
- 
But what it can do is
 offer me these pieces
 
- 
of data in my language,
 in this case Esperanto.
 
- 
I'm on the Esperanto Wikipedia. 
- 
OK. 
- 
So it tells me that she's
 American, for example,
 
- 
and it tells me
 that in Esperanto.
 
- 
OK and it tells me
 that she speaks Latin.
 
- 
Remember we taught
 Wikidata that?
 
- 
It tells me that she
 was educated in Oxford,
 
- 
you know, and gives me the
 references to the extent
 
- 
that they exist. 
- 
I mean this is not an article. 
- 
It's not, you know, paragraphs
 of fluent Esperanto text.
 
- 
But it is information
 that I can understand
 
- 
if I speak this language. 
- 
And it's better than nothing. 
- 
And remember Helen Dewitt was
 not a very detailed article.
 
- 
If I were to ask about, I
 don't know, some politician,
 
- 
or popular singer that
 has more data in Wikidata,
 
- 
than this machine generated
 thing would have been richer.
 
- 
So this feature is available
 and is under beta testing
 
- 
right now, but generally if
 this sounds interesting for you
 
- 
especially if you come
 from a smaller wiki that
 
- 
is missing a lot of articles
 that people may want to learn
 
- 
about, you can contact
 the Wikimedia foundation
 
- 
and ask for article placeholder
 to be enabled on your wiki.
 
- 
And again, this
 is a placeholder.
 
- 
Of course, it exists only
 until someone actually
 
- 
writes a proper Esperanto
 article about Helen Dewitt.
 
- 
So I hope this is clear. 
- 
This is all coming from
 Wikidata on the fly.
 
- 
In real time. 
- 
As you can see it includes my
 latest edits to Helen Dewitt.
 
- 
OK. 
- 
Questions about the-- questions
 about the article placeholder?
 
- 
If there are try and
 put them on the channel.
 
- 
And this brings us to one of
 the main courses of this talk,
 
- 
which is querying Wikidata. 
- 
So I've explained
 how Wikidata works.
 
- 
We've walked through it. 
- 
We've added to it. 
- 
We've created a new item. 
- 
We learned how to contribute
 during our commutes.
 
- 
And all this was you
 kept promising us,
 
- 
Asaf, that this would be-- 
- 
this would enable
 these amazing queries.
 
- 
So time to make good on that. 
- 
The URL you need to remember
 is query.wikidata.org.
 
- 
And that will take you
 to a query system that
 
- 
uses a language called SPARQL. 
- 
SPARQL, spelt with
 a Q. This language
 
- 
is not a Wikimedia creation. 
- 
It's a standardized language
 used for querying linked data
 
- 
sources. 
- 
And because of that
 there are there
 
- 
are certain usability prices
 that we pay for using SPARQL,
 
- 
for using a standard language. 
- 
It's not completely custom
 made for querying Wikidata,
 
- 
and we'll see that
 in just a moment.
 
- 
The principle to
 remember about Wikidata
 
- 
query is that Wikidata will
 tell you everything it knows,
 
- 
but no more. 
- 
I have anticipated this
 several times already, right?
 
- 
Until this moment when
 we taught Wikidata data
 
- 
that Helen Dewitt
 speaks Latin, she
 
- 
would not have appeared
 in query results
 
- 
asking who are American
 writers who speak Latin?
 
- 
She would not have appeared. 
- 
But as of this
 afternoon, she will
 
- 
appear because I've added
 that piece of information.
 
- 
So a result of that principle
 is that you can never say,
 
- 
well I ran a Wikidata
 query and this
 
- 
is the list of Flemish painters
 who are sons of painters.
 
- 
The list. 
- 
That these are all
 the Flemish painters
 
- 
who are sons of painters. 
- 
That is never something you can
 say based on a Wikidata query,
 
- 
because of course, maybe
 not all the Flemish painters
 
- 
who are sons of painters have
 been expressed in Wikidata data
 
- 
yet. 
- 
Wikidata doesn't know
 about some of them,
 
- 
or maybe it knows
 about all of them
 
- 
but doesn't know
 the important fact
 
- 
that this person is
 the son of that person,
 
- 
because those properties
 have not been added.
 
- 
And so they cannot be
 included in the results.
 
- 
So the results of
 a Wikidata query
 
- 
are never the definitive sets. 
- 
What you can say about
 a Wikidata query is here
 
- 
are some Flemish painters
 who are sons of painters.
 
- 
Here are some cities
 with female mayors.
 
- 
Whatever it is
 you're querying about
 
- 
is never guaranteed
 to be complete
 
- 
because Wikidata,
 like Wikipedia, is
 
- 
a work in progress. 
- 
And of course, the more
 we teach Wikidata the
 
- 
more useful it becomes. 
- 
OK so lets go and
 see those queries.
 
- 
So this is query.wikidata.org. 
- 
It's not the wiki. 
- 
All right? 
- 
So this isn't like some
 page on the wiki itself.
 
- 
This is kind of an
 external system.
 
- 
So it's not a wiki. 
- 
You can see I don't
 have a user page here.
 
- 
I don't have a history tab. 
- 
This isn't a wiki page. 
- 
This is a special kind
 of tool or system.
 
- 
And it invites me to
 input a SPARQL query.
 
- 
Now most of us do
 not speak SPARQL.
 
- 
It's a a technical language. 
- 
It's a query language. 
- 
Some of you may be thinking
 about SQL, the database query
 
- 
language. 
- 
SPARQL is named with kind
 of a wink, or a nod, to SQL.
 
- 
But, I warn you, if
 you are comfortable in
 
- 
SQL don't expect to carry
 over your knowledge of SQL
 
- 
into SPARQL. 
- 
They're not the same. 
- 
They are superficially similar. 
- 
Right? 
- 
So they both use
 the keyword select,
 
- 
and they use the word where,
 and they use things like limit,
 
- 
and order. 
- 
So again, if you know
 this already from SQL
 
- 
those mean roughly
 the same things,
 
- 
but don't expect it to
 behave just like SQL.
 
- 
You do need to spend some time
 understanding how SPARQL works.
 
- 
So, by all means, I
 invite you to go and read
 
- 
one of the many fine
 SPARQL tutorials that
 
- 
are out there on the web, or
 to click the Help button here,
 
- 
which also includes
 help about SPARQL.
 
- 
But I also know
 that most of us when
 
- 
we want to do some advanced
 formatting on wiki,
 
- 
for example, we don't go
 and read the help page
 
- 
on templates, right? 
- 
We go to a page that already
 does what we want to do,
 
- 
and adopt and adapt the code
 from that other page, right?
 
- 
So we just take something that
 does roughly what we want,
 
- 
and just copy it over and
 change what we need to change.
 
- 
That is a very pragmatic
 and reasonable way
 
- 
to do things which is why-- 
- 
and the wiki data
 engineers know this,
 
- 
which is why they prepared
 this very handy button for us
 
- 
called examples. 
- 
We click the examples button. 
- 
And, oh my god, there is a ton
 of-- well there's 312 example
 
- 
queries for us to choose from. 
- 
And we can just
 pick something that
 
- 
is roughly like what
 we're trying to find out,
 
- 
and then just change
 what needs changing.
 
- 
So let's take a very simple one. 
- 
The cats query. 
- 
Maybe one of the simplest
 you could possibly have.
 
- 
And let's run it first
 and then I'll kind of
 
- 
walk you through it. 
- 
The goal here is not
 to teach you SPARQL,
 
- 
but to get you to be kind
 of literate in SPARQL.
 
- 
To kind of understand why
 this does what it does.
 
- 
So let's run this query first. 
- 
We click Run and here I
 have results at the bottom.
 
- 
The item, which is
 just a Wikidata item,
 
- 
which of course is a number. 
- 
Remember, wiki data thinks
 of items as queue numbers.
 
- 
And the label,
 because we're humans
 
- 
and we prefer words to numbers. 
- 
So these 114 results
 are all the cats
 
- 
that wiki data knows about. 
- 
Is this all the
 cats in the world?
 
- 
No of course not, remember? 
- 
It's all the cats Wikidata
 knows about, which
 
- 
means they're somehow notable. 
- 
I mean someone bothered to
 describe them on Wikidata.
 
- 
And Wikidata was told this
 item is an instance of cat.
 
- 
Right? 
- 
So these are those cats. 
- 
And we can click any of them. 
- 
I don't know,
 Pixel, for example.
 
- 
Click the Wikipedia item. 
- 
And here is the Wikidata
 item about Pixel
 
- 
with the queue number. 
- 
And he is a tortoiseshell cat. 
- 
And as you can see
 instance of cat.
 
- 
OK. 
- 
And he is five inches high. 
- 
And he is apparently documented
 in Indonesian, In Bahasa.
 
- 
Right here this is Pixel. 
- 
And he is apparently somehow
 related to the Guinness World
 
- 
Records book. 
- 
I don't speak Bahasa, so
 I don't know exactly why
 
- 
this cat is so notable. 
- 
But, of course, cats
 can become notable
 
- 
for all kinds of reasons. 
- 
Maybe they're a
 YouTube sensation,
 
- 
you know, maybe
 they were involved
 
- 
in some historical event. 
- 
I like this cat named Gladstone. 
- 
This cat named Gladstone is-- 
- 
he has position
 held Chief Mouser
 
- 
to Her Majesty's Treasury. 
- 
This is an official
 cat with a job.
 
- 
And he has been holding this
 job, mind you, since the 28th
 
- 
of June this past year. 
- 
That's the start time. 
- 
And there is no end time
 which means he currently
 
- 
holds the position
 of Chief Mouser
 
- 
to her Majesty's Treasury. 
- 
His employer is Her
 Majesty's Treasury.
 
- 
He's a male creature. 
- 
And Wikidata knows
 that this cat is
 
- 
named after William Gladstone,
 the Victorian prime minister.
 
- 
Of course if I don't
 know who this person is
 
- 
I can click through
 and learn that he
 
- 
was a liberal politician
 and prime minister, right?
 
- 
He even has a Twitter account. 
- 
And Wikidata sends
 me right to it.
 
- 
The treasury cat
 Twitter account.
 
- 
And he has articles in
 German, and English,
 
- 
and of course Japanese,
 because he's a cat.
 
- 
All right. 
- 
So this was a very simple query. 
- 
Let's find out why it works. 
- 
OK. 
- 
So what did we actually
 tell Wikidata to do for us?
 
- 
We said, please select
 some items for us
 
- 
along with their labels. 
- 
OK? 
- 
Along with their
 human readable labels
 
- 
because if I remove this
 label what I get is, see,
 
- 
just a list of item numbers. 
- 
That's not as fun. 
- 
So that's what this
 little bit did.
 
- 
I just said, give me the
 items, but also they're
 
- 
human readable label. 
- 
And I want you to
 select a bunch of items,
 
- 
but not just any
 random bunch of items,
 
- 
I want to select items where
 a certain condition holds.
 
- 
What is the condition? 
- 
The condition is that the
 item that I want you to select
 
- 
needs to have property
 31 with a value of Q146.
 
- 
Well, that's helpful. 
- 
If I hover over these numbers-- 
- 
Again, I get the human
 readable version.
 
- 
So I'm looking for
 items that have property
 
- 
instance of with the value cat. 
- 
Right? 
- 
Because that's literally
 what I want, right?
 
- 
I want all the items that have
 a property, a statement, that
 
- 
says instance of cat. 
- 
That's the condition. 
- 
I'm not interested in items
 that are instance of book,
 
- 
or instance of human. 
- 
I'm interested in
 instance of cat.
 
- 
That is the only condition
 here in this query.
 
- 
This complicated line I ask
 you to basically ignore.
 
- 
This is one of those
 sacrifices that we
 
- 
make for using a standard
 language like SPARQL.
 
- 
But the role of this
 complicated line
 
- 
is to basically
 ensure that we get
 
- 
the English label for that cat. 
- 
OK? 
- 
So don't worry about that. 
- 
Just leave it there. 
- 
And we run the query
 and we get the list
 
- 
of cats with their English
 labels, and that is awesome.
 
- 
By the way, if I change EN,
 without really understanding
 
- 
this line, if I change
 EN to HE, for Hebrew,
 
- 
I get the same results
 with a Hebrew label.
 
- 
Of course, these cats,
 nobody bothered to give them
 
- 
Hebrew labels unfortunately. 
- 
So I get the queue number. 
- 
But if I changed
 it to Japanese, JA,
 
- 
I would get still a bunch of
 queue numbers for where there
 
- 
isn't a Japanese label,
 but I would get the labels
 
- 
in Japanese. 
- 
OK? 
- 
So this is an example
 of how you don't even
 
- 
need to understand all
 the syntax of this query
 
- 
to adapt it to your needs. 
- 
If you want this
 query as is, but you
 
- 
want the labels in
 Japanese, you can just
 
- 
change the language code here. 
- 
OK so that is all
 this query does.
 
- 
Again, just give
 me the items that
 
- 
have property 31, instance of,
 with a value 146, which is cat.
 
- 
Let's take a question just
 about this very simple query
 
- 
before we advance to
 more complicated queries.
 
- 
Any questions just about this? 
- 
Like, did anyone kind of
 really lose me talking
 
- 
about this simple query? 
- 
Again, this query just tells
 Wikidata, get me all the items
 
- 
that somewhere among
 their statements
 
- 
have instance of cat. 
- 
That's the only condition. 
- 
No questions. 
- 
OK, feel free to ask if
 you'd come up with one.
 
- 
So let's complicate
 things a little.
 
- 
Let's ask only for male cats. 
- 
OK. 
- 
Remember this cat
 Gladstone is male,
 
- 
and we know this because
 he has a property called
 
- 
sex or gender, and the value
 is male creature, right?
 
- 
So let's add another
 condition right here
 
- 
under the first condition. 
- 
OK? 
- 
This is a new line. 
- 
And I'm adding a new
 condition to the query.
 
- 
I'm saying, not only do I
 want this item that you return
 
- 
to be instance of cat, I
 also want this same item
 
- 
to have another property,
 the property sex or gender.
 
- 
Right? 
- 
And I need to refer to
 the property by number.
 
- 
But don't worry,
 Wikidata will help you.
 
- 
So you start with this
 prefix, Wikidata WDDT.
 
- 
Again, just ignore
 that prefix it's
 
- 
one of the features of SPARQL
 that we need to respect.
 
- 
WDT colon, and then I can
 just type control space
 
- 
to do a search, to
 do an auto complete.
 
- 
So I can just type sex
 and Wikidata helpfully
 
- 
offers me a drop down
 with relevant properties.
 
- 
So I click property 21, which
 is the sex or gender property.
 
- 
And then I say, so I want
 the sex or gender property
 
- 
to have the Wikidata value. 
- 
Again, control space. 
- 
And I can just
 say male creature.
 
- 
See? 
- 
There's a different item
 for male, as inhuman,
 
- 
and a different one for
 male creature, for reasons
 
- 
that we won't go into. 
- 
Let's pick male
 creature, because we're
 
- 
talking about cats here. 
- 
All right. 
- 
And add a period here at
 the end and click Run.
 
- 
And instead of 114 cats, we get,
 this time, we got 43 results.
 
- 
Including our friend Gladstone
 who is a male creature cat.
 
- 
So that means all the
 rest are female, right?
 
- 
Wrong. 
- 
Wrong. 
- 
That does not mean that at all. 
- 
What it means is of
 the 114 items that
 
- 
have instance of cat,
 only 43 have explicitly
 
- 
sex male creature. 
- 
The rest of them do not. 
- 
Maybe because they have
 sex female creature,
 
- 
but maybe because they don't
 have that property at all.
 
- 
I'm emphasizing
 this to kind of help
 
- 
you train yourself to
 correctly interpret
 
- 
the results of
 queries from Wikidata.
 
- 
Don't jump into this kind
 of simplistic conclusion,
 
- 
OK there's 114 total, 43 male,
 therefore the rest are female.
 
- 
That is not correct. 
- 
OK? 
- 
But 43 of those explicitly
 had another statement, sex
 
- 
or gender, male creature. 
- 
So I just added
 another condition,
 
- 
and now my query is
 asking two separate things
 
- 
about the results. 
- 
They need to be a cat
 and a male creature.
 
- 
AUDIENCE: Maybe we
 should see how many
 
- 
cats have Twitter accounts. 
- 
But there is a
 question from YouTube,
 
- 
which is will you talk about
 the export possibilities
 
- 
of the result of the query? 
- 
ASAF BARTOV: Absolutely. 
- 
Absolutely I will in
 just a little bit.
 
- 
I mean there is, in
 addition to just getting
 
- 
this kind of table, I can get
 these results in other formats.
 
- 
And I can also
 download these results.
 
- 
I can click the Download
 button and get them
 
- 
as a comma separated
 file, tab separated
 
- 
file, a JSON file, which is
 useful for programmatic uses.
 
- 
I can also get a link. 
- 
So I can get a
 link to this query.
 
- 
I mean, I spent all this time
 designing this beautiful query.
 
- 
I can get a short URL that was
 generated especially for me
 
- 
right now with a tiny URL. 
- 
I can just paste this
 into Twitter and go,
 
- 
hey people look at all the male
 cats that Wikidata knows about.
 
- 
OK, this is not a
 very exciting query.
 
- 
But once I get to a really
 complicated exciting query
 
- 
I can totally share that
 very easily through this.
 
- 
And we will get to more
 interesting queries
 
- 
in just a second. 
- 
Any questions on this kind
 of basic querying so far?
 
- 
OK. 
- 
So that was a very
 simple example.
 
- 
Let's spend a moment exploring. 
- 
So this cat Gladstone was
 named after this dude, William
 
- 
Gladstone, who was an
 important British politician.
 
- 
I'm sure he's not the
 only thing out there
 
- 
in the universe that's named
 after Gladstone, right?
 
- 
I mean there has got
 to be, I don't know,
 
- 
park benches,
 planets, asteroids,
 
- 
something other than the
 cat, named after this guy.
 
- 
So we can ask Wikidata
 to tell us all the things
 
- 
that, you know, without
 saying instance of something.
 
- 
Like, I don't know, anything
 named after William Gladstone.
 
- 
So how do I do that? 
- 
Same principle. 
- 
Instead of asking about the
 property instance of, property
 
- 
31, instead of that, I
 will ask about the property
 
- 
named after-- 
- 
sorry, named after-- 
- 
I don't need to
 remember the number.
 
- 
I have auto-complete. 
- 
Named after is property 138. 
- 
And I want anything
 at all that is
 
- 
named after this person,
 William Gladstone.
 
- 
Here we go. 
- 
Which is 160852. 
- 
Whatever. 
- 
OK. 
- 
You notice I removed
 instance of cat.
 
- 
I remove the male creature. 
- 
I'm only asking,
 get me all the items
 
- 
that are somehow named after
 that particular politician.
 
- 
And I run the query,
 and it turns out
 
- 
the Wikidata knows
 about three such things.
 
- 
Does that mean that's
 the only-- these
 
- 
are the only three things
 named after him in the world?
 
- 
Of course not. 
- 
But these are the only three
 items that are in Wikidata
 
- 
and explicitly have the
 property named after Gladstone.
 
- 
For all I know, there
 may be a village
 
- 
in England called Gladstone
 named after this person.
 
- 
But if nobody added the
 property, named after, linking
 
- 
to the person, he wouldn't show
 up in the results to my query.
 
- 
So Wikidata knows about
 three such things.
 
- 
One of them is something
 called the Gladstone Professor
 
- 
of Government. 
- 
I can click through and see
 that it's a chair at Oxford
 
- 
University, right? 
- 
So it's a position. 
- 
And another is the William
 Gladstone school number 18.
 
- 
William Gladstone
 school number 18.
 
- 
Where is that? 
- 
That is in Sofia, Bulgaria. 
- 
Again. 
- 
All right, so that's a
 particular school in Bulgaria
 
- 
named after William Gladstone. 
- 
And finally, the third
 result is, of course, our pal
 
- 
Gladstone the Cheif Mouser. 
- 
If I click through,
 that's the cat.
 
- 
All right, so that
 was an example.
 
- 
I mean, you saw how easy it was. 
- 
I just named the property and
 the value that I care about,
 
- 
and I get the results. 
- 
Again, I mean, it's
 kind of a silly example,
 
- 
but think about it. 
- 
This is-- how else can
 you answer that question?
 
- 
There's no reference desk,
 even at a great University
 
- 
of Oxford, where you can
 walk in and say, give me
 
- 
a list of things
 named after Gladstone.
 
- 
There's no easy way to
 answer that unless you happen
 
- 
to have a very large
 structured and linked
 
- 
data store, like Wikidata. 
- 
All right, so that
 was a silly example.
 
- 
Let's take some-- 
- 
AUDIENCE: There's a
 bunch of stuff on there.
 
- 
ASAF: Oh, OK. 
- 
AUDIENCE: Can you show
 easy query on the video?
 
- 
And somebody needs to know
 how to just do property
 
- 
exists without giving
 a specific value.
 
- 
And then once you show easy
 query you reload the page and--
 
- 
ASAF: I don't know easy query. 
- 
So is that a gadget? 
- 
I don't know what easy query is. 
- 
I don't use it. 
- 
So someone can maybe
 send a link or something?
 
- 
Oh it is a gadget. 
- 
I don't have it enabled. 
- 
That is nice. 
- 
So now, what I just did by hand,
 by formulating the query named
 
- 
after Gladstone-- 
- 
I guess this is the-- 
- 
Is it? 
- 
Yeah. 
- 
So this-- I just
 clicked the three--
 
- 
the ellipsis here. 
- 
Right after the name. 
- 
You see this? 
- 
This was just added by
 enabling easy query,
 
- 
which I just learned about. 
- 
So you just click this
 and it auto-magically
 
- 
made this kind of trivial query. 
- 
Of course, if I want a more
 complicated query like,
 
- 
I don't know, give me
 all the things that
 
- 
are named after Lincoln
 but are a school,
 
- 
I will still need to kind
 of edit a custom query.
 
- 
But this is a super
 easy and very nice
 
- 
way of just doing a very super
 quick query for exactly this.
 
- 
Right? 
- 
Like. what other items have
 exactly this property and value
 
- 
named after William Gladstone? 
- 
So, thank you to whoever
 made this suggestion
 
- 
to demonstrate that, and
 I'm glad I learned something
 
- 
too today. 
- 
Let's move to
 another sample query.
 
- 
Here's a fun example. 
- 
Popular surnames among
 fictional characters.
 
- 
Think about that for a second. 
- 
Popular surnames among
 fictional characters.
 
- 
So we're asking Wikidata
 to go through all
 
- 
the fictional
 characters you know,
 
- 
and of those look through
 their surnames, group
 
- 
them so that you can count
 them, the repetitions
 
- 
of the surnames,
 and give me the most
 
- 
popular surnames among them. 
- 
Additionally, I want you to
 awesomely present the results
 
- 
as a bubble chart. 
- 
Oh, yeah. 
- 
Wikidata can do that. 
- 
And I run the query. 
- 
And check it out. 
- 
The most popular names
 among fictional characters
 
- 
we can say that knows about are
 Joan, Smith, Taylor, et cetera.
 
- 
I mean for all we know,
 the most popular name
 
- 
among fictional characters
 actually in the world
 
- 
may be Wu. 
- 
Or something in Chinese
 for all we know.
 
- 
But if that has not been
 modeled in Wikidata,
 
- 
we're not going to get that. 
- 
So Taylor, Smith,
 Jones, Williams,
 
- 
seem to be the
 most popular names.
 
- 
And again, I could limit this. 
- 
I could make the
 same query but add,
 
- 
only among works whose
 original language
 
- 
was Italian, for example, to get
 more interesting results if I
 
- 
only care about
 Italian literature.
 
- 
But this is an example of
 how I got awesome bubble
 
- 
charts for free, and
 I can just plug this
 
- 
into an awesome
 presentation that I make.
 
- 
Of course I can still
 look at the raw table.
 
- 
So the query still resulted
 in a bunch of data, right?
 
- 
So Smith repeats 41 times,
 Jones 38 times, Taylor 34 times,
 
- 
et cetera, et cetera. 
- 
And down that list. 
- 
And I could, again, I could
 export this into a file
 
- 
and load it up in a spreadsheet,
 and do additional processing
 
- 
on it. 
- 
I can link to it. 
- 
I can do all kinds of
 awesome things with it.
 
- 
So that's another awesome query. 
- 
We don't have to go into
 every line by line analysis
 
- 
here of why this
 works the way it does.
 
- 
I want to show you some
 other queries first.
 
- 
Let's look at-- this is just
 fun, overall causes of death.
 
- 
Again a bubble
 chart just looking
 
- 
at people who died
 of things, and have
 
- 
a cause of death listed. 
- 
And we learn that the most
 commonly listed cause of death
 
- 
is myocardial infarction,
 pneumonitis, cerebral vascular,
 
- 
lung cancer, et
 cetera, et cetera.
 
- 
And again, in a bubble chart. 
- 
And so how does that work? 
- 
So just very briefly, the
 important parts of this query
 
- 
are I'm looking for something,
 for some person, who
 
- 
is instance of 31, instance
 of Q5, which is human.
 
- 
So a human. 
- 
Again, just to kind
 of limit the query.
 
- 
I'm not interested in
 books or mountains.
 
- 
I'm looking for humans
 who have that same person,
 
- 
that same variable PID,
 should have a 509, meaning--
 
- 
Hello. 
- 
Why don't I have the-- 
- 
Yeah. 
- 
A 509, which is cause of death. 
- 
And that cause of death
 is another variable,
 
- 
that I'm calling CID. 
- 
Now, previously
 we were saying you
 
- 
know I want things
 that are named
 
- 
after Gladstone specifically. 
- 
Only things that have
 that particular value.
 
- 
Here I'm saying I'm
 looking for things
 
- 
that have some cause of death. 
- 
Not a specific one. 
- 
I just wanted to
 get everything that
 
- 
has a statement with some
 value about property 509
 
- 
cause of death. 
- 
OK? 
- 
And then this other bit of
 magic here, the group by,
 
- 
tells Wikidata I'm not
 actually interested
 
- 
in every individual thing. 
- 
I want you to group those
 causes, and then count them
 
- 
and give me the top ones. 
- 
So that's how this query works. 
- 
Here's that query I promised. 
- 
Painters whose fathers
 were also painters.
 
- 
I can only think of a couple. 
- 
I mean, Monet and Vogel. 
- 
But I'm sure Wikidata
 knows many more.
 
- 
So let's run this query. 
- 
And I have 100 results. 
- 
By the way, I have limited
 it to 100 results just
 
- 
to keep it kind of snappy. 
- 
But actually, we could
 maybe try removing the limit
 
- 
and see if Wikidata
 could tell us
 
- 
the total number in Wikidata. 
- 
Yeah, that wasn't too bad. 
- 
So 1,270 results. 
- 
OK. 
- 
Wikidata, already at this
 early date and it's progress,
 
- 
already knows about
 more than 1,200 painters
 
- 
who are sons of painters. 
- 
Sons of male painters, like
 their father is a painter.
 
- 
There may be
 additional painters who
 
- 
are sons of female painters
 not included in this query.
 
- 
Again, always remember what
 exactly you are asking.
 
- 
In this query I was
 asking about the father.
 
- 
I'm leaving out any
 possible painters who
 
- 
are sons of mother painters. 
- 
OK? 
- 
So how does this work? 
- 
I'm asking for the painter
 along with the human label,
 
- 
and the father along
 with the human label.
 
- 
So Michel Monet is the
 son of Claude Monet.
 
- 
And Domenico Tintoretto is the
 son of the famous Tintoretto
 
- 
whose label, you know, is just
 Tintoretto like Michelangelo.
 
- 
You know, you don't always
 have to have the full name
 
- 
in the common label. 
- 
Paloma Picasso is the
 daughter of Pablo Picasso.
 
- 
OK. 
- 
So Wikidata knows about
 all these results.
 
- 
Of course Holbein the Younger
 son of Holbein the Elder.
 
- 
And how did we get there? 
- 
Well we asked Wikidata
 to look for something,
 
- 
let's call it painter, which
 has 106, which is occupation,
 
- 
with a value painter. 
- 
Right? 
- 
This unwieldy number
 1028181, that's painter.
 
- 
So I'm asking for any item
 that has occupation painter.
 
- 
And let's call
 that item painter.
 
- 
I also want that painter to have
 a property 22, which is father.
 
- 
OK. 
- 
Father. 
- 
And I want it to
 have some value.
 
- 
OK, I'm putting it into
 another variable called father.
 
- 
I could have called
 it, you know, frog.
 
- 
That doesn't change
 anything, just to be clear.
 
- 
What matters is that this
 is the property father.
 
- 
I could have called
 it anything I want.
 
- 
So, and then, I have
 a third condition.
 
- 
That the father, like whatever
 it says here in property 22,
 
- 
I want that father to have
 himself a property 106
 
- 
occupation with a value painter. 
- 
OK? 
- 
These conditions
 combined to give me
 
- 
a list of people who have
 a father and that father
 
- 
has occupation painter as well. 
- 
Of course, if I suddenly,
 or if you suddenly,
 
- 
are consumed by
 curiosity to know
 
- 
who are some politicians
 who are sons of carpenters?
 
- 
You could just
 change that, right?
 
- 
Change the first value
 from painter to politician.
 
- 
Change the third line's value
 from painter to carpenter.
 
- 
Maybe that list
 will be very short
 
- 
because carpenters don't
 tend to be notable,
 
- 
so they wouldn't be
 represented on Wikidata.
 
- 
That's why this works relatively
 well with painters, right?
 
- 
Because most of
 them are notable.
 
- 
But generally you
 could do that, right?
 
- 
That's an example of
 how you can take a query
 
- 
and just replace one of those
 values, or even the language.
 
- 
So again, I could ask
 for these same painters.
 
- 
It's limited again. 
- 
These same painters,
 but with Arabic labels.
 
- 
Same query, but I have Arabic
 labels for these painters.
 
- 
And of course where
 there is no Arabic label
 
- 
I get the queue number. 
- 
OK? 
- 
So that's that query
 that I promised you,
 
- 
painters who sons of painters
 can be done by Wikidata
 
- 
in under one second. 
- 
How awesome is that? 
- 
We can also get some statistics. 
- 
So how about counting
 total articles
 
- 
in a given wiki by gender. 
- 
This is what we call
 the content gender
 
- 
gap, as distinct from the
 participation gender gap.
 
- 
This is the gender gap in
 what we cover on Wikipedia.
 
- 
So let's take one of these. 
- 
So this is a query. 
- 
Articles about women in
 some given Wikipedia.
 
- 
All right. 
- 
So let's take-- 
- 
I don't know. 
- 
Let's take the Tamil Wikipedia. 
- 
That's language code TA. 
- 
So I just put TA here. 
- 
And I click Run, and
 I get this count.
 
- 
That's all I wanted. 
- 
I'm not actually
 interested in the items,
 
- 
like in the list of women
 on the Tamil Wikipedia.
 
- 
I just want the number. 
- 
So I selected the count here. 
- 
And this number
 turns out to be 2159.
 
- 
So there are 2000
 articles about women
 
- 
the Tamil Wikipedia that
 Wikidata knows to be female.
 
- 
Right? 
- 
I'm asking about the gender
 field, property 21 again.
 
- 
Remember, if there's some
 article about a woman in Tamil
 
- 
Wikipedia, but wiki
 data doesn't have
 
- 
a statement about the
 gender, that person
 
- 
will not be counted here. 
- 
So again, be careful
 about kind of stating
 
- 
that is exactly the number
 of women articles on Tamil
 
- 
Wikipedia. 
- 
That's probably not true. 
- 
I'm sure some of those
 articles are missing
 
- 
a sex or gender or property. 
- 
But for raw statistics,
 that's probably good,
 
- 
because some men are also
 missing the sex or gender
 
- 
statistic property. 
- 
So we could take the
 same query for men.
 
- 
It's essentially the exact same. 
- 
It just has this unwieldy
 number for males, 6581097.
 
- 
I can change this language
 code again to TA for Tamil.
 
- 
And how many men are covered
 on Tamil Wikipedia 14,649.
 
- 
OK. 
- 
So women, 2,100, men,
 about seven times as many.
 
- 
Right? 
- 
So that's the approximate
 size of the content gender
 
- 
gap on Tamil Wikipedia. 
- 
And again, I can complicate
 this query as much as I want.
 
- 
For example, I can
 try and find out
 
- 
if this gender gap is wider
 or narrower among musicians,
 
- 
just as an example. 
- 
I could just add a line here
 that says occupation musician,
 
- 
and then I'm only
 counting articles
 
- 
on Tamil Wikipedia about
 musicians who are female
 
- 
versus articles
 on Tamil Wikipedia
 
- 
about musicians who are male. 
- 
And I can kind of
 compare the gender--
 
- 
the content gender gap across
 occupations on Tamil Wikipedia.
 
- 
Do you see the
 important point here?
 
- 
Is that this is not just
 kind of a one purpose query.
 
- 
I can just with a single
 additional conditional suddenly
 
- 
make it a much more interesting
 query, because I break it down
 
- 
by occupation. 
- 
Or I break it down by century. 
- 
Do we have more of the coverage
 gap in 19th century people
 
- 
than in 21st century people? 
- 
I mean, I sure hope so, right? 
- 
The patriarchy is
 weakening somewhat.
 
- 
So I wouldn't be surprised if
 there are many more notable men
 
- 
covered about the 19th century. 
- 
But if we are also covering-- 
- 
I mean it's the
 gender gap is just
 
- 
as wide for 21st century
 people, that would
 
- 
be a little disappointing. 
- 
Again that's something I
 can fairly easily find out
 
- 
on Wikidata query. 
- 
Any questions so far, or
 are you just sharing links?
 
- 
AUDIENCE: Yep there is one. 
- 
So somebody is wondering if you
 can demonstrate, or at least
 
- 
give a short answer of the
 latter of this question.
 
- 
Is it possible using
 in Wikidata SPARQL
 
- 
to find specific
 Wikidata articles, e.g.
 
- 
featured articles, of a
 certain language which do not
 
- 
exist in another language. 
- 
I know it is possible
 to find category based
 
- 
results using a PET scan tool. 
- 
But can we specify
 that by selecting e.g.
 
- 
featured articles? 
- 
ASAF BARTOV: Yes. 
- 
Excellent question. 
- 
It is possible, indeed. 
- 
And I will demonstrate
 one such query.
 
- 
Another query that
 I already mentioned
 
- 
largest cities in the
 world with a female mayor.
 
- 
This query-- let's
 close some of these tabs
 
- 
before my browser chokes. 
- 
So this query lists
 the major world cities
 
- 
run by women currently. 
- 
And the answer is Mumbai, Mexico
 City, Tokyo, bunch of others.
 
- 
And wait-- that's not it at all. 
- 
I clicked the wrong one. 
- 
That's the map of paintings. 
- 
OK. 
- 
Let's demonstrate
 that for a second.
 
- 
So this is the map
 of all paintings
 
- 
for which we know a location
 with the count per location.
 
- 
And the results are
 awesomely presented on a map.
 
- 
OK. 
- 
Again, under the hood this is
 a table, of course, of results.
 
- 
But, awesomely, I can
 browse it as a map.
 
- 
So here is a map of the
 world with all the paintings
 
- 
that Wikidata knows about. 
- 
Not just knows
 about the paintings,
 
- 
but knows about their
 location in a museum.
 
- 
Not surprisingly
 Europe is much better
 
- 
covered than Russia or Africa. 
- 
There is a huge gap in
 contribution to Wikidata
 
- 
from these countries. 
- 
And some of it can be fixed. 
- 
And of course there is much more
 documentation, and much more
 
- 
art in Europe. 
- 
But if we zoom in, I
 don't know, Rome probably
 
- 
has a few paintings. 
- 
Right? 
- 
Hello. 
- 
Sorry. 
- 
It's-- Yes. 
- 
Vatican City sounds
 like a good bet, right?
 
- 
I can zoom in here. 
- 
And I can just click
 one of these dots
 
- 
and see in this point
 there are two paintings.
 
- 
And in this one there is one
 and it's the Archbasilica
 
- 
of St. John Lateran. 
- 
Let's see, this is the
 actual St. Peter, right?
 
- 
Sistine Chapel has 23 paintings. 
- 
What? 
- 
The Sistine Chapel has way
 more than 23 paintings.
 
- 
Correct, but 23 of them
 are documented on Wikidata.
 
- 
Have their own item
 for the painting, not
 
- 
the Sistine Chapel,
 the painting has
 
- 
an item that lists its
 being in the Sistine Chapel.
 
- 
There are 23 of those. 
- 
OK. 
- 
There is definitely
 room to document
 
- 
the rest of the artworks
 in the Sistine Chapel.
 
- 
So, again, this is just
 not the kind of query
 
- 
you were able to
 make before Wikidata,
 
- 
and it's a fairly simple
 query, as you can see.
 
- 
There are examples using
 maps like airports within 100
 
- 
kilometers of Berlin. 
- 
Again using the coordinates
 as a useful data point.
 
- 
And here is a map showing me
 only airports within a 100
 
- 
kilometer radius from Berlin. 
- 
But I wanted to show
 you the mayors query.
 
- 
Let's click the-- oh I just
 have the wrong link here.
 
- 
But I can still find it
 here by typing mayor.
 
- 
Here we go, largest
 cities with female mayor.
 
- 
So this is a slightly
 more complicated query.
 
- 
But if I run it, I get the top
 10, because I set limit to 10.
 
- 
I get the top 10
 cities in the world,
 
- 
by population, size that
 are currently run by women.
 
- 
Tokyo, Mumbai, Yokohama,
 Caracas, et cetera.
 
- 
And one interesting thing that
 you may want to notice here
 
- 
is that I'm asking for cities. 
- 
I mean items, that
 are instance of city.
 
- 
And that have a
 head of government,
 
- 
that have some
 statement about who
 
- 
is in charge, and that statement
 has sex that's listed up here
 
- 
as female. 
- 
Don't worry about
 the syntax right now.
 
- 
I just want to show you
 some specific angle here.
 
- 
And I'm further
 filtering these results.
 
- 
I only want those items where
 there is not the property
 
- 
and the qualifier, end time. 
- 
Why is that important? 
- 
Because if a city once
 had a female mayor,
 
- 
but that mayor is not the mayor
 anymore, because mayors change,
 
- 
I don't want them in this query. 
- 
I want to query of
 cities currently having
 
- 
a female mayor. 
- 
And of course Wikidata
 may have historical data
 
- 
with start and
 end time, as we've
 
- 
seen, that documents this
 person was the mayor of Tokyo
 
- 
or San Francisco
 between these years.
 
- 
But if there is no
 end times that means
 
- 
they are currently the mayor. 
- 
So that's an example of
 asking about a qualifier
 
- 
of a statement, to again, to get
 the results we actually want.
 
- 
If we want current mayors it's
 important to put this filter.
 
- 
If we don't, we will get
 historical female mayors
 
- 
as well. 
- 
All right. 
- 
So these are some
 example queries.
 
- 
Questions about that? 
- 
Oh, the featured
 article example.
 
- 
So let's look at that. 
- 
So I have prepared
 such a query recently.
 
- 
Here we go. 
- 
So this is a query. 
- 
I just saved it here
 on my user page.
 
- 
I mean, this is
 not Wikidata query.
 
- 
This is just a meta page
 containing the query usefully.
 
- 
And let's run this. 
- 
So this query, it's actually
 not very complicated.
 
- 
It's just has a long
 list of countries,
 
- 
because I'm asking
 about African countries.
 
- 
OK. 
- 
I'm looking for human
 females from one
 
- 
of these countries that
 have an article in English.
 
- 
That's what this line means. 
- 
But not in French. 
- 
That's what this part means. 
- 
OK. 
- 
This part, these
 two lines together.
 
- 
But not in French. 
- 
And this is what's
 called a badge.
 
- 
That's Wikidata's concept of
 good and featured articles.
 
- 
It's called a badge. 
- 
So I want them to have some
 badge on English Wikipedia.
 
- 
OK? 
- 
So again, this query is
 asking for the top 100 women
 
- 
from Africa who are documented
 on English Wikipedia,
 
- 
in a featured or
 good article status.
 
- 
But not on French Wikipedia. 
- 
So this is a query that's
 a to-do query, right?
 
- 
That's a query
 for French editors
 
- 
to consider what they might
 usefully translate or create
 
- 
in French. 
- 
And if we run this see
 we have three results.
 
- 
I mean, we have many
 women from Africa
 
- 
covered on English Wikipedia. 
- 
But only three articles
 have featured or good status
 
- 
among those that do not have
 French Wikipedia coverage.
 
- 
Let me rephrase that. 
- 
Among the English Wikipedia
 articles about African women
 
- 
that don't have a
 French counterpart,
 
- 
only three are featured or good. 
- 
OK? 
- 
Do you see this? 
- 
The badge is good article. 
- 
This little incantation
 here is what allows
 
- 
you to ask about the badge. 
- 
This here. 
- 
And, by the way, the slides
 will be uploaded to commons.
 
- 
And we will-- how shall we make
 it available on the YouTube
 
- 
thing as well? 
- 
No, no. 
- 
But, I mean, for people who
 will later watch this video.
 
- 
Oh yeah, we can add it to
 the YouTube description
 
- 
and the comments description. 
- 
So in the-- if you're
 watching this video later,
 
- 
in the description, we will
 add a link to this query
 
- 
specifically. 
- 
Because it's not in
 the slides right now.
 
- 
It will be. 
- 
OK. 
- 
So. 
- 
Questions so far? 
- 
We're almost done. 
- 
We have a few minutes left. 
- 
So questions about queries? 
- 
I mean, I'm sure
 there's tons of things
 
- 
you don't know how to do yet. 
- 
And you maybe you didn't really
 get the sense for SPARQL.
 
- 
It's something you need
 to really do on your own
 
- 
on your computer. 
- 
See how it works. 
- 
Fiddle with it. 
- 
Change something. 
- 
See that it breaks
 and complains.
 
- 
But, very importantly-- oh I
 had this in the other questions
 
- 
slide. 
- 
Remember Wikidata project chat. 
- 
That's kind of the Wikidata
 equivalent of the village pump.
 
- 
It's the page on Wikidata
 where you can just
 
- 
show up and ask a question. 
- 
In my experience, the
 Wikidata community
 
- 
is very nice, very
 welcoming, and very eager
 
- 
to help newer people integrate
 and learn how to do things.
 
- 
There's also an IRC channel. 
- 
If you know what IRC is and
 how to use it, by all means,
 
- 
go to IRC channel Wikidata. 
- 
There's people
 there all the time,
 
- 
and you can just ask a question. 
- 
If you're trying to do a
 query, and you don't quite
 
- 
understand the syntax, or you're
 not sure how to get the result
 
- 
you want. 
- 
There are people there who
 will gladly help you do that.
 
- 
There is also a
 Wikidata newsletter
 
- 
published by the Wikidata team,
 which is centered in Germany
 
- 
and Wikipedia Germany. 
- 
And they send out a newsletter
 in English with Wikidata news.
 
- 
You know, new
 properties, new items,
 
- 
new things in the project. 
- 
But also sample queries. 
- 
So once a week there is
 kind of an awesome query
 
- 
to learn from, if you want
 to learn that way instead
 
- 
of reading like a
 whole manual on SPARQL.
 
- 
So I'm just encouraging
 you to get help
 
- 
in one of those channels. 
- 
Of course you can write to me. 
- 
Just reach out to me and
 ask me questions as well.
 
- 
I hope by now you agree
 that Wikidata is love,
 
- 
and Wikidata data is awesome. 
- 
If there are no questions,
 we do have a tiny bit of time
 
- 
to demonstrate one
 more tool but that's--
 
- 
no? 
- 
No questions. 
- 
OK so let's talk about-- 
- 
well, the resonator
 is kind of nice,
 
- 
but it's a little like
 the article placeholder.
 
- 
So this is not Wikidata
 this is a tool again
 
- 
built by Magnus Manske-- 
- 
AUDIENCE: There's also one
 final question to you in case--
 
- 
ASAF BARTOV: Oh,
 there is a question.
 
- 
AUDIENCE: Yeah. 
- 
ASAF BARTOV: Which
 advantages and disadvantages
 
- 
to create an item
 before an article is
 
- 
done on English Wikipedia? 
- 
Well, I mean, this example
 that I just made right.
 
- 
I'm reading this book
 by a notable author.
 
- 
OK. 
- 
I want this to
 exist on Wikidata,
 
- 
and to be mentioned
 on Wikidata, so
 
- 
that when people look up
 that author in Wikidata
 
- 
they will know about one
 of his notable works.
 
- 
But I'm not prepared to
 put in the time investment
 
- 
to build a whole article
 on English Wikipedia.
 
- 
Either because I don't
 have the time, or I
 
- 
don't have good sources. 
- 
Or maybe my English
 is not good enough,
 
- 
but it is good enough to just
 record these very basic facts
 
- 
and point to the Library of
 Congress records et cetera.
 
- 
So that it's better
 than nothing.
 
- 
So that's one reason
 to maybe do it.
 
- 
Another reason is to
 be able to link to it.
 
- 
So remember that
 translator lady already
 
- 
had an item on Wikidata, but if
 she hadn't we could have just
 
- 
created a very, very basic
 rudimentary item about her just
 
- 
saying, you know,
 this name is human.
 
- 
Country, Bulgaria. 
- 
Occupation, translator. 
- 
Even just that would have
 would have been something,
 
- 
and would have enabled me
 to link to this person.
 
- 
So these are legitimate reasons
 to create Wikidata entities
 
- 
without, or at least before,
 creating a Wikipedia article.
 
- 
If you are going to create-- 
- 
I mean if you're at and
 edit-a-thon or something,
 
- 
and you have come to
 create Wikipedia articles,
 
- 
by all means, first create
 the Wikipedia article,
 
- 
then create the Wikipedia
 item and link to it.
 
- 
I hope that answers
 the question.
 
- 
So the reasonator
 is simply a kind
 
- 
of prettier view of
 items in Wikidata.
 
- 
So you can just type the name
 of an item or the number.
 
- 
Let's pick just a
 random number, 42.
 
- 
Say 42. 
- 
Which happens to
 be, maybe you've
 
- 
heard of this guy,
 Douglas Adams.
 
- 
He happened to have received
 the queue number 42.
 
- 
I'm sure it's a
 cosmic coincidence
 
- 
of infinite improbability. 
- 
And this is a view-- 
- 
this is a tool that
 is not Wikidata.
 
- 
It's a tool built on top of
 Wikidata called resonator.
 
- 
And it gives us the information
 from Q42, that is from the--
 
- 
this item in Wikidata, which
 looks like an item in Wikidata.
 
- 
But it gives it to us in a
 slightly more rational kind
 
- 
of lay out. 
- 
It even kind of
 generates a little bit
 
- 
of pseudo article text for us. 
- 
You know, Douglas Adams was
 a British writer, playwright,
 
- 
screenwriter,
 bla-bla-bla, an author.
 
- 
He was born on this date, in
 this place, to these people.
 
- 
He studied at this place
 between these years.
 
- 
That's all machine generated. 
- 
Nobody wrote this text. 
- 
That's all taken from those
 statements in Wikidata,
 
- 
and generates this reasonable
 reading summary paragraph.
 
- 
And then it gives us this
 little table of relatives.
 
- 
It's all taken from Wikidata. 
- 
But as you can see,
 this is already
 
- 
a little more accessible than
 the essentially arbitrary
 
- 
ordering of statements
 on Wikidata.
 
- 
And that's OK. 
- 
I mean, that's
 kind of by design.
 
- 
Wikidata is the platform. 
- 
There is going to
 be-- there are going
 
- 
to be many new applications,
 and platforms, and tools,
 
- 
and visual interfaces
 on top of Wikidata
 
- 
to browse Wikidata in a more
 friendly or more customized
 
- 
ways. 
- 
For example, one of the
 things that resonator
 
- 
does for us is give us pictures
 and maps and a timeline.
 
- 
Check it out this. 
- 
Time line machine generated,
 just from dates and points
 
- 
in time, mentioned in the
 relatively rich Wikidata
 
- 
item about Douglas Adams. 
- 
Right? 
- 
So this timeline, for example
 again, completely machine
 
- 
generated. 
- 
But he was educated
 between these years,
 
- 
so I can put it on the timeline. 
- 
And this is the year he was
 nominated for a Hugo awards,
 
- 
so I can put that in a timeline. 
- 
Et cetera. 
- 
So that's just a super
 quick demonstration
 
- 
of that tool, the resonator. 
- 
Links are all here
 in the slides.
 
- 
And the final tool I wanted
 to mention very quickly
 
- 
is the mix and match tool. 
- 
You remember my explanation
 about Wikidata as Nexus,
 
- 
as connection point between many
 databases, many data sources.
 
- 
Those depend on
 these equivalencies.
 
- 
On Wikidata being taught
 that this item is like that
 
- 
ID in this other database. 
- 
And mix and match is a tool
 again by, Magnus Manske.
 
- 
Maybe you're detecting
 a pattern here.
 
- 
It's a tool by Magnus
 that is designed
 
- 
to enable us to kind
 of take a foreign,
 
- 
an external data set, put
 it alongside Wikidata,
 
- 
and kind of try and align them. 
- 
So this item in this
 external dataset,
 
- 
is that already
 covered in Wikidata?
 
- 
If so, by what queue number? 
- 
By what item? 
- 
If not, maybe we need
 to create a Wikidata
 
- 
item to represent it. 
- 
Or maybe it's a
 duplicate, or something.
 
- 
So the mix and match tool has
 a list of external data sets,
 
- 
as you can see. 
- 
The Art and Architecture
 Thesaurus by the Getty Research
 
- 
Institute. 
- 
Or the Australian
 Dictionary of Biography.
 
- 
All kinds of external
 data sets here.
 
- 
Somewhere here I had a specific
 link to the Royal Society.
 
- 
It can also give
 me some statistics.
 
- 
So there is an external data set
 of all the Fellows of the Royal
 
- 
Society. 
- 
Right? 
- 
The oldest academic
 learned society in England.
 
- 
And the internet is tired. 
- 
Here we go. 
- 
Nope. 
- 
Did that work? 
- 
Fellows of the Royal
 Society, here we go.
 
- 
So this one is complete. 
- 
I mean, people have manually
 gone over every single item
 
- 
there and either
 matched it to Wikidata
 
- 
or declared that it was not
 in scope, or a duplicate
 
- 
or whatever. 
- 
But let's look at site stats. 
- 
This is a fun kind of
 aspect of this tool.
 
- 
But that is not working. 
- 
Or it's taking too long. 
- 
So let's just demonstrate
 how this works.
 
- 
Maybe Britannica? 
- 
Is that done already? 
- 
Here we go. 
- 
Encyclopedia Britannica. 
- 
Yeah. 
- 
So the Encyclopedia
 Britannica has
 
- 
40% of the items there
 are not yet processed.
 
- 
So let's process one of them. 
- 
For example there is an item
 in the Encyclopedia Britannica
 
- 
called Boston, England. 
- 
As you know
 All-American place names
 
- 
are totally stolen
 from elsewhere.
 
- 
So there is a Boston
 in England, though it's
 
- 
no longer the famous one. 
- 
And the mix and match
 tool has automatically
 
- 
matched it based on
 the label to queue
 
- 
100, which is Boston big
 city in the United States.
 
- 
And that is incorrect, right? 
- 
That's kind of naive computer
 going, well this is Boston,
 
- 
and this other thing
 is also Boston.
 
- 
And it is asking me to
 confirm this match or not.
 
- 
You see? 
- 
So this is the Boston,
 England from Britannica.
 
- 
And the tool is asking
 me, is this the same as
 
- 
Boston queue 100 in America? 
- 
The answer is no. 
- 
I removed this. 
- 
I remove this match. 
- 
And now this Boston,
 England is unmatched.
 
- 
And I can match it to the
 correct one in England.
 
- 
I can do this by searching
 English Wikipedia,
 
- 
or searching Wikidata. 
- 
I mean, it has
 these handy links.
 
- 
So the English town
 is in Lincolnshire.
 
- 
Boston, Lincolnshire. 
- 
So I can go there and then
 get the Wikidata item number.
 
- 
See this is not queue
 100, Boston in the states,
 
- 
this is queue 311975
 town in Lincolnshire.
 
- 
I can get this queue
 number, go back to the mix
 
- 
and match tool-- 
- 
Where was that? 
- 
Here we are. 
- 
And set queue. 
- 
I can tell the tool that this is
 the right Boston, and click OK.
 
- 
And now this town
 in Lincolnshire,
 
- 
you can see this here,
 this item, queue 311975,
 
- 
is linked to Britannica. 
- 
What does this mean? 
- 
Well, if we go there. 
- 
If we actually go
 to the Wikidata
 
- 
entity you will see
 that in addition
 
- 
to the few statements that
 it already had, it now has,
 
- 
thanks to my clicking, it now
 has another identifier here.
 
- 
See? 
- 
Encyclopedia Britannica
 Online ID, with this link.
 
- 
And if we click it, we
 will indeed reach this page
 
- 
in the Britannica
 online, which is indeed
 
- 
about this town in Lincolnshire. 
- 
You see? 
- 
So I've contributed one
 of those mappings, one
 
- 
of those identifiers,
 into Wikidata.
 
- 
And I didn't have
 to do it manually.
 
- 
This tool kind of prompted
 me to either confirm
 
- 
if it was correct,
 I could have just
 
- 
clicked confirm since
 it wasn't correct.
 
- 
I corrected it manually, but
 it made this edit on my behalf.
 
- 
So that's another tool that
 encourages us to systematically
 
- 
teach Wikidata more things. 
- 
And we're out of time. 
- 
Go edit Wikidata, Now
 that you have the power,
 
- 
you know the deal. 
- 
Use it for good,
 and not for evil.
 
- 
If you have questions,
 this is my email address.
 
- 
If you're watching this video
 not live the description
 
- 
will have links to the
 slides, and to a bunch
 
- 
of other useful
 pieces of information.
 
- 
Any last questions on IRC? 
- 
If not, thank you
 for your attention.
 
- 
And if you like this, and if you
 feel that you now get Wikidata,
 
- 
and you get what it's
 good for, and you're
 
- 
inspired to contribute, I have
 only one request from you.
 
- 
I mean, in addition to using
 it for good not for evil,
 
- 
I ask that you spread the word. 
- 
Show this video--
 share this video
 
- 
with other people in your
 community, or around you.
 
- 
Teach this yourself
 once you're comfortable
 
- 
with these concepts. 
- 
Feel free to use my slides. 
- 
Yeah, and edit Wikidata. 
- 
Thank you very
 much, and goodbye.