Return to Video

https:/.../A_Gentle_Introduction_to_Wikidata_for_Absolute_Beginners_%28including_non-techies%21%29.webm.360p.webm

  • 0:03 - 0:04
    Asaf Bartov: Testing, testing.
  • 0:10 - 0:13
    Is this heard in the room?
  • 0:15 - 0:16
    Testing.
  • 0:23 - 0:25
    Hello, everyone.
  • 0:25 - 0:29
    This is a gentle
    introduction to Wikidata
  • 0:29 - 0:32
    for absolute beginners.
  • 0:32 - 0:34
    If you're an absolute
    beginner, if you've never heard
  • 0:34 - 0:38
    of Wikidata, or if you've heard
    of Wikidata but don't quite get
  • 0:38 - 0:41
    it, don't know what it's
    good for, have only used it
  • 0:41 - 0:44
    for inter-wiki links--
  • 0:44 - 0:46
    if you're anywhere
    on this range,
  • 0:46 - 0:47
    you're in the right place.
  • 0:51 - 0:52
    My name is Asaf Bartov.
  • 0:52 - 0:55
    I work for the
    Wikimedia Foundation,
  • 0:55 - 1:00
    and I am a Wikidata enthusiast.
  • 1:00 - 1:06
    So the first thing I want to
    say is that you are lucky.
  • 1:06 - 1:11
    You are lucky because
    Wikidata is already
  • 1:11 - 1:15
    and is quickly becoming even
    more of an important research
  • 1:15 - 1:22
    tool for anyone who's
    trying to ask questions
  • 1:22 - 1:25
    about large amounts
    of information.
  • 1:25 - 1:30
    It will become more and more
    used across the humanities,
  • 1:30 - 1:33
    in particular, because of the
    things that it's able to do,
  • 1:33 - 1:37
    some of which we will
    demonstrate shortly.
  • 1:37 - 1:41
    And you are lucky because you
    get to find out about it now
  • 1:41 - 1:43
    before most of the world.
  • 1:43 - 1:49
    So by the end of this talk,
    you will be a Wikidata hipster
  • 1:49 - 1:51
    because you'll be
    able to say, oh yeah.
  • 1:51 - 1:53
    I knew about Wikidata
    before it was cool.
  • 1:56 - 2:00
    So before we actually
    visit Wikidata,
  • 2:00 - 2:09
    I want to share two key problems
    that Wikidata seeks to solve
  • 2:09 - 2:13
    and which would help us
    understand why it exists.
  • 2:13 - 2:18
    The first problem is that
    have of dated data, that
  • 2:18 - 2:21
    is data that is out of date.
  • 2:21 - 2:24
    And this is apparent
    on Wikipedia
  • 2:24 - 2:28
    across our free
    knowledge encyclopedias.
  • 2:28 - 2:32
    Data on Wikipedia is
    not always up to date.
  • 2:32 - 2:37
    And the more obscure
    it is, the more likely
  • 2:37 - 2:40
    it is not to be up to date.
  • 2:40 - 2:49
    So the Polish Wikipedia may have
    an article about a small town
  • 2:49 - 2:55
    in Argentina, and that article
    will include information
  • 2:55 - 3:01
    about that town like population
    size, name of the mayor.
  • 3:01 - 3:05
    And that information,
    ideally, was
  • 3:05 - 3:09
    correct at the time the article
    was created on the Polish
  • 3:09 - 3:10
    Wikipedia--
  • 3:10 - 3:14
    maybe translated
    from another wiki.
  • 3:14 - 3:18
    But then how likely is
    it to be kept up to date?
  • 3:18 - 3:21
    How likely is it that the
    Polish Wikipedia would give us
  • 3:21 - 3:26
    the correct and latest numbers
    or data about the population
  • 3:26 - 3:28
    size of that town
    or the mayor, right?
  • 3:28 - 3:32
    So this is the kind of data
    that does go out of date, right?
  • 3:32 - 3:34
    Every few years--
    five, 10 years--
  • 3:34 - 3:38
    there is a census, and now there
    are new population figures.
  • 3:38 - 3:42
    Now the census in Argentina will
    be made available in Argentina
  • 3:42 - 3:46
    in Spanish, probably,
    which brings us
  • 3:46 - 3:49
    to another component of the
    problem of dated data, which
  • 3:49 - 3:54
    is there are no obvious
    triggers for updating the data.
  • 3:54 - 3:59
    So the Polish Wikipedian
    is not sent an email
  • 3:59 - 4:01
    by the Argentinean
    government saying, hey,
  • 4:01 - 4:02
    we have a new census.
  • 4:02 - 4:05
    There are new population numbers
    for you to update on Wikipedia.
  • 4:05 - 4:08
    No such email is sent.
  • 4:08 - 4:10
    So it's kind of
    hard to notice when.
  • 4:10 - 4:13
    And of course, multiply that by
    all the different jurisdictions
  • 4:13 - 4:15
    around the world.
  • 4:15 - 4:17
    There's no easy
    way and notice when
  • 4:17 - 4:18
    your data goes out of date.
  • 4:21 - 4:24
    So that's difficult
    to keep up to date.
  • 4:24 - 4:28
    And even if we were to receive
    some kind of indication--
  • 4:28 - 4:31
    oh, there's a new
    census in Argentina,
  • 4:31 - 4:33
    so a whole bunch of
    population figures
  • 4:33 - 4:35
    have now gone out of date.
  • 4:35 - 4:37
    Updating it on the
    Polish Wikipedia
  • 4:37 - 4:40
    and the French Wikipedia
    and the Indonesian Wikipedia
  • 4:40 - 4:45
    and the Arabic Wikipedia is a
    whole bunch of repetitive work
  • 4:45 - 4:47
    that a lot of
    different volunteers
  • 4:47 - 4:50
    will need to do just for
    that one updated piece
  • 4:50 - 4:55
    of information about Argentina.
  • 4:55 - 4:58
    So I hope this is
    clear and resonates
  • 4:58 - 5:02
    with some of your experience
    editing Wikipedia--
  • 5:02 - 5:04
    data that is out of
    date or that needs
  • 5:04 - 5:09
    to be updated
    manually, menially,
  • 5:09 - 5:16
    on a fairly frequent schedule
    across the different countries
  • 5:16 - 5:18
    and data sources.
  • 5:18 - 5:22
    The other-- and I think
    maybe more interesting--
  • 5:22 - 5:26
    shortcoming or problem
    that I want to discuss
  • 5:26 - 5:30
    is what I call the
    inflexible ways
  • 5:30 - 5:36
    of lateral queries, crosscutting
    queries of knowledge.
  • 5:36 - 5:44
    So if I want an answer to
    the question, what countries
  • 5:44 - 5:49
    in the world export rubber--
  • 5:52 - 5:55
    that's a reasonable
    question, right?
  • 5:55 - 5:57
    That information
    is on Wikipedia.
  • 5:57 - 5:59
    Do you agree?
  • 5:59 - 6:01
    If you go to
    Wikipedia and read up
  • 6:01 - 6:06
    about Brazil, about Peru, about
    Germany, somewhere in there--
  • 6:06 - 6:09
    maybe a sub-article called
    Economics of Brazil--
  • 6:09 - 6:14
    you will find the main
    exports of that country.
  • 6:14 - 6:15
    And you can find
    out whether or not
  • 6:15 - 6:17
    that country exports rubber.
  • 6:17 - 6:20
    But what if I don't want
    to go country by country
  • 6:20 - 6:21
    looking for the word rubber?
  • 6:21 - 6:22
    I just want an answer.
  • 6:22 - 6:26
    What are the countries
    that export rubber?
  • 6:26 - 6:28
    Even though that
    information is in Wikipedia,
  • 6:28 - 6:30
    it's hard to get at.
  • 6:30 - 6:32
    It's hard to query.
  • 6:32 - 6:36
    Now, you may say, well, that's
    what we have categories for,
  • 6:36 - 6:36
    right?
  • 6:36 - 6:40
    Categories are a way to
    cut across Wikipedia.
  • 6:40 - 6:45
    So if someone made a
    category called rubber
  • 6:45 - 6:48
    exporting countries, then
    you can go to that category
  • 6:48 - 6:52
    and see a list of countries
    that export rubber.
  • 6:52 - 6:53
    And if nobody has
    made it yet, well, you
  • 6:53 - 6:57
    can create that category and,
    with a kind of one-time effort,
  • 6:57 - 7:00
    populate that category,
    and you're done.
  • 7:00 - 7:02
    Well, yes.
  • 7:02 - 7:04
    That's still not
    very convenient.
  • 7:04 - 7:07
    But also, it's still
    very, very limited,
  • 7:07 - 7:12
    because what if I only want
    countries that export rubber
  • 7:12 - 7:16
    and have a democratic
    system of government,
  • 7:16 - 7:19
    or any other kind of
    additional condition
  • 7:19 - 7:21
    that I would like
    to add to this?
  • 7:21 - 7:22
    Or take a completely
    different example.
  • 7:22 - 7:27
    What if I want to know
    which Flemish town had
  • 7:27 - 7:32
    the most painters born in it?
  • 7:32 - 7:34
    There's a ton of
    Flemish painters.
  • 7:34 - 7:38
    Most of them were
    born somewhere.
  • 7:38 - 7:40
    We could theoretically,
    just you know,
  • 7:40 - 7:44
    look up all the birthplaces
    of all the Flemish painters
  • 7:44 - 7:47
    and tally up the
    numbers and figure out
  • 7:47 - 7:52
    what is the place where the
    most Flemish painters come from?
  • 7:52 - 7:53
    I don't know the answer to that.
  • 7:53 - 7:55
    It would be nice to be
    able to get that answer.
  • 7:55 - 7:58
    Again, the data is in Wikipedia.
  • 7:58 - 8:00
    Those birthplaces are
    listed in the articles
  • 8:00 - 8:02
    about those painters.
  • 8:02 - 8:06
    But there's no easy way
    to get that information.
  • 8:06 - 8:13
    What if I want to ask, who are
    some painters whose father was
  • 8:13 - 8:14
    also a painter?
  • 8:17 - 8:18
    That's a thing
    that exists, right?
  • 8:18 - 8:23
    Some painters are
    sons of painters.
  • 8:23 - 8:27
    You know, Bruegel comes to
    mind as an obvious example.
  • 8:27 - 8:28
    But there's a bunch
    of others, right?
  • 8:28 - 8:29
    So who are those people?
  • 8:29 - 8:31
    What if I want to
    ask that question?
  • 8:31 - 8:33
    That's the kind of question
    that not only Wikipedia
  • 8:33 - 8:35
    doesn't answer today.
  • 8:35 - 8:42
    If you walk to your friendly
    university library reference
  • 8:42 - 8:45
    desk and say,
    hello, I would like
  • 8:45 - 8:49
    a list of painters whose
    father was also a painter,
  • 8:49 - 8:53
    how would that
    librarian help you?
  • 8:53 - 8:58
    There's no easy way to get an
    answer to a question like that.
  • 8:58 - 9:01
    What if you only want
    a list of painters
  • 9:01 - 9:06
    who were immigrants, painters
    who lived somewhere else
  • 9:06 - 9:08
    than where they were born?
  • 9:08 - 9:10
    There's no book.
  • 9:10 - 9:12
    I guess maybe there
    is, but you know,
  • 9:12 - 9:16
    it's not obvious that there's a
    ready resource that says, list
  • 9:16 - 9:18
    of painters who are immigrants.
  • 9:18 - 9:20
    And the librarian would
    probably refer you
  • 9:20 - 9:23
    to a book on the shelf
    called, I don't know,
  • 9:23 - 9:24
    The Complete
    Dictionary of Flemish
  • 9:24 - 9:26
    Painters and go,
    look up the index,
  • 9:26 - 9:29
    you know, and if you
    see a similar surname,
  • 9:29 - 9:30
    maybe they're father and son.
  • 9:30 - 9:35
    And kind of cobble together
    the answer on your own.
  • 9:35 - 9:37
    The reason I'm comparing
    this to a library
  • 9:37 - 9:42
    is to show you that this is a
    kind of question that is not
  • 9:42 - 9:47
    readily satisfiable today.
  • 9:47 - 9:50
    Now, these questions may
    sound contrived to you.
  • 9:50 - 9:52
    You may say to
    yourself, well, you
  • 9:52 - 9:55
    know, painters who are also
    sons of painters, yeah.
  • 9:55 - 9:58
    You know, that
    never occurred to me
  • 9:58 - 10:00
    as a question I
    might care about.
  • 10:00 - 10:02
    But I want to invite
    you to consider
  • 10:02 - 10:06
    that this kind of question,
    questions like that question,
  • 10:06 - 10:09
    may well be questions
    you do care about.
  • 10:09 - 10:13
    And I also want to suggest
    that the fact it is so nearly
  • 10:13 - 10:16
    impossible, the fact that
    there's no obvious way
  • 10:16 - 10:19
    to ask that kind
    of question today,
  • 10:19 - 10:21
    is partly responsible
    to your not
  • 10:21 - 10:23
    coming up with those
    questions, right?
  • 10:23 - 10:26
    We tend to be limited
    by the possible.
  • 10:26 - 10:30
    You know, until human
    flight was made possible,
  • 10:30 - 10:33
    it did not occur to anyone
    to say, oh yeah, by this time
  • 10:33 - 10:34
    next week I will
    be in Australia,
  • 10:34 - 10:37
    because that was
    just impossible.
  • 10:37 - 10:39
    But when flight is
    possible, there's
  • 10:39 - 10:41
    all kinds of things that
    suddenly become possible,
  • 10:41 - 10:43
    and there's all
    kinds of needs that
  • 10:43 - 10:46
    arise based on the
    availability of resources
  • 10:46 - 10:49
    to fulfill those needs.
  • 10:49 - 10:54
    So many of these research
    questions, compound lateral
  • 10:54 - 10:59
    cross-cutting queries, are not
    being asked because people have
  • 10:59 - 11:00
    internalized the fact
    that there is no way
  • 11:00 - 11:06
    to get an answer
    to questions like,
  • 11:06 - 11:13
    what is the most popular first
    name among British politicians?
  • 11:13 - 11:15
    I just made that up, you know?
  • 11:15 - 11:15
    Is it John?
  • 11:15 - 11:17
    Maybe.
  • 11:17 - 11:19
    Maybe it's William,
    for whatever reason.
  • 11:19 - 11:22
    You know, these are the kinds
    of questions we don't routinely
  • 11:22 - 11:26
    ask because we know that it's
    like, who are you going to ask?
  • 11:26 - 11:28
    How are you going to
    get an answer to that?
  • 11:28 - 11:36
    So this problem of not having
    very flexible ways of querying
  • 11:36 - 11:38
    the data that we already have--
  • 11:38 - 11:41
    in Wikipedia, in
    Wikisource, elsewhere--
  • 11:41 - 11:45
    is a significant limitation.
  • 11:45 - 11:51
    So these two key problems
    have one solution.
  • 11:51 - 11:56
    And that is an editable,
    central storage
  • 11:56 - 12:01
    for structured and
    linked data on a wiki,
  • 12:01 - 12:05
    under a free license, which
    is a very long way of saying
  • 12:05 - 12:07
    Wikidata.
  • 12:07 - 12:08
    That is Wikidata.
  • 12:08 - 12:11
    Wikidata is an editable,
    central storage
  • 12:11 - 12:16
    for structured and
    linked data on a wiki,
  • 12:16 - 12:18
    under a free license.
  • 12:18 - 12:23
    So let's take this
    apart and unpack it.
  • 12:23 - 12:25
    First of all, it's
    a central storage.
  • 12:25 - 12:28
    This relates to the
    first problem, right?
  • 12:28 - 12:34
    If we had one place containing
    data like population size,
  • 12:34 - 12:38
    we would be able to update
    that one place and then have
  • 12:38 - 12:42
    all of the different Wikipedias
    draw the data from that one
  • 12:42 - 12:45
    place so that we wouldn't
    have to manually,
  • 12:45 - 12:50
    repetitively update it across
    our hundreds of projects.
  • 12:50 - 12:54
    So having central storage
    makes, I hope, kind
  • 12:54 - 12:57
    of immediate, intuitive sense.
  • 12:57 - 13:03
    But what do I mean by
    structured and linked data?
  • 13:03 - 13:10
    So structured data means
    that each datum, each piece--
  • 13:10 - 13:16
    individual piece-- of data
    is managed on its own,
  • 13:16 - 13:20
    is identified and
    defined on its own,
  • 13:20 - 13:21
    as distinct from Wikipedia.
  • 13:21 - 13:23
    Wikipedia has articles.
  • 13:23 - 13:27
    The article about Brazil
    includes a ton of data,
  • 13:27 - 13:32
    all kinds of information,
    and it's presented as text,
  • 13:32 - 13:34
    as several paragraphs--
    several pages--
  • 13:34 - 13:37
    of text, right?
  • 13:37 - 13:41
    Now, we do have an
    approximation of structured data
  • 13:41 - 13:44
    on Wikipedia.
  • 13:44 - 13:45
    If you've browsed
    Wikipedia a little,
  • 13:45 - 13:49
    you've noticed that we often
    have an info box, what we
  • 13:49 - 13:51
    call an info box on Wikipedia.
  • 13:51 - 13:55
    That's the table on the right
    side if it's a left to right
  • 13:55 - 13:57
    language, the table
    on the right side
  • 13:57 - 14:02
    that has information that
    is easy to tabulate, right?
  • 14:02 - 14:08
    So you know, birth date, birth
    place, death date, death place,
  • 14:08 - 14:10
    nationality--
  • 14:10 - 14:17
    or if it's about a country,
    area, population, anthem,
  • 14:17 - 14:20
    type of government, whatever
    you are likely to find.
  • 14:20 - 14:23
    If it's a movie, then
    you know, starring,
  • 14:23 - 14:27
    genre, box office receipts,
    whatever pieces of data
  • 14:27 - 14:30
    are relevant to an
    article about a movie.
  • 14:30 - 14:35
    So we do already kind of
    group pieces of information
  • 14:35 - 14:40
    on Wikipedia into this
    kind of structured format.
  • 14:40 - 14:44
    Those of you who have
    ever looked at the source,
  • 14:44 - 14:46
    at what the wiki code
    under that looks like,
  • 14:46 - 14:50
    know that it's only
    semi-structured.
  • 14:50 - 14:52
    It looks neat and
    organized in a table,
  • 14:52 - 14:56
    but really, it's just a bunch
    of text that is put there.
  • 14:56 - 14:57
    It is not centralized.
  • 14:57 - 15:00
    Every Wikipedia has its
    own copy of that data.
  • 15:00 - 15:03
    And if I go and update
    the population size
  • 15:03 - 15:07
    on Spanish Wikipedia of
    that Argentinean town,
  • 15:07 - 15:10
    it does not get
    updated automagically
  • 15:10 - 15:14
    on the English Wikipedia or
    the Arabic Wikipedia, right?
  • 15:14 - 15:17
    So the structured data that
    we already have on Wikipedia
  • 15:17 - 15:21
    is not managed centrally.
  • 15:21 - 15:22
    The other thing
    about structured data
  • 15:22 - 15:29
    is, when you have a notion of an
    individual piece of data, that
  • 15:29 - 15:33
    is the cornerstone of
    allowing the kinds of queries
  • 15:33 - 15:35
    that I was talking about.
  • 15:35 - 15:40
    That is what will allow
    me to ask questions like,
  • 15:40 - 15:43
    what is the Flemish town where
    the most painters were born,
  • 15:43 - 15:47
    or what are the world's
    largest cities that
  • 15:47 - 15:50
    have a female mayor?
  • 15:50 - 15:52
    I could come up with other
    examples all day long, right?
  • 15:52 - 15:55
    These are all questions
    that you can ask,
  • 15:55 - 15:59
    once you break down your data
    into individual pieces, each
  • 15:59 - 16:02
    of which is--
  • 16:02 - 16:07
    you're able to refer to each
    of those programmatically.
  • 16:07 - 16:10
    The computer can
    identify, isolate,
  • 16:10 - 16:15
    and calculate based on each
    of those pieces of data.
  • 16:15 - 16:17
    So that's why the
    structure is important.
  • 16:17 - 16:23
    Now, Wikidata is also a
    linked data repository.
  • 16:23 - 16:25
    What does it mean that
    the data is linked?
  • 16:25 - 16:30
    Well, it means that a single
    piece of data can point at,
  • 16:30 - 16:35
    can link to another
    whole bag of data.
  • 16:35 - 16:43
    So if we are describing,
    for example, a person,
  • 16:43 - 16:47
    and we record the
    single piece of data
  • 16:47 - 16:55
    that this person was born
    in Salem, Massachusetts,
  • 16:55 - 17:02
    that single piece of data
    links to the item about Salem,
  • 17:02 - 17:04
    Massachusetts
    because, of course,
  • 17:04 - 17:07
    we know a lot of things
    about that place, Salem,
  • 17:07 - 17:08
    Massachusetts.
  • 17:08 - 17:09
    So it's not just the text--
  • 17:09 - 17:13
    S-A-L-E-M. It's not just,
    that's where they were born.
  • 17:13 - 17:17
    But it's a link to all
    the data that we have
  • 17:17 - 17:19
    about Salem, Massachusetts.
  • 17:19 - 17:25
    If we say someone's
    nationality is French,
  • 17:25 - 17:27
    that is a link to France.
  • 17:27 - 17:31
    That is a link to everything we
    know about the country France.
  • 17:31 - 17:34
    The fact that the data
    is linked and structured
  • 17:34 - 17:38
    allows not only humans,
    but also computers
  • 17:38 - 17:42
    to traverse information
    and to bring
  • 17:42 - 17:45
    us different pieces of
    relevant information
  • 17:45 - 17:49
    programmatically, automatically,
    based on those links.
  • 17:49 - 17:52
    Because it's not just
    text, it's an actual link
  • 17:52 - 17:57
    to another chunk of data.
  • 17:57 - 17:59
    If this sounds a
    little abstract,
  • 17:59 - 18:01
    it will become much
    clearer in just a second
  • 18:01 - 18:03
    when we see it in action.
  • 18:03 - 18:06
    But the other components of
    this little definition are,
  • 18:06 - 18:10
    of course, this central storage
    of structured and linked data
  • 18:10 - 18:13
    needs to be editable,
    of course, because we
  • 18:13 - 18:14
    need to keep it up to date.
  • 18:14 - 18:16
    We need to correct mistakes.
  • 18:16 - 18:21
    And we want it on a wiki
    under a free license.
  • 18:21 - 18:24
    The free license is, of
    course, essential to enable
  • 18:24 - 18:31
    reuse of that data, to enable
    all kinds of reuse of the data.
  • 18:31 - 18:34
    And Wikidata, unlike
    Wikipedia, is released
  • 18:34 - 18:36
    under a different free license.
  • 18:36 - 18:42
    Wikidata is released
    under CC0 waiver.
  • 18:42 - 18:45
    That means unlike
    Wikipedia, where
  • 18:45 - 18:51
    you have to attribute Wikipedia
    when you reuse information
  • 18:51 - 18:55
    from Wikipedia, you do not
    need to attribute Wikidata,
  • 18:55 - 18:57
    and you do not need to
    share alike your work.
  • 18:57 - 19:02
    It's an unencumbered license to
    reuse the data in any way you
  • 19:02 - 19:03
    want, including commercially.
  • 19:03 - 19:05
    You don't have to say that
    it comes from Wikidata.
  • 19:05 - 19:07
    I mean, it could be nice,
    but you don't have to.
  • 19:07 - 19:09
    You're under no
    obligation to do it.
  • 19:09 - 19:14
    And that is important to
    allow certain kinds of reuse
  • 19:14 - 19:17
    where, for example, if you're
    building some kind of device,
  • 19:17 - 19:21
    you may not have a practical
    way to give attribution.
  • 19:21 - 19:24
    And had we required
    that to use Wikidata,
  • 19:24 - 19:27
    we would have made
    Wikidata less reusable.
  • 19:27 - 19:33
    So Wikidata is unencumbered by
    the requirement of attribution.
  • 19:33 - 19:36
    And of course, because
    it's on a wiki,
  • 19:36 - 19:40
    we get all the benefits that we
    are used to expect from a wiki,
  • 19:40 - 19:41
    right?
  • 19:41 - 19:43
    So it's a wiki,
    which means, yes.
  • 19:43 - 19:45
    It has discussion pages.
  • 19:45 - 19:46
    It has revision histories.
  • 19:46 - 19:48
    It remembers everything.
  • 19:48 - 19:51
    So if you screw it up, you
    can always go a version back.
  • 19:51 - 19:52
    Or if someone else
    vandalized the content,
  • 19:52 - 19:55
    we can always go back,
    just like Wikipedia.
  • 19:55 - 19:57
    So we get all the
    benefits we're used to--
  • 19:57 - 20:01
    user talk pages, group
    discussion pages, watch lists,
  • 20:01 - 20:04
    all the features that
    we expect in a wiki.
  • 20:07 - 20:11
    In short, Wikidata is love.
  • 20:11 - 20:14
    I hope you agree with me
    by the end of this talk.
  • 20:14 - 20:19
    So let's zoom in and see
    what this structured data
  • 20:19 - 20:21
    looks like.
  • 20:21 - 20:29
    So structured data on Wikidata
    is collected in statements.
  • 20:29 - 20:32
    And statements have
    the general form
  • 20:32 - 20:39
    of this triple, this
    tripartite ascription--
  • 20:39 - 20:44
    items, properties, and values.
  • 20:44 - 20:47
    Now an item is the
    subject, is the topic
  • 20:47 - 20:49
    that we are trying to describe.
  • 20:49 - 20:52
    It can be any topic that
    Wikipedia can cover,
  • 20:52 - 20:54
    and many others that
    Wikipedia wouldn't.
  • 20:54 - 20:57
    So the topic, the
    item can be Germany,
  • 20:57 - 21:01
    or it can be Salem,
    Massachusetts,
  • 21:01 - 21:03
    or it can be the
    concept of redemption.
  • 21:03 - 21:05
    It can be anything at all.
  • 21:05 - 21:10
    Anything you can imagine
    describing in any way with data
  • 21:10 - 21:12
    can be the item.
  • 21:12 - 21:15
    So the item, consider
    it like the title
  • 21:15 - 21:17
    of the rest of the data.
  • 21:17 - 21:21
    And then what do we say
    about Salem, Massachusetts
  • 21:21 - 21:22
    or about Germany?
  • 21:22 - 21:27
    Well, that's a series of
    properties and values,
  • 21:27 - 21:28
    properties and values.
  • 21:28 - 21:33
    The property is
    the kind of datum,
  • 21:33 - 21:40
    like birth date or language
    spoken or manner of death.
  • 21:40 - 21:43
    These are all real properties.
  • 21:43 - 21:46
    Or national anthem, if I'm
    trying to describe a country--
  • 21:46 - 21:48
    these are properties.
  • 21:48 - 21:50
    And then they have
    values, right?
  • 21:50 - 21:56
    So this person, this
    imaginary person's place
  • 21:56 - 22:00
    of birth, the value of the
    property place of birth
  • 22:00 - 22:02
    is Salem, Massachusetts.
  • 22:02 - 22:07
    So you can think about it
    as like a government form--
  • 22:07 - 22:10
    or not government, just any
    form that you're filling out--
  • 22:10 - 22:12
    where there are field names,
    and then empty spaces for you
  • 22:12 - 22:13
    to fill out.
  • 22:13 - 22:14
    That's the value, OK?
  • 22:14 - 22:18
    So the field names
    or the categories
  • 22:18 - 22:19
    are the properties, right?
  • 22:19 - 22:23
    So name, language,
    occupation, date of birth--
  • 22:23 - 22:24
    these are all properties.
  • 22:24 - 22:27
    And the values are
    the actual piece
  • 22:27 - 22:31
    of data, the actual
    information that we have.
  • 22:31 - 22:34
    And of course,
    different kinds of data
  • 22:34 - 22:40
    are relevant for describing
    different kinds of items.
  • 22:40 - 22:45
    And the key in the value is it
    can be either a literal value--
  • 22:45 - 22:50
    like if we're describing
    the height of a mountain,
  • 22:50 - 22:56
    we might say just
    the number 8,848.
  • 22:56 - 22:57
    That's the height
    of which mountain?
  • 23:02 - 23:04
    Not everyone at once.
  • 23:04 - 23:07
    Oh, because it's meters,
    the metric system.
  • 23:07 - 23:08
    Yeah, Mt.
  • 23:08 - 23:12
    Everest is 8,848 meters.
  • 23:12 - 23:14
    Yes.
  • 23:14 - 23:16
    Get with it, America.
  • 23:16 - 23:18
    The metric system.
  • 23:18 - 23:21
    All right, so that
    can be a literal value
  • 23:21 - 23:23
    like an actual number.
  • 23:23 - 23:28
    Or it can be a link to an
    item, pointing at another item.
  • 23:28 - 23:31
    But in this statement,
    it is the value.
  • 23:31 - 23:35
    So if I'm talking about
    Germany, the item is Germany.
  • 23:35 - 23:40
    And the property capital
    city has the value Berlin.
  • 23:40 - 23:43
    But the value is
    not B-E-R-L-I-N.
  • 23:43 - 23:49
    The value is a pointer to
    the item Berlin, right?
  • 23:49 - 23:51
    That's the link.
  • 23:51 - 23:57
    So a single item is described
    by a series of such statements,
  • 23:57 - 23:57
    right?
  • 23:57 - 24:01
    There's hundreds and hundreds of
    things I can say about Germany.
  • 24:01 - 24:04
    There's hundreds of things
    I can say about a person.
  • 24:04 - 24:06
    And these will
    generally take the form
  • 24:06 - 24:08
    of a property and a value.
  • 24:08 - 24:12
    By the way, some properties
    may have more than one value.
  • 24:12 - 24:16
    Consider the property
    languages spoken.
  • 24:16 - 24:18
    People can speak more
    than one language, right?
  • 24:18 - 24:20
    So if I'm from
    describing myself,
  • 24:20 - 24:22
    we can say languages spoken--
  • 24:22 - 24:26
    English, Hebrew,
    Latin, whatever.
  • 24:26 - 24:28
    So a property can have
    more than one value.
  • 24:31 - 24:34
    So if the item is
    about a country,
  • 24:34 - 24:39
    it would have statements about
    properties like population,
  • 24:39 - 24:43
    land area, official languages,
    borders with, anthem,
  • 24:43 - 24:45
    capital city.
  • 24:45 - 24:49
    If I'm describing a person, I
    have a whole mostly different
  • 24:49 - 24:51
    set of properties that
    are relevant, right?
  • 24:51 - 24:54
    Date of birth, place of birth,
    citizenship, occupation,
  • 24:54 - 24:57
    father, mother,
    religion, notable works--
  • 24:57 - 25:00
    now, are all of these
    relevant for all people?
  • 25:00 - 25:01
    No, of course not.
  • 25:01 - 25:02
    It depends.
  • 25:02 - 25:05
    And different items
    about different people
  • 25:05 - 25:09
    will either have or not
    have these fields, right?
  • 25:09 - 25:13
    So we wouldn't record religion
    for absolutely every person.
  • 25:13 - 25:14
    Some people manage
    to do without.
  • 25:14 - 25:18
    And also, it's not relevant
    for a lot of people, like,
  • 25:18 - 25:20
    what their religion
    happens to be.
  • 25:20 - 25:23
    Date of birth is generally
    relevant for most people
  • 25:23 - 25:24
    that we're documenting.
  • 25:24 - 25:29
    So some properties kind of crop
    up more commonly than others.
  • 25:29 - 25:33
    A person's height, for
    example, is not generally
  • 25:33 - 25:36
    considered of
    encyclopedic value, right?
  • 25:36 - 25:37
    We don't, for
    example, if we have
  • 25:37 - 25:41
    an article about even a
    really well-documented person
  • 25:41 - 25:46
    like Winston Churchill, does
    Wikipedia mention his height?
  • 25:46 - 25:48
    I don't think it does.
  • 25:48 - 25:50
    Even though I'm sure
    we could probably
  • 25:50 - 25:53
    find a source somewhere
    that lists his height,
  • 25:53 - 25:56
    it's just not a
    very relevant piece
  • 25:56 - 25:58
    of information about Churchill.
  • 25:58 - 25:59
    With everything else
    that's written about him
  • 25:59 - 26:01
    and that we know
    about him that we
  • 26:01 - 26:03
    want to include in the
    article, a person's height
  • 26:03 - 26:08
    is not really something of
    great value most of the time.
  • 26:08 - 26:14
    But if we are describing
    Michael Jordan, it is relevant.
  • 26:14 - 26:15
    I'm dating myself.
  • 26:15 - 26:19
    People still know
    Michael Jordan, right?
  • 26:19 - 26:22
    You know, a basketball
    player, that's
  • 26:22 - 26:24
    when height is very
    relevant, right?
  • 26:24 - 26:26
    That's one of the
    first things you
  • 26:26 - 26:28
    say when you're describing
    a basketball player,
  • 26:28 - 26:31
    is list their height.
  • 26:31 - 26:34
    So even within the
    class of person,
  • 26:34 - 26:36
    some properties may be
    more or less relevant,
  • 26:36 - 26:38
    depending on the context.
  • 26:38 - 26:40
    So let's look at some examples.
  • 26:40 - 26:43
    These are examples
    of statements.
  • 26:43 - 26:45
    Each line is a statement.
  • 26:45 - 26:47
    So here's the first one.
  • 26:47 - 26:53
    I want to state, about the
    item Earth, our planet.
  • 26:53 - 26:56
    And what I want
    to say about Earth
  • 26:56 - 27:01
    is that the property
    highest point on Earth
  • 27:01 - 27:03
    has the value Mt.
  • 27:03 - 27:05
    Everest.
  • 27:05 - 27:06
    Would you agree with that?
  • 27:06 - 27:10
    That is the highest
    point on Earth.
  • 27:10 - 27:11
    That's a statement.
  • 27:11 - 27:14
    It says something
    specific, one piece
  • 27:14 - 27:16
    of information about Earth.
  • 27:16 - 27:17
    Now of course, there's
    a lot of other things
  • 27:17 - 27:19
    we want to say about Earth--
  • 27:19 - 27:21
    circumference,
    average temperature,
  • 27:21 - 27:23
    I don't know, all
    kinds of things
  • 27:23 - 27:27
    we can describe the planet
    with, density, it's a galaxy,
  • 27:27 - 27:28
    it belongs to, all that.
  • 27:28 - 27:30
    But here's one piece
    of information,
  • 27:30 - 27:37
    one very specific field in
    the detailed form about Earth.
  • 27:37 - 27:39
    The highest point is Mt.
  • 27:39 - 27:40
    Everest.
  • 27:40 - 27:42
    Now here's a second statement.
  • 27:42 - 27:43
    This time Mt.
  • 27:43 - 27:47
    Everest itself is the item
    that I'm describing, right?
  • 27:47 - 27:49
    The topic has changed.
  • 27:49 - 27:50
    Now I'm saying
    something about Mt.
  • 27:50 - 27:52
    Everest, and what
    I'm saying about Mt.
  • 27:52 - 27:57
    Everest is elevation
    above sea level.
  • 27:57 - 28:01
    Sounds the same but it
    isn't, because the highest
  • 28:01 - 28:05
    point on Earth answers
    the question where,
  • 28:05 - 28:08
    like on the planet, what
    is the highest point?
  • 28:08 - 28:09
    It's Mt.
  • 28:09 - 28:10
    Everest.
  • 28:10 - 28:13
    But how high is that highest
    point is a different piece
  • 28:13 - 28:14
    of information.
  • 28:14 - 28:15
    Do you agree?
  • 28:15 - 28:17
    It's the actual altitude.
  • 28:17 - 28:20
    It's not where on
    the planet it is.
  • 28:20 - 28:22
    So it may sound similar,
    but these are actually
  • 28:22 - 28:24
    very different pieces
    of information.
  • 28:24 - 28:28
    So that highest
    point, how high is it?
  • 28:28 - 28:32
    Well, it's 8,848 meters high.
  • 28:32 - 28:37
    Now the third statement gives
    another piece of information
  • 28:37 - 28:38
    about the first item.
  • 28:38 - 28:41
    Same item-- I could have
    grouped them together.
  • 28:41 - 28:42
    Another thing I
    know about the Earth
  • 28:42 - 28:46
    is that the deepest
    point on the planet
  • 28:46 - 28:53
    is the Challenger Deep, part
    of the so-called Mariana
  • 28:53 - 28:55
    Trench in the ocean.
  • 28:55 - 28:57
    So that is the deepest point.
  • 28:57 - 28:58
    And how deep is it?
  • 28:58 - 29:01
    I again use the elevation
    above sea level.
  • 29:01 - 29:04
    That's the name of the
    property even though it's not
  • 29:04 - 29:05
    above sea level.
  • 29:05 - 29:08
    I have a negative value because
    the elevation of the Challenger
  • 29:08 - 29:14
    Deep is minus 11
    kilometers, more or less.
  • 29:14 - 29:14
    All right?
  • 29:14 - 29:16
    So these are statements.
  • 29:16 - 29:19
    These are four individual
    pieces of data.
  • 29:19 - 29:21
    And I could also
    look at it this way.
  • 29:21 - 29:25
    Maybe that's closer to the
    government form example
  • 29:25 - 29:27
    that I was giving, right?
  • 29:27 - 29:29
    So I want to say
    something about Earth.
  • 29:29 - 29:31
    What do I want to say?
  • 29:31 - 29:34
    Two things-- highest point.
  • 29:34 - 29:37
    That's the field,
    that's the property,
  • 29:37 - 29:38
    and this is the value.
  • 29:38 - 29:39
    The highest point is Mt.
  • 29:39 - 29:40
    Everest.
  • 29:40 - 29:43
    The deepest point
    is Challenger Deep.
  • 29:43 - 29:46
    And then I have things to
    say about Challenger Deep--
  • 29:46 - 29:50
    the property of elevation
    above sea level, the value
  • 29:50 - 29:52
    is minus 11 kilometers.
  • 29:56 - 30:01
    Now here's yet another
    view of the same data
  • 30:01 - 30:05
    once more, with numeric IDs.
  • 30:05 - 30:08
    So this is the same information,
    the same four statements.
  • 30:08 - 30:13
    But this time, in
    addition to using words,
  • 30:13 - 30:21
    I'm also including weird
    numbers following either Q or P.
  • 30:21 - 30:26
    So P stands for property.
  • 30:26 - 30:30
    So the highest point
    property is P610.
  • 30:30 - 30:34
    And the deepest point
    property is P1589.
  • 30:34 - 30:35
    What do these numbers mean?
  • 30:35 - 30:37
    They don't mean anything at all.
  • 30:37 - 30:38
    They're just numbers.
  • 30:38 - 30:40
    They're just sequential numbers.
  • 30:40 - 30:43
    And if I create a new
    Wikidata item right now,
  • 30:43 - 30:46
    it'll get just the
    next available number.
  • 30:46 - 30:48
    So they're just numbers.
  • 30:48 - 30:49
    So P stands for property.
  • 30:49 - 30:51
    What does Q stand for?
  • 30:51 - 30:53
    Does anyone know?
  • 30:53 - 30:58
    It's a trick question
    because it's hard to guess.
  • 30:58 - 31:02
    But the principal
    architect of Wikidata,
  • 31:02 - 31:08
    a Wikipedian named Danny
    [INAUDIBLE] and data scientist,
  • 31:08 - 31:11
    is married to a lovely
    lady named [INAUDIBLE]
  • 31:11 - 31:16
    spelled with a Q. And
    this is a loving tribute.
  • 31:16 - 31:22
    And she's also a Wikipedian and
    an admin of Uzbek Wikipedia.
  • 31:22 - 31:32
    So Q2 is just the numeric
    identifier of the item Earth.
  • 31:32 - 31:36
    And Q513 is the
    identifier of Mt.
  • 31:36 - 31:37
    Everest.
  • 31:37 - 31:43
    You notice that we use that ID
    across the statement, right?
  • 31:43 - 31:49
    So from Wikidata's
    perspective, this
  • 31:49 - 31:53
    is actually what the
    database actually contains.
  • 31:53 - 31:55
    What we were saying with words--
  • 31:55 - 31:58
    the Earth, highest
    point, whatever--
  • 31:58 - 31:59
    never mind that.
  • 31:59 - 32:03
    Q2 has P610 with a value Q513.
  • 32:03 - 32:06
    That's what Wikidata
    cares about, OK?
  • 32:06 - 32:10
    Now that, you'll agree,
    is a little inaccessible.
  • 32:10 - 32:13
    Just these lists of numbers,
    that's a little hard.
  • 32:13 - 32:16
    So Wikidata
    understands and allows
  • 32:16 - 32:20
    us to continue using our words.
  • 32:20 - 32:24
    But actually, it gets
    translated into numeric IDs.
  • 32:24 - 32:25
    Now why is this a good idea?
  • 32:30 - 32:33
    Why can't we just
    say Earth or Mt.
  • 32:33 - 32:35
    Everest?
  • 32:35 - 32:36
    Any thoughts?
  • 32:36 - 32:40
    This is an open question.
  • 32:40 - 32:42
    Why is this a good
    idea to use numbers
  • 32:42 - 32:43
    instead of the names of things?
  • 32:47 - 32:52
    Yes, because more than one
    thing can have the same name.
  • 32:52 - 32:53
    What do you mean?
  • 32:53 - 32:53
    There's only one Mt.
  • 32:53 - 32:54
    Everest.
  • 32:54 - 32:56
    Well, yeah.
  • 32:56 - 32:59
    But there there's also a
    movie called-- and probably
  • 32:59 - 33:00
    more than one-- called Mt.
  • 33:00 - 33:04
    Everest, or a TV documentary
    literally called Mt.
  • 33:04 - 33:07
    Everest.
  • 33:07 - 33:10
    And of course, if I'm
    describing a person named
  • 33:10 - 33:15
    Frank Johnson, not the only
    Frank Johnson on the planet,
  • 33:15 - 33:16
    right?
  • 33:16 - 33:18
    But wait, you say.
  • 33:18 - 33:21
    On Wikipedia we deal
    with that problem, right?
  • 33:21 - 33:23
    How do we deal with that
    problem on Wikipedia?
  • 33:23 - 33:26
    Does anyone in
    the audience know?
  • 33:26 - 33:28
    The standard way to
    deal with the fact
  • 33:28 - 33:30
    that there is more than one
    Frank Johnson in the world,
  • 33:30 - 33:36
    on Wikipedia, is to use
    parentheses after the name.
  • 33:36 - 33:39
    So there is Frank
    Johnson (actor)
  • 33:39 - 33:43
    and Frank Johnson
    (politician), for example,
  • 33:43 - 33:45
    if that's the distinction
    we need to make.
  • 33:45 - 33:48
    So you put in parentheses
    kind of the minimal amount
  • 33:48 - 33:52
    of information you need to tell
    apart these Frank Johnsons.
  • 33:52 - 33:55
    What if there's two
    politician Frank Johnsons?
  • 33:55 - 33:59
    Well, then you would say Frank
    Johnson, (Delaware politician)
  • 33:59 - 34:02
    versus Frank Johnson
    (California politician), right?
  • 34:02 - 34:05
    You just put in that bit of
    context to tell them apart.
  • 34:05 - 34:08
    So that's the solution
    that Wikipedians came up
  • 34:08 - 34:12
    with years and years ago
    because they did need
  • 34:12 - 34:16
    a unique name for the article.
  • 34:16 - 34:18
    You can't have two
    articles literally called
  • 34:18 - 34:21
    Frank Johnson on Wikipedia.
  • 34:21 - 34:24
    So that's the
    solution on Wikipedia.
  • 34:24 - 34:28
    But Wikidata was designed
    much later, more than a decade
  • 34:28 - 34:31
    after Wikipedia, and was
    able to kind of learn
  • 34:31 - 34:35
    from the experience
    of Wikipedia, which
  • 34:35 - 34:39
    has tremendous experience
    with multilingualism, much
  • 34:39 - 34:43
    more than most sites and
    projects, as we know.
  • 34:43 - 34:45
    And so the Wikidata
    team understood
  • 34:45 - 34:48
    from the get go that
    this will be an issue,
  • 34:48 - 34:51
    and it's better to use
    numbers that are unequivocally
  • 34:51 - 34:55
    different from each
    other instead of labels,
  • 34:55 - 34:57
    instead of the actual
    name, the actual text,
  • 34:57 - 35:00
    because names are not unique.
  • 35:00 - 35:03
    Names can change, right?
  • 35:03 - 35:09
    Just last year, there was a
    big naming reform in Ukraine
  • 35:09 - 35:14
    and a whole bunch of towns
    and districts were renamed.
  • 35:14 - 35:17
    Does that mean we should change
    all the data that we have, like
  • 35:17 - 35:20
    lose all the data that we
    have about the old name?
  • 35:20 - 35:22
    No, we ideally just
    want to change the name
  • 35:22 - 35:24
    without breaking links.
  • 35:24 - 35:29
    So having the links actually
    refer to the numbers
  • 35:29 - 35:32
    is one way to ensure the
    integrity of the data,
  • 35:32 - 35:35
    of the links, when
    renaming happens.
  • 35:35 - 35:39
    Another reason is well, even
    if the name doesn't change,
  • 35:39 - 35:42
    not all humans call
    everything the same, right?
  • 35:42 - 35:46
    So Earth is Earth
    in English, but it's
  • 35:46 - 35:48
    [SPEAKING ARABIC] in Arabic.
  • 35:48 - 35:50
    It's [SPEAKING HEBREW]
    in Hebrew.
  • 35:53 - 35:57
    So obviously, Earth--
    even that is not
  • 35:57 - 36:02
    as unambiguous or unequivocal
    as you might think.
  • 36:02 - 36:04
    And so that is the
    reason Wikidata,
  • 36:04 - 36:08
    which is built to be
    multilingual from the start,
  • 36:08 - 36:11
    talks about numbers
    rather than labels.
  • 36:11 - 36:12
    OK.
  • 36:12 - 36:15
    Ha, I had a whole slide
    about that and I forgot.
  • 36:15 - 36:18
    Yes, so even London,
    again, is not
  • 36:18 - 36:21
    just London, England, which is
    what you were thinking about.
  • 36:21 - 36:22
    It's also a city in Canada.
  • 36:22 - 36:26
    And it's also a family
    name, like Jack London.
  • 36:26 - 36:27
    It's also a movie company.
  • 36:27 - 36:32
    There must be some hotel
    named London somewhere.
  • 36:32 - 36:36
    This is a good opportunity
    to remind everyone
  • 36:36 - 36:41
    that the vast
    majority of humankind
  • 36:41 - 36:46
    does not speak a
    word of English.
  • 36:46 - 36:49
    That's a statistic
    worth remembering.
  • 36:49 - 36:55
    The vast majority of the planet
    does not speak English at all.
  • 36:55 - 36:57
    That does not
    contradict the datum
  • 36:57 - 37:00
    that English is the most
    widely spoken language.
  • 37:00 - 37:03
    And yet, in aggregate,
    a majority of people
  • 37:03 - 37:07
    speak other languages,
    and not English at all.
  • 37:07 - 37:13
    So moving swiftly on, this
    is a pause for questions
  • 37:13 - 37:16
    about what I've covered so far.
  • 37:16 - 37:17
    Any questions in the audience?
  • 37:17 - 37:19
    If not, we moved to IRC.
  • 37:19 - 37:21
    If there are any questions--
  • 37:24 - 37:27
    Any questions?
  • 37:27 - 37:27
    No?
  • 37:27 - 37:28
    IRC?
  • 37:28 - 37:29
    Any questions?
  • 37:34 - 37:34
    OK.
  • 37:34 - 37:38
    We will have additional
    pauses for questions later.
  • 37:38 - 37:41
    But enough of my hand-waving.
  • 37:41 - 37:45
    Let's go explore Wikidata.
  • 37:45 - 37:50
    So Wikidata lives
    at wikidata.org.
  • 37:50 - 38:00
    And Wikidata already has
    more than 25 million items.
  • 38:00 - 38:06
    That is, it collects
    statements about more than 25
  • 38:06 - 38:08
    million topics.
  • 38:08 - 38:12
    It has many, many more
    than 25 million statements
  • 38:12 - 38:15
    because many of these items
    have dozens or hundreds
  • 38:15 - 38:16
    of statements.
  • 38:16 - 38:21
    So it documents 25
    million things--
  • 38:21 - 38:23
    people, books, rivers, whatever.
  • 38:26 - 38:29
    Just to give us a sense
    of how big that number is,
  • 38:29 - 38:32
    how many articles do we
    have on English Wikipedia?
  • 38:32 - 38:36
    More than-- yes, more
    than 5 million articles.
  • 38:36 - 38:38
    And that's the
    largest Wikipedia.
  • 38:38 - 38:41
    So Wikidata is
    already describing
  • 38:41 - 38:45
    more than five times, or
    about five times as many items
  • 38:45 - 38:48
    as even our largest Wikipedia.
  • 38:48 - 38:51
    So obviously,
    Wikidata contains data
  • 38:51 - 38:57
    about things that have no
    article on any Wikipedia.
  • 38:57 - 39:02
    It is a much, much larger,
    more comprehensive project.
  • 39:02 - 39:04
    All right, the second
    thing we might notice
  • 39:04 - 39:08
    is, well, this looks kind
    of like Wikipedia, right?
  • 39:08 - 39:11
    If we've never visited, it
    looks kind of like Wikipedia.
  • 39:11 - 39:13
    It has this sidebar.
  • 39:13 - 39:15
    It has these buttons at the top.
  • 39:15 - 39:18
    It looks like it's
    from the '90s.
  • 39:18 - 39:19
    Yeah.
  • 39:19 - 39:21
    So the reason it
    looks like Wikipedia
  • 39:21 - 39:24
    is that it is a wiki running
    on Mediawiki software.
  • 39:24 - 39:28
    It is running on software
    very much like Wikipedia.
  • 39:28 - 39:32
    But it is running on
    a kind of modification
  • 39:32 - 39:34
    of the standard wiki software.
  • 39:34 - 39:36
    It has an additional,
    very important component
  • 39:36 - 39:39
    named Wikibase,
    which gives it all
  • 39:39 - 39:43
    of its structured and
    linked data power.
  • 39:43 - 39:47
    So let's start
    exploring Wikidata.
  • 39:53 - 39:56
    Let's take something local--
  • 39:56 - 39:58
    Harvey Milk.
  • 39:58 - 40:00
    Harvey Milk.
  • 40:00 - 40:03
    What does Wikidata
    know about Harvey Milk?
  • 40:03 - 40:07
    For those on YouTube
    who may not be local,
  • 40:07 - 40:16
    he's a San Francisco politician
    and gay rights activist
  • 40:16 - 40:18
    who was murdered in the '70s.
  • 40:18 - 40:21
    It was very significant in
    the history of those struggles
  • 40:21 - 40:23
    in this country.
  • 40:23 - 40:27
    So what does Wikidata
    tell us about Harvey Milk?
  • 40:27 - 40:30
    Well, the first
    thing is it knows
  • 40:30 - 40:35
    that Harvey Milk is Q17141.
  • 40:35 - 40:37
    That's the most important
    piece of information,
  • 40:37 - 40:39
    is first of all, that
    is the identifier.
  • 40:39 - 40:42
    That is the item
    number of all the data
  • 40:42 - 40:46
    that we will collect
    about Harvey Milk.
  • 40:46 - 40:50
    The second thing you see
    right under the title
  • 40:50 - 40:55
    is this line, this very,
    very brief summary, right?
  • 40:55 - 41:00
    "American politician who became
    a martyr in the gay community."
  • 41:00 - 41:02
    This line is the
    description line.
  • 41:02 - 41:05
    So the name of the item--
  • 41:05 - 41:06
    this is the label.
  • 41:06 - 41:07
    We call it label on Wikidata.
  • 41:07 - 41:09
    That's the label.
  • 41:09 - 41:11
    And this line is
    the description.
  • 41:11 - 41:13
    Now why is this
    description important?
  • 41:13 - 41:17
    This is the description that
    helps us tell this Harvey
  • 41:17 - 41:23
    Milk from any other Harvey
    Milk that may exist, all right?
  • 41:23 - 41:27
    So again, this would
    be useful if I'm
  • 41:27 - 41:30
    looking up someone with a
    slightly more generic name.
  • 41:30 - 41:34
    That line will help me tell
    apart the item about Harvey
  • 41:34 - 41:39
    Milk the gay activist rather
    than Harvey Milk the film
  • 41:39 - 41:42
    actor, OK?
  • 41:42 - 41:43
    And where is it coming from?
  • 41:43 - 41:49
    Well, Wikidata has
    this whole table,
  • 41:49 - 41:53
    as you can see, with
    descriptions and labels
  • 41:53 - 41:55
    in other languages.
  • 41:55 - 42:00
    So Wikidata is able to refer
    to Harvey Milk in Arabic which,
  • 42:00 - 42:04
    don't panic, is written
    from right to left.
  • 42:04 - 42:08
    It also knows what to
    call him in Bulgarian.
  • 42:08 - 42:11
    I mean, it's the same name,
    but it's in a different script.
  • 42:11 - 42:14
    In French, in Hebrew,
    and that's it?
  • 42:14 - 42:18
    Does it not know a name
    for Harvey Milk in Italian?
  • 42:18 - 42:20
    Of course it does.
  • 42:20 - 42:22
    It actually has
    labels for this person
  • 42:22 - 42:24
    in many, many, many languages.
  • 42:24 - 42:30
    It doesn't have descriptions in
    every language, as you can see.
  • 42:30 - 42:31
    OK?
  • 42:31 - 42:36
    So why was Wikidata showing me
    these languages and not others?
  • 42:36 - 42:39
    I mean, why this somewhat
    arbitrary collection--
  • 42:39 - 42:43
    English, Arabic, Bulgarian,
    German, French, and Hebrew?
  • 42:43 - 42:45
    Because I told it to.
  • 42:45 - 42:50
    So if we briefly click
    over to my user page--
  • 42:50 - 42:53
    again, like every wiki,
    you have user accounts.
  • 42:53 - 42:54
    You have user pages.
  • 42:54 - 42:55
    This is my user page.
  • 42:55 - 43:00
    And as you can see,
    there's this little user
  • 43:00 - 43:03
    information box here called
    a Babel box by Wikipedians,
  • 43:03 - 43:07
    where I list the
    languages that I speak.
  • 43:07 - 43:11
    And Wikidata uses this box
    just to kind of helpfully
  • 43:11 - 43:13
    show me these languages.
  • 43:13 - 43:14
    Of course, all the
    other languages
  • 43:14 - 43:20
    are still available, as you saw,
    by clicking the more languages.
  • 43:20 - 43:23
    But this is just a
    useful little way
  • 43:23 - 43:28
    of getting the languages I
    care about up there first.
  • 43:28 - 43:29
    By the way, this is a lie.
  • 43:29 - 43:31
    I don't actually
    speak Bulgarian.
  • 43:31 - 43:34
    That stayed on my user page
    because I was demonstrating
  • 43:34 - 43:37
    this in Bulgaria and I wanted
    that label to show up there
  • 43:37 - 43:38
    during the talk--
  • 43:38 - 43:40
    just in case you
    were going to tell me
  • 43:40 - 43:44
    a really good Bulgarian joke.
  • 43:44 - 43:48
    OK so for example, Hebrew
    is my mother tongue.
  • 43:48 - 43:52
    And we have a Hebrew
    label for Harvey Milk.
  • 43:52 - 43:54
    But we don't have a description.
  • 43:54 - 44:01
    So let's fix that right now by
    clicking the edit button right
  • 44:01 - 44:02
    here.
  • 44:02 - 44:06
    I click edit, and this
    table became editable.
  • 44:06 - 44:10
    And now I can very briefly
    type a description.
  • 44:23 - 44:24
    AUDIENCE: Online in
    about 20 seconds.
  • 44:24 - 44:25
    But can we hold it?
  • 44:25 - 44:26
    ASAF BARTOV: OK.
  • 44:28 - 44:30
    That was good timing
    for the screen to crash.
  • 44:54 - 44:54
    OK?
  • 44:59 - 45:02
    Are we back?
  • 45:02 - 45:03
    OK.
  • 45:03 - 45:04
    Sorry about that.
  • 45:04 - 45:08
    So this was all about what to
    call him in different languages
  • 45:08 - 45:10
    and scripts and how to
    tell this person apart
  • 45:10 - 45:14
    from other people with
    potentially the same name.
  • 45:14 - 45:18
    Let's scroll down and see
    what else does Wikidata
  • 45:18 - 45:20
    know about this person?
  • 45:20 - 45:24
    So as you can see, this is
    a list of statements, right?
  • 45:24 - 45:26
    This is a list of statements.
  • 45:26 - 45:28
    And the properties
    are on the left,
  • 45:28 - 45:30
    the values are on the right.
  • 45:30 - 45:34
    So the first thing Wikidata
    knows about Harvey Milk
  • 45:34 - 45:39
    is a very important
    property called instance of.
  • 45:39 - 45:40
    Instance of.
  • 45:40 - 45:45
    And the property instance of
    answers the very basic question
  • 45:45 - 45:49
    what kind of thing is
    this that I'm describing?
  • 45:49 - 45:51
    Is it a book?
  • 45:51 - 45:52
    Is it a poem?
  • 45:52 - 45:54
    Is it a mountain?
  • 45:54 - 45:56
    Is it a theological concept?
  • 45:56 - 45:58
    No, it's a human.
  • 45:58 - 46:00
    It's a person, OK?
  • 46:00 - 46:02
    The item about Mt.
  • 46:02 - 46:07
    Everest will say
    instance of mountain, OK?
  • 46:07 - 46:11
    This is a very
    important property.
  • 46:11 - 46:12
    Why is it important?
  • 46:12 - 46:15
    Wouldn't anyone looking
    at this know that this is
  • 46:15 - 46:16
    a human being?
  • 46:16 - 46:16
    Yes.
  • 46:16 - 46:19
    Anyone looking at
    this will know.
  • 46:19 - 46:24
    But if I want a computer to
    be able to pull information
  • 46:24 - 46:28
    about people, I want to
    be able to easily exclude
  • 46:28 - 46:31
    all the mountains and
    poems and other things that
  • 46:31 - 46:33
    are not people from my query.
  • 46:33 - 46:37
    So this single datum,
    this single piece of data,
  • 46:37 - 46:42
    is what tells computers and
    algorithms very clearly,
  • 46:42 - 46:43
    this is a human.
  • 46:43 - 46:47
    Things that aren't instance
    of human are other things.
  • 46:47 - 46:48
    OK?
  • 46:48 - 46:50
    So it may sound very
    trivial, but it's not.
  • 46:50 - 46:52
    It's very important
    to have an instance
  • 46:52 - 46:54
    of field for Wikidata items.
  • 46:54 - 46:55
    All right, what else do we know?
  • 46:55 - 46:59
    Well, Wikidata knows about
    an image for Harvey Milk.
  • 46:59 - 47:03
    Again, we can find a ton of
    images-- or maybe not a ton,
  • 47:03 - 47:05
    but we can find dozens
    of images of Harvey Milk
  • 47:05 - 47:10
    on Commons, on our Wikimedia
    multimedia repository.
  • 47:10 - 47:13
    So why should we have a
    single image here on Wikidata?
  • 47:13 - 47:16
    Again, this is
    mostly for reusers.
  • 47:16 - 47:19
    If I'm building some kind of
    tool that pulls information
  • 47:19 - 47:22
    from Wikidata, it's
    nice if there's
  • 47:22 - 47:25
    at least one representative
    image to kind of use
  • 47:25 - 47:30
    as the default or immediate
    image for Harvey Milk
  • 47:30 - 47:33
    in some other reused context.
  • 47:33 - 47:35
    All right, sex or gender--
  • 47:35 - 47:36
    male.
  • 47:36 - 47:39
    Country of citizenship--
    United States of America.
  • 47:39 - 47:40
    Given name is Harvey.
  • 47:40 - 47:42
    The date of birth is so and so.
  • 47:42 - 47:44
    The place of birth is Woodmere.
  • 47:44 - 47:46
    The place of death
    is San Francisco.
  • 47:46 - 47:49
    The manner of death is homicide.
  • 47:49 - 47:51
    Wikidata knows that.
  • 47:51 - 47:56
    Now again, every
    little datum like that
  • 47:56 - 48:02
    is the basis for later querying
    and answering questions.
  • 48:02 - 48:07
    So the fact that we record the
    manner of death of people--
  • 48:07 - 48:09
    or at least of some people--
  • 48:09 - 48:12
    will allow us later
    to go, you know,
  • 48:12 - 48:17
    who are some people from
    Belgium who died by homicide?
  • 48:17 - 48:25
    That's a question Wikidata can
    answer, thanks to this field.
  • 48:25 - 48:28
    The other thing I mentioned
    is that things are links.
  • 48:28 - 48:30
    So the place of
    birth is Woodmere.
  • 48:30 - 48:32
    I don't know where
    Woodmere is, but I
  • 48:32 - 48:34
    can click that and find out.
  • 48:34 - 48:38
    Here is the Wikidata item
    about Woodmere, right?
  • 48:38 - 48:41
    It was the value in the
    statement about Harvey Milk,
  • 48:41 - 48:44
    but now I'm looking at
    the item about Woodmere.
  • 48:44 - 48:48
    And it turns out it's in
    Nassau County, New York, right?
  • 48:48 - 48:50
    And of course, Wikidata has
    a whole bunch of information
  • 48:50 - 48:55
    for me about Woodmere--
  • 48:55 - 49:00
    what country it's in and the
    coordinates and the population
  • 49:00 - 49:06
    and the area, all the things you
    would expect about a place, OK?
  • 49:06 - 49:08
    Let's get back to Harvey Milk.
  • 49:10 - 49:13
    So the manner of death,
    the cause of death--
  • 49:13 - 49:17
    now here, Wikidata gives
    us excellent information.
  • 49:17 - 49:20
    The actual cause of death
    is ballistic trauma.
  • 49:20 - 49:22
    That's a professional term.
  • 49:22 - 49:28
    And this statement
    has qualifiers.
  • 49:28 - 49:31
    So until now, I was talking
    about triples, right?
  • 49:31 - 49:33
    The item has a property
    with a certain value.
  • 49:33 - 49:35
    Actually, each
    statement can also
  • 49:35 - 49:38
    have a number of
    qualifiers which
  • 49:38 - 49:45
    add aspects of information,
    still about that one question
  • 49:45 - 49:47
    that we're answering, right?
  • 49:47 - 49:50
    So if this property
    answers cause of death,
  • 49:50 - 49:51
    it's not discussing
    anything else.
  • 49:51 - 49:53
    It's not discussing languages.
  • 49:53 - 49:55
    It's not discussing
    date of birth, right?
  • 49:55 - 49:57
    It's talking about
    the cause of death.
  • 49:57 - 49:59
    But we're not just
    saying ballistic trauma.
  • 49:59 - 50:05
    We're saying ballistic trauma
    with the quantity attribute
  • 50:05 - 50:06
    being five.
  • 50:06 - 50:08
    What does that mean?
  • 50:08 - 50:09
    Five bullets, right?
  • 50:09 - 50:13
    There are five
    ballistic traumas.
  • 50:13 - 50:15
    He was he was shot five times.
  • 50:15 - 50:18
    And he was shot by this
    person named Dan White.
  • 50:18 - 50:25
    And this ballistic trauma,
    like this actual shooting,
  • 50:25 - 50:28
    is itself the subject
    of this other thing.
  • 50:28 - 50:31
    This is a link to a
    whole other Wikidata
  • 50:31 - 50:36
    item about the Moscone-Milk
    assassinations.
  • 50:36 - 50:39
    Moscone was the San
    Francisco mayor at the time.
  • 50:44 - 50:48
    We'll see slightly better or
    easier to understand examples
  • 50:48 - 50:49
    of qualifiers in a bit.
  • 50:49 - 50:54
    So if this was
    confusing, hang on.
  • 50:54 - 50:56
    So he was killed by Dan White.
  • 50:56 - 50:58
    He spoke English.
  • 50:58 - 51:00
    His occupation--
    here's an example
  • 51:00 - 51:03
    of a property with more
    than one value, right?
  • 51:03 - 51:06
    So Milk was a politician.
  • 51:06 - 51:10
    But he was also a Navy
    officer, at least for a while.
  • 51:10 - 51:13
    That was another thing that
    he did during his life.
  • 51:13 - 51:15
    And he was a human
    rights activist, right?
  • 51:15 - 51:21
    So some people are
    writers and translators.
  • 51:21 - 51:23
    So people can have more
    than one occupation.
  • 51:23 - 51:26
    People can speak more
    than one language.
  • 51:26 - 51:29
    Here's a better
    example of a qualifier.
  • 51:29 - 51:35
    So the property award received
    has the value Presidential
  • 51:35 - 51:38
    Medal of Freedom.
  • 51:38 - 51:43
    And that award has an
    attribute called point in time,
  • 51:43 - 51:44
    like when was this?
  • 51:44 - 51:47
    This was in 2009.
  • 51:47 - 51:51
    Do you see that
    this piece of data--
  • 51:51 - 52:05
    2009-- is a sub-statement
    or is subjugated
  • 52:05 - 52:10
    to the context of this award,
    was the Presidential Medal
  • 52:10 - 52:10
    of Freedom?
  • 52:10 - 52:13
    It can't just kind of
    free float in the article.
  • 52:13 - 52:18
    It's not that 2009 is itself
    a meaningful thing, right?
  • 52:18 - 52:22
    This medal was awarded in 2009.
  • 52:22 - 52:22
    If
  • 52:22 - 52:24
    Wikidata doesn't
    tell us, for example,
  • 52:24 - 52:27
    when he was a Navy officer, OK?
  • 52:27 - 52:30
    But if we were, for example,
    to look that up right now
  • 52:30 - 52:34
    and find out that Milk was
    a Navy officer between 1962
  • 52:34 - 52:40
    and 1964, we could go back
    here to the Navy officer bit
  • 52:40 - 52:41
    and click edit.
  • 52:41 - 52:44
    This is how I edit this
    particular little piece
  • 52:44 - 52:45
    of information.
  • 52:45 - 52:49
    And add a qualifier like this.
  • 52:49 - 52:51
    I click Add Qualifier.
  • 52:51 - 52:58
    And I could pick start
    time and end time, right?
  • 52:58 - 53:05
    And then I could
    type 1962 to 1964,
  • 53:05 - 53:08
    and that would be
    teaching Wikidata.
  • 53:08 - 53:11
    Oh, I'm sorry, I meant to
    do that for Navy officer.
  • 53:11 - 53:11
    OK.
  • 53:11 - 53:15
    But, you know,
    that is the exact--
  • 53:15 - 53:18
    the accurate time span
    of that statement.
  • 53:18 - 53:23
    So it's true to say about a
    person, he was a Navy officer,
  • 53:23 - 53:26
    even if of course he wasn't a
    Navy officer his entire life.
  • 53:26 - 53:28
    But it's better and
    it's more accurate,
  • 53:28 - 53:32
    to say he was a Navy officer
    between 1962 and 1964.
  • 53:32 - 53:35
    Don't worry, I'm
    not saving this.
  • 53:35 - 53:39
    No vandalizing of
    Wikidata in this session.
  • 53:39 - 53:40
    OK.
  • 53:40 - 53:41
    Moving on.
  • 53:41 - 53:42
    What else does Wikidata know?
  • 53:42 - 53:44
    He was educated at
    this university.
  • 53:44 - 53:47
    He was a member of
    this political party.
  • 53:47 - 53:47
    Right?
  • 53:47 - 53:49
    That's of course if
    they're a relevant property
  • 53:49 - 53:52
    for a politician.
  • 53:52 - 53:56
    Religion, military branch,
    what is the category on commons
  • 53:56 - 53:59
    that discusses this
    item, is something
  • 53:59 - 54:01
    that Wikidata can tell us.
  • 54:01 - 54:02
    And that's it.
  • 54:02 - 54:05
    Now, is that everything
    that we could possibly
  • 54:05 - 54:08
    say in a structured
    way about Harvey Milk?
  • 54:08 - 54:09
    No.
  • 54:09 - 54:14
    We could probably find at
    least a few more things to say.
  • 54:14 - 54:17
    We will see how to contribute
    new information to Wikidata
  • 54:17 - 54:20
    in just a minute with
    a different example.
  • 54:20 - 54:23
    But this-- all this was
    a set of statements.
  • 54:23 - 54:24
    Right?
  • 54:24 - 54:26
    This was the title
    statements here.
  • 54:29 - 54:31
    But at the bottom of the
    list of statements is
  • 54:31 - 54:34
    another section
    called identifiers.
  • 54:34 - 54:37
    And I want to spend a minute
    talking about what that is.
  • 54:37 - 54:44
    So identifiers is a
    collection of keys.
  • 54:44 - 54:48
    A collection of
    IDs, or codes, that
  • 54:48 - 54:53
    are keys to other
    information sources.
  • 54:53 - 54:59
    And a lot of Wikidata items
    have a whole series of keys
  • 54:59 - 55:03
    to other databases, other
    sites, other repositories,
  • 55:03 - 55:08
    that help you or a computer
    be able to access not just
  • 55:08 - 55:12
    some database and look for
    information about Harvey Milk,
  • 55:12 - 55:17
    but access the exact record
    relevant to Harvey Milk.
  • 55:17 - 55:20
    And again, if you imagine
    someone named John Smith,
  • 55:20 - 55:22
    that is really valuable, right?
  • 55:22 - 55:23
    If you're not just
    told, oh yeah,
  • 55:23 - 55:25
    you can look at the
    Library of Congress
  • 55:25 - 55:28
    for John Smith,
    good luck with that.
  • 55:28 - 55:30
    Or if I tell you, go to
    the Library of Congress
  • 55:30 - 55:36
    to this record for this John
    Smith, you see the difference.
  • 55:36 - 55:42
    So Wikidata tells us that on
    VIAF, which is the Virtual
  • 55:42 - 55:45
    International Authority File.
  • 55:45 - 55:50
    It's an aggregated master
    index built by bibliographers,
  • 55:50 - 55:53
    by librarians, of people.
  • 55:53 - 55:53
    Right?
  • 55:53 - 55:57
    It tries to kind of aggregate
    information about people
  • 55:57 - 55:59
    across library
    catalogs everywhere.
  • 55:59 - 56:05
    So the VIAF ID for Harvey
    Milk is this number.
  • 56:05 - 56:07
    And conveniently,
    if I click that,
  • 56:07 - 56:10
    I'm not taking to
    some Wikidata item.
  • 56:10 - 56:13
    I'm actually taken
    to the relevant site.
  • 56:13 - 56:17
    So this took me right
    to viaf.org, the Virtual
  • 56:17 - 56:22
    International Authority File,
    directly to their record
  • 56:22 - 56:23
    about Harvey Milk.
  • 56:23 - 56:24
    All right?
  • 56:24 - 56:27
    And that itself leads
    me to national catalogs
  • 56:27 - 56:30
    of national libraries
    all over the world.
  • 56:30 - 56:32
    We won't get into the
    things you can do with VIAF.
  • 56:32 - 56:37
    The point is Wikidata
    contained the piece of thread
  • 56:37 - 56:41
    that I could tug on
    to arrive directly
  • 56:41 - 56:45
    to that information
    in other databases.
  • 56:45 - 56:46
    Yes.
  • 56:46 - 56:50
    And it has that for many,
    many kinds of databases.
  • 56:50 - 56:53
    The BNF, for example, that's
    the National Library of France.
  • 56:53 - 56:56
    And that will take me
    to that index card.
  • 56:56 - 56:57
    IMDB.
  • 56:57 - 56:59
    We all know IMDB, right?
  • 56:59 - 57:03
    So here I have the key
    to Harvey Milk in IMDB.
  • 57:03 - 57:06
    And this is what IMDB says
    about Harvey Milk, right?
  • 57:06 - 57:08
    They have their own piece
    of information about him,
  • 57:08 - 57:12
    of course, with filmography
    and everything else.
  • 57:12 - 57:15
    And see, I did not have
    to search IMDB for it.
  • 57:15 - 57:19
    I just had the key right
    there waiting for me.
  • 57:19 - 57:21
    Now, again, this is
    very convenient for me
  • 57:21 - 57:25
    as I just showed you the
    human use case for this.
  • 57:25 - 57:28
    But it's even more
    powerful in aggregate
  • 57:28 - 57:35
    when we allow computers to
    traverse this network of links
  • 57:35 - 57:36
    between--
  • 57:36 - 57:42
    not just within wiki data, but
    between data storage facilities
  • 57:42 - 57:44
    and repositories.
  • 57:44 - 57:50
    This is sometimes referred to
    as the linked data open cloud.
  • 57:50 - 57:53
    Cloud, because it's multiple
    different repositories
  • 57:53 - 57:55
    that are interlinked.
  • 57:55 - 58:02
    And Wikidata is already, and
    to a growing extent, the Nexus,
  • 58:02 - 58:04
    the connection
    point between a lot
  • 58:04 - 58:07
    of these different databases.
  • 58:07 - 58:09
    So IMDB, for example,
    it's a good example
  • 58:09 - 58:11
    because it's site
    almost everyone knows,
  • 58:11 - 58:14
    IMDB has information
    about Harvey Milk.
  • 58:14 - 58:17
    But that information
    does not include a link
  • 58:17 - 58:19
    to the French National Library.
  • 58:19 - 58:20
    Right?
  • 58:20 - 58:21
    Do you see what I'm saying?
  • 58:21 - 58:26
    So IMDB is a data repository
    with IDs and allows linking.
  • 58:26 - 58:28
    But it does not give you
    what Wikidata gives you which
  • 58:28 - 58:33
    is this kind of collection of--
  • 58:33 - 58:36
    it's like a junction of all
    these different data sources.
  • 58:36 - 58:38
    So Wikidata is the
    place where you
  • 58:38 - 58:41
    can document these
    interrelationships
  • 58:41 - 58:42
    or equivalencies.
  • 58:42 - 58:42
    Right?
  • 58:42 - 58:49
    So ID, you know, 587548 on IMDB
    is discussing the same topic
  • 58:49 - 58:52
    as French National
    Library ID whatever.
  • 58:52 - 58:55
    Wikidata contains that
    piece of information.
  • 58:55 - 58:59
    that this ID in this database
    is about the same person
  • 58:59 - 59:04
    as that ID in that database.
  • 59:04 - 59:05
    OK.
  • 59:05 - 59:07
    So that's what
    identifiers are about.
  • 59:07 - 59:11
    Still scrolling down the
    Wikidata item about Harvey
  • 59:11 - 59:16
    Milk, we have the site links.
  • 59:16 - 59:21
    The site links are links
    to Wikimedia projects
  • 59:21 - 59:23
    that are related to this item.
  • 59:23 - 59:25
    So of course there
    are Wikipedia articles
  • 59:25 - 59:29
    about Harvey Milk in many,
    many different wikipedias.
  • 59:29 - 59:32
    Quite a few language versions.
  • 59:32 - 59:35
    And there are
    pages on Wikiquote,
  • 59:35 - 59:37
    one of the sister projects.
  • 59:37 - 59:39
    There are pages on
    Wikiquote with some quotes
  • 59:39 - 59:40
    from Harvey Milk.
  • 59:40 - 59:45
    And there is even a page for
    Harvey Milk on Wikisource.
  • 59:45 - 59:46
    Right?
  • 59:46 - 59:48
    So this is a collection
    of those links.
  • 59:48 - 59:53
    And those of you who have maybe
    only dealt with Wikidata data
  • 59:53 - 59:57
    for inter-wiki links, which
    we used to do in the old days
  • 59:57 - 60:00
    manually within
    the article text,
  • 60:00 - 60:02
    now we do it through
    Wikidata, so maybe that's
  • 60:02 - 60:04
    the only thing you didn't
    know about Wikidata
  • 60:04 - 60:10
    is how to update these
    inter-wiki tables on Wikidata.
  • 60:10 - 60:11
    All right.
  • 60:11 - 60:14
    So that concludes
    our little tour
  • 60:14 - 60:19
    of the anatomy of
    a Wikidata page.
  • 60:19 - 60:22
    I will just remind you that
    it's a wiki page, which
  • 60:22 - 60:26
    means it has a discussion
    page, a talk page.
  • 60:26 - 60:28
    This one happens to be empty.
  • 60:28 - 60:30
    But, you know, if we have
    concerns or arguments
  • 60:30 - 60:32
    about some of the
    data here that is
  • 60:32 - 60:33
    what we would use
    to discuss this
  • 60:33 - 60:37
    and to arrive at consensus.
  • 60:37 - 60:42
    It also has a history view just
    like every Wikipedia article.
  • 60:42 - 60:47
    So you can see here
    a list of edits.
  • 60:47 - 60:49
    Maybe some of you
    have never looked
  • 60:49 - 60:52
    at a history page on Wikipedia,
    so this looks overwhelming.
  • 60:52 - 60:55
    But every line here,
    every entry here,
  • 60:55 - 60:58
    is a single edit, a single
    revision, a single change
  • 60:58 - 61:00
    to this Wikidata item.
  • 61:00 - 61:02
    Just Harvey Milk.
  • 61:02 - 61:04
    And you can see at the very
    top this edit that I just
  • 61:04 - 61:07
    made-- this is my
    volunteer account
  • 61:07 - 61:10
    and I just made this edit,
    and in parentheses you
  • 61:10 - 61:11
    can see what I did.
  • 61:11 - 61:15
    I added an HE,
    Hebrew, description.
  • 61:15 - 61:17
    And this is the text
    that I added in Hebrew.
  • 61:17 - 61:17
    Right?
  • 61:17 - 61:21
    So we can see who added
    what to the Wikidata item,
  • 61:21 - 61:25
    just like we can do
    the same on Wikipedia.
  • 61:25 - 61:26
    So we have the revision history.
  • 61:26 - 61:28
    We can undo edits.
  • 61:28 - 61:30
    We can revert, just
    like on Wikipedia.
  • 61:34 - 61:37
    And what else did I
    want to show here?
  • 61:37 - 61:41
    We can add an item to my
    watch list using the star,
  • 61:41 - 61:42
    just like on Wikipedia.
  • 61:42 - 61:47
    So we have all these
    standard wiki features
  • 61:47 - 61:48
    that we would come to expect.
  • 61:50 - 61:54
    Let's pause for questions.
  • 61:54 - 61:58
    Any questions about what
    we've covered so far?
  • 62:03 - 62:03
    Yes.
  • 62:07 - 62:11
    Are attributes of statements
    precept for the specific value?
  • 62:17 - 62:20
    No they're not reset.
  • 62:20 - 62:30
    And generally Wikidata data does
    not enforce by default logic.
  • 62:30 - 62:32
    So, I mean, there's
    nothing to prevent you
  • 62:32 - 62:39
    from editing the
    item about Brazil,
  • 62:39 - 62:43
    and adding the property height.
  • 62:47 - 62:50
    Now height is not a relevant
    property for a country.
  • 62:50 - 62:51
    Right?
  • 62:51 - 62:54
    I mean, maybe average
    elevation, maybe.
  • 62:54 - 62:56
    But not just height,
    which is used for humans
  • 62:56 - 62:59
    or for physical things.
  • 62:59 - 63:02
    So you could add that
    property to Brazil and save it
  • 63:02 - 63:05
    and the wiki would not complain.
  • 63:05 - 63:08
    Now in the background
    there are kind
  • 63:08 - 63:13
    of extra wiki outside the
    wiki prostheses for constraint
  • 63:13 - 63:14
    validation.
  • 63:14 - 63:16
    So there are bots and
    other processes that
  • 63:16 - 63:18
    run, and occasionally,
    for example,
  • 63:18 - 63:27
    identify non-living things
    with a date of birth field.
  • 63:27 - 63:28
    That's nonsensical.
  • 63:28 - 63:29
    That should not exist.
  • 63:29 - 63:32
    If someone mistakenly added
    that there are processes
  • 63:32 - 63:34
    that would flag
    that to be fixed.
  • 63:34 - 63:37
    But the wiki itself,
    Wikidata, will not
  • 63:37 - 63:39
    prevent you from adding that.
  • 63:39 - 63:42
    And that is by design
    to keep things flexible.
  • 63:42 - 63:44
    So that people don't
    run into, oh wait,
  • 63:44 - 63:47
    but I can't add this
    because nobody thought
  • 63:47 - 63:50
    that I would need this, maybe.
  • 63:50 - 63:55
    I hope that answers
    your question.
  • 63:55 - 63:57
    You say helpful
    answer, question mark.
  • 63:57 - 64:00
    So was it a helpful answer, or?
  • 64:04 - 64:04
    OK.
  • 64:04 - 64:05
    Yes, Eleanor.
  • 64:05 - 64:11
    AUDIENCE: [INAUDIBLE]
  • 64:11 - 64:12
    ASAF BARTOV: Excellent question.
  • 64:12 - 64:13
    I'll repeat it.
  • 64:13 - 64:16
    You ask how do I find
    the wiki data item
  • 64:16 - 64:18
    number from Wikipedia.
  • 64:18 - 64:22
    If I'm reading about Harvey Milk
    and I want to look at the data
  • 64:22 - 64:24
    how do I do that?
  • 64:24 - 64:27
    That is an excellent question
    and let's skip to Wikipedia.
  • 64:27 - 64:32
    Conveniently I have the
    link right here on English.
  • 64:32 - 64:36
    So this is the Wikipedia
    article about Harvey Milk
  • 64:36 - 64:43
    and every item on Wikipedia
    should have a wiki data
  • 64:43 - 64:48
    item associated with it, but it
    doesn't happen automatically.
  • 64:48 - 64:51
    So if I just created
    a page on Wikipedia
  • 64:51 - 64:55
    I also need to create a
    Wikidata entity for it
  • 64:55 - 64:57
    if it doesn't already exist.
  • 64:57 - 64:59
    It could already exist
    because it was already
  • 64:59 - 65:02
    covered in a different
    language, for example.
  • 65:02 - 65:05
    So that was parenthetical.
  • 65:05 - 65:09
    But every article on Wikipedia
    should have, here on the side,
  • 65:09 - 65:14
    on the side are under Tools,
    a link called Wikidata item.
  • 65:14 - 65:15
    Right here.
  • 65:15 - 65:16
    OK.
  • 65:16 - 65:18
    That Wikidata data
    item is a link
  • 65:18 - 65:22
    that takes you to
    Wikidata, to the entity,
  • 65:22 - 65:24
    and there you find the number.
  • 65:24 - 65:25
    You can-- you don't
    even have to click it.
  • 65:25 - 65:28
    I mean, the URL itself
    tells you the number.
  • 65:28 - 65:35
    The number, you see, it's
    wikidata.org/wiki/q17141.
  • 65:35 - 65:35
    OK.
  • 65:35 - 65:37
    So that was an
    excellent question.
  • 65:37 - 65:38
    Other questions?
  • 65:38 - 65:38
    Yes.
  • 65:41 - 65:44
    Yeah, about the additional
    attributes, the qualifiers.
  • 65:44 - 65:47
    So, yes, I answered
    more generically.
  • 65:47 - 65:49
    But just like the
    properties themselves
  • 65:49 - 65:53
    are not limited per item,
    the qualifiers per statement
  • 65:53 - 65:58
    are also not
    entirely preordained.
  • 65:58 - 66:00
    But there is some
    structure to it.
  • 66:00 - 66:03
    I don't want to go into it
    at great length right now.
  • 66:03 - 66:06
    If we have time in the end
    we can get back to that.
  • 66:06 - 66:10
    But some qualifiers are again
    relevant for some things,
  • 66:10 - 66:13
    start time, end time,
    and others won't be.
  • 66:13 - 66:16
    Wikidata does try to offer you--
  • 66:16 - 66:19
    you may remember when I
    clicked add qualifier,
  • 66:19 - 66:22
    it gave me kind of drop down
    of some relevant qualifiers.
  • 66:22 - 66:24
    So it does try to
    help you in that way.
  • 66:27 - 66:28
    Other question?
  • 66:28 - 66:31
    Are the values for
    instance of already
  • 66:31 - 66:33
    mappable to external ontologies?
  • 66:36 - 66:41
    That is a complicated question.
  • 66:41 - 66:43
    I'll help people understand
    the question first.
  • 66:43 - 66:49
    So an ontology is a
    structure, some kind
  • 66:49 - 66:52
    of hierarchy or
    cloud, of entities
  • 66:52 - 66:55
    and their interrelationships.
  • 66:55 - 66:57
    An ontology would
    say, for example,
  • 66:57 - 66:59
    a person is a living thing.
  • 66:59 - 67:00
    So is a dog.
  • 67:00 - 67:02
    They're both living things,
    but they're different things.
  • 67:02 - 67:10
    And then, you know, say
    things about those entities
  • 67:10 - 67:11
    and their interrelationships.
  • 67:11 - 67:13
    Now there are many,
    many competing,
  • 67:13 - 67:17
    or coexisting models
    of ontology's.
  • 67:17 - 67:20
    Many of them were created
    for specific needs.
  • 67:20 - 67:25
    Many of them want to be
    a universal ontology.
  • 67:25 - 67:28
    But of course it's
    impossible to quite
  • 67:28 - 67:32
    agree on one complete
    and simple ontology.
  • 67:32 - 67:34
    And so there are
    many ontology's.
  • 67:34 - 67:39
    Which brings up your question,
    can we map across ontology's?
  • 67:39 - 67:44
    Can we say that when wiki data
    says instance of book that
  • 67:44 - 67:47
    is equivalent to some other
    ontology saying instance
  • 67:47 - 67:50
    of bibliographic record?
  • 67:50 - 67:51
    And the answer is yes.
  • 67:51 - 67:52
    There are some such mappings.
  • 67:52 - 67:54
    They are incomplete.
  • 67:54 - 67:58
    And there's no kind of
    auto magic thing happening
  • 67:58 - 68:01
    in the wiki vis-a-vis
    those other ontology's.
  • 68:01 - 68:03
    That's kind of
    left as an exercise
  • 68:03 - 68:06
    for those dealing with those
    other ontology's, and for tool
  • 68:06 - 68:10
    builders and other
    platform improvements
  • 68:10 - 68:13
    beyond Wikidata itself.
  • 68:13 - 68:14
    OK.
  • 68:14 - 68:15
    Other questions?
  • 68:15 - 68:17
    Yeah, we have one from
    the YouTube stream.
  • 68:17 - 68:21
    Someone asked, why can't I
    link Howard Carter's occupation
  • 68:21 - 68:26
    to archeologists when I use
    an info box that fetches info
  • 68:26 - 68:29
    from Wikidata?
  • 68:29 - 68:33
    Why can't I link it
    from the info box?
  • 68:33 - 68:36
    So, someone on the
    stream answered
  • 68:36 - 68:38
    saying, because it's
    an improper connection,
  • 68:38 - 68:40
    because the target is not
    about the subject only.
  • 68:43 - 68:47
    The target is not
    about the subject?
  • 68:47 - 68:48
    If I understand the
    question correctly,
  • 68:48 - 68:53
    what you would want to be able
    to do is from within Wikipedia
  • 68:53 - 68:59
    be able to say occupation
    and link to a Wikidata entry
  • 68:59 - 69:01
    about archeology.
  • 69:01 - 69:04
    That doesn't quite
    work that way.
  • 69:04 - 69:05
    We will get to a
    little discussion
  • 69:05 - 69:08
    of that in an upcoming
    section of this talk.
  • 69:08 - 69:13
    So I will defer the rest
    of my answer to then.
  • 69:13 - 69:15
    OK.
  • 69:15 - 69:19
    So we're done with
    questions for this phase,
  • 69:19 - 69:23
    and my browser got
    tired of waiting for me.
  • 69:23 - 69:27
    So, yes.
  • 69:27 - 69:27
    All right.
  • 69:27 - 69:37
    So we took a look at Wikidata,
    and we took questions.
  • 69:37 - 69:41
    So now, let's teach
    Wikidata some new things.
  • 69:41 - 69:44
    Some things it
    doesn't already know.
  • 69:44 - 69:47
    Let's look at this item here.
  • 69:47 - 69:51
    So this item is about one
    of my favorite writers,
  • 69:51 - 69:54
    an American writer
    named Helen Dewitt.
  • 69:54 - 70:02
    Wikidata, of course, fondly
    refers to her as q54674,
  • 70:02 - 70:03
    but we can call
    her Helen Dewitt.
  • 70:03 - 70:06
    And what can we contribute here?
  • 70:06 - 70:11
    So Wikidata has far less
    information about Helen Dewitt.
  • 70:11 - 70:13
    Most of you probably haven't
    heard of her, that's OK.
  • 70:13 - 70:15
    What does Wikidata
    know about her?
  • 70:15 - 70:16
    Well instance of human.
  • 70:16 - 70:18
    We have a photo of her.
  • 70:18 - 70:19
    She's female.
  • 70:19 - 70:21
    She's an American.
  • 70:21 - 70:22
    Her name is Helen.
  • 70:22 - 70:23
    Date of birth.
  • 70:23 - 70:24
    Place of birth.
  • 70:24 - 70:26
    She's an author, a
    novelist, a writer.
  • 70:26 - 70:29
    She was educated at the
    University of Oxford.
  • 70:29 - 70:33
    And Wikidata knows what
    her official website is.
  • 70:33 - 70:36
    That's useful, but that's it.
  • 70:36 - 70:38
    Now we can contribute
    information here.
  • 70:38 - 70:43
    For example, she's an American
    author writing in English.
  • 70:43 - 70:46
    So we could add
    that information.
  • 70:46 - 70:48
    We could click the
    Add button here.
  • 70:48 - 70:50
    And this is a good
    moment to acknowledge
  • 70:50 - 70:55
    that the user interface of
    Wikidata is a work in progress.
  • 70:55 - 70:57
    It's not as intuitive
    as it might be.
  • 70:57 - 70:59
    So you need to
    understand that click--
  • 70:59 - 71:02
    to add a completely
    new property,
  • 71:02 - 71:04
    You need to click
    this Add button.
  • 71:04 - 71:08
    If you want to add an additional
    value to the property official
  • 71:08 - 71:12
    website, you need to
    click this Add button.
  • 71:12 - 71:14
    It makes a kind of
    sense with a shaded box.
  • 71:14 - 71:16
    But, you know, you need
    to kind of pay attention,
  • 71:16 - 71:19
    and it's not as
    friendly as it might be.
  • 71:19 - 71:21
    [COUGHING] Excuse me.
  • 71:21 - 71:23
    So, let's add a property here.
  • 71:23 - 71:26
    Click the Add button.
  • 71:26 - 71:30
    Again, Wikidata tries to
    be useful by suggesting
  • 71:30 - 71:33
    some relevant
    properties for humans.
  • 71:33 - 71:37
    A bit more morbidly it suggests,
    how about date of death?
  • 71:37 - 71:39
    That's not cool, Wikidata.
  • 71:39 - 71:40
    Helen Dewitt is still alive.
  • 71:40 - 71:43
    So I will not add
    date of death, but I
  • 71:43 - 71:46
    can add languages spoken,
    written, or signed.
  • 71:46 - 71:48
    OK, so I click that.
  • 71:48 - 71:52
    And she writes in English.
  • 71:52 - 71:54
    I just type English-- whoops.
  • 71:54 - 71:57
    Not in Hebrew.
  • 71:57 - 71:58
    Don't panic.
  • 71:58 - 72:01
    I type English here.
  • 72:01 - 72:04
    And, oh, and of course Wikidata
    has auto-complete, right?
  • 72:04 - 72:06
    So it tries to help me along.
  • 72:06 - 72:10
    But you will notice that
    it has all kinds of things
  • 72:10 - 72:11
    called English.
  • 72:11 - 72:14
    I mean, it turns out that
    there is a place in Indiana
  • 72:14 - 72:16
    called English, Indiana.
  • 72:16 - 72:17
    Did I mean that?
  • 72:17 - 72:20
    No, of course I didn't mean
    that she writes her books
  • 72:20 - 72:22
    in English, Indiana.
  • 72:22 - 72:22
    Right?
  • 72:22 - 72:26
    But, you know, Wikidata gives me
    the option of linking to that.
  • 72:26 - 72:31
    I also don't mean the botanist
    Carl Schwartz English.
  • 72:31 - 72:33
    No, no I mean the
    west Germanic language
  • 72:33 - 72:34
    originating in England.
  • 72:34 - 72:35
    That's what I mean.
  • 72:35 - 72:36
    So I click that.
  • 72:36 - 72:38
    And I click Save.
  • 72:38 - 72:38
    And that's it.
  • 72:38 - 72:42
    Again I have just made
    an edit to Wikidata.
  • 72:42 - 72:48
    I have just taught Wikidata
    that this author speaks English.
  • 72:48 - 72:50
    Now, again, this
    may be very obvious.
  • 72:50 - 72:52
    She's American.
  • 72:52 - 72:55
    Of course not all
    Americans write in English.
  • 72:55 - 72:57
    It may be obvious if
    you look at her books.
  • 72:57 - 72:59
    The important thing
    is that now Wikidata
  • 72:59 - 73:02
    knows this as a piece of data.
  • 73:02 - 73:05
    And, again, think ahead
    to queries, which we will
  • 73:05 - 73:07
    demonstrate in a little bit.
  • 73:07 - 73:09
    Without this piece
    of information
  • 73:09 - 73:14
    that I just added, if I were to
    ask Wikidata five minutes ago,
  • 73:14 - 73:20
    give me a list of novelists
    writing in English, OK,
  • 73:20 - 73:23
    Wikidata would have returned
    thousands of results.
  • 73:23 - 73:28
    But Helen Dewitt would
    not have been among them.
  • 73:28 - 73:32
    Because up until two
    minutes ago Wikidata
  • 73:32 - 73:36
    didn't know that Helen Dewitt
    writes in English and not
  • 73:36 - 73:38
    in Spanish.
  • 73:38 - 73:39
    Do you see?
  • 73:39 - 73:43
    It is this explicit
    statement that will now
  • 73:43 - 73:47
    make her be included in any
    future queries that asks,
  • 73:47 - 73:49
    who are novelists
    writing in English?
  • 73:53 - 73:54
    OK.
  • 73:54 - 73:59
    By the way, she's
    a PhD in Classics.
  • 73:59 - 74:06
    She speaks-- or at least reads
    and writes Latin and Greek,
  • 74:06 - 74:07
    ancient Greek, and I could--
  • 74:07 - 74:10
    I can-- I mean, I
    happen to know that.
  • 74:10 - 74:12
    But wait, wait, wait,
    wait, wait, you say.
  • 74:12 - 74:14
    What about original research?
  • 74:14 - 74:19
    I mean, you can't just add
    stuff like that to Wikidata.
  • 74:19 - 74:20
    Don't you need sources?
  • 74:20 - 74:23
    Citations?
  • 74:23 - 74:24
    Of course I do.
  • 74:24 - 74:25
    Yes.
  • 74:25 - 74:28
    Let's add some sources to this.
  • 74:28 - 74:31
    So on Wikidata,
    just like Wikipedia,
  • 74:31 - 74:35
    things should generally
    be supported by citations,
  • 74:35 - 74:37
    by references.
  • 74:37 - 74:43
    And just like Wikipedia,
    they aren't always supported
  • 74:43 - 74:45
    in that way.
  • 74:45 - 74:49
    OK so, I mean, I can
    just add it to Wikidata.
  • 74:49 - 74:49
    Watch me.
  • 74:49 - 74:50
    I just did that, right?
  • 74:50 - 74:54
    I just added English and
    Latin without any citation,
  • 74:54 - 74:57
    and I will not be
    arrested for it.
  • 74:57 - 75:00
    Just like I could edit
    a Wikipedia article
  • 75:00 - 75:03
    and add some information
    without a citation.
  • 75:03 - 75:04
    It may stick.
  • 75:04 - 75:07
    It may stay in the article,
    or it may be reverted.
  • 75:07 - 75:11
    It depends on the kind of
    information I'm adding.
  • 75:11 - 75:14
    It depends how many people
    are paying attention
  • 75:14 - 75:15
    to the article on Wikipedia.
  • 75:15 - 75:18
    And it works the
    same way on Wikidata.
  • 75:18 - 75:22
    OK, so, you can add some
    things without references.
  • 75:22 - 75:24
    Ideally, when you
    add, information you
  • 75:24 - 75:26
    should include references.
  • 75:26 - 75:31
    So let's be good Wikidata
    citizens and add a source.
  • 75:31 - 75:34
    Here is an article that
    I prepared in advance.
  • 75:38 - 75:39
    This is Helen Dewitt.
  • 75:39 - 75:44
    And in this article,
    somewhere, it actually
  • 75:44 - 75:52
    says right at the
    bottom here, see,
  • 75:52 - 75:55
    Dewitt knows, in descending
    order of proficiency, Latin,
  • 75:55 - 75:57
    ancient Greek, French,
    German, Spanish,
  • 75:57 - 75:59
    and Portuguese, Dutch, Danish,
    Norwegian, Swedish, Arabic,
  • 75:59 - 76:02
    Hebrew and Japanese.
  • 76:02 - 76:05
    This may sound
    excessive, but it's true.
  • 76:05 - 76:06
    I met this woman.
  • 76:06 - 76:10
    So anyway, we don't have
    to include all of that.
  • 76:10 - 76:13
    The point is this article from
    a reasonably reliable source,
  • 76:13 - 76:16
    this magazine,
    this interview, can
  • 76:16 - 76:19
    count as a source for
    the languages she speaks.
  • 76:19 - 76:21
    So I copy the URL.
  • 76:21 - 76:23
    I just copied off my browser.
  • 76:23 - 76:28
    And, whoops-- that's not--
  • 76:28 - 76:29
    here we go.
  • 76:29 - 76:32
    And I can just add
    a reference here
  • 76:32 - 76:35
    to the information that I
    just added to Wikidata, right?
  • 76:35 - 76:38
    I can click Add Reference.
  • 76:38 - 76:46
    And then just say the reference
    URL is, and I just paste.
  • 76:46 - 76:49
    I paste this URL.
  • 76:49 - 76:50
    Hit Enter.
  • 76:50 - 76:51
    And that's it.
  • 76:51 - 76:55
    And now the fact that she
    speaks Latin has a reference.
  • 76:55 - 76:58
    If you look at the other
    things here on Wikidata,
  • 76:58 - 77:03
    you can see that these IDs, for
    example, have references, too.
  • 77:03 - 77:03
    Right?
  • 77:03 - 77:07
    In this case, the reference
    just says, excuse me--
  • 77:15 - 77:19
    In this case it just as
    imported from English Wikipedia.
  • 77:19 - 77:25
    But wait, you say, can
    Wikipedia be a source?
  • 77:25 - 77:27
    Not properly, no.
  • 77:27 - 77:30
    I mean, just like Wikipedia
    itself doesn't cite itself.
  • 77:30 - 77:34
    We don't say, this person
    was born in this city
  • 77:34 - 77:35
    how do we know?
  • 77:35 - 77:37
    We read it on Wikipedia
    in another language.
  • 77:37 - 77:40
    That's not a good citation.
  • 77:40 - 77:41
    It's not a good
    citation for Wikidata
  • 77:41 - 77:45
    either so why do we put it here?
  • 77:45 - 77:49
    Well you can see the qualifier
    here is different, right?
  • 77:49 - 77:54
    It's not reference URL, which
    is what I put in for Latin here.
  • 78:17 - 78:20
    It's not reference URL here,
    it's a different qualifier.
  • 78:20 - 78:23
    It says-- saying, imported from.
  • 78:23 - 78:26
    So this is not an
    actual reference that
  • 78:26 - 78:28
    supports this piece of data.
  • 78:28 - 78:31
    It just shows where did
    this data come from.
  • 78:31 - 78:34
    It's a slightly different
    thing, because this data was
  • 78:34 - 78:37
    mass imported into Wikidata.
  • 78:37 - 78:41
    So it wasn't input by
    hand by some volunteer.
  • 78:41 - 78:45
    It was imported into Wikidata
    en masse by a script,
  • 78:45 - 78:46
    by a program.
  • 78:46 - 78:50
    And we want to know, where
    did this number come from?
  • 78:50 - 78:51
    Well it came from
    English Wikipedia.
  • 78:51 - 78:54
    So again, that's not
    a proper reference
  • 78:54 - 78:56
    for the validity
    of the information,
  • 78:56 - 78:59
    but it does at least tell us
    it came from English Wikipedia.
  • 78:59 - 79:03
    We can click and look on
    English Wikipedia and find out.
  • 79:03 - 79:05
    Maybe there's a
    footnote there that
  • 79:05 - 79:09
    says where it did come from.
  • 79:09 - 79:11
    OK.
  • 79:11 - 79:15
    So this was an example of
    teaching Wikidata something
  • 79:15 - 79:17
    that it didn't know.
  • 79:17 - 79:19
    Something about the languages.
  • 79:19 - 79:21
    And of course I could add
    this reference for English.
  • 79:21 - 79:23
    I could add all the other
    languages that she speaks.
  • 79:23 - 79:26
    And I won't bore you with
    that, but that is basically
  • 79:26 - 79:27
    how it's done.
  • 79:27 - 79:30
    So you click this Add to
    add a completely new--
  • 79:33 - 79:34
    completely new statement.
  • 79:34 - 79:36
    Now, by the way, the fact
    that these are the only two
  • 79:36 - 79:39
    suggestions that
    Wikidata can think of,
  • 79:39 - 79:42
    doesn't mean these
    are the only options.
  • 79:42 - 79:47
    OK, you can just type
    anything that may be relevant.
  • 79:47 - 79:51
    We could add, for
    example, award.
  • 79:51 - 79:53
    Just start typing award.
  • 79:53 - 79:55
    And here I have I have
    a bunch of properties
  • 79:55 - 79:57
    that are relevant for awards.
  • 79:57 - 80:00
    Awards received, together
    with, conferred by, right?
  • 80:00 - 80:06
    There's all kinds of properties
    that I could rely on.
  • 80:06 - 80:10
    And of course there is a list of
    all the properties of Wikidata.
  • 80:10 - 80:12
    And that list is
    also sorted by type.
  • 80:12 - 80:15
    So yes, there is a list of
    properties relevant to people
  • 80:15 - 80:17
    so that you don't have to guess.
  • 80:17 - 80:19
    But a surprising
    amount of the time
  • 80:19 - 80:23
    you can just start typing
    and get the right properties
  • 80:23 - 80:25
    suggested to you.
  • 80:25 - 80:27
    OK.
  • 80:27 - 80:33
    So we taught Wikidata
    something new,
  • 80:33 - 80:39
    and now let's teach Wikidata
    something completely new.
  • 80:39 - 80:39
    Right?
  • 80:39 - 80:42
    So how do we create
    a new Wikidata item?
  • 80:42 - 80:47
    So, like I said, if I
    created a Wikipedia article
  • 80:47 - 80:50
    about something that was
    not previously covered
  • 80:50 - 80:54
    on any other
    Wikipedia, chances are
  • 80:54 - 80:57
    there would not be an already
    existing Wikidata item.
  • 80:57 - 81:03
    Sometimes there might
    be, because Wikidata
  • 81:03 - 81:07
    does have 25 million entities.
  • 81:07 - 81:08
    But sometimes there wouldn't be.
  • 81:08 - 81:10
    So, first of all, I could
    search for it, right?
  • 81:10 - 81:14
    So I could go to Wikidata
    to the search box
  • 81:14 - 81:17
    here and just start typing, and
    search for what I want, right?
  • 81:17 - 81:21
    So if I'm searching for Helen
    Dewitt I just say Helen,
  • 81:21 - 81:26
    and I can see whether
    or not it exists.
  • 81:26 - 81:29
    And there's a detailed search
    results page, et cetera,
  • 81:29 - 81:33
    where I can where I can find out
    if the item does exist or not.
  • 81:33 - 81:35
    Excuse me, this reminds me
    of a very important thing
  • 81:35 - 81:37
    I wanted to
    demonstrate, and that
  • 81:37 - 81:43
    is the multilingualism
    of Wikidata.
  • 81:43 - 81:49
    So remember all these
    labels in other languages.
  • 81:49 - 81:54
    Wikidata knows what to call
    Helen Dewitt in Hebrew.
  • 81:54 - 82:01
    And it will show it to Wikidata
    users whose language is Hebrew.
  • 82:01 - 82:04
    Mine is set to
    English, for your sake.
  • 82:04 - 82:09
    But if I change this I go to
    Preferences here and change
  • 82:09 - 82:10
    my language.
  • 82:10 - 82:15
    [INAUDIBLE] All
    right, and I hit Save.
  • 82:15 - 82:20
    Wikidata will start
    talking to me in Hebrew.
  • 82:20 - 82:23
    Now brace yourselves.
  • 82:23 - 82:25
    Are you ready?
  • 82:25 - 82:28
    Don't panic, it's right to left.
  • 82:28 - 82:33
    Oh my god everything
    is topsy-turvy.
  • 82:33 - 82:37
    So this is the same
    article in Hebrew.
  • 82:37 - 82:39
    So the sidebar has
    switched direction,
  • 82:39 - 82:41
    and I know most of
    you cannot read it.
  • 82:41 - 82:42
    Bear with me.
  • 82:42 - 82:45
    This is the label
    that we previously
  • 82:45 - 82:47
    saw in the label box.
  • 82:47 - 82:50
    This is how you spell
    Helen Dewitt in Hebrew.
  • 82:50 - 82:53
    And here is the
    description in Hebrew.
  • 82:53 - 82:55
    It's not the description in
    English, this description,
  • 82:55 - 82:57
    American writer, which
    I was shown previously.
  • 82:57 - 83:01
    Now I'm shown the Hebrew
    description, appropriately.
  • 83:01 - 83:04
    But more interestingly,
    oh my god!
  • 83:04 - 83:08
    All these statements
    are suddenly in Hebrew.
  • 83:08 - 83:09
    How did that happen?
  • 83:12 - 83:16
    Well this tiny word here
    is the very concise way
  • 83:16 - 83:22
    to say in Hebrew, instance of,
    and this word here means human.
  • 83:22 - 83:26
    So these are links to
    the same things, right?
  • 83:26 - 83:28
    It still links to Q5.
  • 83:28 - 83:32
    Q5 is the Wikidata
    entity for human.
  • 83:32 - 83:33
    These are still the same things.
  • 83:33 - 83:38
    But because Wikidata has
    multiple labels for everything,
  • 83:38 - 83:40
    it has multiple
    labels for items.
  • 83:40 - 83:43
    And it also has multiple
    labels for property names.
  • 83:43 - 83:46
    So Wikidata knows how
    to say, instance of,
  • 83:46 - 83:50
    and award received,
    in other languages.
  • 83:50 - 83:54
    That is why it is able to show
    me all this data in Hebrew
  • 83:54 - 84:00
    even if none of that data was
    actually input into Wikidata
  • 84:00 - 84:02
    by a Hebrew speaker.
  • 84:02 - 84:05
    That data could have been
    input by English speakers,
  • 84:05 - 84:08
    but thanks to the
    fact that someone once
  • 84:08 - 84:13
    translated the word
    photo into Hebrew,
  • 84:13 - 84:15
    I can see this field in Hebrew.
  • 84:18 - 84:21
    So one of the things you
    can do to help Wikidata,
  • 84:21 - 84:24
    right now, without
    any special knowledge
  • 84:24 - 84:26
    is to help translate
    those labels.
  • 84:26 - 84:29
    Every label only needs to
    be translated just once.
  • 84:29 - 84:31
    So you can see that all
    of these properties, date
  • 84:31 - 84:35
    of birth, name et cetera,
    they all have Hebrew labels.
  • 84:35 - 84:37
    Maybe one of these would not.
  • 84:37 - 84:38
    No, they all have Hebrew labels.
  • 84:38 - 84:39
    Doing pretty good.
  • 84:43 - 84:46
    And I'm able to search
    in my own language.
  • 84:46 - 84:48
    I'm able to click Add.
  • 84:48 - 84:50
    This word is Add,
    so I click this,
  • 84:50 - 84:52
    and now I have the Add screen.
  • 84:52 - 84:56
    It all speaks my language,
    and it's awesome.
  • 84:56 - 85:00
    And now for your sake I
    will switch back to English,
  • 85:00 - 85:03
    but it is important
    to know you can
  • 85:03 - 85:06
    edit Wikidata in any language.
  • 85:06 - 85:09
    And it is far more multi-lingual
    and multi-lingual friendly
  • 85:09 - 85:13
    than, for example commons, which
    is also a project we all share.
  • 85:13 - 85:18
    But commons has some limitations
    on how multi-lingual it is.
  • 85:18 - 85:21
    For example, the category
    names, et cetera.
  • 85:21 - 85:23
    OK.
  • 85:23 - 85:26
    So we were beginning
    to discuss creating
  • 85:26 - 85:27
    something completely new.
  • 85:27 - 85:29
    AUDIENCE: Quick
    questions, if that's OK?
  • 85:29 - 85:31
    So there's two questions on IRC.
  • 85:31 - 85:34
    The first one is, can you
    show search for something
  • 85:34 - 85:35
    like getting the list of things?
  • 85:35 - 85:38
    I want to learn how to search
    for something properly like,
  • 85:38 - 85:44
    show me all the items with
    this value of this property.
  • 85:44 - 85:45
    ASAF BARTOV: Yes.
  • 85:45 - 85:48
    That is part of
    this talk, but I'll
  • 85:48 - 85:49
    get to that in a
    little bit later.
  • 85:49 - 85:52
    There's a whole section where I
    will demonstrate the very, very
  • 85:52 - 85:55
    powerful query
    system of Wikidata
  • 85:55 - 85:57
    where I will cash
    that check that I gave
  • 85:57 - 85:59
    at the beginning of
    all these painters
  • 85:59 - 86:01
    who are sons of painters
    queries et cetera
  • 86:01 - 86:03
    So I will demonstrate
    how to do that.
  • 86:03 - 86:04
    AUDIENCE: Other question.
  • 86:04 - 86:07
    How does Wikidata data deal
    with link rot, and other issues
  • 86:07 - 86:10
    streaming from their URL refs.
  • 86:14 - 86:16
    ASAF BARTOV: URLs break.
  • 86:16 - 86:19
    We call that link rot.
  • 86:19 - 86:22
    Wikidata doesn't have
    any particular magic
  • 86:22 - 86:25
    around link rot,
    just like Wikipedia.
  • 86:25 - 86:29
    So if you do use a bare
    URL it may well rot.
  • 86:29 - 86:34
    But you can add qualifiers
    with back up URLs else
  • 86:34 - 86:38
    on the Internet Archive, or
    another mirroring service.
  • 86:38 - 86:43
    And potentially that could be
    a software feature for Wikidata
  • 86:43 - 86:47
    to automatically save
    or ensure that something
  • 86:47 - 86:49
    is saved on Internet
    Archive, but I don't
  • 86:49 - 86:51
    know that it is doing so now.
  • 86:51 - 86:56
    So, just like Wikipedia, if
    it is a bear URL it may rot.
  • 86:56 - 87:00
    And may need to be
    replaced, possibly by bot.
  • 87:00 - 87:01
    Other questions?
  • 87:10 - 87:13
    All right, so let's
    talk about how you
  • 87:13 - 87:15
    create a completely new item.
  • 87:15 - 87:16
    It's very simple.
  • 87:16 - 87:22
    You go to Wikidata and you
    click here on the side.
  • 87:22 - 87:30
    There's a link, create new item,
    which gives you this screen.
  • 87:30 - 87:35
    And let's create an
    item about a book
  • 87:35 - 87:40
    that I'm reading right now
    by this Bulgarian writer.
  • 87:40 - 87:44
    So we have an article about this
    writer guy named Deyan Enev.
  • 87:44 - 87:49
    But we don't have an
    article or a Wikidata item
  • 87:49 - 88:08
    about one of his famous
    books called Circus Bulgaria.
  • 88:08 - 88:10
    That's the book I'm reading,
    his first collection
  • 88:10 - 88:11
    of short stories in English.
  • 88:11 - 88:14
    Circus Bulgaria came out
    in 2010, Portobello Books,
  • 88:14 - 88:17
    translated by Kapka Kassabova.
  • 88:17 - 88:18
    So that's the book I'm reading.
  • 88:18 - 88:21
    As you can see it's not
    a link on Wikipedia.
  • 88:21 - 88:23
    There's no article about
    it, and there's not even
  • 88:23 - 88:26
    a Wikidata entity item about it.
  • 88:26 - 88:32
    But we can totally create
    it, even without a Wikipedia
  • 88:32 - 88:33
    article.
  • 88:33 - 88:35
    So let's create this new item.
  • 88:35 - 88:37
    Let's create it in
    English for the purposes
  • 88:37 - 88:39
    of our demonstration.
  • 88:39 - 88:45
    The name of the item
    is Circus Bulgaria.
  • 88:45 - 88:48
    Circus Bulgaria,
    that's the name.
  • 88:48 - 88:51
    Not Circus Bulgaria
    parentheses book,
  • 88:51 - 88:54
    or anything you may be
    used to from Wikipedia.
  • 88:54 - 88:57
    It's the actual
    name of the book,
  • 88:57 - 89:00
    and the description,
    again, remember,
  • 89:00 - 89:03
    the description field
    is just to kind of help
  • 89:03 - 89:09
    tell apart this Circus Bulgaria
    from any other potential Circus
  • 89:09 - 89:09
    Bulgaria.
  • 89:09 - 89:11
    Maybe there's a
    film or something.
  • 89:11 - 89:20
    So it's enough to just say
    something like short story
  • 89:20 - 89:23
    collection.
  • 89:23 - 89:28
    I might add by Deyan Enev
    and if just in case, again,
  • 89:28 - 89:32
    some future other short story
    collection by some other author
  • 89:32 - 89:34
    happens to have that same name.
  • 89:34 - 89:36
    That should be
    disambiguating enough.
  • 89:36 - 89:37
    OK.
  • 89:37 - 89:40
    Short story collection
    by Deyan Enev.
  • 89:40 - 89:42
    I could have aliases for this.
  • 89:42 - 89:47
    The aliases assist find-ability.
  • 89:47 - 89:51
    This particular book has just
    this one name, so that's fine.
  • 89:51 - 89:52
    And I click Create.
  • 89:52 - 89:53
    That's it.
  • 89:53 - 89:56
    I just start with a
    label, and a description.
  • 89:56 - 89:59
    I click Create.
  • 89:59 - 90:04
    I have a brand new queue number
    for my new Wikidata item.
  • 90:04 - 90:06
    And Wikidata knows
    what to call it.
  • 90:06 - 90:09
    And a description in
    one language at least.
  • 90:09 - 90:12
    And that's it, and I
    can start populating it.
  • 90:12 - 90:15
    As it can see, it it
    has no site links,
  • 90:15 - 90:17
    but it's ready to be taught.
  • 90:17 - 90:20
    So, for example, I
    can start by teaching
  • 90:20 - 90:25
    it the name of the book
    in another language
  • 90:25 - 90:26
    that I happened to speak.
  • 90:29 - 90:32
    Now it has two labels
    in English and Hebrew.
  • 90:32 - 90:37
    I could also look
    up the book Areon,
  • 90:37 - 90:40
    the original Bulgarian
    label for this book.
  • 90:40 - 90:42
    Seems relevant.
  • 90:42 - 90:43
    Again, I do not speak Bulgarian.
  • 90:43 - 90:50
    But I can go to the Bulgarian
    Wikipedia through into Wiki.
  • 90:50 - 90:52
    This is this gentleman.
  • 90:52 - 90:55
    And I could find--
  • 90:55 - 90:59
    I can read Cyrillic so
    I could easily find--
  • 90:59 - 91:00
    when I say easily--
  • 91:03 - 91:06
    when I say easily--
  • 91:06 - 91:13
    maybe not so easy, but
    I can search for it.
  • 91:21 - 91:22
    Here we go.
  • 91:22 - 91:25
    Tsirk Bulgaria.
  • 91:25 - 91:28
    That is the name of the book.
  • 91:28 - 91:29
    Tsirk, as in circus.
  • 91:29 - 91:30
    No problem.
  • 91:30 - 91:33
    So I just copy this right here.
  • 91:35 - 91:38
    And I go back to my new item.
  • 91:38 - 91:46
    My new item, which is here,
    and I edit the Bulgarian field.
  • 91:48 - 91:50
    And here it is.
  • 91:50 - 91:51
    Awesome.
  • 91:51 - 91:51
    All right.
  • 91:51 - 91:55
    But I still haven't told
    Wikidata anything about this.
  • 91:55 - 91:57
    I know I'm talking about a book.
  • 91:57 - 91:59
    Wikidata that doesn't
    know that yet.
  • 91:59 - 92:03
    So let's start by
    adding some statements.
  • 92:03 - 92:05
    First of all, I click Add.
  • 92:05 - 92:07
    Wikidata sensibly
    says, how about we
  • 92:07 - 92:09
    start with instance of.
  • 92:09 - 92:11
    Tell me what kind of animal--
    no, not kind of animal.
  • 92:11 - 92:14
    What kind of thing are you
    trying to describe here?
  • 92:14 - 92:18
    Well it's an instance of a book.
  • 92:18 - 92:21
    Not in Hebrew, please.
  • 92:21 - 92:22
    So it's an instance of a book.
  • 92:22 - 92:24
    I could even be a
    little more specific
  • 92:24 - 92:32
    and say it's an instance of
    a short story collection.
  • 92:32 - 92:35
    There we go, short
    story collection.
  • 92:35 - 92:37
    I hit Save.
  • 92:37 - 92:37
    Awesome.
  • 92:37 - 92:40
    So now we know what
    kind of thing it is.
  • 92:40 - 92:43
    It's not a human, it's not a
    mountain, it's not a concept.
  • 92:43 - 92:45
    It's a short story collection.
  • 92:45 - 92:46
    Now I can add some other things.
  • 92:46 - 92:49
    See, Wikidata is
    already working for me.
  • 92:49 - 92:51
    Because it's a short
    story collection
  • 92:51 - 92:54
    it's offering me to populate
    these properties, and not
  • 92:54 - 92:55
    other ones.
  • 92:55 - 92:57
    Publication date,
    original language,
  • 92:57 - 93:00
    genre, country of origin,
    these are all relevant, right?
  • 93:00 - 93:04
    So let's start with original
    language of the work
  • 93:04 - 93:07
    is Bulgarian.
  • 93:07 - 93:10
    Not Bulgaria, Bulgarian.
  • 93:10 - 93:12
    This is the item I want to link.
  • 93:12 - 93:22
    Hit Save, and whatever.
  • 93:22 - 93:23
    Author.
  • 93:23 - 93:27
    Let's identify the author.
  • 93:27 - 93:29
    So the author, the main
    creator of the work,
  • 93:29 - 93:32
    is that gentleman Deyan Enev.
  • 93:32 - 93:35
    And remember, he has
    a Wikipedia article.
  • 93:35 - 93:37
    He also has a Wikidata entity.
  • 93:37 - 93:40
    So Wikidata does know about him.
  • 93:40 - 93:49
    So I hit Save, and I can add
    something about the translator.
  • 93:53 - 93:54
    And what was that lady's name?
  • 93:58 - 94:00
    Kapka Kassabova.
  • 94:00 - 94:05
    Now it so happens that Wikidata
    already knows about this lady.
  • 94:08 - 94:09
    See?
  • 94:09 - 94:12
    So I can just start typing
    and then just link to it.
  • 94:12 - 94:13
    Awesome.
  • 94:13 - 94:14
    But what if it didn't?
  • 94:14 - 94:16
    What if it was translated
    by someone who isn't
  • 94:16 - 94:18
    already covered on Wikidata?
  • 94:18 - 94:22
    Well I could just type
    the name as a string,
  • 94:22 - 94:26
    but ideally I could
    create a Wikidata entity
  • 94:26 - 94:29
    about this translator so
    that there is a possibility
  • 94:29 - 94:30
    to link to her.
  • 94:34 - 94:37
    Now I might actually
    add a qualifier here
  • 94:37 - 94:40
    because, she's not the
    translator of the book, right?
  • 94:40 - 94:44
    She's the translator of
    the book into English.
  • 94:44 - 94:44
    Right.
  • 94:44 - 94:50
    So the language that she
    translated into is English.
  • 94:50 - 94:51
    Right?
  • 94:51 - 94:54
    This book-- remember
    I'm describing the book.
  • 94:54 - 94:55
    The item is about the book.
  • 94:55 - 94:57
    So the book would have
    a different translator
  • 94:57 - 94:59
    into Polish.
  • 94:59 - 95:02
    So this is an example of
    a property or a statement
  • 95:02 - 95:06
    that doesn't make sense without
    one of those qualifiers.
  • 95:06 - 95:08
    It's just not correct.
  • 95:08 - 95:11
    It doesn't make sense to
    say that translator is.
  • 95:11 - 95:15
    The English translator, or
    even this English translator.
  • 95:15 - 95:18
    In 50 years maybe there would
    be an additional English
  • 95:18 - 95:19
    translation.
  • 95:19 - 95:25
    So that's an example of
    needing that qualifier.
  • 95:25 - 95:27
    And of course I could go on
    and populate the other fields.
  • 95:27 - 95:30
    We don't have to
    do that right now.
  • 95:30 - 95:33
    Publication date, country
    of origin, et cetera.
  • 95:33 - 95:35
    So this is already beginning
    to look like all those items
  • 95:35 - 95:38
    that we already saw, but just
    a moment ago it didn't exist.
  • 95:38 - 95:44
    Just a moment ago Wikidata
    had no concept of this work.
  • 95:44 - 95:46
    This happens to be one
    of his notable works.
  • 95:46 - 95:52
    So I could actually go to the
    item about Deyan Enev which
  • 95:52 - 95:56
    has all this information
    already, occupation, languages,
  • 95:56 - 95:59
    and add a property.
  • 95:59 - 96:01
    Remember, I'm not
    limited to these.
  • 96:01 - 96:06
    I can add a property
    called notable works,
  • 96:06 - 96:09
    and mention my new item.
  • 96:09 - 96:12
    Circus Bulgaria.
  • 96:12 - 96:13
    See?
  • 96:13 - 96:15
    My new item is
    showing up, and thanks
  • 96:15 - 96:19
    to this description that I
    wrote, short story collection,
  • 96:19 - 96:23
    it's already appearing here in
    the dropdown very conveniently.
  • 96:23 - 96:24
    So I linked to this.
  • 96:24 - 96:25
    I hit Save.
  • 96:29 - 96:32
    Ideally again I should find
    some references showing
  • 96:32 - 96:35
    that this is a
    notable work by him,
  • 96:35 - 96:37
    but we won't spend
    time on that right now.
  • 96:37 - 96:39
    But the point is we
    created a new item.
  • 96:39 - 96:40
    We populated it a little bit.
  • 96:40 - 96:44
    We linked to it so that it's
    more discoverable by mentioning
  • 96:44 - 96:48
    it in the author name, and
    of course the book item
  • 96:48 - 96:51
    itself mentions the author
    and links to the author.
  • 96:51 - 96:53
    So that's all good.
  • 96:53 - 96:58
    One last thing we shall do is
    give it some useful identifier
  • 96:58 - 97:03
    so let's add, say, the
    Library of Congress record
  • 97:03 - 97:04
    for this book.
  • 97:04 - 97:04
    OK.
  • 97:04 - 97:08
    So I have prepared
    this in advance.
  • 97:08 - 97:09
    Ooh.
  • 97:09 - 97:13
    Just in time, with 80 seconds to
    go before it's giving up on me.
  • 97:13 - 97:14
    Oh it has already
    given up on me.
  • 97:14 - 97:15
    That is very unfortunate.
  • 97:23 - 97:29
    So I go to the Library of
    Congress and I find this book.
  • 97:29 - 97:33
    I find this entry, right?
  • 97:33 - 97:37
    In the Library of Congress
    database about this book.
  • 97:37 - 97:39
    And it has a permalink.
  • 97:39 - 97:43
    It has a kind of guaranteed
    to be permanent link.
  • 97:43 - 97:48
    I can just copy that link,
    go back to my little book,
  • 97:48 - 97:56
    and say the Library of Congress.
  • 97:56 - 98:01
    Yeah, LCCN, that's what they
    call their IDs, the call
  • 98:01 - 98:02
    number.
  • 98:02 - 98:07
    And I paste it here.
  • 98:07 - 98:08
    I actually don't need the URL.
  • 98:08 - 98:09
    I need just a number.
  • 98:12 - 98:14
    And there we go.
  • 98:14 - 98:17
    I have added it,
    and now Wikidata
  • 98:17 - 98:21
    knows how to find bibliographic
    information about this book.
  • 98:21 - 98:25
    And any re-user of
    Wikidata, some program,
  • 98:25 - 98:29
    some tool that connects
    books to authors
  • 98:29 - 98:33
    or does statistical analysis or
    whatever, some future yet to be
  • 98:33 - 98:35
    imagined tool
    could automatically
  • 98:35 - 98:39
    find additional metadata on the
    Library of Congress site thanks
  • 98:39 - 98:42
    to this connection
    that I just made.
  • 98:42 - 98:44
    And of course I could
    add many other IDs
  • 98:44 - 98:46
    to other catalogs
    around the world,
  • 98:46 - 98:48
    and we won't do that right now.
  • 98:48 - 98:52
    You can see that it's now
    showing up under identifiers.
  • 98:52 - 98:56
    So this is how we created
    a brand new piece of data.
  • 98:56 - 99:00
    Questions about this,
    about creating new items?
  • 99:18 - 99:19
    Yeah, all right.
  • 99:19 - 99:26
    So we've seen how to contribute
    to Wikidata on our own,
  • 99:26 - 99:26
    kind of through--
  • 99:26 - 99:28
    directly through Wikidata.
  • 99:31 - 99:35
    Now you may you may be
    thinking, but Asaf, this
  • 99:35 - 99:40
    sounds like a ton
    of work recording
  • 99:40 - 99:44
    all of these little tiny bits of
    information about every person
  • 99:44 - 99:47
    and every book and every town.
  • 99:47 - 99:51
    And if you think that
    you would be correct.
  • 99:51 - 99:53
    That is a ton of work.
  • 99:53 - 99:55
    It's a lot of work.
  • 99:55 - 100:00
    However, it is centralized, so
    it is reusable on other wikis
  • 100:00 - 100:04
    and we will show in just a
    moment how we pull information
  • 100:04 - 100:07
    from Wikidata into
    Wikipedia or other projects.
  • 100:11 - 100:14
    We will show that
    in just a moment.
  • 100:14 - 100:19
    But here's an
    awesome little game
  • 100:19 - 100:23
    that we Wikidata
    volunteer, Magnis Monska,
  • 100:23 - 100:31
    has authored called the
    Wikidata game, in which he
  • 100:31 - 100:32
    tricks people--
  • 100:32 - 100:36
    sorry, helps people
    make contributions
  • 100:36 - 100:42
    to Wikidata in a very,
    very easy and pleasant way.
  • 100:42 - 100:44
    Let's look at the Wikidata game.
  • 100:44 - 100:48
    So the first thing you need
    to do in that Wikidata game
  • 100:48 - 100:51
    is to log in,
    because the Wikidata
  • 100:51 - 100:53
    game makes edits in your name.
  • 100:53 - 100:55
    So we need to authorize it.
  • 100:55 - 100:57
    It's perfectly safe.
  • 100:57 - 101:01
    And after you do that you
    can go to the Wikidata game.
  • 101:01 - 101:02
    So this is the game.
  • 101:02 - 101:04
    Now I'm logged in.
  • 101:04 - 101:05
    And the Wikidata game
    actually includes
  • 101:05 - 101:07
    a number of different games.
  • 101:07 - 101:09
    Let's start with a person game.
  • 101:09 - 101:14
    So Wikidata shows you--
  • 101:14 - 101:21
    shows you an item, and asks
    you a very simple question.
  • 101:21 - 101:23
    Person, or not a person?
  • 101:26 - 101:31
    So Wikidata goes through
    Wikidata entities
  • 101:31 - 101:36
    that don't even have the
    instance of property.
  • 101:36 - 101:38
    Which is why Wikidata
    doesn't know,
  • 101:38 - 101:41
    literally doesn't know, if this
    is a person, or a mountain,
  • 101:41 - 101:44
    or a city, or a country,
    or anything else.
  • 101:44 - 101:47
    So it asks you, because this
    is the kind of question that
  • 101:47 - 101:50
    Wikidata cannot
    decide on its own,
  • 101:50 - 101:55
    but for us humans it's generally
    trivial to be able to say
  • 101:55 - 101:58
    whether something that we're
    looking at is a person or not.
  • 101:58 - 102:04
    It gets slightly trickier when
    the information is in Javanese,
  • 102:04 - 102:06
    as it is here,
    rather than English.
  • 102:06 - 102:10
    So this item happens to
    be described in Javanese.
  • 102:10 - 102:14
    My Javanese, spoken in
    Indonesia, is very weak.
  • 102:14 - 102:20
    However, I can tell that
    this is not a person.
  • 102:20 - 102:21
    How can I tell?
  • 102:21 - 102:23
    Without understanding
    a word of Japanese
  • 102:23 - 102:26
    I see that it mentions
    1000 kilometers
  • 102:26 - 102:29
    and square kilometers, see?
  • 102:29 - 102:33
    So this is about a
    place, or an area,
  • 102:33 - 102:36
    or a region, or whatever,
    but not a person.
  • 102:36 - 102:39
    So this is an
    example of how even
  • 102:39 - 102:41
    without understanding
    language you can sometimes
  • 102:41 - 102:42
    make a determination.
  • 102:42 - 102:45
    However, of course,
    you should be sure.
  • 102:45 - 102:48
    This is definitely not
    what the Wikipedia article
  • 102:48 - 102:49
    about a person looks like.
  • 102:49 - 102:50
    So this is not a person.
  • 102:50 - 102:53
    I just click it and I'm
    shown the next item.
  • 102:57 - 103:00
    This item is in another
    language I do not speak,
  • 103:00 - 103:01
    and I just don't know.
  • 103:01 - 103:04
    I do not know if this is
    about a person or not.
  • 103:04 - 103:07
    So I click Not Sure.
  • 103:07 - 103:11
    This is in Swedish, and
    it's about Sulawesi, still
  • 103:11 - 103:14
    Indonesia.
  • 103:14 - 103:17
    And it is not about a person.
  • 103:17 - 103:18
    I have enough Swedish for that.
  • 103:18 - 103:22
    So I click not a person.
  • 103:22 - 103:24
    Now, you may say,
    well, do I really
  • 103:24 - 103:28
    have to deal with all these
    languages that I don't speak?
  • 103:28 - 103:29
    The answer is no.
  • 103:29 - 103:31
    You don't have to.
  • 103:31 - 103:33
    Here at the bottom
    of the Wikidata game
  • 103:33 - 103:34
    there are settings.
  • 103:34 - 103:38
    You can click that
    and tell Wikidata,
  • 103:38 - 103:42
    I cannot even read
    Chinese or Japanese,
  • 103:42 - 103:45
    so please don't show me
    items in those languages.
  • 103:45 - 103:47
    Because I wouldn't
    even be able to guess.
  • 103:47 - 103:50
    I prefer these languages in
    which I can relatively easily
  • 103:50 - 103:51
    make determinations.
  • 103:51 - 103:55
    And I can even tell Wikidata to
    only show me these languages.
  • 103:55 - 103:55
    You see?
  • 103:55 - 103:57
    This was not selected,
    which is why I
  • 103:57 - 104:01
    was shown some other languages.
  • 104:01 - 104:04
    I could say, only use
    these languages, and save.
  • 104:04 - 104:06
    And now I can try
    this game again.
  • 104:06 - 104:08
    However, that can
    slow it down a little.
  • 104:08 - 104:09
    So here we go.
  • 104:09 - 104:12
    Here's a Spanish-- which
    is one of the languages I
  • 104:12 - 104:15
    told Wikidata game it can use.
  • 104:15 - 104:16
    This is a Spanish item.
  • 104:16 - 104:19
    Now is it about a person or not?
  • 104:22 - 104:23
    It is not about a person.
  • 104:26 - 104:27
    Is it about a person?
  • 104:29 - 104:30
    No.
  • 104:33 - 104:35
    Yes, it is right?
  • 104:35 - 104:39
    Monk Cistercian, Pedro
    de Ovideo Falconi.
  • 104:39 - 104:41
    That sounds like a person.
  • 104:41 - 104:43
    Frau Pedro Nasser.
  • 104:43 - 104:45
    Yeah, he was born
    in Madrid 1577.
  • 104:45 - 104:46
    This is a person.
  • 104:46 - 104:47
    OK.
  • 104:47 - 104:50
    So I click person.
  • 104:50 - 104:52
    Again, if you're not
    sure, click not sure.
  • 104:52 - 104:55
    The point is, just by clicking
    person and as you can see
  • 104:55 - 104:58
    this would work
    very well on mobile,
  • 104:58 - 105:01
    which is why I said you can
    contribute on your commute.
  • 105:01 - 105:04
    You can just hold your
    phone or tablet or whatever,
  • 105:04 - 105:06
    and just tap.
  • 105:06 - 105:07
    Person, not a person.
  • 105:07 - 105:09
    Person, not a person.
  • 105:09 - 105:12
    The amazing thing is that just
    tapping person has actually
  • 105:12 - 105:16
    made an edit to Wikidata
    on my behalf, which
  • 105:16 - 105:22
    I can find out, like every
    wiki, by clicking contributions.
  • 105:22 - 105:24
    And as you can see in addition
    to the stuff about circus
  • 105:24 - 105:28
    Bulgaria, my latest edit is in
    fact about this Pedro de Ovideo
  • 105:28 - 105:30
    Falconi person.
  • 105:30 - 105:32
    And the edit was, you can--
  • 105:32 - 105:38
    I hope you can see this, created
    the claim instance of human.
  • 105:38 - 105:39
    So I added--
  • 105:39 - 105:43
    I mean Wikidata game
    added for me the statement
  • 105:43 - 105:44
    instance of human.
  • 105:44 - 105:48
    Now, the awesome thing is
    that it was super easy to do.
  • 105:48 - 105:52
    I didn't have to go into that
    entity, click the Add button,
  • 105:52 - 105:57
    choose the instance of property,
    choose human, hit Save.
  • 105:57 - 105:59
    Instead of all these
    operations I just
  • 105:59 - 106:04
    tapped on my screen,
    person, not a person.
  • 106:04 - 106:10
    And I can do hundreds of
    edits during my daily commute.
  • 106:10 - 106:12
    There are other games,
    like the gender game.
  • 106:12 - 106:15
    So this is about--
  • 106:15 - 106:17
    this is when Wikidata
    already knows
  • 106:17 - 106:20
    that this item is a
    person, but it doesn't
  • 106:20 - 106:22
    know the gender of this person.
  • 106:22 - 106:25
    Which is another one of
    the more basic items.
  • 106:25 - 106:28
    And this is taking a long
    time because of the language
  • 106:28 - 106:30
    limitations that I set on it.
  • 106:30 - 106:33
    I guess the less exotic
    languages have already
  • 106:33 - 106:35
    been exhausted in the game.
  • 106:35 - 106:37
    We don't have to
    wait all this time.
  • 106:40 - 106:45
    We can try something else.
  • 106:45 - 106:46
    How about occupation?
  • 106:46 - 106:47
    The occupation game.
  • 106:47 - 106:49
    Here we go, this is in Russian.
  • 106:49 - 106:56
    And what is the occupation
    of this gentleman?
  • 106:56 - 106:59
    Well he is an [INAUDIBLE].
  • 106:59 - 107:01
    He's a church person.
  • 107:01 - 107:04
    However, so the
    occupation game is
  • 107:04 - 107:06
    where Wikidata game
    will automatically
  • 107:06 - 107:11
    pull likely occupations
    from the article text
  • 107:11 - 107:14
    and ask for confirmation.
  • 107:14 - 107:17
    So if he-- if this person
    really is a deacon,
  • 107:17 - 107:18
    I should click that.
  • 107:18 - 107:20
    But I'm not sure.
  • 107:20 - 107:25
    I'm not clear on the Russian
    church's distinctions between--
  • 107:25 - 107:27
    I mean [INAUDIBLE]
    is pretty senior,
  • 107:27 - 107:29
    but I don't know if that
    automatically also means
  • 107:29 - 107:30
    he's a deacon or not.
  • 107:30 - 107:33
    And [INAUDIBLE] is
    not listed here.
  • 107:33 - 107:36
    So I will click not listed.
  • 107:36 - 107:40
    Also, these guesses
    are not always correct.
  • 107:40 - 107:43
    So, this guy for
    example, is in Russian.
  • 107:43 - 107:43
    I can read this.
  • 107:43 - 107:44
    He's a philologist.
  • 107:44 - 107:45
    He's a linguist.
  • 107:45 - 107:49
    So I can confirm it
    and click linguist.
  • 107:49 - 107:49
    All right?
  • 107:49 - 107:52
    And again, if we look
    at my contributions
  • 107:52 - 107:56
    we can see the Wikidata
    game on my behalf
  • 107:56 - 108:00
    created occupation linguist.
  • 108:00 - 108:02
    OK.
  • 108:02 - 108:04
    Just by typing linguist there.
  • 108:04 - 108:07
    Now if it's taken
    from the article,
  • 108:07 - 108:10
    why would it ever be wrong?
  • 108:10 - 108:16
    Well Jesus was the
    son of a carpenter.
  • 108:16 - 108:19
    The word carpenter
    appears in the text.
  • 108:19 - 108:23
    That doesn't mean it's correct
    to say Jesus was a carpenter.
  • 108:23 - 108:23
    OK?
  • 108:23 - 108:25
    Just a trivial example, right?
  • 108:25 - 108:30
    So many, many articles will say,
    you know, born to a physician.
  • 108:30 - 108:33
    And so the word physician
    could be guessed,
  • 108:33 - 108:36
    but it wouldn't be correct
    unless the son is also
  • 108:36 - 108:38
    a physician.
  • 108:38 - 108:44
    So I hope it gives
    you the gist of it.
  • 108:44 - 108:48
    There is also a
    distributed Wikidata game,
  • 108:48 - 108:49
    which is pretty awesome.
  • 108:51 - 108:54
    Here we go, which
    has additional games.
  • 108:54 - 109:03
    So, for example, the
    key on game gives you,
  • 109:03 - 109:07
    maybe it gives you,
    some items to play with.
  • 109:17 - 109:17
    Yes?
  • 109:17 - 109:18
    No?
  • 109:18 - 109:18
    OK.
  • 109:18 - 109:21
    So it gives you
    this little card,
  • 109:21 - 109:28
    and asks you to confirm is this
    instance of human settlement?
  • 109:28 - 109:30
    That is, is it a village,
    town, city, whatever.
  • 109:30 - 109:33
    Is it a kind of human
    settlement or not?
  • 109:33 - 109:34
    Or maybe it's a book.
  • 109:34 - 109:36
    Maybe it's a poem.
  • 109:36 - 109:39
    Again, so, is it an
    English settlement?
  • 109:39 - 109:42
    And you can click the languages
    here to see the information.
  • 109:42 - 109:43
    So I can click English.
  • 109:43 - 109:45
    And indeed the article--
  • 109:45 - 109:46
    I mean the actual
    Wikipedia article
  • 109:46 - 109:49
    says Camigji is a
    town and territory
  • 109:49 - 109:51
    in this district in the Congo.
  • 109:51 - 109:55
    So yes, this is an instance
    of human settlement.
  • 109:55 - 109:58
    So I clicked yes.
  • 109:58 - 110:00
    And just clicking yes
    again went to that item,
  • 110:00 - 110:03
    and added property
    of human settlement.
  • 110:03 - 110:06
    Now the point of
    all these games is
  • 110:06 - 110:08
    these are tools,
    written by programmers,
  • 110:08 - 110:12
    making kind of semi educated
    guesses about these fairly
  • 110:12 - 110:14
    basic properties.
  • 110:14 - 110:18
    And they are meant to
    semi automate, to assist,
  • 110:18 - 110:24
    in the accumulation of all
    these important pieces of data.
  • 110:24 - 110:27
    Now every single
    click here helps
  • 110:27 - 110:31
    Wikidata give better
    results, richer results
  • 110:31 - 110:32
    in future queries.
  • 110:32 - 110:38
    Again, as of right now
    Wikidata can include Camigji
  • 110:38 - 110:43
    if I ask it, you know, what
    are some towns in Congo?
  • 110:43 - 110:44
    Until now it could not.
  • 110:44 - 110:47
    Because it literally
    didn't know.
  • 110:47 - 110:52
    So every time we click male,
    female, person, not a person,
  • 110:52 - 110:57
    make these decisions,
    we help improve Wikidata
  • 110:57 - 111:02
    and enrich the results
    that we could receive.
  • 111:02 - 111:05
    Any questions about this, about
    kind of micro contributions
  • 111:05 - 111:07
    through the Wikidata game?
  • 111:07 - 111:10
    If that looks
    appealing I encourage
  • 111:10 - 111:13
    you to go and visit
    the Wikidata game
  • 111:13 - 111:15
    and start contributing
    in that way.
  • 111:20 - 111:22
    There is a question here.
  • 111:22 - 111:25
    If I make an article about
    Circus Bulgaria how should
  • 111:25 - 111:27
    I correctly connect them?
  • 111:27 - 111:29
    That is an excellent question.
  • 111:29 - 111:33
    So once-- so now there is a
    Wikidata item about that book,
  • 111:33 - 111:38
    but there is no Wikipedia
    article anywhere.
  • 111:38 - 111:41
    Now suppose I write one
    in, Bulgarian maybe,
  • 111:41 - 111:43
    you go to Wikidata.
  • 111:43 - 111:45
    You find the item by searching.
  • 111:45 - 111:49
    You find the item, and then
    the empty site links section
  • 111:49 - 111:51
    right at the bottom there--
  • 111:51 - 111:52
    where are we?
  • 111:52 - 111:53
    We have this?
  • 111:53 - 111:55
    Circus Bulgaria.
  • 111:55 - 111:56
    Let's demonstrate this.
  • 111:56 - 111:58
    So here is the item
    about the book.
  • 111:58 - 112:01
    Let's say that now
    there is an article
  • 112:01 - 112:04
    because I just created it.
  • 112:04 - 112:07
    I can go here to the empty
    Wikipedia link section,
  • 112:07 - 112:12
    click Edit, type the
    name of the wiki,
  • 112:12 - 112:16
    let's say English, and then
    type the name of the page
  • 112:16 - 112:18
    that I just created.
  • 112:18 - 112:21
    Circus-- right?
  • 112:21 - 112:23
    And again, it offers
    me auto-complete
  • 112:23 - 112:25
    for my convenience.
  • 112:25 - 112:28
    Now we don't actually
    have the article created,
  • 112:28 - 112:30
    but I could let's just
    say this was the article.
  • 112:30 - 112:33
    I can just click this,
    hit Save, and that
  • 112:33 - 112:36
    would associate the
    new Wikipedia article
  • 112:36 - 112:38
    with this Wikidata item.
  • 112:38 - 112:42
    That is the beginning of the
    inter-wiki list for this item.
  • 112:42 - 112:44
    I will not click
    Save Now, because we
  • 112:44 - 112:45
    didn't have the article yet.
  • 112:45 - 112:47
    So I hope that
    answers that question.
  • 112:47 - 112:50
    Was there another question
    that I missed here?
  • 112:50 - 112:51
    No.
  • 112:51 - 112:53
    OK.
  • 112:53 - 112:55
    Any questions about
    the Wikidata game?
  • 112:55 - 113:01
    About this idea of
    micro contributions?
  • 113:01 - 113:05
    If not then we can move
    on to embedding data,
  • 113:05 - 113:07
    and after that we
    can discuss queries,
  • 113:07 - 113:12
    how to get at all this
    data from Wikidata.
  • 113:12 - 113:16
    So the short version of how
    to embed data from Wikidata
  • 113:16 - 113:20
    is that there is this
    little magic incantation.
  • 113:20 - 113:25
    Curly brace, curly brace,
    hash mark, property.
  • 113:25 - 113:30
    It looks like a template, but
    it isn't because of that hash.
  • 113:30 - 113:31
    And that is magic.
  • 113:31 - 113:34
    Take a look at this little
    demo that I prepared.
  • 113:34 - 113:38
    This page, which is off
    my user page on meta,
  • 113:38 - 113:40
    but it could be on any wiki.
  • 113:40 - 113:42
    OK.
  • 113:42 - 113:49
    Says, since San Francisco
    is item Q62 in Wikidata,
  • 113:49 - 113:55
    and since population is
    property P1082, I can tell you
  • 113:55 - 113:59
    that according to Wikidata the
    population of San Francisco
  • 113:59 - 114:02
    is this.
  • 114:02 - 114:08
    And this bolded number here was
    produced with this incantation.
  • 114:08 - 114:14
    Curly brace, curly brace,
    hash mark, property P1082,
  • 114:14 - 114:19
    that's population,
    type from what item?
  • 114:19 - 114:19
    Right?
  • 114:19 - 114:22
    Cause I'm pulling
    an arbitrary number.
  • 114:22 - 114:24
    I could put any
    property in any item
  • 114:24 - 114:27
    here, and kind of include
    it, embedded, into my text.
  • 114:27 - 114:30
    This isn't even about-- you
    notice this is my user page.
  • 114:30 - 114:32
    This isn't even the article
    about San Francisco.
  • 114:32 - 114:35
    I just want to pull that
    number into this thing
  • 114:35 - 114:36
    that I'm writing.
  • 114:36 - 114:39
    So it's fairly simple.
  • 114:39 - 114:41
    I identify the property.
  • 114:41 - 114:43
    I identify the item
    to take it from.
  • 114:43 - 114:47
    And Wikidata will,
    I mean Wikipedia,
  • 114:47 - 114:50
    or the wiki I'm on, in this
    case meta, will go to Wikipedia
  • 114:50 - 114:53
    and fetch it for me.
  • 114:53 - 114:56
    Likewise, since Denny Vrandecic,
    the designer of Wikidata
  • 114:56 - 115:01
    is item 18618629, right?
  • 115:01 - 115:05
    I mean, he's a notable person,
    so he has a Wikidata entity.
  • 115:05 - 115:09
    And since occupation is property
    106, and date of birth is 569,
  • 115:09 - 115:12
    and place of birth
    is 19, because
  • 115:12 - 115:15
    of all that I can tell you
    that Vrandecic was born
  • 115:15 - 115:19
    in Stuttgart, on this date,
    and is researcher, programmer,
  • 115:19 - 115:21
    and computer scientist.
  • 115:21 - 115:25
    If you look at the source for
    this page, click Edit Source,
  • 115:25 - 115:29
    you can see that the word
    Stuttgart does not appear here,
  • 115:29 - 115:31
    because it came from Wikidata.
  • 115:31 - 115:34
    I did not write this into
    my little demo page here.
  • 115:34 - 115:35
    See?
  • 115:35 - 115:37
    Place of birth is--
  • 115:37 - 115:38
    where is it?
  • 115:38 - 115:38
    Here.
  • 115:38 - 115:44
    Born in property 19 from
    queue number so-and-so.
  • 115:44 - 115:47
    That is how easy
    it is to pull stuff
  • 115:47 - 115:52
    into a wiki from Wikidata.
  • 115:52 - 115:55
    OK now there's
    some nuance to it.
  • 115:55 - 115:57
    And there's there are
    some additional parameters
  • 115:57 - 115:58
    you can give.
  • 115:58 - 116:00
    And you can ask
    Wikidata to give you
  • 116:00 - 116:04
    not just the text of the values,
    but actually make it links.
  • 116:07 - 116:15
    So, for example, if I change
    this from property to values--
  • 116:26 - 116:29
    No, that did not work at all.
  • 116:29 - 116:30
    Wasn't it values?
  • 116:30 - 116:30
    What was it?
  • 116:33 - 116:35
    Values and then--
  • 117:19 - 117:20
    Oh, statements.
  • 117:20 - 117:21
    My bad, sorry.
  • 117:21 - 117:23
    The Magic word is statements.
  • 117:23 - 117:24
    Statements.
  • 117:24 - 117:29
    So going back here.
  • 117:29 - 117:35
    If I change the word property
    to the word statements
  • 117:35 - 117:41
    here then this same value--
  • 117:41 - 117:43
    that did not work at all.
  • 117:43 - 117:47
    Oh, because I'm on meta.
  • 117:47 - 117:49
    So because I'm on
    meta, meta doesn't
  • 117:49 - 117:52
    have an article named
    researcher, programmer,
  • 117:52 - 117:54
    or computer scientist.
  • 117:54 - 117:55
    But Wikipedia does.
  • 117:55 - 118:00
    If I included this same
    syntax in Wikipedia,
  • 118:00 - 118:03
    like English Wikipedia,
    for example--
  • 118:03 - 118:05
    So let's go there right now.
  • 118:11 - 118:13
    And go-- go to my--
  • 118:19 - 118:19
    Go to my sandbox.
  • 118:23 - 118:28
    If I just brutally paste
    this on my sandbox here--
  • 118:33 - 118:36
    So, see, these became links.
  • 118:36 - 118:40
    Because Wikipedia has an article
    called programmer and computer
  • 118:40 - 118:41
    scientist.
  • 118:41 - 118:43
    So, like I said, there's
    some additional nuance
  • 118:43 - 118:45
    to the embedding.
  • 118:45 - 118:47
    The important thing
    is that this is
  • 118:47 - 118:51
    the key to delivering on that
    first problem that I mentioned.
  • 118:51 - 118:56
    How to get data from
    a central location
  • 118:56 - 118:59
    onto your wiki in your language.
  • 118:59 - 119:04
    Basically using property and
    statements magic incantations.
  • 119:04 - 119:07
    And of course,
    usually, this would be
  • 119:07 - 119:10
    in the context of an info box.
  • 119:10 - 119:14
    Some wikis-- English Wikipedia
    is not leading the way there.
  • 119:14 - 119:16
    Some smaller wikis
    are more advanced
  • 119:16 - 119:22
    actually in integrating
    Wikidata embeddings like this
  • 119:22 - 119:25
    into their info boxes.
  • 119:25 - 119:26
    So that instead of
    the info box just
  • 119:26 - 119:31
    being a template on the wiki
    with field equals value,
  • 119:31 - 119:32
    field equals value.
  • 119:32 - 119:36
    That template of the
    info box on the wiki
  • 119:36 - 119:40
    pulls the values, the birthdate,
    the languages, et cetera,
  • 119:40 - 119:44
    pulls them from Wikidata.
  • 119:44 - 119:50
    So basically just-- I just
    demonstrated single calls
  • 119:50 - 119:53
    to this, but of course
    an info box template
  • 119:53 - 119:56
    would include maybe
    20 or 40 such embeds,
  • 119:56 - 119:58
    and that is not a problem.
  • 119:58 - 120:01
    Of course, before you go and
    edit the English Wikipedia's
  • 120:01 - 120:06
    info box person and replace
    it all with Wikidata embeds,
  • 120:06 - 120:09
    you should discuss it with the
    English Wikipedia community.
  • 120:09 - 120:12
    These discussions have
    already been taking place.
  • 120:12 - 120:14
    There are some
    concerns about how
  • 120:14 - 120:17
    to patrol this, how to keep
    it newbie friendly, et cetera.
  • 120:17 - 120:21
    So there are legitimate concerns
    with just moving everything
  • 120:21 - 120:23
    to be embedded from Wikidata.
  • 120:23 - 120:26
    But the communities are
    gradually handling this.
  • 120:26 - 120:29
    I mean this ability to embed
    from Wikidata is not very old.
  • 120:29 - 120:32
    It's been around
    for about a year.
  • 120:32 - 120:35
    So communities are
    still working on kind
  • 120:35 - 120:38
    of integrating that technology.
  • 120:38 - 120:40
    But that is that is kind
    of just the basics of how
  • 120:40 - 120:44
    to pull data, individual bits
    of data, that's not querying,
  • 120:44 - 120:47
    that's not asking those sweeping
    questions that I was talking
  • 120:47 - 120:49
    about yet.
  • 120:49 - 120:51
    We'll get to that
    right now this is
  • 120:51 - 120:55
    how to pull a specific datum,
    a specific piece of data,
  • 120:55 - 120:57
    from Wikidata.
  • 121:02 - 121:03
    OK.
  • 121:03 - 121:07
    So here's another quick
    thing to demonstrate
  • 121:07 - 121:10
    before we go to
    queries, and that
  • 121:10 - 121:12
    is the article placeholder.
  • 121:12 - 121:15
    The article placeholder
    is a feature
  • 121:15 - 121:20
    that is being tested on the
    Esperanto Wikipedia, and maybe
  • 121:20 - 121:22
    another wiki, I don't remember.
  • 121:22 - 121:28
    And it is using the
    potential of Wikidata
  • 121:28 - 121:33
    to offer a placeholder
    for an article.
  • 121:33 - 121:38
    An automatically generated
    Wikidata powered replacement
  • 121:38 - 121:42
    placeholder for an article
    for articles that don't yet
  • 121:42 - 121:46
    exist on Esperanto.
  • 121:46 - 121:50
    So let's go to the
    Esperanto Wikipedia.
  • 121:50 - 121:52
    I don't speak Esperanto.
  • 121:52 - 121:57
    But let's look for Helen
    Dewitt, our friend,
  • 121:57 - 121:58
    in Esperanto Wikipedia.
  • 121:58 - 122:00
    Now Esperanto is not
    one of the Wikipedias
  • 122:00 - 122:03
    that have an article
    about Helen Dewitt.
  • 122:03 - 122:05
    And so it tells me that, right?
  • 122:05 - 122:07
    There is no Helen Dewitt.
  • 122:07 - 122:09
    Maybe you were looking
    for Helena Dewitt.
  • 122:09 - 122:10
    No, I was not.
  • 122:10 - 122:14
    You can start an article
    about Helen Dewitt.
  • 122:14 - 122:15
    You can search.
  • 122:15 - 122:18
    You know, there's
    all this stuff.
  • 122:18 - 122:24
    But there is also this
    little option here, hiding,
  • 122:24 - 122:31
    which tells me that the
    Esperanto Wikipedia is--
  • 122:31 - 122:32
    what's happening here?
  • 122:35 - 122:36
    Yes.
  • 122:36 - 122:41
    The Esperanto Wikipedia is
    ready to give me this page.
  • 122:41 - 122:44
    This page, as you can see, it's
    on the Esperanto Wikipedia,
  • 122:44 - 122:46
    but it's not an article.
  • 122:46 - 122:47
    See, it's a special page.
  • 122:47 - 122:50
    It's machine generated.
  • 122:50 - 122:52
    You can see the URL as well.
  • 122:52 - 122:54
    It's not, you know,
    slash Helen Dewitt.
  • 122:54 - 122:58
    It's slash specialio,
    about topic,
  • 122:58 - 123:02
    and then the Wikidata
    ID of Helen Dewitt.
  • 123:02 - 123:04
    And what I get here--
  • 123:04 - 123:06
    I get an English
    description, by the way,
  • 123:06 - 123:08
    because there is no
    Esperanto description.
  • 123:08 - 123:10
    Wikidata can't make it up.
  • 123:10 - 123:14
    But what it can do is
    offer me these pieces
  • 123:14 - 123:17
    of data in my language,
    in this case Esperanto.
  • 123:17 - 123:19
    I'm on the Esperanto Wikipedia.
  • 123:19 - 123:19
    OK.
  • 123:19 - 123:23
    So it tells me that she's
    American, for example,
  • 123:23 - 123:26
    and it tells me
    that in Esperanto.
  • 123:26 - 123:29
    OK and it tells me
    that she speaks Latin.
  • 123:29 - 123:32
    Remember we taught
    Wikidata that?
  • 123:32 - 123:36
    It tells me that she
    was educated in Oxford,
  • 123:36 - 123:38
    you know, and gives me the
    references to the extent
  • 123:38 - 123:39
    that they exist.
  • 123:39 - 123:42
    I mean this is not an article.
  • 123:42 - 123:47
    It's not, you know, paragraphs
    of fluent Esperanto text.
  • 123:47 - 123:50
    But it is information
    that I can understand
  • 123:50 - 123:52
    if I speak this language.
  • 123:52 - 123:55
    And it's better than nothing.
  • 123:55 - 124:00
    And remember Helen Dewitt was
    not a very detailed article.
  • 124:00 - 124:04
    If I were to ask about, I
    don't know, some politician,
  • 124:04 - 124:08
    or popular singer that
    has more data in Wikidata,
  • 124:08 - 124:13
    than this machine generated
    thing would have been richer.
  • 124:13 - 124:16
    So this feature is available
    and is under beta testing
  • 124:16 - 124:20
    right now, but generally if
    this sounds interesting for you
  • 124:20 - 124:22
    especially if you come
    from a smaller wiki that
  • 124:22 - 124:25
    is missing a lot of articles
    that people may want to learn
  • 124:25 - 124:28
    about, you can contact
    the Wikimedia foundation
  • 124:28 - 124:33
    and ask for article placeholder
    to be enabled on your wiki.
  • 124:33 - 124:35
    And again, this
    is a placeholder.
  • 124:35 - 124:38
    Of course, it exists only
    until someone actually
  • 124:38 - 124:43
    writes a proper Esperanto
    article about Helen Dewitt.
  • 124:43 - 124:45
    So I hope this is clear.
  • 124:45 - 124:51
    This is all coming from
    Wikidata on the fly.
  • 124:51 - 124:51
    In real time.
  • 124:51 - 124:58
    As you can see it includes my
    latest edits to Helen Dewitt.
  • 124:58 - 124:59
    OK.
  • 124:59 - 125:05
    Questions about the-- questions
    about the article placeholder?
  • 125:05 - 125:10
    If there are try and
    put them on the channel.
  • 125:10 - 125:13
    And this brings us to one of
    the main courses of this talk,
  • 125:13 - 125:15
    which is querying Wikidata.
  • 125:15 - 125:19
    So I've explained
    how Wikidata works.
  • 125:19 - 125:20
    We've walked through it.
  • 125:20 - 125:21
    We've added to it.
  • 125:21 - 125:23
    We've created a new item.
  • 125:23 - 125:26
    We learned how to contribute
    during our commutes.
  • 125:26 - 125:30
    And all this was you
    kept promising us,
  • 125:30 - 125:32
    Asaf, that this would be--
  • 125:32 - 125:35
    this would enable
    these amazing queries.
  • 125:35 - 125:38
    So time to make good on that.
  • 125:38 - 125:43
    The URL you need to remember
    is query.wikidata.org.
  • 125:43 - 125:49
    And that will take you
    to a query system that
  • 125:49 - 125:53
    uses a language called SPARQL.
  • 125:53 - 125:58
    SPARQL, spelt with
    a Q. This language
  • 125:58 - 126:02
    is not a Wikimedia creation.
  • 126:02 - 126:06
    It's a standardized language
    used for querying linked data
  • 126:06 - 126:08
    sources.
  • 126:08 - 126:11
    And because of that
    there are there
  • 126:11 - 126:15
    are certain usability prices
    that we pay for using SPARQL,
  • 126:15 - 126:16
    for using a standard language.
  • 126:16 - 126:20
    It's not completely custom
    made for querying Wikidata,
  • 126:20 - 126:22
    and we'll see that
    in just a moment.
  • 126:22 - 126:24
    The principle to
    remember about Wikidata
  • 126:24 - 126:28
    query is that Wikidata will
    tell you everything it knows,
  • 126:28 - 126:29
    but no more.
  • 126:29 - 126:32
    I have anticipated this
    several times already, right?
  • 126:32 - 126:36
    Until this moment when
    we taught Wikidata data
  • 126:36 - 126:39
    that Helen Dewitt
    speaks Latin, she
  • 126:39 - 126:42
    would not have appeared
    in query results
  • 126:42 - 126:46
    asking who are American
    writers who speak Latin?
  • 126:46 - 126:47
    She would not have appeared.
  • 126:47 - 126:49
    But as of this
    afternoon, she will
  • 126:49 - 126:53
    appear because I've added
    that piece of information.
  • 126:53 - 127:01
    So a result of that principle
    is that you can never say,
  • 127:01 - 127:06
    well I ran a Wikidata
    query and this
  • 127:06 - 127:12
    is the list of Flemish painters
    who are sons of painters.
  • 127:12 - 127:12
    The list.
  • 127:12 - 127:14
    That these are all
    the Flemish painters
  • 127:14 - 127:15
    who are sons of painters.
  • 127:15 - 127:19
    That is never something you can
    say based on a Wikidata query,
  • 127:19 - 127:22
    because of course, maybe
    not all the Flemish painters
  • 127:22 - 127:26
    who are sons of painters have
    been expressed in Wikidata data
  • 127:26 - 127:27
    yet.
  • 127:27 - 127:29
    Wikidata doesn't know
    about some of them,
  • 127:29 - 127:30
    or maybe it knows
    about all of them
  • 127:30 - 127:32
    but doesn't know
    the important fact
  • 127:32 - 127:35
    that this person is
    the son of that person,
  • 127:35 - 127:39
    because those properties
    have not been added.
  • 127:39 - 127:41
    And so they cannot be
    included in the results.
  • 127:41 - 127:43
    So the results of
    a Wikidata query
  • 127:43 - 127:47
    are never the definitive sets.
  • 127:47 - 127:50
    What you can say about
    a Wikidata query is here
  • 127:50 - 127:53
    are some Flemish painters
    who are sons of painters.
  • 127:53 - 127:56
    Here are some cities
    with female mayors.
  • 127:56 - 127:58
    Whatever it is
    you're querying about
  • 127:58 - 128:01
    is never guaranteed
    to be complete
  • 128:01 - 128:04
    because Wikidata,
    like Wikipedia, is
  • 128:04 - 128:06
    a work in progress.
  • 128:06 - 128:13
    And of course, the more
    we teach Wikidata the
  • 128:13 - 128:16
    more useful it becomes.
  • 128:16 - 128:23
    OK so lets go and
    see those queries.
  • 128:23 - 128:26
    So this is query.wikidata.org.
  • 128:26 - 128:29
    It's not the wiki.
  • 128:29 - 128:30
    All right?
  • 128:30 - 128:33
    So this isn't like some
    page on the wiki itself.
  • 128:33 - 128:35
    This is kind of an
    external system.
  • 128:35 - 128:36
    So it's not a wiki.
  • 128:36 - 128:38
    You can see I don't
    have a user page here.
  • 128:38 - 128:40
    I don't have a history tab.
  • 128:40 - 128:41
    This isn't a wiki page.
  • 128:41 - 128:45
    This is a special kind
    of tool or system.
  • 128:45 - 128:51
    And it invites me to
    input a SPARQL query.
  • 128:51 - 128:55
    Now most of us do
    not speak SPARQL.
  • 128:55 - 129:00
    It's a a technical language.
  • 129:00 - 129:02
    It's a query language.
  • 129:02 - 129:07
    Some of you may be thinking
    about SQL, the database query
  • 129:07 - 129:08
    language.
  • 129:08 - 129:13
    SPARQL is named with kind
    of a wink, or a nod, to SQL.
  • 129:13 - 129:17
    But, I warn you, if
    you are comfortable in
  • 129:17 - 129:23
    SQL don't expect to carry
    over your knowledge of SQL
  • 129:23 - 129:24
    into SPARQL.
  • 129:24 - 129:26
    They're not the same.
  • 129:26 - 129:28
    They are superficially similar.
  • 129:28 - 129:28
    Right?
  • 129:28 - 129:32
    So they both use
    the keyword select,
  • 129:32 - 129:35
    and they use the word where,
    and they use things like limit,
  • 129:35 - 129:36
    and order.
  • 129:36 - 129:38
    So again, if you know
    this already from SQL
  • 129:38 - 129:40
    those mean roughly
    the same things,
  • 129:40 - 129:45
    but don't expect it to
    behave just like SQL.
  • 129:45 - 129:50
    You do need to spend some time
    understanding how SPARQL works.
  • 129:50 - 129:53
    So, by all means, I
    invite you to go and read
  • 129:53 - 129:56
    one of the many fine
    SPARQL tutorials that
  • 129:56 - 130:00
    are out there on the web, or
    to click the Help button here,
  • 130:00 - 130:04
    which also includes
    help about SPARQL.
  • 130:04 - 130:08
    But I also know
    that most of us when
  • 130:08 - 130:13
    we want to do some advanced
    formatting on wiki,
  • 130:13 - 130:16
    for example, we don't go
    and read the help page
  • 130:16 - 130:18
    on templates, right?
  • 130:18 - 130:21
    We go to a page that already
    does what we want to do,
  • 130:21 - 130:27
    and adopt and adapt the code
    from that other page, right?
  • 130:27 - 130:31
    So we just take something that
    does roughly what we want,
  • 130:31 - 130:33
    and just copy it over and
    change what we need to change.
  • 130:33 - 130:36
    That is a very pragmatic
    and reasonable way
  • 130:36 - 130:37
    to do things which is why--
  • 130:37 - 130:40
    and the wiki data
    engineers know this,
  • 130:40 - 130:43
    which is why they prepared
    this very handy button for us
  • 130:43 - 130:46
    called examples.
  • 130:46 - 130:48
    We click the examples button.
  • 130:48 - 130:52
    And, oh my god, there is a ton
    of-- well there's 312 example
  • 130:52 - 130:56
    queries for us to choose from.
  • 130:56 - 130:57
    And we can just
    pick something that
  • 130:57 - 131:00
    is roughly like what
    we're trying to find out,
  • 131:00 - 131:03
    and then just change
    what needs changing.
  • 131:03 - 131:05
    So let's take a very simple one.
  • 131:05 - 131:07
    The cats query.
  • 131:07 - 131:10
    Maybe one of the simplest
    you could possibly have.
  • 131:10 - 131:14
    And let's run it first
    and then I'll kind of
  • 131:14 - 131:16
    walk you through it.
  • 131:16 - 131:18
    The goal here is not
    to teach you SPARQL,
  • 131:18 - 131:21
    but to get you to be kind
    of literate in SPARQL.
  • 131:21 - 131:24
    To kind of understand why
    this does what it does.
  • 131:24 - 131:26
    So let's run this query first.
  • 131:26 - 131:31
    We click Run and here I
    have results at the bottom.
  • 131:31 - 131:34
    The item, which is
    just a Wikidata item,
  • 131:34 - 131:35
    which of course is a number.
  • 131:35 - 131:39
    Remember, wiki data thinks
    of items as queue numbers.
  • 131:39 - 131:41
    And the label,
    because we're humans
  • 131:41 - 131:43
    and we prefer words to numbers.
  • 131:43 - 131:50
    So these 114 results
    are all the cats
  • 131:50 - 131:53
    that wiki data knows about.
  • 131:53 - 131:55
    Is this all the
    cats in the world?
  • 131:55 - 131:57
    No of course not, remember?
  • 131:57 - 132:00
    It's all the cats Wikidata
    knows about, which
  • 132:00 - 132:01
    means they're somehow notable.
  • 132:01 - 132:05
    I mean someone bothered to
    describe them on Wikidata.
  • 132:05 - 132:13
    And Wikidata was told this
    item is an instance of cat.
  • 132:13 - 132:14
    Right?
  • 132:14 - 132:17
    So these are those cats.
  • 132:17 - 132:19
    And we can click any of them.
  • 132:19 - 132:20
    I don't know,
    Pixel, for example.
  • 132:20 - 132:22
    Click the Wikipedia item.
  • 132:22 - 132:24
    And here is the Wikidata
    item about Pixel
  • 132:24 - 132:26
    with the queue number.
  • 132:26 - 132:29
    And he is a tortoiseshell cat.
  • 132:29 - 132:33
    And as you can see
    instance of cat.
  • 132:33 - 132:34
    OK.
  • 132:34 - 132:37
    And he is five inches high.
  • 132:37 - 132:42
    And he is apparently documented
    in Indonesian, In Bahasa.
  • 132:42 - 132:45
    Right here this is Pixel.
  • 132:45 - 132:50
    And he is apparently somehow
    related to the Guinness World
  • 132:50 - 132:52
    Records book.
  • 132:52 - 132:55
    I don't speak Bahasa, so
    I don't know exactly why
  • 132:55 - 132:56
    this cat is so notable.
  • 132:56 - 132:59
    But, of course, cats
    can become notable
  • 132:59 - 133:00
    for all kinds of reasons.
  • 133:00 - 133:02
    Maybe they're a
    YouTube sensation,
  • 133:02 - 133:04
    you know, maybe
    they were involved
  • 133:04 - 133:05
    in some historical event.
  • 133:05 - 133:09
    I like this cat named Gladstone.
  • 133:09 - 133:17
    This cat named Gladstone is--
  • 133:17 - 133:20
    he has position
    held Chief Mouser
  • 133:20 - 133:22
    to Her Majesty's Treasury.
  • 133:22 - 133:25
    This is an official
    cat with a job.
  • 133:25 - 133:29
    And he has been holding this
    job, mind you, since the 28th
  • 133:29 - 133:32
    of June this past year.
  • 133:32 - 133:33
    That's the start time.
  • 133:33 - 133:36
    And there is no end time
    which means he currently
  • 133:36 - 133:39
    holds the position
    of Chief Mouser
  • 133:39 - 133:40
    to her Majesty's Treasury.
  • 133:40 - 133:43
    His employer is Her
    Majesty's Treasury.
  • 133:43 - 133:44
    He's a male creature.
  • 133:44 - 133:47
    And Wikidata knows
    that this cat is
  • 133:47 - 133:53
    named after William Gladstone,
    the Victorian prime minister.
  • 133:53 - 133:55
    Of course if I don't
    know who this person is
  • 133:55 - 133:58
    I can click through
    and learn that he
  • 133:58 - 134:02
    was a liberal politician
    and prime minister, right?
  • 134:02 - 134:03
    He even has a Twitter account.
  • 134:03 - 134:06
    And Wikidata sends
    me right to it.
  • 134:06 - 134:08
    The treasury cat
    Twitter account.
  • 134:08 - 134:11
    And he has articles in
    German, and English,
  • 134:11 - 134:16
    and of course Japanese,
    because he's a cat.
  • 134:16 - 134:16
    All right.
  • 134:16 - 134:20
    So this was a very simple query.
  • 134:20 - 134:21
    Let's find out why it works.
  • 134:21 - 134:22
    OK.
  • 134:22 - 134:26
    So what did we actually
    tell Wikidata to do for us?
  • 134:26 - 134:32
    We said, please select
    some items for us
  • 134:32 - 134:34
    along with their labels.
  • 134:34 - 134:34
    OK?
  • 134:34 - 134:36
    Along with their
    human readable labels
  • 134:36 - 134:42
    because if I remove this
    label what I get is, see,
  • 134:42 - 134:44
    just a list of item numbers.
  • 134:44 - 134:45
    That's not as fun.
  • 134:45 - 134:47
    So that's what this
    little bit did.
  • 134:47 - 134:50
    I just said, give me the
    items, but also they're
  • 134:50 - 134:52
    human readable label.
  • 134:52 - 134:55
    And I want you to
    select a bunch of items,
  • 134:55 - 134:57
    but not just any
    random bunch of items,
  • 134:57 - 135:01
    I want to select items where
    a certain condition holds.
  • 135:01 - 135:03
    What is the condition?
  • 135:03 - 135:06
    The condition is that the
    item that I want you to select
  • 135:06 - 135:14
    needs to have property
    31 with a value of Q146.
  • 135:14 - 135:16
    Well, that's helpful.
  • 135:16 - 135:18
    If I hover over these numbers--
  • 135:18 - 135:20
    Again, I get the human
    readable version.
  • 135:20 - 135:24
    So I'm looking for
    items that have property
  • 135:24 - 135:29
    instance of with the value cat.
  • 135:29 - 135:29
    Right?
  • 135:29 - 135:31
    Because that's literally
    what I want, right?
  • 135:31 - 135:34
    I want all the items that have
    a property, a statement, that
  • 135:34 - 135:37
    says instance of cat.
  • 135:37 - 135:38
    That's the condition.
  • 135:38 - 135:42
    I'm not interested in items
    that are instance of book,
  • 135:42 - 135:43
    or instance of human.
  • 135:43 - 135:46
    I'm interested in
    instance of cat.
  • 135:46 - 135:51
    That is the only condition
    here in this query.
  • 135:51 - 135:56
    This complicated line I ask
    you to basically ignore.
  • 135:56 - 135:58
    This is one of those
    sacrifices that we
  • 135:58 - 136:01
    make for using a standard
    language like SPARQL.
  • 136:01 - 136:03
    But the role of this
    complicated line
  • 136:03 - 136:05
    is to basically
    ensure that we get
  • 136:05 - 136:08
    the English label for that cat.
  • 136:08 - 136:09
    OK?
  • 136:09 - 136:10
    So don't worry about that.
  • 136:10 - 136:12
    Just leave it there.
  • 136:12 - 136:13
    And we run the query
    and we get the list
  • 136:13 - 136:17
    of cats with their English
    labels, and that is awesome.
  • 136:17 - 136:22
    By the way, if I change EN,
    without really understanding
  • 136:22 - 136:27
    this line, if I change
    EN to HE, for Hebrew,
  • 136:27 - 136:30
    I get the same results
    with a Hebrew label.
  • 136:30 - 136:34
    Of course, these cats,
    nobody bothered to give them
  • 136:34 - 136:36
    Hebrew labels unfortunately.
  • 136:36 - 136:38
    So I get the queue number.
  • 136:38 - 136:43
    But if I changed
    it to Japanese, JA,
  • 136:43 - 136:45
    I would get still a bunch of
    queue numbers for where there
  • 136:45 - 136:47
    isn't a Japanese label,
    but I would get the labels
  • 136:47 - 136:49
    in Japanese.
  • 136:49 - 136:49
    OK?
  • 136:49 - 136:51
    So this is an example
    of how you don't even
  • 136:51 - 136:55
    need to understand all
    the syntax of this query
  • 136:55 - 136:56
    to adapt it to your needs.
  • 136:56 - 136:58
    If you want this
    query as is, but you
  • 136:58 - 137:00
    want the labels in
    Japanese, you can just
  • 137:00 - 137:03
    change the language code here.
  • 137:03 - 137:07
    OK so that is all
    this query does.
  • 137:07 - 137:09
    Again, just give
    me the items that
  • 137:09 - 137:18
    have property 31, instance of,
    with a value 146, which is cat.
  • 137:18 - 137:20
    Let's take a question just
    about this very simple query
  • 137:20 - 137:26
    before we advance to
    more complicated queries.
  • 137:26 - 137:29
    Any questions just about this?
  • 137:29 - 137:33
    Like, did anyone kind of
    really lose me talking
  • 137:33 - 137:35
    about this simple query?
  • 137:35 - 137:39
    Again, this query just tells
    Wikidata, get me all the items
  • 137:39 - 137:41
    that somewhere among
    their statements
  • 137:41 - 137:44
    have instance of cat.
  • 137:44 - 137:47
    That's the only condition.
  • 137:47 - 137:48
    No questions.
  • 137:48 - 137:50
    OK, feel free to ask if
    you'd come up with one.
  • 137:50 - 137:55
    So let's complicate
    things a little.
  • 137:55 - 137:59
    Let's ask only for male cats.
  • 138:02 - 138:03
    OK.
  • 138:03 - 138:07
    Remember this cat
    Gladstone is male,
  • 138:07 - 138:10
    and we know this because
    he has a property called
  • 138:10 - 138:14
    sex or gender, and the value
    is male creature, right?
  • 138:14 - 138:18
    So let's add another
    condition right here
  • 138:18 - 138:20
    under the first condition.
  • 138:20 - 138:21
    OK?
  • 138:21 - 138:23
    This is a new line.
  • 138:23 - 138:25
    And I'm adding a new
    condition to the query.
  • 138:25 - 138:31
    I'm saying, not only do I
    want this item that you return
  • 138:31 - 138:35
    to be instance of cat, I
    also want this same item
  • 138:35 - 138:39
    to have another property,
    the property sex or gender.
  • 138:39 - 138:40
    Right?
  • 138:40 - 138:43
    And I need to refer to
    the property by number.
  • 138:43 - 138:46
    But don't worry,
    Wikidata will help you.
  • 138:46 - 138:50
    So you start with this
    prefix, Wikidata WDDT.
  • 138:53 - 138:55
    Again, just ignore
    that prefix it's
  • 138:55 - 138:59
    one of the features of SPARQL
    that we need to respect.
  • 138:59 - 139:03
    WDT colon, and then I can
    just type control space
  • 139:03 - 139:04
    to do a search, to
    do an auto complete.
  • 139:04 - 139:08
    So I can just type sex
    and Wikidata helpfully
  • 139:08 - 139:12
    offers me a drop down
    with relevant properties.
  • 139:12 - 139:15
    So I click property 21, which
    is the sex or gender property.
  • 139:15 - 139:18
    And then I say, so I want
    the sex or gender property
  • 139:18 - 139:20
    to have the Wikidata value.
  • 139:20 - 139:22
    Again, control space.
  • 139:22 - 139:25
    And I can just
    say male creature.
  • 139:25 - 139:26
    See?
  • 139:26 - 139:31
    There's a different item
    for male, as inhuman,
  • 139:31 - 139:34
    and a different one for
    male creature, for reasons
  • 139:34 - 139:35
    that we won't go into.
  • 139:35 - 139:37
    Let's pick male
    creature, because we're
  • 139:37 - 139:38
    talking about cats here.
  • 139:38 - 139:39
    All right.
  • 139:39 - 139:42
    And add a period here at
    the end and click Run.
  • 139:42 - 139:48
    And instead of 114 cats, we get,
    this time, we got 43 results.
  • 139:48 - 139:53
    Including our friend Gladstone
    who is a male creature cat.
  • 139:53 - 139:59
    So that means all the
    rest are female, right?
  • 139:59 - 140:00
    Wrong.
  • 140:00 - 140:01
    Wrong.
  • 140:01 - 140:03
    That does not mean that at all.
  • 140:03 - 140:07
    What it means is of
    the 114 items that
  • 140:07 - 140:12
    have instance of cat,
    only 43 have explicitly
  • 140:12 - 140:15
    sex male creature.
  • 140:15 - 140:18
    The rest of them do not.
  • 140:18 - 140:22
    Maybe because they have
    sex female creature,
  • 140:22 - 140:26
    but maybe because they don't
    have that property at all.
  • 140:26 - 140:28
    I'm emphasizing
    this to kind of help
  • 140:28 - 140:32
    you train yourself to
    correctly interpret
  • 140:32 - 140:34
    the results of
    queries from Wikidata.
  • 140:34 - 140:37
    Don't jump into this kind
    of simplistic conclusion,
  • 140:37 - 140:42
    OK there's 114 total, 43 male,
    therefore the rest are female.
  • 140:42 - 140:44
    That is not correct.
  • 140:44 - 140:45
    OK?
  • 140:45 - 140:50
    But 43 of those explicitly
    had another statement, sex
  • 140:50 - 140:53
    or gender, male creature.
  • 140:53 - 140:55
    So I just added
    another condition,
  • 140:55 - 140:58
    and now my query is
    asking two separate things
  • 140:58 - 141:00
    about the results.
  • 141:00 - 141:04
    They need to be a cat
    and a male creature.
  • 141:04 - 141:06
    AUDIENCE: Maybe we
    should see how many
  • 141:06 - 141:08
    cats have Twitter accounts.
  • 141:08 - 141:11
    But there is a
    question from YouTube,
  • 141:11 - 141:14
    which is will you talk about
    the export possibilities
  • 141:14 - 141:17
    of the result of the query?
  • 141:17 - 141:18
    ASAF BARTOV: Absolutely.
  • 141:18 - 141:21
    Absolutely I will in
    just a little bit.
  • 141:21 - 141:23
    I mean there is, in
    addition to just getting
  • 141:23 - 141:28
    this kind of table, I can get
    these results in other formats.
  • 141:28 - 141:30
    And I can also
    download these results.
  • 141:30 - 141:33
    I can click the Download
    button and get them
  • 141:33 - 141:35
    as a comma separated
    file, tab separated
  • 141:35 - 141:39
    file, a JSON file, which is
    useful for programmatic uses.
  • 141:39 - 141:41
    I can also get a link.
  • 141:41 - 141:42
    So I can get a
    link to this query.
  • 141:42 - 141:46
    I mean, I spent all this time
    designing this beautiful query.
  • 141:46 - 141:50
    I can get a short URL that was
    generated especially for me
  • 141:50 - 141:52
    right now with a tiny URL.
  • 141:52 - 141:55
    I can just paste this
    into Twitter and go,
  • 141:55 - 141:59
    hey people look at all the male
    cats that Wikidata knows about.
  • 141:59 - 142:01
    OK, this is not a
    very exciting query.
  • 142:01 - 142:04
    But once I get to a really
    complicated exciting query
  • 142:04 - 142:08
    I can totally share that
    very easily through this.
  • 142:08 - 142:10
    And we will get to more
    interesting queries
  • 142:10 - 142:12
    in just a second.
  • 142:12 - 142:16
    Any questions on this kind
    of basic querying so far?
  • 142:16 - 142:18
    OK.
  • 142:18 - 142:25
    So that was a very
    simple example.
  • 142:25 - 142:30
    Let's spend a moment exploring.
  • 142:30 - 142:39
    So this cat Gladstone was
    named after this dude, William
  • 142:39 - 142:44
    Gladstone, who was an
    important British politician.
  • 142:44 - 142:46
    I'm sure he's not the
    only thing out there
  • 142:46 - 142:49
    in the universe that's named
    after Gladstone, right?
  • 142:49 - 142:52
    I mean there has got
    to be, I don't know,
  • 142:52 - 142:55
    park benches,
    planets, asteroids,
  • 142:55 - 143:00
    something other than the
    cat, named after this guy.
  • 143:00 - 143:04
    So we can ask Wikidata
    to tell us all the things
  • 143:04 - 143:07
    that, you know, without
    saying instance of something.
  • 143:07 - 143:11
    Like, I don't know, anything
    named after William Gladstone.
  • 143:11 - 143:13
    So how do I do that?
  • 143:13 - 143:15
    Same principle.
  • 143:15 - 143:20
    Instead of asking about the
    property instance of, property
  • 143:20 - 143:25
    31, instead of that, I
    will ask about the property
  • 143:25 - 143:27
    named after--
  • 143:27 - 143:29
    sorry, named after--
  • 143:29 - 143:31
    I don't need to
    remember the number.
  • 143:31 - 143:32
    I have auto-complete.
  • 143:32 - 143:35
    Named after is property 138.
  • 143:35 - 143:37
    And I want anything
    at all that is
  • 143:37 - 143:42
    named after this person,
    William Gladstone.
  • 143:42 - 143:44
    Here we go.
  • 143:44 - 143:46
    Which is 160852.
  • 143:46 - 143:47
    Whatever.
  • 143:47 - 143:48
    OK.
  • 143:48 - 143:51
    You notice I removed
    instance of cat.
  • 143:51 - 143:52
    I remove the male creature.
  • 143:52 - 143:55
    I'm only asking,
    get me all the items
  • 143:55 - 143:59
    that are somehow named after
    that particular politician.
  • 143:59 - 144:01
    And I run the query,
    and it turns out
  • 144:01 - 144:05
    the Wikidata knows
    about three such things.
  • 144:05 - 144:07
    Does that mean that's
    the only-- these
  • 144:07 - 144:09
    are the only three things
    named after him in the world?
  • 144:09 - 144:10
    Of course not.
  • 144:10 - 144:12
    But these are the only three
    items that are in Wikidata
  • 144:12 - 144:18
    and explicitly have the
    property named after Gladstone.
  • 144:18 - 144:20
    For all I know, there
    may be a village
  • 144:20 - 144:24
    in England called Gladstone
    named after this person.
  • 144:24 - 144:27
    But if nobody added the
    property, named after, linking
  • 144:27 - 144:31
    to the person, he wouldn't show
    up in the results to my query.
  • 144:31 - 144:34
    So Wikidata knows about
    three such things.
  • 144:34 - 144:36
    One of them is something
    called the Gladstone Professor
  • 144:36 - 144:37
    of Government.
  • 144:37 - 144:40
    I can click through and see
    that it's a chair at Oxford
  • 144:40 - 144:41
    University, right?
  • 144:41 - 144:43
    So it's a position.
  • 144:43 - 144:50
    And another is the William
    Gladstone school number 18.
  • 144:50 - 144:51
    William Gladstone
    school number 18.
  • 144:51 - 144:53
    Where is that?
  • 144:53 - 144:55
    That is in Sofia, Bulgaria.
  • 144:55 - 144:56
    Again.
  • 144:56 - 144:59
    All right, so that's a
    particular school in Bulgaria
  • 144:59 - 145:03
    named after William Gladstone.
  • 145:03 - 145:07
    And finally, the third
    result is, of course, our pal
  • 145:07 - 145:10
    Gladstone the Cheif Mouser.
  • 145:10 - 145:13
    If I click through,
    that's the cat.
  • 145:13 - 145:14
    All right, so that
    was an example.
  • 145:14 - 145:16
    I mean, you saw how easy it was.
  • 145:16 - 145:19
    I just named the property and
    the value that I care about,
  • 145:19 - 145:21
    and I get the results.
  • 145:21 - 145:23
    Again, I mean, it's
    kind of a silly example,
  • 145:23 - 145:24
    but think about it.
  • 145:24 - 145:28
    This is-- how else can
    you answer that question?
  • 145:28 - 145:30
    There's no reference desk,
    even at a great University
  • 145:30 - 145:34
    of Oxford, where you can
    walk in and say, give me
  • 145:34 - 145:37
    a list of things
    named after Gladstone.
  • 145:37 - 145:41
    There's no easy way to
    answer that unless you happen
  • 145:41 - 145:45
    to have a very large
    structured and linked
  • 145:45 - 145:48
    data store, like Wikidata.
  • 145:48 - 145:51
    All right, so that
    was a silly example.
  • 145:51 - 145:51
    Let's take some--
  • 145:51 - 145:53
    AUDIENCE: There's a
    bunch of stuff on there.
  • 145:53 - 145:54
    ASAF: Oh, OK.
  • 145:54 - 145:57
    AUDIENCE: Can you show
    easy query on the video?
  • 145:57 - 146:02
    And somebody needs to know
    how to just do property
  • 146:02 - 146:06
    exists without giving
    a specific value.
  • 146:06 - 146:11
    And then once you show easy
    query you reload the page and--
  • 146:11 - 146:13
    ASAF: I don't know easy query.
  • 146:13 - 146:16
    So is that a gadget?
  • 146:16 - 146:17
    I don't know what easy query is.
  • 146:17 - 146:20
    I don't use it.
  • 146:20 - 146:25
    So someone can maybe
    send a link or something?
  • 146:25 - 146:26
    Oh it is a gadget.
  • 146:26 - 146:27
    I don't have it enabled.
  • 146:32 - 146:32
    That is nice.
  • 146:32 - 146:42
    So now, what I just did by hand,
    by formulating the query named
  • 146:42 - 146:45
    after Gladstone--
  • 146:45 - 146:48
    I guess this is the--
  • 146:48 - 146:49
    Is it?
  • 146:53 - 146:54
    Yeah.
  • 146:54 - 146:56
    So this-- I just
    clicked the three--
  • 146:56 - 146:57
    the ellipsis here.
  • 146:57 - 146:58
    Right after the name.
  • 146:58 - 147:00
    You see this?
  • 147:00 - 147:03
    This was just added by
    enabling easy query,
  • 147:03 - 147:05
    which I just learned about.
  • 147:05 - 147:08
    So you just click this
    and it auto-magically
  • 147:08 - 147:10
    made this kind of trivial query.
  • 147:10 - 147:12
    Of course, if I want a more
    complicated query like,
  • 147:12 - 147:15
    I don't know, give me
    all the things that
  • 147:15 - 147:18
    are named after Lincoln
    but are a school,
  • 147:18 - 147:22
    I will still need to kind
    of edit a custom query.
  • 147:22 - 147:23
    But this is a super
    easy and very nice
  • 147:23 - 147:29
    way of just doing a very super
    quick query for exactly this.
  • 147:29 - 147:29
    Right?
  • 147:29 - 147:33
    Like. what other items have
    exactly this property and value
  • 147:33 - 147:36
    named after William Gladstone?
  • 147:36 - 147:39
    So, thank you to whoever
    made this suggestion
  • 147:39 - 147:42
    to demonstrate that, and
    I'm glad I learned something
  • 147:42 - 147:45
    too today.
  • 147:45 - 147:49
    Let's move to
    another sample query.
  • 147:49 - 147:50
    Here's a fun example.
  • 147:50 - 147:57
    Popular surnames among
    fictional characters.
  • 147:57 - 147:59
    Think about that for a second.
  • 147:59 - 148:03
    Popular surnames among
    fictional characters.
  • 148:03 - 148:07
    So we're asking Wikidata
    to go through all
  • 148:07 - 148:10
    the fictional
    characters you know,
  • 148:10 - 148:14
    and of those look through
    their surnames, group
  • 148:14 - 148:16
    them so that you can count
    them, the repetitions
  • 148:16 - 148:18
    of the surnames,
    and give me the most
  • 148:18 - 148:22
    popular surnames among them.
  • 148:22 - 148:26
    Additionally, I want you to
    awesomely present the results
  • 148:26 - 148:28
    as a bubble chart.
  • 148:28 - 148:29
    Oh, yeah.
  • 148:29 - 148:31
    Wikidata can do that.
  • 148:31 - 148:34
    And I run the query.
  • 148:34 - 148:37
    And check it out.
  • 148:37 - 148:41
    The most popular names
    among fictional characters
  • 148:41 - 148:46
    we can say that knows about are
    Joan, Smith, Taylor, et cetera.
  • 148:46 - 148:48
    I mean for all we know,
    the most popular name
  • 148:48 - 148:51
    among fictional characters
    actually in the world
  • 148:51 - 148:52
    may be Wu.
  • 148:52 - 148:55
    Or something in Chinese
    for all we know.
  • 148:55 - 148:58
    But if that has not been
    modeled in Wikidata,
  • 148:58 - 149:01
    we're not going to get that.
  • 149:01 - 149:04
    So Taylor, Smith,
    Jones, Williams,
  • 149:04 - 149:07
    seem to be the
    most popular names.
  • 149:07 - 149:08
    And again, I could limit this.
  • 149:08 - 149:12
    I could make the
    same query but add,
  • 149:12 - 149:14
    only among works whose
    original language
  • 149:14 - 149:19
    was Italian, for example, to get
    more interesting results if I
  • 149:19 - 149:21
    only care about
    Italian literature.
  • 149:21 - 149:25
    But this is an example of
    how I got awesome bubble
  • 149:25 - 149:28
    charts for free, and
    I can just plug this
  • 149:28 - 149:31
    into an awesome
    presentation that I make.
  • 149:31 - 149:34
    Of course I can still
    look at the raw table.
  • 149:34 - 149:38
    So the query still resulted
    in a bunch of data, right?
  • 149:38 - 149:42
    So Smith repeats 41 times,
    Jones 38 times, Taylor 34 times,
  • 149:42 - 149:44
    et cetera, et cetera.
  • 149:44 - 149:49
    And down that list.
  • 149:49 - 149:52
    And I could, again, I could
    export this into a file
  • 149:52 - 149:56
    and load it up in a spreadsheet,
    and do additional processing
  • 149:56 - 149:57
    on it.
  • 149:57 - 149:59
    I can link to it.
  • 149:59 - 150:03
    I can do all kinds of
    awesome things with it.
  • 150:03 - 150:05
    So that's another awesome query.
  • 150:05 - 150:08
    We don't have to go into
    every line by line analysis
  • 150:08 - 150:12
    here of why this
    works the way it does.
  • 150:12 - 150:16
    I want to show you some
    other queries first.
  • 150:16 - 150:22
    Let's look at-- this is just
    fun, overall causes of death.
  • 150:22 - 150:25
    Again a bubble
    chart just looking
  • 150:25 - 150:28
    at people who died
    of things, and have
  • 150:28 - 150:31
    a cause of death listed.
  • 150:31 - 150:34
    And we learn that the most
    commonly listed cause of death
  • 150:34 - 150:40
    is myocardial infarction,
    pneumonitis, cerebral vascular,
  • 150:40 - 150:43
    lung cancer, et
    cetera, et cetera.
  • 150:43 - 150:45
    And again, in a bubble chart.
  • 150:45 - 150:50
    And so how does that work?
  • 150:50 - 150:53
    So just very briefly, the
    important parts of this query
  • 150:53 - 150:59
    are I'm looking for something,
    for some person, who
  • 150:59 - 151:04
    is instance of 31, instance
    of Q5, which is human.
  • 151:04 - 151:05
    So a human.
  • 151:05 - 151:07
    Again, just to kind
    of limit the query.
  • 151:07 - 151:11
    I'm not interested in
    books or mountains.
  • 151:11 - 151:14
    I'm looking for humans
    who have that same person,
  • 151:14 - 151:21
    that same variable PID,
    should have a 509, meaning--
  • 151:21 - 151:22
    Hello.
  • 151:22 - 151:25
    Why don't I have the--
  • 151:25 - 151:25
    Yeah.
  • 151:25 - 151:28
    A 509, which is cause of death.
  • 151:28 - 151:32
    And that cause of death
    is another variable,
  • 151:32 - 151:33
    that I'm calling CID.
  • 151:33 - 151:35
    Now, previously
    we were saying you
  • 151:35 - 151:37
    know I want things
    that are named
  • 151:37 - 151:40
    after Gladstone specifically.
  • 151:40 - 151:42
    Only things that have
    that particular value.
  • 151:42 - 151:44
    Here I'm saying I'm
    looking for things
  • 151:44 - 151:47
    that have some cause of death.
  • 151:47 - 151:49
    Not a specific one.
  • 151:49 - 151:50
    I just wanted to
    get everything that
  • 151:50 - 151:55
    has a statement with some
    value about property 509
  • 151:55 - 151:57
    cause of death.
  • 151:57 - 151:58
    OK?
  • 151:58 - 152:04
    And then this other bit of
    magic here, the group by,
  • 152:04 - 152:08
    tells Wikidata I'm not
    actually interested
  • 152:08 - 152:09
    in every individual thing.
  • 152:09 - 152:12
    I want you to group those
    causes, and then count them
  • 152:12 - 152:14
    and give me the top ones.
  • 152:14 - 152:16
    So that's how this query works.
  • 152:21 - 152:22
    Here's that query I promised.
  • 152:22 - 152:26
    Painters whose fathers
    were also painters.
  • 152:26 - 152:29
    I can only think of a couple.
  • 152:29 - 152:32
    I mean, Monet and Vogel.
  • 152:32 - 152:35
    But I'm sure Wikidata
    knows many more.
  • 152:35 - 152:39
    So let's run this query.
  • 152:39 - 152:40
    And I have 100 results.
  • 152:40 - 152:43
    By the way, I have limited
    it to 100 results just
  • 152:43 - 152:45
    to keep it kind of snappy.
  • 152:45 - 152:48
    But actually, we could
    maybe try removing the limit
  • 152:48 - 152:50
    and see if Wikidata
    could tell us
  • 152:50 - 152:54
    the total number in Wikidata.
  • 152:54 - 152:55
    Yeah, that wasn't too bad.
  • 152:55 - 152:58
    So 1,270 results.
  • 152:58 - 152:59
    OK.
  • 152:59 - 153:04
    Wikidata, already at this
    early date and it's progress,
  • 153:04 - 153:08
    already knows about
    more than 1,200 painters
  • 153:08 - 153:11
    who are sons of painters.
  • 153:11 - 153:16
    Sons of male painters, like
    their father is a painter.
  • 153:16 - 153:18
    There may be
    additional painters who
  • 153:18 - 153:21
    are sons of female painters
    not included in this query.
  • 153:21 - 153:25
    Again, always remember what
    exactly you are asking.
  • 153:25 - 153:28
    In this query I was
    asking about the father.
  • 153:28 - 153:30
    I'm leaving out any
    possible painters who
  • 153:30 - 153:33
    are sons of mother painters.
  • 153:33 - 153:33
    OK?
  • 153:33 - 153:35
    So how does this work?
  • 153:35 - 153:40
    I'm asking for the painter
    along with the human label,
  • 153:40 - 153:43
    and the father along
    with the human label.
  • 153:43 - 153:48
    So Michel Monet is the
    son of Claude Monet.
  • 153:48 - 153:54
    And Domenico Tintoretto is the
    son of the famous Tintoretto
  • 153:54 - 153:57
    whose label, you know, is just
    Tintoretto like Michelangelo.
  • 153:57 - 154:00
    You know, you don't always
    have to have the full name
  • 154:00 - 154:02
    in the common label.
  • 154:02 - 154:07
    Paloma Picasso is the
    daughter of Pablo Picasso.
  • 154:07 - 154:08
    OK.
  • 154:08 - 154:11
    So Wikidata knows about
    all these results.
  • 154:11 - 154:15
    Of course Holbein the Younger
    son of Holbein the Elder.
  • 154:15 - 154:16
    And how did we get there?
  • 154:16 - 154:21
    Well we asked Wikidata
    to look for something,
  • 154:21 - 154:27
    let's call it painter, which
    has 106, which is occupation,
  • 154:27 - 154:31
    with a value painter.
  • 154:31 - 154:32
    Right?
  • 154:32 - 154:35
    This unwieldy number
    1028181, that's painter.
  • 154:35 - 154:40
    So I'm asking for any item
    that has occupation painter.
  • 154:40 - 154:43
    And let's call
    that item painter.
  • 154:43 - 154:50
    I also want that painter to have
    a property 22, which is father.
  • 154:50 - 154:51
    OK.
  • 154:51 - 154:52
    Father.
  • 154:52 - 154:55
    And I want it to
    have some value.
  • 154:55 - 154:59
    OK, I'm putting it into
    another variable called father.
  • 154:59 - 155:01
    I could have called
    it, you know, frog.
  • 155:01 - 155:04
    That doesn't change
    anything, just to be clear.
  • 155:04 - 155:07
    What matters is that this
    is the property father.
  • 155:07 - 155:10
    I could have called
    it anything I want.
  • 155:10 - 155:14
    So, and then, I have
    a third condition.
  • 155:14 - 155:18
    That the father, like whatever
    it says here in property 22,
  • 155:18 - 155:23
    I want that father to have
    himself a property 106
  • 155:23 - 155:28
    occupation with a value painter.
  • 155:28 - 155:29
    OK?
  • 155:29 - 155:31
    These conditions
    combined to give me
  • 155:31 - 155:36
    a list of people who have
    a father and that father
  • 155:36 - 155:38
    has occupation painter as well.
  • 155:38 - 155:41
    Of course, if I suddenly,
    or if you suddenly,
  • 155:41 - 155:44
    are consumed by
    curiosity to know
  • 155:44 - 155:51
    who are some politicians
    who are sons of carpenters?
  • 155:51 - 155:53
    You could just
    change that, right?
  • 155:53 - 155:57
    Change the first value
    from painter to politician.
  • 155:57 - 156:03
    Change the third line's value
    from painter to carpenter.
  • 156:03 - 156:04
    Maybe that list
    will be very short
  • 156:04 - 156:07
    because carpenters don't
    tend to be notable,
  • 156:07 - 156:09
    so they wouldn't be
    represented on Wikidata.
  • 156:09 - 156:12
    That's why this works relatively
    well with painters, right?
  • 156:12 - 156:14
    Because most of
    them are notable.
  • 156:14 - 156:16
    But generally you
    could do that, right?
  • 156:16 - 156:18
    That's an example of
    how you can take a query
  • 156:18 - 156:22
    and just replace one of those
    values, or even the language.
  • 156:22 - 156:27
    So again, I could ask
    for these same painters.
  • 156:27 - 156:28
    It's limited again.
  • 156:28 - 156:31
    These same painters,
    but with Arabic labels.
  • 156:31 - 156:35
    Same query, but I have Arabic
    labels for these painters.
  • 156:35 - 156:37
    And of course where
    there is no Arabic label
  • 156:37 - 156:40
    I get the queue number.
  • 156:40 - 156:41
    OK?
  • 156:41 - 156:44
    So that's that query
    that I promised you,
  • 156:44 - 156:48
    painters who sons of painters
    can be done by Wikidata
  • 156:48 - 156:50
    in under one second.
  • 156:50 - 156:51
    How awesome is that?
  • 156:51 - 156:53
    We can also get some statistics.
  • 156:53 - 156:56
    So how about counting
    total articles
  • 156:56 - 157:00
    in a given wiki by gender.
  • 157:00 - 157:02
    This is what we call
    the content gender
  • 157:02 - 157:07
    gap, as distinct from the
    participation gender gap.
  • 157:07 - 157:10
    This is the gender gap in
    what we cover on Wikipedia.
  • 157:10 - 157:11
    So let's take one of these.
  • 157:16 - 157:18
    So this is a query.
  • 157:18 - 157:23
    Articles about women in
    some given Wikipedia.
  • 157:23 - 157:24
    All right.
  • 157:24 - 157:26
    So let's take--
  • 157:26 - 157:26
    I don't know.
  • 157:26 - 157:30
    Let's take the Tamil Wikipedia.
  • 157:30 - 157:32
    That's language code TA.
  • 157:32 - 157:35
    So I just put TA here.
  • 157:35 - 157:39
    And I click Run, and
    I get this count.
  • 157:39 - 157:40
    That's all I wanted.
  • 157:40 - 157:42
    I'm not actually
    interested in the items,
  • 157:42 - 157:45
    like in the list of women
    on the Tamil Wikipedia.
  • 157:45 - 157:46
    I just want the number.
  • 157:46 - 157:49
    So I selected the count here.
  • 157:49 - 157:53
    And this number
    turns out to be 2159.
  • 157:53 - 157:57
    So there are 2000
    articles about women
  • 157:57 - 158:02
    the Tamil Wikipedia that
    Wikidata knows to be female.
  • 158:02 - 158:03
    Right?
  • 158:03 - 158:06
    I'm asking about the gender
    field, property 21 again.
  • 158:06 - 158:09
    Remember, if there's some
    article about a woman in Tamil
  • 158:09 - 158:12
    Wikipedia, but wiki
    data doesn't have
  • 158:12 - 158:14
    a statement about the
    gender, that person
  • 158:14 - 158:16
    will not be counted here.
  • 158:16 - 158:18
    So again, be careful
    about kind of stating
  • 158:18 - 158:23
    that is exactly the number
    of women articles on Tamil
  • 158:23 - 158:23
    Wikipedia.
  • 158:23 - 158:25
    That's probably not true.
  • 158:25 - 158:28
    I'm sure some of those
    articles are missing
  • 158:28 - 158:31
    a sex or gender or property.
  • 158:31 - 158:33
    But for raw statistics,
    that's probably good,
  • 158:33 - 158:36
    because some men are also
    missing the sex or gender
  • 158:36 - 158:38
    statistic property.
  • 158:38 - 158:42
    So we could take the
    same query for men.
  • 158:42 - 158:43
    It's essentially the exact same.
  • 158:43 - 158:49
    It just has this unwieldy
    number for males, 6581097.
  • 158:49 - 158:53
    I can change this language
    code again to TA for Tamil.
  • 158:53 - 158:59
    And how many men are covered
    on Tamil Wikipedia 14,649.
  • 158:59 - 159:00
    OK.
  • 159:00 - 159:07
    So women, 2,100, men,
    about seven times as many.
  • 159:07 - 159:07
    Right?
  • 159:07 - 159:12
    So that's the approximate
    size of the content gender
  • 159:12 - 159:15
    gap on Tamil Wikipedia.
  • 159:15 - 159:19
    And again, I can complicate
    this query as much as I want.
  • 159:19 - 159:21
    For example, I can
    try and find out
  • 159:21 - 159:30
    if this gender gap is wider
    or narrower among musicians,
  • 159:30 - 159:31
    just as an example.
  • 159:31 - 159:36
    I could just add a line here
    that says occupation musician,
  • 159:36 - 159:38
    and then I'm only
    counting articles
  • 159:38 - 159:41
    on Tamil Wikipedia about
    musicians who are female
  • 159:41 - 159:43
    versus articles
    on Tamil Wikipedia
  • 159:43 - 159:45
    about musicians who are male.
  • 159:45 - 159:48
    And I can kind of
    compare the gender--
  • 159:48 - 159:54
    the content gender gap across
    occupations on Tamil Wikipedia.
  • 159:54 - 159:56
    Do you see the
    important point here?
  • 159:56 - 159:58
    Is that this is not just
    kind of a one purpose query.
  • 159:58 - 160:01
    I can just with a single
    additional conditional suddenly
  • 160:01 - 160:04
    make it a much more interesting
    query, because I break it down
  • 160:04 - 160:06
    by occupation.
  • 160:06 - 160:08
    Or I break it down by century.
  • 160:08 - 160:13
    Do we have more of the coverage
    gap in 19th century people
  • 160:13 - 160:14
    than in 21st century people?
  • 160:14 - 160:16
    I mean, I sure hope so, right?
  • 160:16 - 160:18
    The patriarchy is
    weakening somewhat.
  • 160:18 - 160:22
    So I wouldn't be surprised if
    there are many more notable men
  • 160:22 - 160:23
    covered about the 19th century.
  • 160:23 - 160:26
    But if we are also covering--
  • 160:26 - 160:27
    I mean it's the
    gender gap is just
  • 160:27 - 160:30
    as wide for 21st century
    people, that would
  • 160:30 - 160:31
    be a little disappointing.
  • 160:31 - 160:36
    Again that's something I
    can fairly easily find out
  • 160:36 - 160:39
    on Wikidata query.
  • 160:39 - 160:42
    Any questions so far, or
    are you just sharing links?
  • 160:42 - 160:43
    AUDIENCE: Yep there is one.
  • 160:43 - 160:47
    So somebody is wondering if you
    can demonstrate, or at least
  • 160:47 - 160:50
    give a short answer of the
    latter of this question.
  • 160:50 - 160:53
    Is it possible using
    in Wikidata SPARQL
  • 160:53 - 160:56
    to find specific
    Wikidata articles, e.g.
  • 160:56 - 160:59
    featured articles, of a
    certain language which do not
  • 160:59 - 161:01
    exist in another language.
  • 161:01 - 161:04
    I know it is possible
    to find category based
  • 161:04 - 161:06
    results using a PET scan tool.
  • 161:06 - 161:09
    But can we specify
    that by selecting e.g.
  • 161:09 - 161:10
    featured articles?
  • 161:10 - 161:11
    ASAF BARTOV: Yes.
  • 161:11 - 161:13
    Excellent question.
  • 161:13 - 161:14
    It is possible, indeed.
  • 161:14 - 161:18
    And I will demonstrate
    one such query.
  • 161:18 - 161:19
    Another query that
    I already mentioned
  • 161:19 - 161:25
    largest cities in the
    world with a female mayor.
  • 161:25 - 161:29
    This query-- let's
    close some of these tabs
  • 161:29 - 161:30
    before my browser chokes.
  • 161:34 - 161:37
    So this query lists
    the major world cities
  • 161:37 - 161:39
    run by women currently.
  • 161:39 - 161:46
    And the answer is Mumbai, Mexico
    City, Tokyo, bunch of others.
  • 161:49 - 161:52
    And wait-- that's not it at all.
  • 161:52 - 161:53
    I clicked the wrong one.
  • 161:53 - 161:55
    That's the map of paintings.
  • 161:55 - 161:56
    OK.
  • 161:56 - 161:57
    Let's demonstrate
    that for a second.
  • 161:57 - 162:00
    So this is the map
    of all paintings
  • 162:00 - 162:04
    for which we know a location
    with the count per location.
  • 162:04 - 162:08
    And the results are
    awesomely presented on a map.
  • 162:08 - 162:09
    OK.
  • 162:09 - 162:12
    Again, under the hood this is
    a table, of course, of results.
  • 162:12 - 162:16
    But, awesomely, I can
    browse it as a map.
  • 162:16 - 162:20
    So here is a map of the
    world with all the paintings
  • 162:20 - 162:22
    that Wikidata knows about.
  • 162:22 - 162:24
    Not just knows
    about the paintings,
  • 162:24 - 162:28
    but knows about their
    location in a museum.
  • 162:28 - 162:31
    Not surprisingly
    Europe is much better
  • 162:31 - 162:36
    covered than Russia or Africa.
  • 162:36 - 162:40
    There is a huge gap in
    contribution to Wikidata
  • 162:40 - 162:42
    from these countries.
  • 162:42 - 162:44
    And some of it can be fixed.
  • 162:44 - 162:48
    And of course there is much more
    documentation, and much more
  • 162:48 - 162:50
    art in Europe.
  • 162:50 - 162:54
    But if we zoom in, I
    don't know, Rome probably
  • 162:54 - 162:56
    has a few paintings.
  • 162:56 - 162:56
    Right?
  • 163:00 - 163:02
    Hello.
  • 163:02 - 163:04
    Sorry.
  • 163:04 - 163:10
    It's-- Yes.
  • 163:10 - 163:13
    Vatican City sounds
    like a good bet, right?
  • 163:13 - 163:14
    I can zoom in here.
  • 163:14 - 163:16
    And I can just click
    one of these dots
  • 163:16 - 163:21
    and see in this point
    there are two paintings.
  • 163:21 - 163:25
    And in this one there is one
    and it's the Archbasilica
  • 163:25 - 163:27
    of St. John Lateran.
  • 163:27 - 163:31
    Let's see, this is the
    actual St. Peter, right?
  • 163:31 - 163:34
    Sistine Chapel has 23 paintings.
  • 163:34 - 163:34
    What?
  • 163:34 - 163:37
    The Sistine Chapel has way
    more than 23 paintings.
  • 163:37 - 163:40
    Correct, but 23 of them
    are documented on Wikidata.
  • 163:40 - 163:43
    Have their own item
    for the painting, not
  • 163:43 - 163:45
    the Sistine Chapel,
    the painting has
  • 163:45 - 163:50
    an item that lists its
    being in the Sistine Chapel.
  • 163:50 - 163:51
    There are 23 of those.
  • 163:51 - 163:52
    OK.
  • 163:52 - 163:54
    There is definitely
    room to document
  • 163:54 - 163:57
    the rest of the artworks
    in the Sistine Chapel.
  • 163:57 - 164:00
    So, again, this is just
    not the kind of query
  • 164:00 - 164:03
    you were able to
    make before Wikidata,
  • 164:03 - 164:08
    and it's a fairly simple
    query, as you can see.
  • 164:08 - 164:13
    There are examples using
    maps like airports within 100
  • 164:13 - 164:15
    kilometers of Berlin.
  • 164:15 - 164:18
    Again using the coordinates
    as a useful data point.
  • 164:18 - 164:22
    And here is a map showing me
    only airports within a 100
  • 164:22 - 164:26
    kilometer radius from Berlin.
  • 164:26 - 164:29
    But I wanted to show
    you the mayors query.
  • 164:29 - 164:35
    Let's click the-- oh I just
    have the wrong link here.
  • 164:35 - 164:41
    But I can still find it
    here by typing mayor.
  • 164:41 - 164:45
    Here we go, largest
    cities with female mayor.
  • 164:45 - 164:47
    So this is a slightly
    more complicated query.
  • 164:47 - 164:53
    But if I run it, I get the top
    10, because I set limit to 10.
  • 164:53 - 164:55
    I get the top 10
    cities in the world,
  • 164:55 - 165:00
    by population, size that
    are currently run by women.
  • 165:00 - 165:03
    Tokyo, Mumbai, Yokohama,
    Caracas, et cetera.
  • 165:03 - 165:08
    And one interesting thing that
    you may want to notice here
  • 165:08 - 165:11
    is that I'm asking for cities.
  • 165:11 - 165:14
    I mean items, that
    are instance of city.
  • 165:14 - 165:16
    And that have a
    head of government,
  • 165:16 - 165:19
    that have some
    statement about who
  • 165:19 - 165:28
    is in charge, and that statement
    has sex that's listed up here
  • 165:28 - 165:30
    as female.
  • 165:30 - 165:32
    Don't worry about
    the syntax right now.
  • 165:32 - 165:35
    I just want to show you
    some specific angle here.
  • 165:35 - 165:38
    And I'm further
    filtering these results.
  • 165:38 - 165:45
    I only want those items where
    there is not the property
  • 165:45 - 165:49
    and the qualifier, end time.
  • 165:49 - 165:50
    Why is that important?
  • 165:50 - 165:57
    Because if a city once
    had a female mayor,
  • 165:57 - 166:00
    but that mayor is not the mayor
    anymore, because mayors change,
  • 166:00 - 166:02
    I don't want them in this query.
  • 166:02 - 166:05
    I want to query of
    cities currently having
  • 166:05 - 166:06
    a female mayor.
  • 166:06 - 166:08
    And of course Wikidata
    may have historical data
  • 166:08 - 166:10
    with start and
    end time, as we've
  • 166:10 - 166:15
    seen, that documents this
    person was the mayor of Tokyo
  • 166:15 - 166:17
    or San Francisco
    between these years.
  • 166:17 - 166:19
    But if there is no
    end times that means
  • 166:19 - 166:22
    they are currently the mayor.
  • 166:22 - 166:24
    So that's an example of
    asking about a qualifier
  • 166:24 - 166:28
    of a statement, to again, to get
    the results we actually want.
  • 166:28 - 166:32
    If we want current mayors it's
    important to put this filter.
  • 166:32 - 166:35
    If we don't, we will get
    historical female mayors
  • 166:35 - 166:36
    as well.
  • 166:40 - 166:40
    All right.
  • 166:40 - 166:45
    So these are some
    example queries.
  • 166:45 - 166:49
    Questions about that?
  • 166:52 - 166:53
    Oh, the featured
    article example.
  • 166:58 - 167:02
    So let's look at that.
  • 167:07 - 167:13
    So I have prepared
    such a query recently.
  • 167:13 - 167:15
    Here we go.
  • 167:15 - 167:19
    So this is a query.
  • 167:19 - 167:20
    I just saved it here
    on my user page.
  • 167:20 - 167:22
    I mean, this is
    not Wikidata query.
  • 167:22 - 167:25
    This is just a meta page
    containing the query usefully.
  • 167:28 - 167:34
    And let's run this.
  • 167:34 - 167:38
    So this query, it's actually
    not very complicated.
  • 167:38 - 167:40
    It's just has a long
    list of countries,
  • 167:40 - 167:42
    because I'm asking
    about African countries.
  • 167:42 - 167:43
    OK.
  • 167:43 - 167:45
    I'm looking for human
    females from one
  • 167:45 - 167:51
    of these countries that
    have an article in English.
  • 167:51 - 167:53
    That's what this line means.
  • 167:53 - 167:56
    But not in French.
  • 167:56 - 167:58
    That's what this part means.
  • 167:58 - 167:59
    OK.
  • 167:59 - 168:02
    This part, these
    two lines together.
  • 168:02 - 168:03
    But not in French.
  • 168:03 - 168:06
    And this is what's
    called a badge.
  • 168:06 - 168:09
    That's Wikidata's concept of
    good and featured articles.
  • 168:09 - 168:11
    It's called a badge.
  • 168:11 - 168:16
    So I want them to have some
    badge on English Wikipedia.
  • 168:16 - 168:17
    OK?
  • 168:17 - 168:22
    So again, this query is
    asking for the top 100 women
  • 168:22 - 168:26
    from Africa who are documented
    on English Wikipedia,
  • 168:26 - 168:29
    in a featured or
    good article status.
  • 168:29 - 168:31
    But not on French Wikipedia.
  • 168:31 - 168:33
    So this is a query that's
    a to-do query, right?
  • 168:33 - 168:36
    That's a query
    for French editors
  • 168:36 - 168:40
    to consider what they might
    usefully translate or create
  • 168:40 - 168:41
    in French.
  • 168:41 - 168:49
    And if we run this see
    we have three results.
  • 168:49 - 168:51
    I mean, we have many
    women from Africa
  • 168:51 - 168:52
    covered on English Wikipedia.
  • 168:52 - 168:58
    But only three articles
    have featured or good status
  • 168:58 - 169:03
    among those that do not have
    French Wikipedia coverage.
  • 169:03 - 169:05
    Let me rephrase that.
  • 169:05 - 169:08
    Among the English Wikipedia
    articles about African women
  • 169:08 - 169:11
    that don't have a
    French counterpart,
  • 169:11 - 169:15
    only three are featured or good.
  • 169:15 - 169:17
    OK?
  • 169:17 - 169:18
    Do you see this?
  • 169:18 - 169:20
    The badge is good article.
  • 169:20 - 169:24
    This little incantation
    here is what allows
  • 169:24 - 169:26
    you to ask about the badge.
  • 169:26 - 169:29
    This here.
  • 169:29 - 169:33
    And, by the way, the slides
    will be uploaded to commons.
  • 169:33 - 169:39
    And we will-- how shall we make
    it available on the YouTube
  • 169:39 - 169:40
    thing as well?
  • 169:43 - 169:43
    No, no.
  • 169:43 - 169:46
    But, I mean, for people who
    will later watch this video.
  • 169:52 - 169:54
    Oh yeah, we can add it to
    the YouTube description
  • 169:54 - 169:55
    and the comments description.
  • 169:55 - 169:58
    So in the-- if you're
    watching this video later,
  • 169:58 - 170:01
    in the description, we will
    add a link to this query
  • 170:01 - 170:01
    specifically.
  • 170:01 - 170:03
    Because it's not in
    the slides right now.
  • 170:03 - 170:04
    It will be.
  • 170:07 - 170:08
    OK.
  • 170:08 - 170:10
    So.
  • 170:10 - 170:14
    Questions so far?
  • 170:14 - 170:15
    We're almost done.
  • 170:15 - 170:16
    We have a few minutes left.
  • 170:16 - 170:18
    So questions about queries?
  • 170:18 - 170:20
    I mean, I'm sure
    there's tons of things
  • 170:20 - 170:22
    you don't know how to do yet.
  • 170:22 - 170:25
    And you maybe you didn't really
    get the sense for SPARQL.
  • 170:25 - 170:27
    It's something you need
    to really do on your own
  • 170:27 - 170:28
    on your computer.
  • 170:28 - 170:29
    See how it works.
  • 170:29 - 170:30
    Fiddle with it.
  • 170:30 - 170:31
    Change something.
  • 170:31 - 170:33
    See that it breaks
    and complains.
  • 170:33 - 170:37
    But, very importantly-- oh I
    had this in the other questions
  • 170:37 - 170:38
    slide.
  • 170:38 - 170:42
    Remember Wikidata project chat.
  • 170:42 - 170:46
    That's kind of the Wikidata
    equivalent of the village pump.
  • 170:46 - 170:48
    It's the page on Wikidata
    where you can just
  • 170:48 - 170:50
    show up and ask a question.
  • 170:50 - 170:52
    In my experience, the
    Wikidata community
  • 170:52 - 170:55
    is very nice, very
    welcoming, and very eager
  • 170:55 - 171:00
    to help newer people integrate
    and learn how to do things.
  • 171:00 - 171:02
    There's also an IRC channel.
  • 171:02 - 171:04
    If you know what IRC is and
    how to use it, by all means,
  • 171:04 - 171:08
    go to IRC channel Wikidata.
  • 171:08 - 171:09
    There's people
    there all the time,
  • 171:09 - 171:11
    and you can just ask a question.
  • 171:11 - 171:13
    If you're trying to do a
    query, and you don't quite
  • 171:13 - 171:16
    understand the syntax, or you're
    not sure how to get the result
  • 171:16 - 171:17
    you want.
  • 171:17 - 171:20
    There are people there who
    will gladly help you do that.
  • 171:20 - 171:23
    There is also a
    Wikidata newsletter
  • 171:23 - 171:26
    published by the Wikidata team,
    which is centered in Germany
  • 171:26 - 171:27
    and Wikipedia Germany.
  • 171:27 - 171:32
    And they send out a newsletter
    in English with Wikidata news.
  • 171:32 - 171:34
    You know, new
    properties, new items,
  • 171:34 - 171:35
    new things in the project.
  • 171:35 - 171:37
    But also sample queries.
  • 171:37 - 171:39
    So once a week there is
    kind of an awesome query
  • 171:39 - 171:43
    to learn from, if you want
    to learn that way instead
  • 171:43 - 171:46
    of reading like a
    whole manual on SPARQL.
  • 171:46 - 171:48
    So I'm just encouraging
    you to get help
  • 171:48 - 171:49
    in one of those channels.
  • 171:49 - 171:51
    Of course you can write to me.
  • 171:51 - 171:56
    Just reach out to me and
    ask me questions as well.
  • 171:56 - 171:59
    I hope by now you agree
    that Wikidata is love,
  • 171:59 - 172:03
    and Wikidata data is awesome.
  • 172:03 - 172:06
    If there are no questions,
    we do have a tiny bit of time
  • 172:06 - 172:12
    to demonstrate one
    more tool but that's--
  • 172:12 - 172:12
    no?
  • 172:12 - 172:13
    No questions.
  • 172:13 - 172:18
    OK so let's talk about--
  • 172:18 - 172:19
    well, the resonator
    is kind of nice,
  • 172:19 - 172:23
    but it's a little like
    the article placeholder.
  • 172:23 - 172:26
    So this is not Wikidata
    this is a tool again
  • 172:26 - 172:27
    built by Magnus Manske--
  • 172:27 - 172:29
    AUDIENCE: There's also one
    final question to you in case--
  • 172:29 - 172:30
    ASAF BARTOV: Oh,
    there is a question.
  • 172:30 - 172:30
    AUDIENCE: Yeah.
  • 172:30 - 172:32
    ASAF BARTOV: Which
    advantages and disadvantages
  • 172:32 - 172:35
    to create an item
    before an article is
  • 172:35 - 172:38
    done on English Wikipedia?
  • 172:38 - 172:42
    Well, I mean, this example
    that I just made right.
  • 172:42 - 172:47
    I'm reading this book
    by a notable author.
  • 172:47 - 172:48
    OK.
  • 172:48 - 172:51
    I want this to
    exist on Wikidata,
  • 172:51 - 172:53
    and to be mentioned
    on Wikidata, so
  • 172:53 - 172:57
    that when people look up
    that author in Wikidata
  • 172:57 - 172:59
    they will know about one
    of his notable works.
  • 172:59 - 173:02
    But I'm not prepared to
    put in the time investment
  • 173:02 - 173:06
    to build a whole article
    on English Wikipedia.
  • 173:06 - 173:07
    Either because I don't
    have the time, or I
  • 173:07 - 173:09
    don't have good sources.
  • 173:09 - 173:12
    Or maybe my English
    is not good enough,
  • 173:12 - 173:15
    but it is good enough to just
    record these very basic facts
  • 173:15 - 173:18
    and point to the Library of
    Congress records et cetera.
  • 173:18 - 173:20
    So that it's better
    than nothing.
  • 173:20 - 173:23
    So that's one reason
    to maybe do it.
  • 173:23 - 173:27
    Another reason is to
    be able to link to it.
  • 173:27 - 173:30
    So remember that
    translator lady already
  • 173:30 - 173:33
    had an item on Wikidata, but if
    she hadn't we could have just
  • 173:33 - 173:39
    created a very, very basic
    rudimentary item about her just
  • 173:39 - 173:42
    saying, you know,
    this name is human.
  • 173:42 - 173:43
    Country, Bulgaria.
  • 173:43 - 173:45
    Occupation, translator.
  • 173:45 - 173:49
    Even just that would have
    would have been something,
  • 173:49 - 173:52
    and would have enabled me
    to link to this person.
  • 173:52 - 173:57
    So these are legitimate reasons
    to create Wikidata entities
  • 173:57 - 174:02
    without, or at least before,
    creating a Wikipedia article.
  • 174:02 - 174:03
    If you are going to create--
  • 174:03 - 174:05
    I mean if you're at and
    edit-a-thon or something,
  • 174:05 - 174:08
    and you have come to
    create Wikipedia articles,
  • 174:08 - 174:11
    by all means, first create
    the Wikipedia article,
  • 174:11 - 174:14
    then create the Wikipedia
    item and link to it.
  • 174:18 - 174:20
    I hope that answers
    the question.
  • 174:20 - 174:25
    So the reasonator
    is simply a kind
  • 174:25 - 174:31
    of prettier view of
    items in Wikidata.
  • 174:31 - 174:36
    So you can just type the name
    of an item or the number.
  • 174:36 - 174:39
    Let's pick just a
    random number, 42.
  • 174:39 - 174:40
    Say 42.
  • 174:43 - 174:46
    Which happens to
    be, maybe you've
  • 174:46 - 174:51
    heard of this guy,
    Douglas Adams.
  • 174:51 - 174:55
    He happened to have received
    the queue number 42.
  • 174:55 - 174:59
    I'm sure it's a
    cosmic coincidence
  • 174:59 - 175:01
    of infinite improbability.
  • 175:01 - 175:03
    And this is a view--
  • 175:03 - 175:06
    this is a tool that
    is not Wikidata.
  • 175:06 - 175:10
    It's a tool built on top of
    Wikidata called resonator.
  • 175:10 - 175:15
    And it gives us the information
    from Q42, that is from the--
  • 175:15 - 175:19
    this item in Wikidata, which
    looks like an item in Wikidata.
  • 175:19 - 175:21
    But it gives it to us in a
    slightly more rational kind
  • 175:21 - 175:22
    of lay out.
  • 175:22 - 175:24
    It even kind of
    generates a little bit
  • 175:24 - 175:28
    of pseudo article text for us.
  • 175:28 - 175:30
    You know, Douglas Adams was
    a British writer, playwright,
  • 175:30 - 175:32
    screenwriter,
    bla-bla-bla, an author.
  • 175:32 - 175:36
    He was born on this date, in
    this place, to these people.
  • 175:36 - 175:39
    He studied at this place
    between these years.
  • 175:39 - 175:41
    That's all machine generated.
  • 175:41 - 175:42
    Nobody wrote this text.
  • 175:42 - 175:46
    That's all taken from those
    statements in Wikidata,
  • 175:46 - 175:51
    and generates this reasonable
    reading summary paragraph.
  • 175:51 - 175:54
    And then it gives us this
    little table of relatives.
  • 175:54 - 175:56
    It's all taken from Wikidata.
  • 175:56 - 175:58
    But as you can see,
    this is already
  • 175:58 - 176:02
    a little more accessible than
    the essentially arbitrary
  • 176:02 - 176:05
    ordering of statements
    on Wikidata.
  • 176:05 - 176:06
    And that's OK.
  • 176:06 - 176:08
    I mean, that's
    kind of by design.
  • 176:08 - 176:10
    Wikidata is the platform.
  • 176:10 - 176:12
    There is going to
    be-- there are going
  • 176:12 - 176:16
    to be many new applications,
    and platforms, and tools,
  • 176:16 - 176:19
    and visual interfaces
    on top of Wikidata
  • 176:19 - 176:23
    to browse Wikidata in a more
    friendly or more customized
  • 176:23 - 176:24
    ways.
  • 176:24 - 176:27
    For example, one of the
    things that resonator
  • 176:27 - 176:32
    does for us is give us pictures
    and maps and a timeline.
  • 176:32 - 176:33
    Check it out this.
  • 176:33 - 176:39
    Time line machine generated,
    just from dates and points
  • 176:39 - 176:44
    in time, mentioned in the
    relatively rich Wikidata
  • 176:44 - 176:47
    item about Douglas Adams.
  • 176:47 - 176:48
    Right?
  • 176:48 - 176:50
    So this timeline, for example
    again, completely machine
  • 176:50 - 176:51
    generated.
  • 176:51 - 176:53
    But he was educated
    between these years,
  • 176:53 - 176:55
    so I can put it on the timeline.
  • 176:55 - 176:57
    And this is the year he was
    nominated for a Hugo awards,
  • 176:57 - 177:00
    so I can put that in a timeline.
  • 177:00 - 177:01
    Et cetera.
  • 177:01 - 177:03
    So that's just a super
    quick demonstration
  • 177:03 - 177:07
    of that tool, the resonator.
  • 177:07 - 177:10
    Links are all here
    in the slides.
  • 177:10 - 177:13
    And the final tool I wanted
    to mention very quickly
  • 177:13 - 177:16
    is the mix and match tool.
  • 177:16 - 177:22
    You remember my explanation
    about Wikidata as Nexus,
  • 177:22 - 177:27
    as connection point between many
    databases, many data sources.
  • 177:27 - 177:31
    Those depend on
    these equivalencies.
  • 177:31 - 177:35
    On Wikidata being taught
    that this item is like that
  • 177:35 - 177:38
    ID in this other database.
  • 177:38 - 177:42
    And mix and match is a tool
    again by, Magnus Manske.
  • 177:42 - 177:45
    Maybe you're detecting
    a pattern here.
  • 177:45 - 177:47
    It's a tool by Magnus
    that is designed
  • 177:47 - 177:50
    to enable us to kind
    of take a foreign,
  • 177:50 - 177:55
    an external data set, put
    it alongside Wikidata,
  • 177:55 - 177:57
    and kind of try and align them.
  • 177:57 - 177:59
    So this item in this
    external dataset,
  • 177:59 - 178:01
    is that already
    covered in Wikidata?
  • 178:01 - 178:03
    If so, by what queue number?
  • 178:03 - 178:04
    By what item?
  • 178:04 - 178:06
    If not, maybe we need
    to create a Wikidata
  • 178:06 - 178:08
    item to represent it.
  • 178:08 - 178:10
    Or maybe it's a
    duplicate, or something.
  • 178:10 - 178:16
    So the mix and match tool has
    a list of external data sets,
  • 178:16 - 178:18
    as you can see.
  • 178:18 - 178:21
    The Art and Architecture
    Thesaurus by the Getty Research
  • 178:21 - 178:22
    Institute.
  • 178:22 - 178:27
    Or the Australian
    Dictionary of Biography.
  • 178:27 - 178:29
    All kinds of external
    data sets here.
  • 178:32 - 178:40
    Somewhere here I had a specific
    link to the Royal Society.
  • 178:40 - 178:42
    It can also give
    me some statistics.
  • 178:42 - 178:47
    So there is an external data set
    of all the Fellows of the Royal
  • 178:47 - 178:48
    Society.
  • 178:48 - 178:48
    Right?
  • 178:48 - 178:55
    The oldest academic
    learned society in England.
  • 178:55 - 178:57
    And the internet is tired.
  • 179:03 - 179:05
    Here we go.
  • 179:05 - 179:07
    Nope.
  • 179:07 - 179:08
    Did that work?
  • 179:13 - 179:15
    Fellows of the Royal
    Society, here we go.
  • 179:15 - 179:18
    So this one is complete.
  • 179:18 - 179:21
    I mean, people have manually
    gone over every single item
  • 179:21 - 179:24
    there and either
    matched it to Wikidata
  • 179:24 - 179:27
    or declared that it was not
    in scope, or a duplicate
  • 179:27 - 179:29
    or whatever.
  • 179:29 - 179:31
    But let's look at site stats.
  • 179:31 - 179:35
    This is a fun kind of
    aspect of this tool.
  • 179:35 - 179:39
    But that is not working.
  • 179:39 - 179:41
    Or it's taking too long.
  • 179:41 - 179:44
    So let's just demonstrate
    how this works.
  • 179:44 - 179:46
    Maybe Britannica?
  • 179:46 - 179:47
    Is that done already?
  • 179:53 - 179:54
    Here we go.
  • 179:54 - 179:55
    Encyclopedia Britannica.
  • 179:55 - 179:56
    Yeah.
  • 179:56 - 180:02
    So the Encyclopedia
    Britannica has
  • 180:02 - 180:06
    40% of the items there
    are not yet processed.
  • 180:06 - 180:08
    So let's process one of them.
  • 180:08 - 180:16
    For example there is an item
    in the Encyclopedia Britannica
  • 180:16 - 180:20
    called Boston, England.
  • 180:20 - 180:23
    As you know
    All-American place names
  • 180:23 - 180:26
    are totally stolen
    from elsewhere.
  • 180:26 - 180:29
    So there is a Boston
    in England, though it's
  • 180:29 - 180:31
    no longer the famous one.
  • 180:31 - 180:36
    And the mix and match
    tool has automatically
  • 180:36 - 180:40
    matched it based on
    the label to queue
  • 180:40 - 180:44
    100, which is Boston big
    city in the United States.
  • 180:44 - 180:46
    And that is incorrect, right?
  • 180:46 - 180:49
    That's kind of naive computer
    going, well this is Boston,
  • 180:49 - 180:51
    and this other thing
    is also Boston.
  • 180:51 - 180:56
    And it is asking me to
    confirm this match or not.
  • 180:56 - 180:57
    You see?
  • 180:57 - 181:01
    So this is the Boston,
    England from Britannica.
  • 181:01 - 181:05
    And the tool is asking
    me, is this the same as
  • 181:05 - 181:07
    Boston queue 100 in America?
  • 181:07 - 181:08
    The answer is no.
  • 181:08 - 181:10
    I removed this.
  • 181:10 - 181:12
    I remove this match.
  • 181:12 - 181:15
    And now this Boston,
    England is unmatched.
  • 181:15 - 181:23
    And I can match it to the
    correct one in England.
  • 181:23 - 181:27
    I can do this by searching
    English Wikipedia,
  • 181:27 - 181:29
    or searching Wikidata.
  • 181:29 - 181:32
    I mean, it has
    these handy links.
  • 181:32 - 181:37
    So the English town
    is in Lincolnshire.
  • 181:37 - 181:38
    Boston, Lincolnshire.
  • 181:38 - 181:46
    So I can go there and then
    get the Wikidata item number.
  • 181:46 - 181:50
    See this is not queue
    100, Boston in the states,
  • 181:50 - 181:53
    this is queue 311975
    town in Lincolnshire.
  • 181:53 - 181:57
    I can get this queue
    number, go back to the mix
  • 181:57 - 181:58
    and match tool--
  • 181:58 - 181:59
    Where was that?
  • 181:59 - 182:00
    Here we are.
  • 182:00 - 182:02
    And set queue.
  • 182:02 - 182:09
    I can tell the tool that this is
    the right Boston, and click OK.
  • 182:09 - 182:15
    And now this town
    in Lincolnshire,
  • 182:15 - 182:17
    you can see this here,
    this item, queue 311975,
  • 182:17 - 182:21
    is linked to Britannica.
  • 182:21 - 182:23
    What does this mean?
  • 182:23 - 182:24
    Well, if we go there.
  • 182:24 - 182:25
    If we actually go
    to the Wikidata
  • 182:25 - 182:29
    entity you will see
    that in addition
  • 182:29 - 182:34
    to the few statements that
    it already had, it now has,
  • 182:34 - 182:39
    thanks to my clicking, it now
    has another identifier here.
  • 182:39 - 182:39
    See?
  • 182:39 - 182:44
    Encyclopedia Britannica
    Online ID, with this link.
  • 182:44 - 182:49
    And if we click it, we
    will indeed reach this page
  • 182:49 - 182:52
    in the Britannica
    online, which is indeed
  • 182:52 - 182:54
    about this town in Lincolnshire.
  • 182:54 - 182:55
    You see?
  • 182:55 - 182:59
    So I've contributed one
    of those mappings, one
  • 182:59 - 183:02
    of those identifiers,
    into Wikidata.
  • 183:02 - 183:05
    And I didn't have
    to do it manually.
  • 183:05 - 183:08
    This tool kind of prompted
    me to either confirm
  • 183:08 - 183:09
    if it was correct,
    I could have just
  • 183:09 - 183:12
    clicked confirm since
    it wasn't correct.
  • 183:12 - 183:17
    I corrected it manually, but
    it made this edit on my behalf.
  • 183:17 - 183:21
    So that's another tool that
    encourages us to systematically
  • 183:21 - 183:24
    teach Wikidata more things.
  • 183:24 - 183:26
    And we're out of time.
  • 183:26 - 183:29
    Go edit Wikidata, Now
    that you have the power,
  • 183:29 - 183:31
    you know the deal.
  • 183:31 - 183:32
    Use it for good,
    and not for evil.
  • 183:32 - 183:36
    If you have questions,
    this is my email address.
  • 183:36 - 183:39
    If you're watching this video
    not live the description
  • 183:39 - 183:42
    will have links to the
    slides, and to a bunch
  • 183:42 - 183:45
    of other useful
    pieces of information.
  • 183:45 - 183:50
    Any last questions on IRC?
  • 183:50 - 183:53
    If not, thank you
    for your attention.
  • 183:53 - 183:56
    And if you like this, and if you
    feel that you now get Wikidata,
  • 183:56 - 183:58
    and you get what it's
    good for, and you're
  • 183:58 - 184:02
    inspired to contribute, I have
    only one request from you.
  • 184:02 - 184:05
    I mean, in addition to using
    it for good not for evil,
  • 184:05 - 184:08
    I ask that you spread the word.
  • 184:08 - 184:10
    Show this video--
    share this video
  • 184:10 - 184:13
    with other people in your
    community, or around you.
  • 184:13 - 184:16
    Teach this yourself
    once you're comfortable
  • 184:16 - 184:18
    with these concepts.
  • 184:18 - 184:21
    Feel free to use my slides.
  • 184:21 - 184:24
    Yeah, and edit Wikidata.
  • 184:24 - 184:27
    Thank you very
    much, and goodbye.
Title:
https:/.../A_Gentle_Introduction_to_Wikidata_for_Absolute_Beginners_%28including_non-techies%21%29.webm.360p.webm
Video Language:
English
Duration:
03:04:32

English subtitles

Incomplete

Revisions Compare revisions