< Return to Video

36C3 Wikipaka WG: Querying Linked Data with SPARQL and the Wikidata Query Service

  • 0:00 - 0:22
    36C3 preroll music
  • 0:22 - 0:30
    Okay so now to our speaker, he’s Lucas.
    He's a SPARQL magician I'm told, so and he
  • 0:30 - 0:35
    will introduce you to his favorite
    querying language, SPARQL, and give you a
  • 0:35 - 0:40
    little introduction and in the second part
    he will do some live coding which is
  • 0:40 - 0:46
    always really interesting and funny and
    you can give him some things that he's
  • 0:46 - 0:50
    querying for you and I'm sure we'll have
    lots of fun and interesting learning stuff
  • 0:50 - 0:54
    here so give a warm round of applause to
    Lucas.
  • 0:54 - 0:56
    [Applause]
  • 1:01 - 1:09
    [inaudible]
  • 1:09 - 1:13
    Is this better? Aha! It's a bit too loud
    so I'll just talk a bit until they have
  • 1:13 - 1:19
    figured it out. Yeah so this is going to
    be kind of two parts but not really that
  • 1:19 - 1:22
    separate but in the second part I'm
    basically going to write the queries that
  • 1:22 - 1:27
    you suggest so if you – if you see what
    I'm going to do here and then think oh I
  • 1:27 - 1:31
    have a great idea for something we could
    perhaps query then just remember that and
  • 1:31 - 1:35
    we'll get back to that hopefully because
    otherwise the second half is going to be
  • 1:35 - 1:40
    really short if I don't get any ideas from
    you. But yeah, so this is about querying
  • 1:40 - 1:46
    linked data which allows you to do all
    kinds of crazy things and answer all kinds
  • 1:46 - 1:51
    of crazy questions such as I think I had
    on the slides something like "what are the
  • 1:51 - 1:54
    largest cities with a female mayor?" and
    if you wanted to find that out
  • 1:54 - 1:59
    traditionally you could like go through
    Wikipedia and try to find all the largest
  • 1:59 - 2:03
    cities and see which ones have a female
    mayor and which ones don't or perhaps
  • 2:03 - 2:07
    there's a category with all the cities
    with a female mayor but then you have to
  • 2:07 - 2:12
    sort them by population and it's a whole
    mess and with linked data you can find
  • 2:12 - 2:17
    that out much more easily and also all
    kinds of other things but let's start with
  • 2:17 - 2:25
    some simple fantasy linked data so this is
    a tiny snippet of linked data, some data
  • 2:25 - 2:30
    graph. It's just composed of a load of
    nodes which are these ovals and rectangles
  • 2:30 - 2:35
    here and they're connected with arrows and
    each of these forms kind of a triple
  • 2:35 - 2:40
    consisting of the start node and then the
    arrow and then the end node and that's how
  • 2:40 - 2:45
    we represent all the information you have
    in there, in this linked database. So for
  • 2:45 - 2:48
    example we can read this as this talk
    right now happens in the Esszimmer or the
  • 2:48 - 2:52
    dining room which is the name of this
    stage here and it's going to be followed
  • 2:52 - 2:56
    by the live querying session which also
    happens in Esszimmer and the live querying
  • 2:56 - 3:01
    session in turn follows this talk again
    and the Esszimmer, the dining room, is
  • 3:01 - 3:06
    next to the kitchen, the Küche, and the
    kitchen is next to the dining room again
  • 3:06 - 3:10
    and both of them are part of the
    WikipakaWG which is part of 36C3 and the
  • 3:10 - 3:17
    talk happens right now and at the same
    time there's also some talk about how
  • 3:17 - 3:22
    state elections are climate elections or
    something in the Chaos West stage, starts
  • 3:22 - 3:26
    at the same time, Chaos West stage is part
    of the Chaos West Assembly which is part
  • 3:26 - 3:32
    of 36C3 as well and so this graph has a
    few important properties, for example
  • 3:32 - 3:36
    there's some redundant connections here,
    you could see, you could say, if this talk
  • 3:36 - 3:39
    is followed by the live querying then you
    don't really need to know that live
  • 3:39 - 3:44
    querying follows this talk, it's kind of
    redundant information. You already know
  • 3:44 - 3:49
    it, but it doesn't hurt to have it, and it
    often makes your life easier if you have a
  • 3:49 - 3:54
    little bit of redundancy in your graph and
    then if you find that one half of this
  • 3:54 - 3:57
    connection is missing for example you can
    still investigate what's going on and also
  • 3:57 - 4:02
    in here we have kind of bi-directional
    connection so Esszimmer is next to Küche
  • 4:02 - 4:08
    which is next to Esszimmer but this is two
    separate arrows and could also be that
  • 4:08 - 4:12
    only one of them is there so you don't
    have arrows which go into-, in both
  • 4:12 - 4:16
    directions at once in this data model, it
    has to be, if you want something like this
  • 4:16 - 4:19
    you have to have two separate arrows
    because that keeps the data model very
  • 4:19 - 4:26
    simple. You just have subject predicate
    object and that's everything you have, and
  • 4:26 - 4:33
    then to query this graph, you kind of
    select a tiny part of it and then you
  • 4:33 - 4:39
    remove some part that you don't know about
    for example we know that this talk is
  • 4:39 - 4:44
    followed by live querying and if we remove
    the live querying part, then we can ask
  • 4:44 - 4:51
    something like... Okay, I did it the other
    way around. Never mind, this way. This
  • 4:51 - 4:54
    talk is followed by which talk? and then
    you have a question but because you've
  • 4:54 - 5:00
    left out this part and then if you ask
    this question to a query service it can,
  • 5:00 - 5:07
    kind of, you can think of this like a,
    err, damn, I only know the German word for
  • 5:07 - 5:12
    this one, a, Schablone, template, so you
    put this over the graph and this has to
  • 5:12 - 5:16
    match the existing node this has to match
    the existing arrow and then you see which
  • 5:16 - 5:20
    nodes can you put in here and in this case
    that's only the live querying or the other
  • 5:20 - 5:27
    way around which talk follows this one so
    you can have the beginning of the triple
  • 5:27 - 5:31
    can be a variable like this one or the end
    of the triple can be a variable like in
  • 5:31 - 5:39
    this case and you can also have more
    complicated patterns like, no there's not
  • 5:39 - 5:42
    a more complicated pattern, this is the
    same pattern. You have the question which
  • 5:42 - 5:46
    talk happens in Esszimmer and you have two
    answers: this talk happens in Esszimmer
  • 5:46 - 5:52
    and live querying happens in Esszimmer.
    But you can also combine more graph nodes
  • 5:52 - 5:58
    like this, for example, which talk happens
    in some room, which is part of the
  • 5:58 - 6:02
    Wikipaka-WG. So we have one free part here
    and one free part here. But we know that
  • 6:02 - 6:06
    these two have to be connected with,
    "happens in", and then this has to be
  • 6:06 - 6:11
    connected with "is part of" to the
    Wikipaka-WG. And you can kind of
  • 6:11 - 6:17
    construct– if you can phrase your question
    as a kind of graph like this, where some
  • 6:17 - 6:19
    parts are predetermined that you already
    know about and the other parts that you
  • 6:19 - 6:26
    want to find. Those are these kind of
    variables which are here indicated with
  • 6:26 - 6:31
    just dashed lines. Then you can ask that
    question to the graph and find the
  • 6:31 - 6:36
    matching results. In this case, you have
    these two matches, this talk happens in
  • 6:36 - 6:40
    Esszimmer as part of Wikipaka-WG and live
    querying happens in Esszimmer, is part of
  • 6:40 - 6:47
    Wikidata– Wikipaka-WG. And then, if you–
    if we had more information in this graph
  • 6:47 - 6:52
    here, we might also have other rooms. For
    example, there's this library over there
  • 6:52 - 6:56
    which also is going to have some talks. If
    we had the whole schedule in here, we
  • 6:56 - 7:01
    would find those as well. And we could
    also adapt the query so that we don't even
  • 7:01 - 7:07
    make the Wikipaka-WG part fixed. We could
    ask for anything that happens in 33C3. So
  • 7:07 - 7:11
    that would be some variable, happens in
    some room, is part of some assembly, is
  • 7:11 - 7:16
    part of 36C3. And then we would find this
    thing as well because it fits the same
  • 7:16 - 7:22
    kind of pattern: happens in, is part of,
    is part of 36C3. Does that make sense?
  • 7:22 - 7:32
    Hopefully. I'm seeing a lot of nodding
    heads. OK, that's great. So then we can
  • 7:32 - 7:38
    try to move ahead to actually ask some of
    these questions to a real query system.
  • 7:38 - 7:43
    Because in reality, you're not going to
    actually draw these graphs, but you have
  • 7:43 - 7:48
    some kind of language where you phrase
    them instead, which looks a bit like this.
  • 7:48 - 7:53
    So you have the part: SELECT anything
    WHERE, that is kind of like SQL, and then
  • 7:53 - 7:58
    everything else is not like SQL. Forget
    SQL! I hear this is easier to understand
  • 7:58 - 8:03
    if you don't know SQL. I didn't know SQL
    that much when I learned SPARQL, and I
  • 8:03 - 8:09
    think it helped me, apparently. But what
    you write down here is these, is this kind
  • 8:09 - 8:14
    of description of the graph, and these
    dashed parts, which are the variables
  • 8:14 - 8:18
    which you don't yet know. Those are marked
    with a question mark because that's kind
  • 8:18 - 8:21
    of what you use to ask a question. In this
    case, I've just called it "?talk", but it
  • 8:21 - 8:27
    could be any name, basically. And then
    instead of "happens in" as two words, I've
  • 8:27 - 8:33
    just written "happensIn" as one and then
    with the prefix "36C3" and it happens in
  • 8:33 - 8:38
    the 36C3 Esszimmer because I don't really
    have a separate dining room at home, but a
  • 8:38 - 8:43
    lot of people do. So if we just wrote it
    happens in Esszimmer, that would be pretty
  • 8:43 - 8:48
    ambiguous and no one would know which
    which dining room you're talking about.
  • 8:48 - 8:53
    And by adding this prefix we know we're
    talking about just the dining room in
  • 8:53 - 8:59
    this, at thirty– 36C3. I think, I assume
    there's no other assembly that has
  • 8:59 - 9:01
    something called the dining room. If it
    does, then we would have to add something
  • 9:01 - 9:06
    else here to make it clear. And I've used
    the same prefix for "happensIn" to make
  • 9:06 - 9:10
    clear which kind of "happens in" relation
    we're talking about, that it's one
  • 9:10 - 9:16
    specific to Congress events. And then you
    could ask this to a query service which
  • 9:16 - 9:22
    has this example graph in it, and you
    might get the response that it's these two
  • 9:22 - 9:28
    talks. And at the end, you have this
    period here because if you read the whole
  • 9:28 - 9:33
    thing, it's kind of like a sentence again.
    Because the talk happens in Esszimmer. And
  • 9:33 - 9:37
    if you have two sentences, then you have
    two periods. So the talk happens in some
  • 9:37 - 9:41
    room. And this room is part of the
    Wikipaka-WG. And because we've used the
  • 9:41 - 9:48
    same variable name here and down here,
    this has to be the same room. And it
  • 9:48 - 9:51
    couldn't just be two different things. So
    if we use two different variable names
  • 9:51 - 9:56
    here, room and something else, then we
    would just get all the combinations of
  • 9:56 - 9:59
    talks happening somewhere and rooms being
    part of Wikipaka-WG without them being
  • 9:59 - 10:03
    connected anyway, but because they use the
    same variable name they have to be
  • 10:03 - 10:09
    connected like this. And then you would
    get these results we've seen earlier. What
  • 10:09 - 10:14
    you can also do is leave out the room. So
    when I translate this into English, I
  • 10:14 - 10:18
    could say, the talk happens in the room
    and the room is part of Wikipaka-WG. But I
  • 10:18 - 10:23
    could also say the talk happens in some
    room, which is
    part of the Wikipaka-WG,
  • 10:23 - 10:26
    as kind of a– I don't know what that's
    called in English kind of a relative
  • 10:26 - 10:33
    sentence sub-something-clause where we
    don't really talk about the room in itself
  • 10:33 - 10:37
    just as a part of this larger sentence.
    And you can write that in SPARQL as well.
  • 10:37 - 10:44
    And then it looks like this. And these
    square brackets kind of describe what the
  • 10:44 - 10:48
    room looks like without giving it names.
    So in this case, you can only select the
  • 10:48 - 10:52
    talk up here and we don't have a room
    variable. But if you don't care about what
  • 10:52 - 10:56
    the room is, then that can be very useful.
    I've also changed something else here.
  • 10:56 - 11:04
    I've replaced the 36C3 in "isPartOf" with
    schema, which is another prefix and schema
  • 11:04 - 11:09
    is kind of this collection of useful
    prefixes and other nodes that you can
  • 11:09 - 11:14
    reuse, for example, if you're describing
    things you have on your website, you might
  • 11:14 - 11:19
    say you have an article with a
    schema:title and a schema:publicationDate.
  • 11:19 - 11:23
    So this was mainly introduced by Google
    and some other search engines. But we can
  • 11:23 - 11:28
    use the same vocabulary to talk about our
    talks because "isPartOf" is one of these
  • 11:28 - 11:36
    standard terms we can use for that. And
    what else do I have. OK, the next thing I
  • 11:36 - 11:41
    have is actual queries. So I think I'm
    just going to– I'm almost going to switch
  • 11:41 - 11:45
    to Wikidata, so I should talk a bit about
    Wikidata. So all these examples here were
  • 11:45 - 11:53
    just on some example graph, which I made
    up here and threw on a slide with a lot of
  • 11:53 - 11:58
    probably overengineered tikz LaTeX magic,
    which I shouldn't have wasted that much
  • 11:58 - 12:04
    time about. But it looks nice. And… but if
    we want to write real queries, we could
  • 12:04 - 12:07
    load this thing into a query service, but
    it wouldn't be that interesting because
  • 12:07 - 12:12
    it's kind of small. But there are a lot of
    real data graphs out there that you can
  • 12:12 - 12:17
    query with this query language, SPARQL.
    And one of the coolest ones, at least in
  • 12:17 - 12:21
    my opinion, is called Wikidata or
    Wikidata. There's some kind of discussion
  • 12:21 - 12:28
    about how it's pronounced. And it's kind
    of a free database of anything that's
  • 12:28 - 12:34
    relevant. And it's part of the same family
    of projects as Wikipedia and Wikimedia
  • 12:34 - 12:38
    Commons and other things. And it's also
    maintained by the same community of
  • 12:38 - 12:42
    volunteers. And you can find all kinds of
    really interesting and cool and funny data
  • 12:42 - 12:46
    there. So all of these example queries,
    which I have here, we're just going to ask
  • 12:46 - 12:57
    to Wikidata. But first, I will just give
    you one or two minutes to try to imagine
  • 12:57 - 13:04
    what this question would look like, either
    in the graph format or in the SPARQL
  • 13:04 - 13:09
    format. Just try to figure out how you
    would formulate: "which software is
  • 13:09 - 13:15
    written in bash" as a kind of, this kind
    of graph query. And then we can see what
  • 13:15 - 13:23
    we can come up with. So. I didn't think
    this through. I need some waiting loop
  • 13:23 - 13:36
    music now. Does anyone have a kind of idea
    of what the graph looks like, because I'm
  • 13:36 - 13:41
    going to uncover it now and then you can
    compare, if it looks the same way. So it
  • 13:41 - 13:46
    would look like, this at least using the
    Wikidata terminology. So instead of "is
  • 13:46 - 13:52
    written in", the property is called
    probing– programming language. And this
  • 13:52 - 13:56
    could also, this could be called "bash" or
    "Bourne Again Shell" or "GNU bash" or
  • 13:56 - 14:02
    something. Doesn't really matter. And in
    SPARQL, it looks like this, which is a lot
  • 14:02 - 14:07
    less readable, unfortunately, because one
    of the things about Wikidata is that it's
  • 14:07 - 14:14
    multilingual. So instead of saying
    "programming language", we say "P277". And
  • 14:14 - 14:18
    I think that's beautiful, haha. No, but
    this is a property ID and you can look up
  • 14:18 - 14:23
    what this property is called in English or
    in German or in any other language. So if
  • 14:23 - 14:31
    we look at Wikidata.org and look for – I
    think I forgot to zoom in. Yeah. There we
  • 14:31 - 14:40
    go. I hope that's readable. Property P,
    what was it? 277. That is the property
  • 14:40 - 14:45
    "programming language", at least in… okay,
    you can't read that. There you go. At
  • 14:45 - 14:48
    least in English. In German it's
    "Programmiersprache", and it has tons of
  • 14:48 - 14:52
    other languages too. So you can use
    Wikidata in any language you want, which
  • 14:52 - 14:57
    is very nice. I could also show this page
    in a different language and then all of
  • 14:57 - 15:01
    this would look different. The downside is
    that the SPARQL query is not quite as
  • 15:01 - 15:07
    readable because you have to use all these
    numeric identifiers, but you don't have to
  • 15:07 - 15:15
    memorize them at least. So let's… oops,
    try to write this query. SELECT * WHERE
  • 15:15 - 15:25
    and we have the software, which is… which
    has the programming language "bash", and
  • 15:25 - 15:31
    then we have to add these prefixes first,
    so bash is going to be a Wikidata item. So
  • 15:31 - 15:36
    we abbreviate that with "wd" and that's a
    prefix. And then if I press control space,
  • 15:36 - 15:42
    or I think on Macs command space works as
    well, then it searches for bash and shows
  • 15:42 - 15:47
    me these suggestions and then I can just
    select the right one. In this case, "GNU
  • 15:47 - 15:51
    bash", and then I have the ID, and if I
    move the mouse over it again, then I can
  • 15:51 - 15:56
    see what this ID refers to. So it's not
    quite as bad as– so on the PDF slides, you
  • 15:56 - 16:01
    just see the ID. But if you're actually on
    the query.wikidata.org website… let me
  • 16:01 - 16:05
    make that a bit larger so you can all see
    it. And if you want to try that out on
  • 16:05 - 16:09
    your laptop, I don't know, here it's a bit
    audio outage And for the programming
  • 16:09 - 16:17
    language, we use a slightly different
    prefix, which is "wdt", which stands for
  • 16:17 - 16:21
    "truthy". So we're only interested in
    "truthy" information and not all the
  • 16:21 - 16:29
    information. And then we find this
    property P277. And if we run this query
  • 16:29 - 16:35
    with control-enter or with this button
    here, then we get a collection of other
  • 16:35 - 16:40
    IDs. Yeah. Does anyone want to get
    software which is written in bash? This
  • 16:40 - 16:51
    one has a very low ID that is going to be…
    Loading. There we go. Autopackage. Some
  • 16:51 - 16:55
    package management system that I haven't
    even heard of, but it's written in bash.
  • 16:55 - 17:01
    OK, so… wait. Er, so here you can see all
    these statements and "programming
  • 17:01 - 17:08
    language: GNU Bash" is the one we looked
    for. And unfortunately… so this is not a
  • 17:08 - 17:12
    very useful list. So one thing we can do
    in the Wikidata Query Service, which is
  • 17:12 - 17:17
    pretty specific to Wikidata, is to add the
    so-called label service, which is
  • 17:17 - 17:21
    basically magic that you don't need to
    understand. But you write something like
  • 17:21 - 17:26
    "serv" or "service" and then with
    control+space again for autocompletion.
  • 17:26 - 17:31
    And it suggests you this thing. And you
    just keep that in your query at all times,
  • 17:31 - 17:35
    basically. And then you say, I would like
    to have not just a software, but also the
  • 17:35 - 17:41
    software label. And then we get down here,
    the label of the software. And I can also
  • 17:41 - 17:46
    add the software description. And then we
    also see what, what is described. At least
  • 17:46 - 17:53
    if it has a description and then the query
    results are already a lot more usable. And
  • 17:53 - 17:59
    I'm just going to rename this to "item"
    and then we can edit this query however we
  • 17:59 - 18:04
    want and the variable name will always
    kind of match. Because the next query
  • 18:04 - 18:08
    won't be about software anymore. So it'll
    be confusing if you just still call it
  • 18:08 - 18:13
    "software". But, yeah, there is some
    software here like Apache Yetus, Ruby
  • 18:13 - 18:19
    Version Manager, Wikidata missing
    pictures, Pi-hole, all written in Bash.
  • 18:19 - 18:28
    OK, I have several more examples queries
    here, which are kind of simple, should I
  • 18:28 - 18:34
    skip ahead or is it good if I do a few
    more simple examples. Skip ahead? Is that
  • 18:34 - 18:41
    OK? OK, then let's. So who was born at sea
    is not all that interesting. Just Place of
  • 18:41 - 18:45
    birth at sea. We have a special value for
    that and it's not a very interesting list.
  • 18:45 - 18:49
    I think a few results, just five or so,
    because most people are going to have
  • 18:49 - 18:52
    "place of birth: Atlantic Ocean" or
    something. Which places are located on the
  • 18:52 - 18:57
    White Elster, just something for the
    Leipzig people. And where does the
  • 18:57 - 19:01
    Neverending Story take place? This
    actually kind of cute. Let's do that.
  • 19:01 - 19:06
    Also, this is a bit interesting because in
    this case, the variable is in the last
  • 19:06 - 19:13
    place and not the first one. So that… and
    then we have the Neverending Story in the
  • 19:13 - 19:20
    beginning and narrative location. And then
    the item is at the end instead of at the
  • 19:20 - 19:25
    beginning of a triple. And it works just
    as well, except that a lot of these don't
  • 19:25 - 19:32
    have a label in English. So let's add
    German as a fallback language. And then we
  • 19:32 - 19:38
    get all of these places which someone
    added to Wikidata at some point. Let's see
  • 19:38 - 19:42
    if there's any useful information about
    them. So they all have IDs in the same
  • 19:42 - 19:48
    range. So it looks like they were all
    created at the same time because the are
  • 19:48 - 19:52
    are just increasing all the time. So the
    Gelichterland is a place from the
  • 19:52 - 19:55
    Neverending Story, it's a finctional…
    fictional country. It has a capital, which
  • 19:55 - 20:01
    is this fictional place. It's located on
    the… this terrain feature, it's present in
  • 20:01 - 20:06
    the Neverending Story. And it depicts
    horror fiction. I'm not sure about that,
  • 20:06 - 20:12
    but let's leave it alone for now. OK,
    yeah. And skip to a slightly more
  • 20:12 - 20:20
    interesting query, which is this one,
    which popes had children. So what is the
  • 20:20 - 20:25
    graph going to look like for this? How
    many, how many triples are we going to
  • 20:25 - 20:29
    have? So triple is node, arrow, and
    another node, how many triples would you
  • 20:29 - 20:36
    need for "Pope has a child"? Let's do a
    raising hands. Who thinks you need zero
  • 20:36 - 20:43
    triples, OK? Who thinks you need one
    triple? Who thinks you need two triples?
  • 20:43 - 20:48
    That's more people. Does anyone think you
    need three triples? No. OK, so mostly two,
  • 20:48 - 20:54
    but some people think one. So the one… the
    people who think it might need one triple,
  • 20:54 - 21:03
    perhaps are thinking of something like the
    Pope, which is the leader of the worldwide
  • 21:03 - 21:11
    Catholic Church, has a child, this child
    or it's called item, but that's not going
  • 21:11 - 21:15
    to have any results. Or it could be the
    other way around. And you could say that…
  • 21:15 - 21:26
    oh let's just comment this out. The item
    has "father: the pope". And that doesn't
  • 21:26 - 21:31
    work. Because the items are not… the
    children are not directly connected to the
  • 21:31 - 21:35
    item for the office of the pope, instead
    it's going to be two levels. It's going to
  • 21:35 - 21:40
    say the child has a father, some person,
    and then the person has the office pope or
  • 21:40 - 21:45
    has the position pope or is a pope or
    something. So you need this level of
  • 21:45 - 21:49
    indirection. So in the graph that looks
    either like this or it could be the other
  • 21:49 - 21:55
    way around. So either the child has a
    father pope, which has "position held:
  • 21:55 - 22:01
    pope" or the pope has a child and also a
    "position held", so that's kind of an
  • 22:01 - 22:04
    example of the redundancy I mentioned
    earlier, we have the two directions
  • 22:04 - 22:11
    "child" and also "father"/"mother", and-
    so you can ask your query in two ways, and
  • 22:11 - 22:14
    it doesn't really make that much of a
    difference, assuming that the data is
  • 22:14 - 22:20
    complete. And I think someone occasionally
    runs queries to check if any of these
  • 22:20 - 22:25
    circles are missing. So let's try one of
    them, let's just stay with this one, so
  • 22:25 - 22:32
    the item does not have "pope" as father,
    it has some pope, and then this pope has
  • 22:32 - 22:43
    "position held: pope". And then let's add
    the "pope" label and… yeah, pope label is
  • 22:43 - 22:50
    enough, and then we get 24 results! So we
    have a Duke of Parma which, who was the
  • 22:50 - 22:55
    son of Paul III. Paul III had three
    children. Let's sort by this. Wow,
  • 22:55 - 23:04
    Alexander VI was very busy. And some of
    them just have, oh oh oh, we have
  • 23:04 - 23:09
    duplicates, Giovanni Borgia and Giovanni
    Borgia. Should I demonstrate Wikidata
  • 23:09 - 23:14
    editing now or do we just ignore this? So,
    yeah, someone imported a lot of
  • 23:14 - 23:19
    information from this peerage database and
    apparently we have some duplicate items
  • 23:19 - 23:24
    here, let's just leave those alone for
    now. In fact, I think this and this also
  • 23:24 - 23:30
    looks suspiciously similar. Giovanni
    Borgia, unless he had two children of that
  • 23:30 - 23:38
    name. I mean, he could have. So this… we
    have a date of birth 1470s… 1498. No, that
  • 23:38 - 23:45
    might actually be different children. OK,
    not a very creative father in the names.
  • 23:45 - 23:53
    Yeah. And wait, that's a pope who's a
    child of another pope. Very interesting!
  • 23:53 - 23:56
    And another one. And another one. We have
    three popes who are children of other
  • 23:56 - 24:02
    popes. Let's search for those! So we would
    also need for that, that the item has
  • 24:02 - 24:11
    "position held: Pope", and I could copy
    paste this, but just do this. So the item
  • 24:11 - 24:14
    should be… child should have a "father:
    pope" and the item should have "position
  • 24:14 - 24:18
    held: Pope", and the pope should also have
    "position held: pope". And in this case,
  • 24:18 - 24:23
    it would probably be less confusing to
    call these "child" and "father", because
  • 24:23 - 24:26
    this is also a pope now, but… variable
    names. One of the three hardest problems
  • 24:26 - 24:30
    in computer science, right? Yeah, we have
    three children who are… three popes who
  • 24:30 - 24:37
    are children of other popes. Wow. I'm
    actually going to save this query, popes
  • 24:37 - 24:42
    who were children of other popes. But
    actually, we can future-proof this a
  • 24:42 - 24:48
    little bit, because right now we've only
    said that the father should be a pope. But
  • 24:48 - 24:51
    in case there's ever a female pope, let's
    just switch this around and say that the
  • 24:51 - 24:59
    pope should have the child… item and then
    it's going to work, even if the pope
  • 24:59 - 25:03
    happens to be female and is a mother
    instead of a father. There we go, same
  • 25:03 - 25:13
    three results. OK, and let's keep that,
    and open a new tab for next queries. Yeah.
  • 25:13 - 25:18
    Which Microsoft software runs on Linux.
    OK. That's not that funny. So perhaps we
  • 25:18 - 25:23
    can just skip it… I don't know. That joke
    kind of ran out of steam a while ago.
  • 25:23 - 25:27
    Basically looks like this and it's like
    Visual Studio Code and three other
  • 25:27 - 25:31
    programs, meh. What are some compositions
    for organ and orchestra. This isn't funny
  • 25:31 - 25:36
    at all, but I just find it very nice
    because it's just an awesome sound. And so
  • 25:36 - 25:41
    that would be… the composition has the
    instrumentation "organ" and also
  • 25:41 - 25:53
    "orchestra", which we can write as… item,
    item label… composition… instrumentation,
  • 25:53 - 26:12
    this one, orchestra. And also,
    "composition… organ". And then, oops,
  • 26:12 - 26:18
    yeah, this should be "item"… and also I
    forgot to add the label service. There we
  • 26:18 - 26:28
    go. And we have 12 results, which is nice
    if you want to listen to any of those. We
  • 26:28 - 26:39
    could also check if any of them have an
    audio file on Commons. Let's see. One, OK,
  • 26:39 - 26:46
    and I think we've heard this one already.
    So, but… one thing that's kind of annoying
  • 26:46 - 26:50
    here, I should have mentioned this in the
    last query, I think. So I had to repeat
  • 26:50 - 26:53
    the item and the property ID, which is a
    bit annoying and makes the query difficult
  • 26:53 - 26:58
    to read. And what you can do is leave that
    out and you can also do this in the
  • 26:58 - 27:05
    previous case. So let's actually go one
    slide back. So here I didn't write twice
  • 27:05 - 27:07
    that it's the software which should have
    the developer, and also the operating
  • 27:07 - 27:11
    system. I just wrote the software has
    "developer: Microsoft" and also with a
  • 27:11 - 27:17
    semicolon at the end instead of a period,
    it has "operating system: Linux". So if
  • 27:17 - 27:19
    you read this as English it's just one
    sentence where you don't repeat the
  • 27:19 - 27:22
    subject twice. The software has
    "developer: Microsoft" and "operating
  • 27:22 - 27:26
    system: Linux", instead of "software has
    developer: Microsoft" and "software has
  • 27:26 - 27:31
    operating system: Linux". And if you… if
    the property here is also the same thing,
  • 27:31 - 27:36
    then you can even leave that out and add a
    comma at the end and just list the two
  • 27:36 - 27:41
    values and you don't even have to repeat
    the instrumentation. So let's do that here
  • 27:41 - 27:47
    and abbreviate this query. And it has the
    exact same 12 results, just slightly more
  • 27:47 - 27:55
    convenient to read and… to write at least,
    hopefully also to read. I don't know. But
  • 27:55 - 27:57
    you don't use the comma that much. The
    semicolon is pretty useful, like we could
  • 27:57 - 28:07
    have written this as, the pope has, er,
    the child and also position held like
  • 28:07 - 28:11
    this. It means exactly the same, but you
    can immediately see that both of these
  • 28:11 - 28:18
    refer to the pope because there's just a
    bunch of blank space here. Yeah, so then
  • 28:18 - 28:28
    we have this one. This isn't funny at all,
    but there are a lot of people who used to
  • 28:28 - 28:33
    be in the Nazi Party during World War 2
    and then who later just went back into a
  • 28:33 - 28:37
    civil life and even received the
    Bundesverdienstkreuz, the order of merit
  • 28:37 - 28:42
    of the Federal Republic of Germany. And
    you can find those… in this case I've done
  • 28:42 - 28:47
    it with three triples, which is, the
    person was a member of this political
  • 28:47 - 28:52
    party and received this award. And also
    I've added that they're "instance of:
  • 28:52 - 28:55
    human", because we also have a lot of
    fictional data on Wikidata. You already
  • 28:55 - 28:58
    saw that with the Neverending Story stuff
    earlier. So there might also be a
  • 28:58 - 29:02
    fictional character who was a member of
    this political party and who received the
  • 29:02 - 29:07
    award, and we're not really interested in
    those. So we add "instance of: human", and
  • 29:07 - 29:11
    then we are certain that we only get real
    results and not fictional results. And it
  • 29:11 - 29:14
    doesn't really cost us anything because
    the Query Service can optimize that pretty
  • 29:14 - 29:22
    well. So let's write that… actually, let's
    do that here. So the item should be
  • 29:22 - 29:32
    "instance of: human", which is Q5, because
    it's a very common item, and "member of
  • 29:32 - 29:40
    political party". And you can see I can
    search by the German abbreviation and find
  • 29:40 - 29:44
    this, even though it's not a label,
    because there are search aliases. And also
  • 29:44 - 29:49
    "award received", the
    Bundesverdienstkreuz, because I can't be
  • 29:49 - 29:54
    bothered to type in the whole English
    name. There we go. And we find, I think…
  • 29:54 - 30:04
    how many results? Eleven results. Yeah.
    And this actually isn't quite correct,
  • 30:04 - 30:10
    because in theory, you don't get this
    order, this order has like 11 parts or
  • 30:10 - 30:15
    something. You can get the Grand Cross
    with Distinction or you can get the Star
  • 30:15 - 30:19
    or whatever. I think it's listed somewhere
    here. Yeah, you can get the Grand Cross
  • 30:19 - 30:23
    Special Class, you can get the Grand Cross
    Special Issue, you can get the Grand Cross
  • 30:23 - 30:27
    First Class, blah blah blah. And so, in
    theory, any of these people should have
  • 30:27 - 30:34
    one of these awards and not just "order of
    merit". But I think when I checked, all of
  • 30:34 - 30:42
    them just had… all the results, just had
    directly "order of merit". But actually,
  • 30:42 - 30:48
    no we can try to search for the correct
    ones instead. So it would not be part of
  • 30:48 - 30:54
    this directly, it would be… "award
    received" would be some award, such as
  • 30:54 - 31:03
    this one, and then this award is part of
    the order of merit, so "award"… "part of"…
  • 31:03 - 31:15
    Let's see if that finds any results. Oh.
    Oh. Oh, dear. Yeah, that, that… that's a
  • 31:15 - 31:21
    lot of results. "Herbert von Karajan".
    That's that's depressing. OK, yeah. OK, so
  • 31:21 - 31:24
    I think I… when I tried this out and
    didn't find any results, I just did
  • 31:24 - 31:30
    something wrong because, this way we find
    a lot more results. And if we… so we don't
  • 31:30 - 31:36
    actually select the award here, because we
    don't care what kind of award they got. So
  • 31:36 - 31:42
    we could also use this abbreviation again,
    like this. So we just say they got some
  • 31:42 - 31:47
    award, which is part of the order of
    merit. And in this case, we could even
  • 31:47 - 31:54
    abbreviate that further and say, we put a
    slash here. And then, that kind of
  • 31:54 - 31:58
    describes a path that you have to take
    from this item to this item and you have
  • 31:58 - 32:04
    to first get to some award received. And
    then that has to be part of something
  • 32:04 - 32:08
    else. And you can add as many elements
    here as you want. And then we get the
  • 32:08 - 32:18
    exact same 802 results… and… lots of well-
    known names here. And if we want to find
  • 32:18 - 32:22
    the original 11 ones that directly had the
    order of merit as the award received, we
  • 32:22 - 32:26
    can add a question mark here, which is
    just like in a regular expression, it says
  • 32:26 - 32:32
    this part is optional. They can have
    directly received this award or they can
  • 32:32 - 32:36
    have received some award, which is part of
    the order of merit. And then we should get
  • 32:36 - 32:48
    813. Yeah, 813 results, so 802, plus the
    11 from earlier. And… I'm starting this
  • 32:48 - 32:53
    with "instance of: human", which… and the
    Query Service is going to re-order this
  • 32:53 - 32:57
    because searching for all the humans and
    then filtering for the ones who are in
  • 32:57 - 33:01
    this political party and so on wouldn't be
    efficient. So I don't have to worry about
  • 33:01 - 33:06
    that. I could write it in this order, or I
    could shuffle it around. Doesn't make any
  • 33:06 - 33:10
    difference. The Query Service already
    knows in which order to do these things.
  • 33:10 - 33:14
    So you don't have to worry about that. You
    can just start with "is a human" and then
  • 33:14 - 33:23
    add everything else. I think I have one
    more complicated query here. Yeah, so
  • 33:23 - 33:28
    that's one of the examples I mentioned
    earlier, the largest cities by population
  • 33:28 - 33:33
    with a female mayor. So the graph for that
    is, I think the largest one I prepared for
  • 33:33 - 33:38
    the slides, except the one in the
    beginning. And it looks like this. We
  • 33:38 - 33:41
    should have a city which is a city,
    "instance of: city", and it has a certain
  • 33:41 - 33:46
    population, and it has… so for the mayor,
    we use the same property as for head of
  • 33:46 - 33:52
    government. And if you don't know that,
    you could look at some city like Berlin
  • 33:52 - 33:59
    and maybe you know what the mayor of
    Berlin is called… what was it?. Something
  • 33:59 - 34:05
    "Müller", I think. Yeah. And then you can
    see, aha, the property for the mayor is
  • 34:05 - 34:14
    "head of government". Or you could also
    search for, the city should have a mayor,
  • 34:14 - 34:19
    and then you'll still find "head of
    government", the right property. And that
  • 34:19 - 34:25
    mayor should be a human and she should
    have the gender "female". Oops. There's a
  • 34:25 - 34:28
    question mark there for no reason at all.
    That's not a variable. That should be the
  • 34:28 - 34:37
    fixed value. Sorry. So let's put that
    there. We have a city which is "instance
  • 34:37 - 34:50
    of: city", and it also has a population
    which we're going to use later and it also
  • 34:50 - 34:55
    has a head of government. No, that's
    wrong. Not the "office held by head of
  • 34:55 - 34:59
    government", the "head of government"
    itself, which we call the mayor and then
  • 34:59 - 35:18
    the mayor is "instance of: human" and
    gender should be female… come on… female.
  • 35:18 - 35:28
    And let's select the city, cityLabel,
    mayorLabel and also the population. And
  • 35:28 - 35:31
    then we find some 83 results. That's not
    yet the largest cities with a female
  • 35:31 - 35:37
    mayor. That's just all of them. And in
    Wikidata we know about 83, apparently. And
  • 35:37 - 35:42
    if your local hometown has a female mayor,
    just go ahead and add it to Wikidata and
  • 35:42 - 35:47
    it's probably relevant. It's not– So the
    relevance criteria are not as strict as on
  • 35:47 - 35:53
    Wikipedia fortunately. But if we want just
    the most populous ones, we can go a bit
  • 35:53 - 36:00
    back into SQL land and say we want to
    ORDER BY the population and in SQL you
  • 36:00 - 36:03
    would write DESC afterwards and in SPARQL
    it's different. You write
  • 36:03 - 36:10
    DESC(?population). Erm, I think it's nicer
    that way. But perhaps it would have been
  • 36:10 - 36:14
    nicer to just stick with the SQL syntax. I
    don't know. And we want to limit this to
  • 36:14 - 36:19
    just the ten most populous cities, for
    example. And here we go. Tokyo is
  • 36:19 - 36:26
    currently the biggest one, then Hong Kong,
    Baghdad, Surabaya, Rome. Yeah. And, oh.
  • 36:26 - 36:37
    This doesn't make that much sense, Caracas
    has two mayors. Anyone… yeah, exactly. So
  • 36:37 - 36:44
    we're only supposed to get the current
    mayor. Head of government… yeah. Does
  • 36:44 - 36:52
    anyone know which one is the current one?
    Or we could just check Wikipedia… Caracas,
  • 36:52 - 36:56
    which hopefully doesn't get it's
    information from Wikidata yet. So it's not
  • 36:56 - 37:08
    circular. And the mayor is… Carolina,
    Carolina Cestari… Cestari, I don't know.
  • 37:12 - 37:15
    laughter
  • 37:15 - 37:25
    OK, so let's add a new one. Ah…? Doesn't
    have an item yet, is that… is that the
  • 37:25 - 37:31
    mayor, or is chief of government something
    else? Doesn't occur anywhere else on the
  • 37:31 - 37:45
    page, of course. Local government… mayor…
    no. OK, so let's just… I don't know,
  • 37:45 - 37:55
    doesn't she have a Wikipedia article? No.
    Just appears in some lists and then she
  • 37:55 - 38:01
    doesn't have a Wikidata item yet? No.
    Then… I don't know. We'll do some live
  • 38:01 - 38:05
    Wikidata editing. It wasn't part of this
    talk, but let's just do it. Carolina
  • 38:05 - 38:17
    Cestari… what country is that? Venezuela.
    Venezuelan politician, and that sounds
  • 38:17 - 38:23
    like a female name, so I'm just going to
    guess and check that after the talk. So
  • 38:23 - 38:29
    she's definitely a human. And gender is
    female and that is going to be enough for
  • 38:29 - 38:38
    our query. Do this search again. There we
    go. And set this to preferred rank. So
  • 38:38 - 38:41
    that's how the Query Service knows that
    this is the current value and it should
  • 38:41 - 38:44
    only return this one. And ideally, one of
    the head of government values should have
  • 38:44 - 38:50
    this preferred rank to mark it as the
    correct current value. And then all the
  • 38:50 - 38:54
    other ones are additional data that you
    can use if you want. But it's not the main
  • 38:54 - 39:01
    value and we are not going to get it in a
    simple query. And then there's some error
  • 39:01 - 39:06
    because Caracas isn't some kind of
    political territorial entity and it should
  • 39:06 - 39:13
    have a start time. I don't care right now.
    OK, so we run this query again and
  • 39:13 - 39:21
    hopefully get just one result for Caracas
    this time. No. Uhm, we have to wait a bit
  • 39:21 - 39:26
    until the Query Service is updated.
    Because it's kind of asynchronous. It just
  • 39:26 - 39:34
    keeps watching for changes and eventually
    it will get the new data, but… okay. It
  • 39:34 - 39:42
    might take a bit longer. Anyways. That's
    how that query works. Does that make kind
  • 39:42 - 39:52
    of sense? OK, great. Yeah, I think this is
    almost exactly what I wrote here. Yeah.
  • 39:52 - 39:56
    Except with some labels and the label
    service. Yeah. There is one problem here,
  • 39:56 - 40:02
    which is, for example, I happen to know
    that Mexico City is a very large city with
  • 40:02 - 40:11
    a population of… population: almost 9
    million. So it should be right after Tokyo
  • 40:11 - 40:19
    in front of Hong Kong. And the head of
    government is a Claudia Sheinbaum or
  • 40:19 - 40:24
    something, which sounds like a woman. So
    we should get this result in the query.
  • 40:24 - 40:29
    The reason we don't is that Mexico City is
    an instance of "big city" and we have
  • 40:29 - 40:35
    searched for "instance of: city". And
    there's some debate about does this class
  • 40:35 - 40:40
    even make sense at all? I think this is
    actually the German classification of, a
  • 40:40 - 40:44
    big city is one with 100 000 Inhabitants,
    and in other languages or countries, a big
  • 40:44 - 40:49
    city might be something else, but for now
    that… the data is what it is. Fortunately,
  • 40:49 - 40:54
    what we have here is the information, a
    "big city" is a subclass of a city/town,
  • 40:54 - 41:05
    which is a subclass of "locality", which
    is a subclass of. Wait. We should arrive
  • 41:05 - 41:08
    at city at some point, but I think we've
    already gone past that. It's also an
  • 41:08 - 41:12
    instance of capital. Let's go down that
    instead. A capital is a subclass of city,
  • 41:12 - 41:17
    there we go. So if we can tell the Query
    Service to follow these subclass
  • 41:17 - 41:23
    connections, then we should find these
    cities. And one way to do that… to make it
  • 41:23 - 41:30
    work for Mexico City would be to say, it
    has to be "instance of", some, with the
  • 41:30 - 41:37
    path again, "subclass of: city" and then
    we would find Mexico City, but we would
  • 41:37 - 41:43
    not find all the… oh, we would still find
    Tokyo because it's still a capital, I
  • 41:43 - 41:47
    guess. But we've missed a lot of other
    cities, I think which we used to have…
  • 41:47 - 41:54
    yeah. Rome, for example, is gone. Because
    it's… that's just an instance of city
  • 41:54 - 41:57
    directly. And we've now made the subclass
    mandatory. What we should do is make it
  • 41:57 - 42:02
    optional, or even better, we would– we
    should say there can be any number of this
  • 42:02 - 42:07
    element. So there… it can be an instance
    of city or it can be an instance of a
  • 42:07 - 42:11
    subclass of city, it can be an instance of
    a subclass of a subclass of city. You can
  • 42:11 - 42:14
    follow any number of elements, that what
    this… that's what this star means, just
  • 42:14 - 42:19
    like in a regular expression. And then we
    probably have to say we only want the
  • 42:19 - 42:24
    distinct ones because they are like five
    different ways to go through the subclass
  • 42:24 - 42:30
    tree until you've found "city". And we're
    not interested in the different ways. But
  • 42:30 - 42:35
    now we should get Tokyo and Mexico City.
    And Rome is also here and Caracas is
  • 42:35 - 42:39
    completely gone because we found enough
    other cities which we were missing
  • 42:39 - 42:46
    earlier. So you kind of have to watch out
    and sometimes use elements like this…
  • 42:46 - 42:52
    "subclass of"-tree is pretty common, or
    with a, something… order of merit, we had
  • 42:52 - 42:57
    to use this "part of". You have to watch
    out if the results are plausible, or
  • 42:57 - 43:01
    ideally, you know some item that should be
    in the results, and then you check, is it
  • 43:01 - 43:06
    there? Why is it not there? And
    investigate like that. But that's a fixed
  • 43:06 - 43:11
    version of the query. And… yeah, if we
    were not interested in the mayor, we could
  • 43:11 - 43:15
    do the same trick again. But, yeah. It
    doesn't make that much of a difference.
  • 43:15 - 43:19
    And I think… yeah, that was almost the
    only difference. Yeah, except that I
  • 43:19 - 43:23
    removed the population so we can order by
    a variable that you don't select in the
  • 43:23 - 43:34
    end if you want. And I think I am out of
    slides. So, yeah, if you want to see more
  • 43:34 - 43:38
    queries, you can look at these Twitter or
    social media accounts. There's a huge list
  • 43:38 - 43:43
    of example queries on Wikidata, which is
    so big that it's getting too big for a
  • 43:43 - 43:46
    wiki page, and people had to move some
    queries out there and it's kind of just
  • 43:46 - 43:51
    grown since 2015 or something. And there's
    a lot of garbage there, but also a lot of
  • 43:51 - 43:56
    useful queries if you want to look at
    that. And I had two more queries in the
  • 43:56 - 44:01
    talk description which we haven't talked
    about yet, and I think we have the time. I
  • 44:01 - 44:04
    can just try to open these. "Which films
    starred more than one future head of
  • 44:04 - 44:15
    government?" Does that work? It doesn't.
    Can I copy the URL here? Yeah, copy link
  • 44:15 - 44:21
    address. So that's a kind of longer query,
    which is why it didn't really fit on one
  • 44:21 - 44:26
    slide. But the important film is you have…
    er, the important part is you have some
  • 44:26 - 44:32
    film… instance of, or subclass of film, it
    has a publication date and a cast member,
  • 44:32 - 44:41
    which is the head of government. And the
    head of government held some position,
  • 44:41 - 44:47
    some head of government, er, some subclass
    of head of government. And that should be
  • 44:47 - 44:53
    after the film was published. And then you
    get a bunch of results. I think this takes
  • 44:53 - 45:00
    like 11 seconds or something. And you get
    like films with Schwarzenegger and one
  • 45:00 - 45:06
    other actor who became US governor. I
    don't remember the name. And you also get
  • 45:06 - 45:10
    a lot of… or several films from World War
    II with future French heads of government,
  • 45:10 - 45:16
    which is really cool. So, like a film that
    was shot about the liberation of Paris,
  • 45:16 - 45:20
    where it's… it's kind of a stretch to call
    them cast members, but they're definitely
  • 45:20 - 45:26
    in the film. And if we get the result,
    then I can tell you what the film is
  • 45:26 - 45:35
    called. Yeah, it might be busy right now,
    so you get up to 60 seconds in the Query
  • 45:35 - 45:40
    Service and then in the end your query is
    killed if it takes longer than that. So
  • 45:40 - 45:43
    sometimes it can be a bit of a struggle to
    make the query work within 60 seconds.
  • 45:43 - 45:48
    There we go, 50 seconds. That was close.
    So there's yeah, there's a "La Libération
  • 45:48 - 45:52
    de Paris" with Charles de Gaulle, who was
    president of the Council and president of
  • 45:52 - 45:58
    the provisional government, and also
    Georges Bidault, I think, who was prime
  • 45:58 - 46:03
    minister and president of the Council, and
    other stuff. We have several Indian films
  • 46:03 - 46:10
    with people who went on to become chief
    ministers. And then down here there's some
  • 46:10 - 46:14
    Canadian politicians, apparently. And then
    here's Arnold Schwarzenegger and Jesse
  • 46:14 - 46:21
    Ventura, who both became governors and
    also starred in several films. And the
  • 46:21 - 46:26
    other thing was, we have a lot of data
    about the British government because a lot
  • 46:26 - 46:32
    of volunteers have just been slaving away
    at that data and adding and adding more
  • 46:32 - 46:39
    information. I think they've… they have
    all their parliaments, complete with party
  • 46:39 - 46:43
    affiliations and everything for at least
    the last 100 years and some partial data
  • 46:43 - 46:47
    for a lot more than that, because they
    have a very long parliamentary history.
  • 46:47 - 46:51
    And then you can do queries like "how many
    people named John are there in
  • 46:51 - 46:56
    parliament", and "how many women with any
    name". And you can see when the women were
  • 46:56 - 47:02
    finally more than just the men who are
    named "John". And it's kind of an amusing
  • 47:02 - 47:08
    graph. Or not so amusing. Takes a while as
    well. I hope it doesn't take 50 seconds,
  • 47:08 - 47:14
    but it looks like the Query Service might
    be busy at the moment. But I think it was
  • 47:14 - 47:20
    something like in 1991 or so is the
    crossover point. Oh yeah. And I should
  • 47:20 - 47:24
    mention anyway, so everything we saw right
    now was just a lot of tables. But you can
  • 47:24 - 47:31
    also show results in different ways, such
    as a line chart. There we go. So in 1992,
  • 47:31 - 47:35
    this was the first parliament which had
    more women than Johns. And then the Johns
  • 47:35 - 47:41
    have slightly declined and the women have
    gone up to 220. How many people are in the
  • 47:41 - 47:48
    House of Commons in total? Does anyone
    know? No. So I don't know what percentage
  • 47:48 - 47:52
    this is. Uh, but, this was… yeah, this
    latest election from 12 December already
  • 47:52 - 48:03
    in there. Yeah. indistinguishable. What?
    So the query looks like this. So this one
  • 48:03 - 48:06
    is broken into several parts. We first
    find all the members of parliament, so
  • 48:06 - 48:11
    they should be human, again, no fictional
    people, and then they should have some
  • 48:11 - 48:16
    "position held", which is a subclass of
    "member of parliament" in the House of
  • 48:16 - 48:22
    Commons. And then there should also be,
    um, a parliamentary term on that, so that
  • 48:22 - 48:28
    we know which parliament it is and when it
    starts. And then down here, we import all
  • 48:28 - 48:35
    those MPs and filter for just the ones
    with the "given name: John". And then we
  • 48:35 - 48:40
    filter for just the ones with "gender:
    female". And there's an optional "subclass
  • 48:40 - 48:44
    of" in here, because currently the data
    model is that there is a separate item for
  • 48:44 - 48:49
    transgender female and someone can have
    "gender: transfemale– transgender female",
  • 48:49 - 48:53
    which is a subclass of "female". And there
    is a discussion right now to get rid of
  • 48:53 - 48:57
    that and have a separate property for that
    instead. And then all the trans people
  • 48:57 - 48:59
    just have "gender:", their right gender,
    and you don't have to mess with subclass.
  • 48:59 - 49:04
    But right now we still… well, we need it
    in theory, I don't think there are any MPs
  • 49:04 - 49:09
    in practice. But, you know, you know, you
    can just keep it in there. And then we
  • 49:09 - 49:15
    import the results and get them here
    either as a line chart or as a table, if
  • 49:15 - 49:21
    you want to sort it by the time… yeah, the
    data starts in 1919, apparently. So we
  • 49:21 - 49:25
    have exactly a hundred years of history
    there. We can also show it as a bar chart,
  • 49:25 - 49:31
    if that makes more sense. No it doesn't.
    That makes no sense. Line chart is the
  • 49:31 - 49:35
    right one. Oh, right, but if you show the
    line chart again, then it breaks for some
  • 49:35 - 49:39
    reason, there's some bug there. So let's
    just show it again. There we go. That's
  • 49:39 - 49:47
    the right… chart. Yeah, and I guess… oh
    wow, it's already… 50 minutes, so I guess
  • 49:47 - 49:55
    this is the point where we start moving to
    the live querying part, and I was told I
  • 49:55 - 49:59
    should make at least a short break for the
    stream, so the Angels know where to cut
  • 49:59 - 50:03
    between. But we could also take a 10
    minute's break and then start the next
  • 50:03 - 50:09
    talk on time. Does that sound OK? Or is 10
    minutes too long? Uhm, if you're going to
  • 50:09 - 50:14
    stay here, which would be very nice, then
    please think of some example queries that
  • 50:14 - 50:17
    you think we could write, and then I can
    try to write them, because otherwise I'm
  • 50:17 - 50:22
    not going to have much to do. But yeah,
    let's do a 10 minute break and see you
  • 50:22 - 50:25
    then. Thank you so far.
  • 50:25 - 50:27
    Applause
  • 50:27 - 50:32
    Postroll Music
  • 50:32 - 50:55
    Subtitles created by c3subtitles.de
    in the year 2021. Join, and help us!
Title:
36C3 Wikipaka WG: Querying Linked Data with SPARQL and the Wikidata Query Service
Description:

more » « less
Video Language:
English
Duration:
50:55

English subtitles

Revisions