36C3 preroll music Okay so now to our speaker, he’s Lucas. He's a SPARQL magician I'm told, so and he will introduce you to his favorite querying language, SPARQL, and give you a little introduction and in the second part he will do some live coding which is always really interesting and funny and you can give him some things that he's querying for you and I'm sure we'll have lots of fun and interesting learning stuff here so give a warm round of applause to Lucas. [Applause] [inaudible] Is this better? Aha! It's a bit too loud so I'll just talk a bit until they have figured it out. Yeah so this is going to be kind of two parts but not really that separate but in the second part I'm basically going to write the queries that you suggest so if you – if you see what I'm going to do here and then think oh I have a great idea for something we could perhaps query then just remember that and we'll get back to that hopefully because otherwise the second half is going to be really short if I don't get any ideas from you. But yeah, so this is about querying linked data which allows you to do all kinds of crazy things and answer all kinds of crazy questions such as I think I had on the slides something like "what are the largest cities with a female mayor?" and if you wanted to find that out traditionally you could like go through Wikipedia and try to find all the largest cities and see which ones have a female mayor and which ones don't or perhaps there's a category with all the cities with a female mayor but then you have to sort them by population and it's a whole mess and with linked data you can find that out much more easily and also all kinds of other things but let's start with some simple fantasy linked data so this is a tiny snippet of linked data, some data graph. It's just composed of a load of nodes which are these ovals and rectangles here and they're connected with arrows and each of these forms kind of a triple consisting of the start node and then the arrow and then the end node and that's how we represent all the information you have in there, in this linked database. So for example we can read this as this talk right now happens in the Esszimmer or the dining room which is the name of this stage here and it's going to be followed by the live querying session which also happens in Esszimmer and the live querying session in turn follows this talk again and the Esszimmer, the dining room, is next to the kitchen, the Küche, and the kitchen is next to the dining room again and both of them are part of the WikipakaWG which is part of 36C3 and the talk happens right now and at the same time there's also some talk about how state elections are climate elections or something in the Chaos West stage, starts at the same time, Chaos West stage is part of the Chaos West Assembly which is part of 36C3 as well and so this graph has a few important properties, for example there's some redundant connections here, you could see, you could say, if this talk is followed by the live querying then you don't really need to know that live querying follows this talk, it's kind of redundant information. You already know it, but it doesn't hurt to have it, and it often makes your life easier if you have a little bit of redundancy in your graph and then if you find that one half of this connection is missing for example you can still investigate what's going on and also in here we have kind of bi-directional connection so Esszimmer is next to Küche which is next to Esszimmer but this is two separate arrows and could also be that only one of them is there so you don't have arrows which go into-, in both directions at once in this data model, it has to be, if you want something like this you have to have two separate arrows because that keeps the data model very simple. You just have subject predicate object and that's everything you have, and then to query this graph, you kind of select a tiny part of it and then you remove some part that you don't know about for example we know that this talk is followed by live querying and if we remove the live querying part, then we can ask something like... Okay, I did it the other way around. Never mind, this way. This talk is followed by which talk? and then you have a question but because you've left out this part and then if you ask this question to a query service it can, kind of, you can think of this like a, err, damn, I only know the German word for this one, a, Schablone, template, so you put this over the graph and this has to match the existing node this has to match the existing arrow and then you see which nodes can you put in here and in this case that's only the live querying or the other way around which talk follows this one so you can have the beginning of the triple can be a variable like this one or the end of the triple can be a variable like in this case and you can also have more complicated patterns like, no there's not a more complicated pattern, this is the same pattern. You have the question which talk happens in Esszimmer and you have two answers: this talk happens in Esszimmer and live querying happens in Esszimmer. But you can also combine more graph nodes like this, for example, which talk happens in some room, which is part of the Wikipaka-WG. So we have one free part here and one free part here. But we know that these two have to be connected with, "happens in", and then this has to be connected with "is part of" to the Wikipaka-WG. And you can kind of construct– if you can phrase your question as a kind of graph like this, where some parts are predetermined that you already know about and the other parts that you want to find. Those are these kind of variables which are here indicated with just dashed lines. Then you can ask that question to the graph and find the matching results. In this case, you have these two matches, this talk happens in Esszimmer as part of Wikipaka-WG and live querying happens in Esszimmer, is part of Wikidata– Wikipaka-WG. And then, if you– if we had more information in this graph here, we might also have other rooms. For example, there's this library over there which also is going to have some talks. If we had the whole schedule in here, we would find those as well. And we could also adapt the query so that we don't even make the Wikipaka-WG part fixed. We could ask for anything that happens in 33C3. So that would be some variable, happens in some room, is part of some assembly, is part of 36C3. And then we would find this thing as well because it fits the same kind of pattern: happens in, is part of, is part of 36C3. Does that make sense? Hopefully. I'm seeing a lot of nodding heads. OK, that's great. So then we can try to move ahead to actually ask some of these questions to a real query system. Because in reality, you're not going to actually draw these graphs, but you have some kind of language where you phrase them instead, which looks a bit like this. So you have the part: SELECT anything WHERE, that is kind of like SQL, and then everything else is not like SQL. Forget SQL! I hear this is easier to understand if you don't know SQL. I didn't know SQL that much when I learned SPARQL, and I think it helped me, apparently. But what you write down here is these, is this kind of description of the graph, and these dashed parts, which are the variables which you don't yet know. Those are marked with a question mark because that's kind of what you use to ask a question. In this case, I've just called it "?talk", but it could be any name, basically. And then instead of "happens in" as two words, I've just written "happensIn" as one and then with the prefix "36C3" and it happens in the 36C3 Esszimmer because I don't really have a separate dining room at home, but a lot of people do. So if we just wrote it happens in Esszimmer, that would be pretty ambiguous and no one would know which which dining room you're talking about. And by adding this prefix we know we're talking about just the dining room in this, at thirty– 36C3. I think, I assume there's no other assembly that has something called the dining room. If it does, then we would have to add something else here to make it clear. And I've used the same prefix for "happensIn" to make clear which kind of "happens in" relation we're talking about, that it's one specific to Congress events. And then you could ask this to a query service which has this example graph in it, and you might get the response that it's these two talks. And at the end, you have this period here because if you read the whole thing, it's kind of like a sentence again. Because the talk happens in Esszimmer. And if you have two sentences, then you have two periods. So the talk happens in some room. And this room is part of the Wikipaka-WG. And because we've used the same variable name here and down here, this has to be the same room. And it couldn't just be two different things. So if we use two different variable names here, room and something else, then we would just get all the combinations of talks happening somewhere and rooms being part of Wikipaka-WG without them being connected anyway, but because they use the same variable name they have to be connected like this. And then you would get these results we've seen earlier. What you can also do is leave out the room. So when I translate this into English, I could say, the talk happens in the room and the room is part of Wikipaka-WG. But I could also say the talk happens in some room, which is part of the Wikipaka-WG, as kind of a– I don't know what that's called in English kind of a relative sentence sub-something-clause where we don't really talk about the room in itself just as a part of this larger sentence. And you can write that in SPARQL as well. And then it looks like this. And these square brackets kind of describe what the room looks like without giving it names. So in this case, you can only select the talk up here and we don't have a room variable. But if you don't care about what the room is, then that can be very useful. I've also changed something else here. I've replaced the 36C3 in "isPartOf" with schema, which is another prefix and schema is kind of this collection of useful prefixes and other nodes that you can reuse, for example, if you're describing things you have on your website, you might say you have an article with a schema:title and a schema:publicationDate. So this was mainly introduced by Google and some other search engines. But we can use the same vocabulary to talk about our talks because "isPartOf" is one of these standard terms we can use for that. And what else do I have. OK, the next thing I have is actual queries. So I think I'm just going to– I'm almost going to switch to Wikidata, so I should talk a bit about Wikidata. So all these examples here were just on some example graph, which I made up here and threw on a slide with a lot of probably overengineered tikz LaTeX magic, which I shouldn't have wasted that much time about. But it looks nice. And… but if we want to write real queries, we could load this thing into a query service, but it wouldn't be that interesting because it's kind of small. But there are a lot of real data graphs out there that you can query with this query language, SPARQL. And one of the coolest ones, at least in my opinion, is called Wikidata or Wikidata. There's some kind of discussion about how it's pronounced. And it's kind of a free database of anything that's relevant. And it's part of the same family of projects as Wikipedia and Wikimedia Commons and other things. And it's also maintained by the same community of volunteers. And you can find all kinds of really interesting and cool and funny data there. So all of these example queries, which I have here, we're just going to ask to Wikidata. But first, I will just give you one or two minutes to try to imagine what this question would look like, either in the graph format or in the SPARQL format. Just try to figure out how you would formulate: "which software is written in bash" as a kind of, this kind of graph query. And then we can see what we can come up with. So. I didn't think this through. I need some waiting loop music now. Does anyone have a kind of idea of what the graph looks like, because I'm going to uncover it now and then you can compare, if it looks the same way. So it would look like, this at least using the Wikidata terminology. So instead of "is written in", the property is called probing– programming language. And this could also, this could be called "bash" or "Bourne Again Shell" or "GNU bash" or something. Doesn't really matter. And in SPARQL, it looks like this, which is a lot less readable, unfortunately, because one of the things about Wikidata is that it's multilingual. So instead of saying "programming language", we say "P277". And I think that's beautiful, haha. No, but this is a property ID and you can look up what this property is called in English or in German or in any other language. So if we look at Wikidata.org and look for – I think I forgot to zoom in. Yeah. There we go. I hope that's readable. Property P, what was it? 277. That is the property "programming language", at least in… okay, you can't read that. There you go. At least in English. In German it's "Programmiersprache", and it has tons of other languages too. So you can use Wikidata in any language you want, which is very nice. I could also show this page in a different language and then all of this would look different. The downside is that the SPARQL query is not quite as readable because you have to use all these numeric identifiers, but you don't have to memorize them at least. So let's… oops, try to write this query. SELECT * WHERE and we have the software, which is… which has the programming language "bash", and then we have to add these prefixes first, so bash is going to be a Wikidata item. So we abbreviate that with "wd" and that's a prefix. And then if I press control space, or I think on Macs command space works as well, then it searches for bash and shows me these suggestions and then I can just select the right one. In this case, "GNU bash", and then I have the ID, and if I move the mouse over it again, then I can see what this ID refers to. So it's not quite as bad as– so on the PDF slides, you just see the ID. But if you're actually on the query.wikidata.org website… let me make that a bit larger so you can all see it. And if you want to try that out on your laptop, I don't know, here it's a bit audio outage And for the programming language, we use a slightly different prefix, which is "wdt", which stands for "truthy". So we're only interested in "truthy" information and not all the information. And then we find this property P277. And if we run this query with control-enter or with this button here, then we get a collection of other IDs. Yeah. Does anyone want to get software which is written in bash? This one has a very low ID that is going to be… Loading. There we go. Autopackage. Some package management system that I haven't even heard of, but it's written in bash. OK, so… wait. Er, so here you can see all these statements and "programming language: GNU Bash" is the one we looked for. And unfortunately… so this is not a very useful list. So one thing we can do in the Wikidata Query Service, which is pretty specific to Wikidata, is to add the so-called label service, which is basically magic that you don't need to understand. But you write something like "serv" or "service" and then with control+space again for autocompletion. And it suggests you this thing. And you just keep that in your query at all times, basically. And then you say, I would like to have not just a software, but also the software label. And then we get down here, the label of the software. And I can also add the software description. And then we also see what, what is described. At least if it has a description and then the query results are already a lot more usable. And I'm just going to rename this to "item" and then we can edit this query however we want and the variable name will always kind of match. Because the next query won't be about software anymore. So it'll be confusing if you just still call it "software". But, yeah, there is some software here like Apache Yetus, Ruby Version Manager, Wikidata missing pictures, Pi-hole, all written in Bash. OK, I have several more examples queries here, which are kind of simple, should I skip ahead or is it good if I do a few more simple examples. Skip ahead? Is that OK? OK, then let's. So who was born at sea is not all that interesting. Just Place of birth at sea. We have a special value for that and it's not a very interesting list. I think a few results, just five or so, because most people are going to have "place of birth: Atlantic Ocean" or something. Which places are located on the White Elster, just something for the Leipzig people. And where does the Neverending Story take place? This actually kind of cute. Let's do that. Also, this is a bit interesting because in this case, the variable is in the last place and not the first one. So that… and then we have the Neverending Story in the beginning and narrative location. And then the item is at the end instead of at the beginning of a triple. And it works just as well, except that a lot of these don't have a label in English. So let's add German as a fallback language. And then we get all of these places which someone added to Wikidata at some point. Let's see if there's any useful information about them. So they all have IDs in the same range. So it looks like they were all created at the same time because the are are just increasing all the time. So the Gelichterland is a place from the Neverending Story, it's a finctional… fictional country. It has a capital, which is this fictional place. It's located on the… this terrain feature, it's present in the Neverending Story. And it depicts horror fiction. I'm not sure about that, but let's leave it alone for now. OK, yeah. And skip to a slightly more interesting query, which is this one, which popes had children. So what is the graph going to look like for this? How many, how many triples are we going to have? So triple is node, arrow, and another node, how many triples would you need for "Pope has a child"? Let's do a raising hands. Who thinks you need zero triples, OK? Who thinks you need one triple? Who thinks you need two triples? That's more people. Does anyone think you need three triples? No. OK, so mostly two, but some people think one. So the one… the people who think it might need one triple, perhaps are thinking of something like the Pope, which is the leader of the worldwide Catholic Church, has a child, this child or it's called item, but that's not going to have any results. Or it could be the other way around. And you could say that… oh let's just comment this out. The item has "father: the pope". And that doesn't work. Because the items are not… the children are not directly connected to the item for the office of the pope, instead it's going to be two levels. It's going to say the child has a father, some person, and then the person has the office pope or has the position pope or is a pope or something. So you need this level of indirection. So in the graph that looks either like this or it could be the other way around. So either the child has a father pope, which has "position held: pope" or the pope has a child and also a "position held", so that's kind of an example of the redundancy I mentioned earlier, we have the two directions "child" and also "father"/"mother", and- so you can ask your query in two ways, and it doesn't really make that much of a difference, assuming that the data is complete. And I think someone occasionally runs queries to check if any of these circles are missing. So let's try one of them, let's just stay with this one, so the item does not have "pope" as father, it has some pope, and then this pope has "position held: pope". And then let's add the "pope" label and… yeah, pope label is enough, and then we get 24 results! So we have a Duke of Parma which, who was the son of Paul III. Paul III had three children. Let's sort by this. Wow, Alexander VI was very busy. And some of them just have, oh oh oh, we have duplicates, Giovanni Borgia and Giovanni Borgia. Should I demonstrate Wikidata editing now or do we just ignore this? So, yeah, someone imported a lot of information from this peerage database and apparently we have some duplicate items here, let's just leave those alone for now. In fact, I think this and this also looks suspiciously similar. Giovanni Borgia, unless he had two children of that name. I mean, he could have. So this… we have a date of birth 1470s… 1498. No, that might actually be different children. OK, not a very creative father in the names. Yeah. And wait, that's a pope who's a child of another pope. Very interesting! And another one. And another one. We have three popes who are children of other popes. Let's search for those! So we would also need for that, that the item has "position held: Pope", and I could copy paste this, but just do this. So the item should be… child should have a "father: pope" and the item should have "position held: Pope", and the pope should also have "position held: pope". And in this case, it would probably be less confusing to call these "child" and "father", because this is also a pope now, but… variable names. One of the three hardest problems in computer science, right? Yeah, we have three children who are… three popes who are children of other popes. Wow. I'm actually going to save this query, popes who were children of other popes. But actually, we can future-proof this a little bit, because right now we've only said that the father should be a pope. But in case there's ever a female pope, let's just switch this around and say that the pope should have the child… item and then it's going to work, even if the pope happens to be female and is a mother instead of a father. There we go, same three results. OK, and let's keep that, and open a new tab for next queries. Yeah. Which Microsoft software runs on Linux. OK. That's not that funny. So perhaps we can just skip it… I don't know. That joke kind of ran out of steam a while ago. Basically looks like this and it's like Visual Studio Code and three other programs, meh. What are some compositions for organ and orchestra. This isn't funny at all, but I just find it very nice because it's just an awesome sound. And so that would be… the composition has the instrumentation "organ" and also "orchestra", which we can write as… item, item label… composition… instrumentation, this one, orchestra. And also, "composition… organ". And then, oops, yeah, this should be "item"… and also I forgot to add the label service. There we go. And we have 12 results, which is nice if you want to listen to any of those. We could also check if any of them have an audio file on Commons. Let's see. One, OK, and I think we've heard this one already. So, but… one thing that's kind of annoying here, I should have mentioned this in the last query, I think. So I had to repeat the item and the property ID, which is a bit annoying and makes the query difficult to read. And what you can do is leave that out and you can also do this in the previous case. So let's actually go one slide back. So here I didn't write twice that it's the software which should have the developer, and also the operating system. I just wrote the software has "developer: Microsoft" and also with a semicolon at the end instead of a period, it has "operating system: Linux". So if you read this as English it's just one sentence where you don't repeat the subject twice. The software has "developer: Microsoft" and "operating system: Linux", instead of "software has developer: Microsoft" and "software has operating system: Linux". And if you… if the property here is also the same thing, then you can even leave that out and add a comma at the end and just list the two values and you don't even have to repeat the instrumentation. So let's do that here and abbreviate this query. And it has the exact same 12 results, just slightly more convenient to read and… to write at least, hopefully also to read. I don't know. But you don't use the comma that much. The semicolon is pretty useful, like we could have written this as, the pope has, er, the child and also position held like this. It means exactly the same, but you can immediately see that both of these refer to the pope because there's just a bunch of blank space here. Yeah, so then we have this one. This isn't funny at all, but there are a lot of people who used to be in the Nazi Party during World War 2 and then who later just went back into a civil life and even received the Bundesverdienstkreuz, the order of merit of the Federal Republic of Germany. And you can find those… in this case I've done it with three triples, which is, the person was a member of this political party and received this award. And also I've added that they're "instance of: human", because we also have a lot of fictional data on Wikidata. You already saw that with the Neverending Story stuff earlier. So there might also be a fictional character who was a member of this political party and who received the award, and we're not really interested in those. So we add "instance of: human", and then we are certain that we only get real results and not fictional results. And it doesn't really cost us anything because the Query Service can optimize that pretty well. So let's write that… actually, let's do that here. So the item should be "instance of: human", which is Q5, because it's a very common item, and "member of political party". And you can see I can search by the German abbreviation and find this, even though it's not a label, because there are search aliases. And also "award received", the Bundesverdienstkreuz, because I can't be bothered to type in the whole English name. There we go. And we find, I think… how many results? Eleven results. Yeah. And this actually isn't quite correct, because in theory, you don't get this order, this order has like 11 parts or something. You can get the Grand Cross with Distinction or you can get the Star or whatever. I think it's listed somewhere here. Yeah, you can get the Grand Cross Special Class, you can get the Grand Cross Special Issue, you can get the Grand Cross First Class, blah blah blah. And so, in theory, any of these people should have one of these awards and not just "order of merit". But I think when I checked, all of them just had… all the results, just had directly "order of merit". But actually, no we can try to search for the correct ones instead. So it would not be part of this directly, it would be… "award received" would be some award, such as this one, and then this award is part of the order of merit, so "award"… "part of"… Let's see if that finds any results. Oh. Oh. Oh, dear. Yeah, that, that… that's a lot of results. "Herbert von Karajan". That's that's depressing. OK, yeah. OK, so I think I… when I tried this out and didn't find any results, I just did something wrong because, this way we find a lot more results. And if we… so we don't actually select the award here, because we don't care what kind of award they got. So we could also use this abbreviation again, like this. So we just say they got some award, which is part of the order of merit. And in this case, we could even abbreviate that further and say, we put a slash here. And then, that kind of describes a path that you have to take from this item to this item and you have to first get to some award received. And then that has to be part of something else. And you can add as many elements here as you want. And then we get the exact same 802 results… and… lots of well- known names here. And if we want to find the original 11 ones that directly had the order of merit as the award received, we can add a question mark here, which is just like in a regular expression, it says this part is optional. They can have directly received this award or they can have received some award, which is part of the order of merit. And then we should get 813. Yeah, 813 results, so 802, plus the 11 from earlier. And… I'm starting this with "instance of: human", which… and the Query Service is going to re-order this because searching for all the humans and then filtering for the ones who are in this political party and so on wouldn't be efficient. So I don't have to worry about that. I could write it in this order, or I could shuffle it around. Doesn't make any difference. The Query Service already knows in which order to do these things. So you don't have to worry about that. You can just start with "is a human" and then add everything else. I think I have one more complicated query here. Yeah, so that's one of the examples I mentioned earlier, the largest cities by population with a female mayor. So the graph for that is, I think the largest one I prepared for the slides, except the one in the beginning. And it looks like this. We should have a city which is a city, "instance of: city", and it has a certain population, and it has… so for the mayor, we use the same property as for head of government. And if you don't know that, you could look at some city like Berlin and maybe you know what the mayor of Berlin is called… what was it?. Something "Müller", I think. Yeah. And then you can see, aha, the property for the mayor is "head of government". Or you could also search for, the city should have a mayor, and then you'll still find "head of government", the right property. And that mayor should be a human and she should have the gender "female". Oops. There's a question mark there for no reason at all. That's not a variable. That should be the fixed value. Sorry. So let's put that there. We have a city which is "instance of: city", and it also has a population which we're going to use later and it also has a head of government. No, that's wrong. Not the "office held by head of government", the "head of government" itself, which we call the mayor and then the mayor is "instance of: human" and gender should be female… come on… female. And let's select the city, cityLabel, mayorLabel and also the population. And then we find some 83 results. That's not yet the largest cities with a female mayor. That's just all of them. And in Wikidata we know about 83, apparently. And if your local hometown has a female mayor, just go ahead and add it to Wikidata and it's probably relevant. It's not– So the relevance criteria are not as strict as on Wikipedia fortunately. But if we want just the most populous ones, we can go a bit back into SQL land and say we want to ORDER BY the population and in SQL you would write DESC afterwards and in SPARQL it's different. You write DESC(?population). Erm, I think it's nicer that way. But perhaps it would have been nicer to just stick with the SQL syntax. I don't know. And we want to limit this to just the ten most populous cities, for example. And here we go. Tokyo is currently the biggest one, then Hong Kong, Baghdad, Surabaya, Rome. Yeah. And, oh. This doesn't make that much sense, Caracas has two mayors. Anyone… yeah, exactly. So we're only supposed to get the current mayor. Head of government… yeah. Does anyone know which one is the current one? Or we could just check Wikipedia… Caracas, which hopefully doesn't get it's information from Wikidata yet. So it's not circular. And the mayor is… Carolina, Carolina Cestari… Cestari, I don't know. laughter OK, so let's add a new one. Ah…? Doesn't have an item yet, is that… is that the mayor, or is chief of government something else? Doesn't occur anywhere else on the page, of course. Local government… mayor… no. OK, so let's just… I don't know, doesn't she have a Wikipedia article? No. Just appears in some lists and then she doesn't have a Wikidata item yet? No. Then… I don't know. We'll do some live Wikidata editing. It wasn't part of this talk, but let's just do it. Carolina Cestari… what country is that? Venezuela. Venezuelan politician, and that sounds like a female name, so I'm just going to guess and check that after the talk. So she's definitely a human. And gender is female and that is going to be enough for our query. Do this search again. There we go. And set this to preferred rank. So that's how the Query Service knows that this is the current value and it should only return this one. And ideally, one of the head of government values should have this preferred rank to mark it as the correct current value. And then all the other ones are additional data that you can use if you want. But it's not the main value and we are not going to get it in a simple query. And then there's some error because Caracas isn't some kind of political territorial entity and it should have a start time. I don't care right now. OK, so we run this query again and hopefully get just one result for Caracas this time. No. Uhm, we have to wait a bit until the Query Service is updated. Because it's kind of asynchronous. It just keeps watching for changes and eventually it will get the new data, but… okay. It might take a bit longer. Anyways. That's how that query works. Does that make kind of sense? OK, great. Yeah, I think this is almost exactly what I wrote here. Yeah. Except with some labels and the label service. Yeah. There is one problem here, which is, for example, I happen to know that Mexico City is a very large city with a population of… population: almost 9 million. So it should be right after Tokyo in front of Hong Kong. And the head of government is a Claudia Sheinbaum or something, which sounds like a woman. So we should get this result in the query. The reason we don't is that Mexico City is an instance of "big city" and we have searched for "instance of: city". And there's some debate about does this class even make sense at all? I think this is actually the German classification of, a big city is one with 100 000 Inhabitants, and in other languages or countries, a big city might be something else, but for now that… the data is what it is. Fortunately, what we have here is the information, a "big city" is a subclass of a city/town, which is a subclass of "locality", which is a subclass of. Wait. We should arrive at city at some point, but I think we've already gone past that. It's also an instance of capital. Let's go down that instead. A capital is a subclass of city, there we go. So if we can tell the Query Service to follow these subclass connections, then we should find these cities. And one way to do that… to make it work for Mexico City would be to say, it has to be "instance of", some, with the path again, "subclass of: city" and then we would find Mexico City, but we would not find all the… oh, we would still find Tokyo because it's still a capital, I guess. But we've missed a lot of other cities, I think which we used to have… yeah. Rome, for example, is gone. Because it's… that's just an instance of city directly. And we've now made the subclass mandatory. What we should do is make it optional, or even better, we would– we should say there can be any number of this element. So there… it can be an instance of city or it can be an instance of a subclass of city, it can be an instance of a subclass of a subclass of city. You can follow any number of elements, that what this… that's what this star means, just like in a regular expression. And then we probably have to say we only want the distinct ones because they are like five different ways to go through the subclass tree until you've found "city". And we're not interested in the different ways. But now we should get Tokyo and Mexico City. And Rome is also here and Caracas is completely gone because we found enough other cities which we were missing earlier. So you kind of have to watch out and sometimes use elements like this… "subclass of"-tree is pretty common, or with a, something… order of merit, we had to use this "part of". You have to watch out if the results are plausible, or ideally, you know some item that should be in the results, and then you check, is it there? Why is it not there? And investigate like that. But that's a fixed version of the query. And… yeah, if we were not interested in the mayor, we could do the same trick again. But, yeah. It doesn't make that much of a difference. And I think… yeah, that was almost the only difference. Yeah, except that I removed the population so we can order by a variable that you don't select in the end if you want. And I think I am out of slides. So, yeah, if you want to see more queries, you can look at these Twitter or social media accounts. There's a huge list of example queries on Wikidata, which is so big that it's getting too big for a wiki page, and people had to move some queries out there and it's kind of just grown since 2015 or something. And there's a lot of garbage there, but also a lot of useful queries if you want to look at that. And I had two more queries in the talk description which we haven't talked about yet, and I think we have the time. I can just try to open these. "Which films starred more than one future head of government?" Does that work? It doesn't. Can I copy the URL here? Yeah, copy link address. So that's a kind of longer query, which is why it didn't really fit on one slide. But the important film is you have… er, the important part is you have some film… instance of, or subclass of film, it has a publication date and a cast member, which is the head of government. And the head of government held some position, some head of government, er, some subclass of head of government. And that should be after the film was published. And then you get a bunch of results. I think this takes like 11 seconds or something. And you get like films with Schwarzenegger and one other actor who became US governor. I don't remember the name. And you also get a lot of… or several films from World War II with future French heads of government, which is really cool. So, like a film that was shot about the liberation of Paris, where it's… it's kind of a stretch to call them cast members, but they're definitely in the film. And if we get the result, then I can tell you what the film is called. Yeah, it might be busy right now, so you get up to 60 seconds in the Query Service and then in the end your query is killed if it takes longer than that. So sometimes it can be a bit of a struggle to make the query work within 60 seconds. There we go, 50 seconds. That was close. So there's yeah, there's a "La Libération de Paris" with Charles de Gaulle, who was president of the Council and president of the provisional government, and also Georges Bidault, I think, who was prime minister and president of the Council, and other stuff. We have several Indian films with people who went on to become chief ministers. And then down here there's some Canadian politicians, apparently. And then here's Arnold Schwarzenegger and Jesse Ventura, who both became governors and also starred in several films. And the other thing was, we have a lot of data about the British government because a lot of volunteers have just been slaving away at that data and adding and adding more information. I think they've… they have all their parliaments, complete with party affiliations and everything for at least the last 100 years and some partial data for a lot more than that, because they have a very long parliamentary history. And then you can do queries like "how many people named John are there in parliament", and "how many women with any name". And you can see when the women were finally more than just the men who are named "John". And it's kind of an amusing graph. Or not so amusing. Takes a while as well. I hope it doesn't take 50 seconds, but it looks like the Query Service might be busy at the moment. But I think it was something like in 1991 or so is the crossover point. Oh yeah. And I should mention anyway, so everything we saw right now was just a lot of tables. But you can also show results in different ways, such as a line chart. There we go. So in 1992, this was the first parliament which had more women than Johns. And then the Johns have slightly declined and the women have gone up to 220. How many people are in the House of Commons in total? Does anyone know? No. So I don't know what percentage this is. Uh, but, this was… yeah, this latest election from 12 December already in there. Yeah. indistinguishable. What? So the query looks like this. So this one is broken into several parts. We first find all the members of parliament, so they should be human, again, no fictional people, and then they should have some "position held", which is a subclass of "member of parliament" in the House of Commons. And then there should also be, um, a parliamentary term on that, so that we know which parliament it is and when it starts. And then down here, we import all those MPs and filter for just the ones with the "given name: John". And then we filter for just the ones with "gender: female". And there's an optional "subclass of" in here, because currently the data model is that there is a separate item for transgender female and someone can have "gender: transfemale– transgender female", which is a subclass of "female". And there is a discussion right now to get rid of that and have a separate property for that instead. And then all the trans people just have "gender:", their right gender, and you don't have to mess with subclass. But right now we still… well, we need it in theory, I don't think there are any MPs in practice. But, you know, you know, you can just keep it in there. And then we import the results and get them here either as a line chart or as a table, if you want to sort it by the time… yeah, the data starts in 1919, apparently. So we have exactly a hundred years of history there. We can also show it as a bar chart, if that makes more sense. No it doesn't. That makes no sense. Line chart is the right one. Oh, right, but if you show the line chart again, then it breaks for some reason, there's some bug there. So let's just show it again. There we go. That's the right… chart. Yeah, and I guess… oh wow, it's already… 50 minutes, so I guess this is the point where we start moving to the live querying part, and I was told I should make at least a short break for the stream, so the Angels know where to cut between. But we could also take a 10 minute's break and then start the next talk on time. Does that sound OK? Or is 10 minutes too long? Uhm, if you're going to stay here, which would be very nice, then please think of some example queries that you think we could write, and then I can try to write them, because otherwise I'm not going to have much to do. But yeah, let's do a 10 minute break and see you then. Thank you so far. Applause Postroll Music Subtitles created by c3subtitles.de in the year 2021. Join, and help us!