Music
Herald: We'll do some live querying with him
so you were told to think of some ideas that
we could search for in Wikidata and when
we get to that point I would ask you to
raise your hand and wait till I get to you
with the microphone so that the people in
the stream can also hear what we're
talking about so that's the thing I'll go
back to Lucas and we still have
translations. Wenn ihr es auf deutsch hören wollt
wir haben immen noch Übersetzter, die alles versuchen
es Euch nochmal auf Deutsch zu erzählen. Also hört mal rein.
Lucas: Does anyone have a query?
Yes, in the front there. We have a question allready.
Audience member: Is it possible to find all circular family trees?
L: All circular what?
AM: Family trees
L: It's certainly possible to find
some. Finding all it's probably going to
be a timeout but there would be something
like select, probably child would be
the simplest, so item child plus item
again. So if we put the star like earlier
then you could then every Tree would
match that, but with the plus it means it
has to be at least one child link or more,
and let's just add a "limit 1" because I'm
not that optimistic that this is even
going to find one, but I'm pretty sure we
cannot find all of them, but let's see if
we can find one and this might just take a
while, but I don't think there is a good
way to do this otherwise unless you
download one of the dumps either the JSON
data dumps or the RDF dumps which is the
same data format used here and then you
can do it locally without an timeout.
I don't think there's much I can optimize
about this query is pretty short unless
like I had an idea that people named John
are more likely to have these kinds of
cycles, then we could filter it down first
but you men. I'm afraid that is not going
to work it looks like. Yes, timeout. And
you can see the thing is written in Java
the server dragazines. One thing we
can do with this "P40+" is something like
search start with a certain mythical
creature such as King Arthur. I hope I can
find him like this. Search is being alright
3d map of EDT. There we go, that's the
legendary British or Welsh king and then
we are searching for an item who is
definitely a real human and who has a date
of birth and we say the date of birth
should be greater than at say 1950 and
this is a date time value and this let's
even say 1980. I think that might be
No more efficient. There we go. No
results, okay. I thought King Arthur had
some real descendants. Though then it was
some other mythical creatures. Let's just
start with any ancestor who has the item
as child and the ancestor is also instance
of mythical creature, mythical character.
Let's see if we have any mythical
characters with children who are born
after 1950. Oh I still have the "limit 1"
here could make that a "limit 10" probably
or something, but I'm optimistic I think
there are some people here, especially, I
think, even British MPs, there's some
that's already on the list of example
queries British MPs with mythical
ancestors and there's lovely have traced
their lineage back to some 6th or 5th
century person and you have all the
apparent links in there and then it's kind
of tricky to figure out where it starts
being wrong. Oh that's not working out so
well. Does anyone else have ideas in the
meantime? There, way in the back.
Someone: Thank you
Audience member: We all know that stupid
game in Wikipedia where you try to find
the Adolf Hitler page by only clicking
links, so can you find the number of pages
that are directly connected to the Adolf
Hitler page in Wikipedia?
L: You can. Oh that was a timeout, dammit.
So that would be kind of ... It's a one funny
story about that for example is there's
the first example we
have here is cats and why do we have cats
and not dogs? Because if you search for
dogs the second result, no, it's the
fourth result by now, but that's the
thought of Hitler and we don't really want
that and normally so we usually use cats
as the example instead but let's just
search for anything where the item has any
connection and we don't care which
property it is to Adolf Hitler, like that,
and we are going to find 920 results. ok
some of these are site links so we also
want the item to have some label which
uses this new namespace and we want only
the English label so the language of the
label should be in English and we then we
just select the item and the label and
hopefully that's still pretty efficient.
Here we go NSDAP membership number
that's actually a property but I assume it
has as the example yep there's a property
example here as NSDAP number 1. World War
two has probably of cause of death do we
have him as an example on cause of death
really? and we have nitric acid poisoning,
stroke, cholera, shot to the head, cyanide
poisoning, hanging, That's a very pleasant
list. Do we need to have that many
projects handfuls of closet yeah then we
have Nazi Party, Klara Hitler, I don't
know who that is, 1936 Summer Olympics,
all kinds of things, so that's how
you can find all the things with a direct
connection to Hitler. Any other
examples? Yes, over there in the right, or
was there already somewhere someone back
there that I missed
AM: Can you find the
cheapest public infrastructure projects in
Germany?
L: The cheapest public infrastructure
what?
AM: Projects
L: Projects
AM: Like a bridge building
L: I don't think we're going to have a
full dataset about that but you can try.
Let's start with a more expensive one and
[crackling noise]
L: see - perhaps move away from the box,
that might help. Let's start with a very
expensive project and see just what the
data model looks like so what does
infrastructure project look like what's
what was the cost so the cost is probably
going to be in euro and I don't know how
to write here over there okay it's a
property called cost in Euro and does it
have something like instance of
international airport, building under
construction, Greenfield Airport, proposed
airport being built, so we could check
first is Berlin Brandenburg Airport, is
that an instance of some subclass of
public infrastructure? Is that a thing?
that looks like the wrong item what is
this this is nothing. Okay. There's anything
linked to this item? no nothing like
suicide. Okay. So it could be an
international airport is a subclass of
airport which is a subclass of an
aerodrome which is an architectural
structure, and we can search for
architectural structures, so the structure
would be an instance of subclass of
architectural structure, and it would have
a cost, and order by the singing costs I
think it's 10 and we're probably going to
get things in like yen or some other
currency where this number is just going
to be very high because we're not taking
any conversions into account right now but
let's see if we find something there. What
is it doing? Okay... not sure why this is
taking so long. Let's try a second version
in the mean time where we quantity amount
is cost and various quantities units
should be the euro they're still running
and yeah let's try this that works any
better or not? Okay, this was a timeout. This
looks like it's going to be a timeout as
well. I don't know, we can just search for
the most expensive things at all. Remove
this part, there we go. This costs 55
billion euros. What is this thing? Power
of Siberia, natural gas pipeline. That's,
that's in euro, the costs? Apparently. And
then this is 15 billion euros and then
8.77 find something that's the channel, oh
the Channel Tunnel is expensive. The
Brenner Tunnel was also expensive.
laughter
And Stuttgart 21 took about 21 whatever
was also- or is projected to be expensive.
Do we have one cost or several? Okay in
2018 we have a cost of 7 billion. Yeah, so
let's sort by the ascending constant set
because that was what we actually wanted
and then we get... okay now we're going to
get a lot of things that aren't really
infrastructure projects we have the whole
and a hot and energetic universe. Does
that mean it's a no budget film or what?
Okay. So we would need some kind of ...
Let's say, let's do duck typing instead of
saying it is an infrastructure project,
let's say it has, I don't know, a
coordinate location. And if it has a
coordinate location, we're going to call
it some kind of infrastructure project, or
at least it's not going to be a
documentary film. Perhaps that works
better. Yeah, so 21,000 euros cost this
thing which was in France. Oh, okay,
right, it should also be country Germany.
Here we go. That's 400,000 euros for
fountain in Stuttgart. Does that count? I
guess. And that's the engines of something
it doesn't even have a German la- an
English label, just a German one. Wait...
Oh, so this is the class of all the
fountains with exactly this name which are
a subclass of well and are all named after
this goddess, okay, cool. Yeah so then we
have some of these cheap projects, which
is… this public square… a bridge – oh
yeah, there's this tiny bridge, a
footbridge, has even an image, that's what
it looks like, and it costs, what was it,
1.6 million euros already. Wow. And then
we have another public square. Yeah. So,
"cheap public infrastructure projects".
And also probably "infrastructure" in
quotes, because we're really just saying
it has a location and "Country: Germany".
And, yeah, I can send this query around
afterwards. And this didn't work, this
didn't work. Okay, any other ideas? That's
bad news. We could try to continue with
some of these. Was there something? Oh,
from the Camera Angel!
AM: I have a question! I saw that with
Wikidata Query Service we can draw these
nice trees and have images in them, and
one example that came to my mind was all
the programming patterns – programming
design patterns, but grouped by their
kind, like they're structural patterns,
convenience patterns, and so on, and like,
can we draw a graph and maybe put an image
in them.
L: We can try that. So let's see how
that's modeled, I don't know, with the
visitor pattern for example. That's a
design pattern what kind of statements
does have. It's a subclass of behavioral
pattern, is this a programming thing or
already…? Oh yeah it's a soft… okay it's a
software design pattern. So we should say ...
We're going to have a pattern with
its label and a pattern kind with its
label and the pattern is going to be a
subclass of the pattern kinds, which is
going to be some subclass of – what was
it? Of software design pattern – and I'm
just going to copy this ID so it's the
right one – label service, and say, I
would like to see this by default in the
graph view. Here we go. Well that looks
not as bad as I thought. We have a lot of
structural patterns, behavioral patterns,
one architectural pattern, a few
creational patterns, and one fundamental
pattern. Yeah. And… yeah what we could
also do is, if we do this, then we should
also see connections of all of these.
Now we have the tree rooted at
software design patterns, we have monads,
and fundamental pattern is a kind of
software design pattern. Structural
pattern… and it's all linked there and
this is working… very well, I… That's much
better than I expected. I expected a huge
mess of… because it sometimes gets
different to determine when should you use
"instance of" and when should use
"subclass off", like if it's software or
patterns like this, I expected we would
have to account for both of these, but
this looks very good to me. I think we
don't need to do anything with this query.
Yeah, so that is, uhm, software design
patterns by a pattern tree.
Okay. Any other ideas? Or I can try to
keep optimizing this one
AM: Which cities have applied to be host
city of the Eurovision Song Contest the
most times but were never successful?
L: Oh!
Laughter from Audience
L: That's a very good question. I don't
know if we have– do you know who applied
for this year or for some year? But I
could check if the state if that's modeled
anywhere. Uhm, I need some example cities
so… let's check ESC 2018 if it has
information on where it took place, which
one won the bid, but also who was
nominated or something, or who applied… We
have "presenters", we have "followed by ",
"start time", "end time", "participants",
we have the winner, do we have a location
at all? Oh yeah, there it is. Okay, we
have a country, and a location, but I'm
not seeing any other countries here, and I
assume that information is not going to be
on the country item. It's possible that we
have some separate item for "Eurovision
2018 Bid" or… Well wait, it would have to
be "which city", because the country is
determined by the winner isn't it? So the
city, but I suspect we don't have that
information. We have a list of host
cities, but that's just… a Wikipedia list
article.
Interference noise
Do we have to switch to the other mic? Oh
no, that sounds great! Okay. Yeah, so we
don't have any of the structured
information here. It's just linking all of
these Wikipedia articles together, and
then here is the actual list with the
different venues. But I don't think we
have that information in Wikidata at the
moment. We could add it, you'd have to
figure out the data model, but it would
probably be relevant enough, I think.
I wonder if we have that for the Olympic
Games. So, Olympics 2020, do we have the
process of who applied to host those? Uhm.
We have a location. We have parts. Let's
check. Perhaps English Wikipedia has a
separate article about the selection
process for the 2020 Summer Olympics.
Doesn't look like it. "Host city
selection". No I don't see a main… oh no,
there! "Bids for the 2020 Summer
Olympics", that's the Wikipedia article.
Does that have any useful information on
Wikidata? Bids for Olympic Games no.
Damnit. So you can see, when these bids
all happened, but we don't have the
bidding countries and cities apparently on
Wikidata, at least not as far as I can
see. Bids for the… 2012 for example…
No, sadly, we don't have that information
yet. Did this one run by way? No.
Any other questions?
Herald: Our translation angels had a question.
H: They want to know, if can give them the
countries with the most colorful flags
L: Yes! That [interference noise] should
be possible. So "select country", and the
"count of the colors as counts"
[interference noise] the country has,
oops, has a flag, not the "flag image", a
flag, and the flag has color. And it
should be "color" and not "colors", and
then we group by country so this is a bit
like a [noise] grouping and aggregate
functions
[Interference noise]
Interferene noise*
Do we need to use the other microphone?
[Noise] Okay [Noise] But then you can't
really walk around anymore.
H: Hello hello? Hello hello? Muss man da
noch was machen?
L: Okay so now… This could be really fun!
Yeah, so we are searching for countries
with flags, and hope that the flags have
colors and been counting them, and what I
didn't do is… what's this? Do I want to
know? Okay, okay it's at least it's not
the straight pride flag, I guess. Does
this have 14 colors? No, what was it? No,
eight, I guess, one, two, three, four,
five, six, seven, eight, yeah. That's
accurate. Yeah, I didn't filter for
countries here, the thing is, country is
really a stupidly complicated term, so
what I did was… queries… I have a pre-
prepared query for the UN member states
somewhere, which I just copy all the time.
And this is now going to be called a
state, and then we only get state flags,
uhm, and there's exactly– oh, right. I
need to group by "state" and "state label"
and copy these up here as well, and then
it will hopefully work, and we will find
out that… the United Kingdom has… 12?
I suspect that's because it has four flags,
which all have the same rank, or a no– no
it should be five, right? United Kingdom
and Northern Ireland, Scotland, Wales and
England. Let's search for "flag". Flag is
the flag of the United Kingdom, no? Why
does it have 12 colors? It has blue, red,
white… wat. I see. But that still doesn't
explain the 12. Let's count only the
distinct colours "distinct", there's auto-
completion, thank God, perhaps that helps
. Though I don't know why it would have…
oh it would have had the state multiple
times because it's a sovereign state
multiple times probably. Let's check. Yeah
the United Kingdom is, it's a Commonwealth
realm, and an island nation, and a
sovereign state, and that's probably why
we got it multiple times, and, yeah that
looks more reasonable. South Africa,
Ecuador, South Sudan, and what we can also
do is, add the, of the flag, the image and
call that I, because I can't be bothered
to type the whole thing, and add that
here, and also add it to the "group by",
because otherwise it's not the right
aggregate and I can't be bothered to write
"sample" with one hand, and then we can
hopefully also see it. Oh, we get two
images of the flag of South Africa. That
also looks like one of them should be
"preferred rank", but anyways, we can
switch to image grid, and then we can see
all these colorful flags. One, two, three,
four, five, six, yeah. That's six. And
this is more than six, so I guess, I would
say that should actually be two separate
items, for this old flag and– no, this old
flag and the new flag, but… This is six…
is that only six colors? I'll believe it.
This is six colors, six, and then we have
five colors, yeah. So here are the, let's
just add a comment there, and I will tweet
this out later as well, "colorful state
flags". Yeah. And also we can use
the image grid as the default view.
We probably have time for one more question,
if it's a short one. Though I won't be
able to type very fast. Yes, let's
hope this works. Otherwise I can repeat it
for the stream if I hear you.
AM: So does it work? Yep seems so. I don't
know if it's possible, but the smallest
images that are on Wikipedia? So, by image
size?
L: That would not be possible with the
Query Service I think. But I think on
Commons you can search… can you search?
Whoops, I don't have that search shortcut
set up here. Can you search by image size?
I think that might be possible. Advanced
search, file type, sorting order… No.
You could probably sort by a file size in an
SQL query. Which is not a thing from the
Wikidata Query Service, but it's possible
with something else and as it happens I am
going to have another talk later, where I
talk, about among other things, how you
can write SQL queries against the
Wikipedia databases, and then we might be
able to find a solution for that, and
that's I think at 6 p.m. today over in the
Esszimmer, or you come over to me after
the talk and then I can try to figure it out there.
H: A last emergency idea that we have to
try out?
H: I'm muted. Do you have ano– one more
idea? A small idea maybe we could do but
other than that I think we are, so– filled
the time quite well.
L: Yeah I think we're done. But if you
have any other ideas, you can always
contact me on Twitter @wikidatafacts, or
on Mastodon as well, and then I will see
what I can do for you. Yeah. Thanks.
H: Thank you very much, Lucas, that was a
great introduction to Wikidata querying!
Music
Subtitles created by c3subtitles.de
in the year 2021. Join, and help us!