WEBVTT
00:00:00.000 --> 00:00:09.465
intro music
00:00:14.815 --> 00:00:18.081
Herald: Wikidata for (Data) Journalists
by Elizabeth Giesemann.
00:00:19.501 --> 00:00:25.520
Elisabeth Giesemann: So our agenda for
today is that we will have a look on key
00:00:25.520 --> 00:00:32.697
points of data journalism. We will quickly
explain what Wikidata is, what tools you
00:00:32.697 --> 00:00:39.489
can use inside of Wikidata for data
visualization, what other third party
00:00:39.489 --> 00:00:46.477
tools are there for your research? Then we
have a look at critical research done with
00:00:46.477 --> 00:00:52.589
Wikidata. And finally, we have a critical
look on the data of Wikidata itself.
00:00:57.259 --> 00:01:02.979
Key points of data journalism are that you
want to interview a dataset, so you want
00:01:02.979 --> 00:01:08.746
to find connections, correlations and
causalities behind the data. Also, you
00:01:08.746 --> 00:01:16.786
want to visualize the data in a compelling
way and you want to write your own story.
00:01:16.786 --> 00:01:23.987
You want to find a new spin
and a new look on- at the facts
00:01:23.987 --> 00:01:26.482
and all of these things
you can do with Wikidata.
00:01:31.752 --> 00:01:35.442
At Wikimedia Deutschland, we want
to support evidence-based reporting
00:01:35.442 --> 00:01:40.390
that's why we want to support you
in using Wikidata.
00:01:40.390 --> 00:01:49.623
Also data journalism helps you to tailor
your story to the users or your readers.
00:01:49.623 --> 00:01:55.970
Data journalism helps you to create visual
storytelling instead of walls of text.
00:01:55.970 --> 00:02:03.994
And this, again, helps you to convey facts
faster and way more easy
00:02:03.994 --> 00:02:06.292
and that makes your story
way more inclusive.
00:02:10.553 --> 00:02:13.602
So how do you get to a story
with Wikidata?
00:02:13.602 --> 00:02:19.359
You want to find and recognize patterns
in a dataset, you can search for geographical
00:02:19.359 --> 00:02:25.644
data, you can search for similarities and
differences in the data, and you can also
00:02:25.644 --> 00:02:31.959
search for missing data, because that also
exists in Wikidata. You can visualize your
00:02:31.959 --> 00:02:37.731
findings with the tools that you find in
the Wikidata Query Service. And what's
00:02:37.731 --> 00:02:43.210
most important is you can connect to the
Wikidata community and find people who are
00:02:43.210 --> 00:02:48.592
working on a similar subject or have a
similar research- research question to the
00:02:48.592 --> 00:03:00.320
one that you have. So I included this
visualization to show you that data is
00:03:00.320 --> 00:03:08.640
only the beginning of your story and the
path that you will take. We want you to
00:03:08.640 --> 00:03:17.120
use the data in Wikidata for- to create a
compelling story and therefore contribute
00:03:17.680 --> 00:03:29.787
value and your idea about what's in the
data. Because data is a lot, but it's not
00:03:29.787 --> 00:03:34.960
everything, as we've seen in the last
month, many people aren't convinced by
00:03:34.960 --> 00:03:43.440
facts. Also, there is a lack of time and
there is a lack of data- data literacy in
00:03:43.440 --> 00:03:49.200
our society. It's not always easy to
understand the complexity of historical
00:03:49.200 --> 00:03:55.280
events and developments, to understand the
complexity of medical data or demographic
00:03:55.280 --> 00:04:03.040
changes. So it is important to have a
storytelling aspect to your data, have
00:04:03.040 --> 00:04:08.000
good visualizations and an easy to
understand approach to convey the
00:04:08.000 --> 00:04:14.320
significance of your data and your story.
And finally, it is important to remain
00:04:14.320 --> 00:04:27.758
transparent and clear about the use and
analysis of the data. So what is Wikidata?
00:04:27.758 --> 00:04:33.589
Wikidata is a free linked database that
can be read and edited by both humans and
00:04:33.589 --> 00:04:39.518
machines, so it is a database of linked
open data. It- that means that the data
00:04:39.518 --> 00:04:46.247
doesn't just sit there in tables. It can
be connected and combined with other data,
00:04:46.247 --> 00:04:56.269
found on Wikidata. As such, it is a
realization of the semantic web as dreamt
00:04:56.269 --> 00:05:04.884
by Tim Berners-Lee and also Wikidata won a
prize for its realization of the semantic
00:05:04.884 --> 00:05:12.864
web. We just celebrated Wikidata- data's
8th birthday. It currently holds 90
00:05:12.864 --> 00:05:20.985
million items and has 44,000 active users
and contributors, which makes it the most
00:05:20.985 --> 00:05:31.692
edited Wikimedia project. It was initially
used to or thought of to support the
00:05:31.692 --> 00:05:39.070
projects of the other projects of the
Wikimedia ecosystem and seen as a central
00:05:39.070 --> 00:05:46.162
storage for the structured data of the
sister of projects like Wikivoyage,
00:05:46.162 --> 00:05:57.767
Wikisource and the most famous Wikimedia
project, Wikipedia. But it also has
00:05:57.767 --> 00:06:04.509
another function, which means- which is to
provide free and open data to the
00:06:04.509 --> 00:06:12.841
Internet, and that became really huge. As
already said, we now have more than 80- 90
00:06:12.841 --> 00:06:18.921
million data items on Wikidata. A
colleague of mine created this map and you
00:06:18.921 --> 00:06:28.312
can see here the geolocation data that is
in Wikidata and we are very proud that
00:06:28.312 --> 00:06:33.901
it's distributed all over the world but
it's also- we also take it with a grain of
00:06:33.901 --> 00:06:40.960
salt, because as you can see, it's very
bright in Europe and on the east and west
00:06:40.960 --> 00:06:51.170
coasts of the US, but there are very dark
spots where we can't record the knowledge
00:06:51.170 --> 00:06:55.632
in the same way as we do in our Western
societies and that brings us to the
00:06:55.632 --> 00:07:02.314
question of what is knowledge equity and
how can we actually best serve everybody
00:07:02.314 --> 00:07:15.600
in our global society? So how does it
work? Wikidata items, which are real
00:07:15.600 --> 00:07:22.000
things or concepts in the real world, like
Berlin, Barack Obama, helium, and these
00:07:22.000 --> 00:07:36.058
items are identified with an ID, the QID.
So Q76 or Q... I don't, I can't read the
00:07:36.058 --> 00:07:43.296
number now, so these items have labels,
descriptions, aliases and sitelinks.
00:07:43.296 --> 00:07:49.840
Labels, that means it's described in all
of the languages that Wikidata holds
00:07:49.840 --> 00:07:59.246
currently, those are around 300.
Descriptions are forms to describe what
00:07:59.246 --> 00:08:10.000
the item holds and aliases, sometimes one
item has several names, etc, etc. An item
00:08:10.000 --> 00:08:16.800
also has properties, those are used to
label to data like a person is born
00:08:16.800 --> 00:08:22.640
somewhere, its date of birth or death or
the location of a specific building.
00:08:24.720 --> 00:08:32.240
Statements hold informations in
properties, so P47 shares the border with
00:08:32.240 --> 00:08:42.320
another, like, country or the population.
Statements also have qualifiers to expand
00:08:42.320 --> 00:08:48.320
the information and then also they have
references which is very important because
00:08:50.080 --> 00:08:59.697
for scientific research, you want to have
those references. So here we see again our
00:08:59.697 --> 00:09:22.080
item, Berlin, Q64. The property is the
population of 3.7 million. So what's new
00:09:22.080 --> 00:09:29.200
about research with Wikidata is that you
can ask your own questions. Before, you
00:09:29.200 --> 00:09:34.480
would go to a library and some- the
librarians - librarians are awesome, but
00:09:34.480 --> 00:09:41.120
they would give you books with specific
facts in them and you would consume them
00:09:41.120 --> 00:09:48.240
and try to use them for your research. At
Wikidata you can ask very specific
00:09:48.240 --> 00:09:56.080
questions that nobody else came up with
before. So for your research, you want to
00:09:56.080 --> 00:10:01.440
do your own Wikidata queries, that's what
we have the Wikidata Query Service for.
00:10:03.120 --> 00:10:08.320
The good news is that you don't have to
learn Python or R or become a data
00:10:08.320 --> 00:10:17.280
scientist, but you want to learn a bit of
SPARQL. We included a few resources here
00:10:17.280 --> 00:10:22.720
in this presentation and there's also
going to be a talk given by my colleague
00:10:22.720 --> 00:10:33.360
Lucas on the 29th on how to query Wikidata
with SPARQL. We also have a guided tour on
00:10:33.360 --> 00:10:47.217
Wikidata on our website which I can
recommend. OK, so, um, as said, once you
00:10:47.217 --> 00:10:56.150
queried your data, you can visualize your
results for more compelling storytelling
00:10:56.150 --> 00:11:00.090
and there are several ways of doing this
and I'm going to show you some of this
00:11:00.090 --> 00:11:09.920
just to give you an idea. You could, for
instance, ask the query service to show
00:11:09.920 --> 00:11:17.760
you airports that are named after a person
and color code them according to their
00:11:17.760 --> 00:11:32.227
gender. Gender of the person, not the
airport, obviously. You can ask the query
00:11:32.227 --> 00:11:45.872
service, show me everything connected to
the item Berlin. You can ask it to show
00:11:45.872 --> 00:11:52.218
you the population of the countries that
are bordering Germany and how it
00:11:52.218 --> 00:12:03.187
developed. You can also ask the query
service to show you the most common cause
00:12:03.187 --> 00:12:17.360
of death among noble people. Or here it
shows you an- an historical overview of
00:12:17.360 --> 00:12:42.511
space probes. Or all of the children and
grandchildren of Genghis Khan. So we had a
00:12:42.511 --> 00:12:48.220
look on the visualizations inside of
Wikidata's Query Service, but there are
00:12:48.220 --> 00:12:55.381
also tools that use Wikidata's data for
their own visualizations. And I'm going to
00:12:55.381 --> 00:13:05.280
show you some of them now. So here is
Histropedia, which makes time beams of
00:13:05.280 --> 00:13:15.563
historical events using data from
Wikidata. This is Inventaire. Basically,
00:13:15.563 --> 00:13:24.132
it lets you create your own private
library and then uses the data from
00:13:24.132 --> 00:13:35.280
Wikidata to describe the publications.
Here is "Ask me anything". That's done by
00:13:35.280 --> 00:13:43.200
different researchers in Europe, and it
lets you pose questions in natural
00:13:43.200 --> 00:13:52.560
language to Wikidata so you don't have to
use the query service. That's a way that
00:13:53.200 --> 00:14:01.840
to use Wikidata that's also used by a lot
of voice assistants like Siri and Alexa.
00:14:04.800 --> 00:14:10.640
And here you have Scholia, which is
basically a platform for scientific
00:14:10.640 --> 00:14:18.960
publications that are published under open
access and collected, and it can answer
00:14:18.960 --> 00:14:27.840
your questions like who published what
paper, with whom, who and when or who
00:14:27.840 --> 00:14:37.489
wrote the first paper on COVID, when was
it published, etc. And here we have "Sum
00:14:37.489 --> 00:14:44.563
of All Paintings". Basically, it's a
database that creates all of the paintings
00:14:44.563 --> 00:14:50.884
in the world and lists their metadata so
you can combine it in your own specific
00:14:50.884 --> 00:15:06.117
way. So I showed you a couple of examples,
what you could do, and I want to hint at
00:15:06.117 --> 00:15:15.273
other researchers who did great stuff with
Wikidata and used it for very cool
00:15:15.273 --> 00:15:32.009
storytelling. If my slides work, OK, here
we go. So, um, "Women's representation and
00:15:32.009 --> 00:15:37.487
voice in media coverage of the coronavirus
crisis", that's the- that's a study done
00:15:37.487 --> 00:15:45.504
by a researcher called Laura Jones
regarding the representation of female
00:15:45.504 --> 00:15:53.616
experts within the coverage of
coronavirus. It uses evaluations of
00:15:53.616 --> 00:16:03.600
Wikipedia and Wikidata to show- to show
how much representation was there, of
00:16:03.600 --> 00:16:21.745
female experts. And, as we see, it's not a
lot. Finally, there is another great
00:16:21.745 --> 00:16:29.672
example I want to tell you about, it's a
project called Enslaved.org. It's a linked
00:16:29.672 --> 00:16:37.652
open data platform based on Wikibase,
which is the software behind Wikidata and
00:16:37.652 --> 00:16:45.970
it basically shows or it collects and
connects data related to the transatlantic
00:16:45.970 --> 00:16:53.059
slave trade. So, people who suffered under
the slave trade and the records that were
00:16:53.059 --> 00:17:03.122
done by the people active in this slave
trade, those data is collected. It has
00:17:03.122 --> 00:17:12.552
been collected in several databases and
Enslaved build one large database to
00:17:12.552 --> 00:17:21.946
connect them and rebuild the stories,
which I think is a really great idea to or
00:17:21.946 --> 00:17:30.133
really great way to humanize people who
have been dehumanized with data. Like you
00:17:30.133 --> 00:17:40.560
can see here, they collect- they collect
data from newspapers and from the
00:17:40.560 --> 00:17:56.123
slaveholders to recount a story of
individuals. So finally, I also want to
00:17:56.123 --> 00:18:02.720
talk to you about one thing in Wikidata
that is always on our minds, which is that
00:18:03.600 --> 00:18:09.680
Wikidata is not perfect. I highly
recommend the talk by Os Keyes
00:18:09.680 --> 00:18:15.920
"Questioning Wikidata" in which it is
explained that all classification systems
00:18:15.920 --> 00:18:22.640
are inherently dangerous and Wikidata is a
large encyclopedic wiki classification
00:18:22.640 --> 00:18:30.720
system which makes choices, ethical and
political choices, about what is notable,
00:18:31.280 --> 00:18:43.120
about how to categorize information. And
these choices, they reduce complexity and
00:18:43.120 --> 00:18:54.080
reduce also specific forms of- of history,
like oral history. This reduction has
00:18:54.080 --> 00:19:03.440
consequences. As you know, Wikidata is
used by many programs, apps, voice
00:19:03.440 --> 00:19:17.084
assistance and what- what and how we store
information in Wikidata really matters. So
00:19:17.084 --> 00:19:27.280
we ask ourselves, what is encyclopedic
knowledge? And how can we organize it in a
00:19:27.280 --> 00:19:34.134
more inclusive way? Encyclopedic knowledge
is a Western concept, and we can and must
00:19:34.134 --> 00:19:45.896
do better than just use our own Western
view to organize the world. But then also
00:19:45.896 --> 00:19:52.240
the wiki principle applies, we have a huge
community behind Wikidata that helps us to
00:19:52.240 --> 00:19:59.760
make these decisions, and you can also
become a part of this by researching
00:19:59.760 --> 00:20:11.646
Wikidata, using it for your work and also
contributing your research. So once again,
00:20:11.646 --> 00:20:17.927
I want to tell you, you can use Wikidata
as a tool for your storytelling. Wikidata
00:20:17.927 --> 00:20:24.162
can help you find connections between
data. Wikidata can help you find- can help
00:20:24.162 --> 00:20:30.406
you build visualization in its query
service. You can ask questions about
00:20:30.406 --> 00:20:38.080
historical data correlations more
critically than you could- than you could
00:20:38.080 --> 00:20:45.360
before. And- but there are also downsides
to- downsides to Wikidata because it is an
00:20:45.360 --> 00:20:55.256
encyclopedic way of organizing Western
knowledge. So this was only a start. I'm
00:20:55.256 --> 00:21:02.739
looking forward to our Q&A session now and
if you have further questions, concerns or
00:21:02.739 --> 00:21:08.021
have ideas, you can contact me and my
colleagues and you can also contact me
00:21:08.021 --> 00:21:18.572
individually. Thank you.
00:21:18.572 --> 00:21:23.520
Herald: Hello and welcome to Elizabeth.
Thank you very much for your interesting
00:21:23.520 --> 00:21:29.520
talk. That was a very great introduction.
Elisabeth: Hi. Yeah, thanks for having me.
00:21:30.320 --> 00:21:36.240
I'm happy that I was able to talk a bit
about Wikidata and how you could do
00:21:36.240 --> 00:21:43.040
storytelling with it. I wanted to add
that, obviously, you can ask me questions
00:21:43.040 --> 00:21:50.640
now, but also I want to hint at the great
introduction of Wikidata that one of my
00:21:50.640 --> 00:21:57.120
colleagues gave. Yesterday, two of my
colleagues, which is already online, and
00:21:57.120 --> 00:22:03.040
tomorrow there will be a query service
workshops where you can learn a bit more
00:22:03.040 --> 00:22:09.040
in-depth how to query Wikidata.
Herald: Yeah, that's a very good hint.
00:22:09.040 --> 00:22:13.280
There's actually there's two questions in
the chat right now. The first one is, are
00:22:13.280 --> 00:22:17.840
your slides going to be published because
people are interested in your links to the
00:22:17.840 --> 00:22:22.320
tutorials, obviously.
Elisabeth: Yes, that was, uh, I asked
00:22:22.320 --> 00:22:29.840
before, I think the talk will be published
and the slides. Is there a Wikipaka board
00:22:29.840 --> 00:22:36.320
where I can put it? Otherwise, I can also
put a link on our Twitter account,
00:22:36.320 --> 00:22:43.600
Wikimedia Deutschland. And yeah...
Herald: I think Twitter for now would
00:22:43.600 --> 00:22:48.160
probably be the best idea, I actually have
to check on the Wikipaka board, but we
00:22:48.160 --> 00:22:50.400
will let you know where you can find
everything.
00:22:50.400 --> 00:23:01.880
Elisabeth: I put it on the Wikimedia
Deutschland Twitter. It's @wmde I think
00:23:01.880 --> 00:23:05.280
Herald: we will also retweet it
obviously. You will find it, I promise.
00:23:05.280 --> 00:23:08.720
Elisabeth: OK.
Herald: There's another question. What
00:23:08.720 --> 00:23:12.720
resources would you recommend for self-
studying the writing of queries for
00:23:12.720 --> 00:23:19.200
query.wikidata.org?
Elisabeth: Mhm. Um, I put some links in
00:23:19.200 --> 00:23:27.600
the- in the slides. There is... yeah, we
have, like, a few tutorials on Wikidata.
00:23:27.600 --> 00:23:35.040
There was also a couple of months ago, a
very nice and very easy tutorial published
00:23:35.040 --> 00:23:41.600
by Wikimedia Israel. And I- so we didn't
do it, but I can recommend it, it's a very
00:23:42.640 --> 00:23:47.730
low key introduction to your first
queries.
00:23:47.730 --> 00:23:54.400
Herald: OK. We will also publish that
somehow. I have a question for you as
00:23:54.400 --> 00:23:58.800
well. You mentioned that Wikidata is like
a great way for meeting other people that
00:23:58.800 --> 00:24:05.120
are working on similar topics. So is there
some kind of like greater community of
00:24:05.120 --> 00:24:13.120
journalists using Wikidata?
Elisabeth: So far, the community is mostly
00:24:13.120 --> 00:24:19.280
research based. That's also why we wanted
to reach out here. So I would recommend
00:24:19.280 --> 00:24:26.480
getting in touch with the community on
there regarding the research topics that
00:24:26.480 --> 00:24:35.360
you have. And you can also get in touch
with us and we connect you. I have a noise
00:24:35.360 --> 00:24:41.440
in my ear, but I hope it's only me.
Herald: Well, I don't have it, so it might
00:24:42.400 --> 00:24:47.200
just be you, but I feel like there might
be also an echo on the stream, that's what
00:24:47.200 --> 00:24:51.280
people on the chat are saying.
Elisabeth: Oh, OK.
00:24:51.280 --> 00:24:56.160
Herald: So I don't have any other questions
in the chat and since there seems to be an
00:24:56.160 --> 00:25:02.240
echo on the stream, I don't want to annoy
people any further. So I would suggest for
00:25:02.240 --> 00:25:07.760
everyone who has further questions to you
that you can meet in our Big Blue Button
00:25:07.760 --> 00:25:15.840
meetup room that I will be posting in the
chat right now and we will continue our
00:25:15.840 --> 00:25:22.560
program here at 2:20 with another talk
about Flutter by "The one with the braid",
00:25:22.560 --> 00:25:29.200
so I'm saying bye for now.
Elisabeth: Thanks, bye.
00:25:29.200 --> 00:25:30.251
Herald: Bye.
00:25:30.251 --> 00:25:33.601
outro music
00:25:33.601 --> 00:25:40.000
Subtitles created by c3subtitles.de
in the year 2021. Join, and help us!