1
00:00:00,000 --> 00:00:09,465
intro music
2
00:00:14,815 --> 00:00:18,081
Herald: Wikidata for (Data) Journalists
by Elizabeth Giesemann.
3
00:00:19,501 --> 00:00:25,520
Elisabeth Giesemann: So our agenda for
today is that we will have a look on key
4
00:00:25,520 --> 00:00:32,697
points of data journalism. We will quickly
explain what Wikidata is, what tools you
5
00:00:32,697 --> 00:00:39,489
can use inside of Wikidata for data
visualization, what other third party
6
00:00:39,489 --> 00:00:46,477
tools are there for your research? Then we
have a look at critical research done with
7
00:00:46,477 --> 00:00:52,589
Wikidata. And finally, we have a critical
look on the data of Wikidata itself.
8
00:00:57,259 --> 00:01:02,979
Key points of data journalism are that you
want to interview a dataset, so you want
9
00:01:02,979 --> 00:01:08,746
to find connections, correlations and
causalities behind the data. Also, you
10
00:01:08,746 --> 00:01:16,786
want to visualize the data in a compelling
way and you want to write your own story.
11
00:01:16,786 --> 00:01:23,987
You want to find a new spin
and a new look on- at the facts
12
00:01:23,987 --> 00:01:26,482
and all of these things
you can do with Wikidata.
13
00:01:31,752 --> 00:01:35,442
At Wikimedia Deutschland, we want
to support evidence-based reporting
14
00:01:35,442 --> 00:01:40,390
that's why we want to support you
in using Wikidata.
15
00:01:40,390 --> 00:01:49,623
Also data journalism helps you to tailor
your story to the users or your readers.
16
00:01:49,623 --> 00:01:55,970
Data journalism helps you to create visual
storytelling instead of walls of text.
17
00:01:55,970 --> 00:02:03,994
And this, again, helps you to convey facts
faster and way more easy
18
00:02:03,994 --> 00:02:06,292
and that makes your story
way more inclusive.
19
00:02:10,553 --> 00:02:13,602
So how do you get to a story
with Wikidata?
20
00:02:13,602 --> 00:02:19,359
You want to find and recognize patterns
in a dataset, you can search for geographical
21
00:02:19,359 --> 00:02:25,644
data, you can search for similarities and
differences in the data, and you can also
22
00:02:25,644 --> 00:02:31,959
search for missing data, because that also
exists in Wikidata. You can visualize your
23
00:02:31,959 --> 00:02:37,731
findings with the tools that you find in
the Wikidata Query Service. And what's
24
00:02:37,731 --> 00:02:43,210
most important is you can connect to the
Wikidata community and find people who are
25
00:02:43,210 --> 00:02:48,592
working on a similar subject or have a
similar research- research question to the
26
00:02:48,592 --> 00:03:00,320
one that you have. So I included this
visualization to show you that data is
27
00:03:00,320 --> 00:03:08,640
only the beginning of your story and the
path that you will take. We want you to
28
00:03:08,640 --> 00:03:17,120
use the data in Wikidata for- to create a
compelling story and therefore contribute
29
00:03:17,680 --> 00:03:29,787
value and your idea about what's in the
data. Because data is a lot, but it's not
30
00:03:29,787 --> 00:03:34,960
everything, as we've seen in the last
month, many people aren't convinced by
31
00:03:34,960 --> 00:03:43,440
facts. Also, there is a lack of time and
there is a lack of data- data literacy in
32
00:03:43,440 --> 00:03:49,200
our society. It's not always easy to
understand the complexity of historical
33
00:03:49,200 --> 00:03:55,280
events and developments, to understand the
complexity of medical data or demographic
34
00:03:55,280 --> 00:04:03,040
changes. So it is important to have a
storytelling aspect to your data, have
35
00:04:03,040 --> 00:04:08,000
good visualizations and an easy to
understand approach to convey the
36
00:04:08,000 --> 00:04:14,320
significance of your data and your story.
And finally, it is important to remain
37
00:04:14,320 --> 00:04:27,758
transparent and clear about the use and
analysis of the data. So what is Wikidata?
38
00:04:27,758 --> 00:04:33,589
Wikidata is a free linked database that
can be read and edited by both humans and
39
00:04:33,589 --> 00:04:39,518
machines, so it is a database of linked
open data. It- that means that the data
40
00:04:39,518 --> 00:04:46,247
doesn't just sit there in tables. It can
be connected and combined with other data,
41
00:04:46,247 --> 00:04:56,269
found on Wikidata. As such, it is a
realization of the semantic web as dreamt
42
00:04:56,269 --> 00:05:04,884
by Tim Berners-Lee and also Wikidata won a
prize for its realization of the semantic
43
00:05:04,884 --> 00:05:12,864
web. We just celebrated Wikidata- data's
8th birthday. It currently holds 90
44
00:05:12,864 --> 00:05:20,985
million items and has 44,000 active users
and contributors, which makes it the most
45
00:05:20,985 --> 00:05:31,692
edited Wikimedia project. It was initially
used to or thought of to support the
46
00:05:31,692 --> 00:05:39,070
projects of the other projects of the
Wikimedia ecosystem and seen as a central
47
00:05:39,070 --> 00:05:46,162
storage for the structured data of the
sister of projects like Wikivoyage,
48
00:05:46,162 --> 00:05:57,767
Wikisource and the most famous Wikimedia
project, Wikipedia. But it also has
49
00:05:57,767 --> 00:06:04,509
another function, which means- which is to
provide free and open data to the
50
00:06:04,509 --> 00:06:12,841
Internet, and that became really huge. As
already said, we now have more than 80- 90
51
00:06:12,841 --> 00:06:18,921
million data items on Wikidata. A
colleague of mine created this map and you
52
00:06:18,921 --> 00:06:28,312
can see here the geolocation data that is
in Wikidata and we are very proud that
53
00:06:28,312 --> 00:06:33,901
it's distributed all over the world but
it's also- we also take it with a grain of
54
00:06:33,901 --> 00:06:40,960
salt, because as you can see, it's very
bright in Europe and on the east and west
55
00:06:40,960 --> 00:06:51,170
coasts of the US, but there are very dark
spots where we can't record the knowledge
56
00:06:51,170 --> 00:06:55,632
in the same way as we do in our Western
societies and that brings us to the
57
00:06:55,632 --> 00:07:02,314
question of what is knowledge equity and
how can we actually best serve everybody
58
00:07:02,314 --> 00:07:15,600
in our global society? So how does it
work? Wikidata items, which are real
59
00:07:15,600 --> 00:07:22,000
things or concepts in the real world, like
Berlin, Barack Obama, helium, and these
60
00:07:22,000 --> 00:07:36,058
items are identified with an ID, the QID.
So Q76 or Q... I don't, I can't read the
61
00:07:36,058 --> 00:07:43,296
number now, so these items have labels,
descriptions, aliases and sitelinks.
62
00:07:43,296 --> 00:07:49,840
Labels, that means it's described in all
of the languages that Wikidata holds
63
00:07:49,840 --> 00:07:59,246
currently, those are around 300.
Descriptions are forms to describe what
64
00:07:59,246 --> 00:08:10,000
the item holds and aliases, sometimes one
item has several names, etc, etc. An item
65
00:08:10,000 --> 00:08:16,800
also has properties, those are used to
label to data like a person is born
66
00:08:16,800 --> 00:08:22,640
somewhere, its date of birth or death or
the location of a specific building.
67
00:08:24,720 --> 00:08:32,240
Statements hold informations in
properties, so P47 shares the border with
68
00:08:32,240 --> 00:08:42,320
another, like, country or the population.
Statements also have qualifiers to expand
69
00:08:42,320 --> 00:08:48,320
the information and then also they have
references which is very important because
70
00:08:50,080 --> 00:08:59,697
for scientific research, you want to have
those references. So here we see again our
71
00:08:59,697 --> 00:09:22,080
item, Berlin, Q64. The property is the
population of 3.7 million. So what's new
72
00:09:22,080 --> 00:09:29,200
about research with Wikidata is that you
can ask your own questions. Before, you
73
00:09:29,200 --> 00:09:34,480
would go to a library and some- the
librarians - librarians are awesome, but
74
00:09:34,480 --> 00:09:41,120
they would give you books with specific
facts in them and you would consume them
75
00:09:41,120 --> 00:09:48,240
and try to use them for your research. At
Wikidata you can ask very specific
76
00:09:48,240 --> 00:09:56,080
questions that nobody else came up with
before. So for your research, you want to
77
00:09:56,080 --> 00:10:01,440
do your own Wikidata queries, that's what
we have the Wikidata Query Service for.
78
00:10:03,120 --> 00:10:08,320
The good news is that you don't have to
learn Python or R or become a data
79
00:10:08,320 --> 00:10:17,280
scientist, but you want to learn a bit of
SPARQL. We included a few resources here
80
00:10:17,280 --> 00:10:22,720
in this presentation and there's also
going to be a talk given by my colleague
81
00:10:22,720 --> 00:10:33,360
Lucas on the 29th on how to query Wikidata
with SPARQL. We also have a guided tour on
82
00:10:33,360 --> 00:10:47,217
Wikidata on our website which I can
recommend. OK, so, um, as said, once you
83
00:10:47,217 --> 00:10:56,150
queried your data, you can visualize your
results for more compelling storytelling
84
00:10:56,150 --> 00:11:00,090
and there are several ways of doing this
and I'm going to show you some of this
85
00:11:00,090 --> 00:11:09,920
just to give you an idea. You could, for
instance, ask the query service to show
86
00:11:09,920 --> 00:11:17,760
you airports that are named after a person
and color code them according to their
87
00:11:17,760 --> 00:11:32,227
gender. Gender of the person, not the
airport, obviously. You can ask the query
88
00:11:32,227 --> 00:11:45,872
service, show me everything connected to
the item Berlin. You can ask it to show
89
00:11:45,872 --> 00:11:52,218
you the population of the countries that
are bordering Germany and how it
90
00:11:52,218 --> 00:12:03,187
developed. You can also ask the query
service to show you the most common cause
91
00:12:03,187 --> 00:12:17,360
of death among noble people. Or here it
shows you an- an historical overview of
92
00:12:17,360 --> 00:12:42,511
space probes. Or all of the children and
grandchildren of Genghis Khan. So we had a
93
00:12:42,511 --> 00:12:48,220
look on the visualizations inside of
Wikidata's Query Service, but there are
94
00:12:48,220 --> 00:12:55,381
also tools that use Wikidata's data for
their own visualizations. And I'm going to
95
00:12:55,381 --> 00:13:05,280
show you some of them now. So here is
Histropedia, which makes time beams of
96
00:13:05,280 --> 00:13:15,563
historical events using data from
Wikidata. This is Inventaire. Basically,
97
00:13:15,563 --> 00:13:24,132
it lets you create your own private
library and then uses the data from
98
00:13:24,132 --> 00:13:35,280
Wikidata to describe the publications.
Here is "Ask me anything". That's done by
99
00:13:35,280 --> 00:13:43,200
different researchers in Europe, and it
lets you pose questions in natural
100
00:13:43,200 --> 00:13:52,560
language to Wikidata so you don't have to
use the query service. That's a way that
101
00:13:53,200 --> 00:14:01,840
to use Wikidata that's also used by a lot
of voice assistants like Siri and Alexa.
102
00:14:04,800 --> 00:14:10,640
And here you have Scholia, which is
basically a platform for scientific
103
00:14:10,640 --> 00:14:18,960
publications that are published under open
access and collected, and it can answer
104
00:14:18,960 --> 00:14:27,840
your questions like who published what
paper, with whom, who and when or who
105
00:14:27,840 --> 00:14:37,489
wrote the first paper on COVID, when was
it published, etc. And here we have "Sum
106
00:14:37,489 --> 00:14:44,563
of All Paintings". Basically, it's a
database that creates all of the paintings
107
00:14:44,563 --> 00:14:50,884
in the world and lists their metadata so
you can combine it in your own specific
108
00:14:50,884 --> 00:15:06,117
way. So I showed you a couple of examples,
what you could do, and I want to hint at
109
00:15:06,117 --> 00:15:15,273
other researchers who did great stuff with
Wikidata and used it for very cool
110
00:15:15,273 --> 00:15:32,009
storytelling. If my slides work, OK, here
we go. So, um, "Women's representation and
111
00:15:32,009 --> 00:15:37,487
voice in media coverage of the coronavirus
crisis", that's the- that's a study done
112
00:15:37,487 --> 00:15:45,504
by a researcher called Laura Jones
regarding the representation of female
113
00:15:45,504 --> 00:15:53,616
experts within the coverage of
coronavirus. It uses evaluations of
114
00:15:53,616 --> 00:16:03,600
Wikipedia and Wikidata to show- to show
how much representation was there, of
115
00:16:03,600 --> 00:16:21,745
female experts. And, as we see, it's not a
lot. Finally, there is another great
116
00:16:21,745 --> 00:16:29,672
example I want to tell you about, it's a
project called Enslaved.org. It's a linked
117
00:16:29,672 --> 00:16:37,652
open data platform based on Wikibase,
which is the software behind Wikidata and
118
00:16:37,652 --> 00:16:45,970
it basically shows or it collects and
connects data related to the transatlantic
119
00:16:45,970 --> 00:16:53,059
slave trade. So, people who suffered under
the slave trade and the records that were
120
00:16:53,059 --> 00:17:03,122
done by the people active in this slave
trade, those data is collected. It has
121
00:17:03,122 --> 00:17:12,552
been collected in several databases and
Enslaved build one large database to
122
00:17:12,552 --> 00:17:21,946
connect them and rebuild the stories,
which I think is a really great idea to or
123
00:17:21,946 --> 00:17:30,133
really great way to humanize people who
have been dehumanized with data. Like you
124
00:17:30,133 --> 00:17:40,560
can see here, they collect- they collect
data from newspapers and from the
125
00:17:40,560 --> 00:17:56,123
slaveholders to recount a story of
individuals. So finally, I also want to
126
00:17:56,123 --> 00:18:02,720
talk to you about one thing in Wikidata
that is always on our minds, which is that
127
00:18:03,600 --> 00:18:09,680
Wikidata is not perfect. I highly
recommend the talk by Os Keyes
128
00:18:09,680 --> 00:18:15,920
"Questioning Wikidata" in which it is
explained that all classification systems
129
00:18:15,920 --> 00:18:22,640
are inherently dangerous and Wikidata is a
large encyclopedic wiki classification
130
00:18:22,640 --> 00:18:30,720
system which makes choices, ethical and
political choices, about what is notable,
131
00:18:31,280 --> 00:18:43,120
about how to categorize information. And
these choices, they reduce complexity and
132
00:18:43,120 --> 00:18:54,080
reduce also specific forms of- of history,
like oral history. This reduction has
133
00:18:54,080 --> 00:19:03,440
consequences. As you know, Wikidata is
used by many programs, apps, voice
134
00:19:03,440 --> 00:19:17,084
assistance and what- what and how we store
information in Wikidata really matters. So
135
00:19:17,084 --> 00:19:27,280
we ask ourselves, what is encyclopedic
knowledge? And how can we organize it in a
136
00:19:27,280 --> 00:19:34,134
more inclusive way? Encyclopedic knowledge
is a Western concept, and we can and must
137
00:19:34,134 --> 00:19:45,896
do better than just use our own Western
view to organize the world. But then also
138
00:19:45,896 --> 00:19:52,240
the wiki principle applies, we have a huge
community behind Wikidata that helps us to
139
00:19:52,240 --> 00:19:59,760
make these decisions, and you can also
become a part of this by researching
140
00:19:59,760 --> 00:20:11,646
Wikidata, using it for your work and also
contributing your research. So once again,
141
00:20:11,646 --> 00:20:17,927
I want to tell you, you can use Wikidata
as a tool for your storytelling. Wikidata
142
00:20:17,927 --> 00:20:24,162
can help you find connections between
data. Wikidata can help you find- can help
143
00:20:24,162 --> 00:20:30,406
you build visualization in its query
service. You can ask questions about
144
00:20:30,406 --> 00:20:38,080
historical data correlations more
critically than you could- than you could
145
00:20:38,080 --> 00:20:45,360
before. And- but there are also downsides
to- downsides to Wikidata because it is an
146
00:20:45,360 --> 00:20:55,256
encyclopedic way of organizing Western
knowledge. So this was only a start. I'm
147
00:20:55,256 --> 00:21:02,739
looking forward to our Q&A session now and
if you have further questions, concerns or
148
00:21:02,739 --> 00:21:08,021
have ideas, you can contact me and my
colleagues and you can also contact me
149
00:21:08,021 --> 00:21:18,572
individually. Thank you.
150
00:21:18,572 --> 00:21:23,520
Herald: Hello and welcome to Elizabeth.
Thank you very much for your interesting
151
00:21:23,520 --> 00:21:29,520
talk. That was a very great introduction.
Elisabeth: Hi. Yeah, thanks for having me.
152
00:21:30,320 --> 00:21:36,240
I'm happy that I was able to talk a bit
about Wikidata and how you could do
153
00:21:36,240 --> 00:21:43,040
storytelling with it. I wanted to add
that, obviously, you can ask me questions
154
00:21:43,040 --> 00:21:50,640
now, but also I want to hint at the great
introduction of Wikidata that one of my
155
00:21:50,640 --> 00:21:57,120
colleagues gave. Yesterday, two of my
colleagues, which is already online, and
156
00:21:57,120 --> 00:22:03,040
tomorrow there will be a query service
workshops where you can learn a bit more
157
00:22:03,040 --> 00:22:09,040
in-depth how to query Wikidata.
Herald: Yeah, that's a very good hint.
158
00:22:09,040 --> 00:22:13,280
There's actually there's two questions in
the chat right now. The first one is, are
159
00:22:13,280 --> 00:22:17,840
your slides going to be published because
people are interested in your links to the
160
00:22:17,840 --> 00:22:22,320
tutorials, obviously.
Elisabeth: Yes, that was, uh, I asked
161
00:22:22,320 --> 00:22:29,840
before, I think the talk will be published
and the slides. Is there a Wikipaka board
162
00:22:29,840 --> 00:22:36,320
where I can put it? Otherwise, I can also
put a link on our Twitter account,
163
00:22:36,320 --> 00:22:43,600
Wikimedia Deutschland. And yeah...
Herald: I think Twitter for now would
164
00:22:43,600 --> 00:22:48,160
probably be the best idea, I actually have
to check on the Wikipaka board, but we
165
00:22:48,160 --> 00:22:50,400
will let you know where you can find
everything.
166
00:22:50,400 --> 00:23:01,880
Elisabeth: I put it on the Wikimedia
Deutschland Twitter. It's @wmde I think
167
00:23:01,880 --> 00:23:05,280
Herald: we will also retweet it
obviously. You will find it, I promise.
168
00:23:05,280 --> 00:23:08,720
Elisabeth: OK.
Herald: There's another question. What
169
00:23:08,720 --> 00:23:12,720
resources would you recommend for self-
studying the writing of queries for
170
00:23:12,720 --> 00:23:19,200
query.wikidata.org?
Elisabeth: Mhm. Um, I put some links in
171
00:23:19,200 --> 00:23:27,600
the- in the slides. There is... yeah, we
have, like, a few tutorials on Wikidata.
172
00:23:27,600 --> 00:23:35,040
There was also a couple of months ago, a
very nice and very easy tutorial published
173
00:23:35,040 --> 00:23:41,600
by Wikimedia Israel. And I- so we didn't
do it, but I can recommend it, it's a very
174
00:23:42,640 --> 00:23:47,730
low key introduction to your first
queries.
175
00:23:47,730 --> 00:23:54,400
Herald: OK. We will also publish that
somehow. I have a question for you as
176
00:23:54,400 --> 00:23:58,800
well. You mentioned that Wikidata is like
a great way for meeting other people that
177
00:23:58,800 --> 00:24:05,120
are working on similar topics. So is there
some kind of like greater community of
178
00:24:05,120 --> 00:24:13,120
journalists using Wikidata?
Elisabeth: So far, the community is mostly
179
00:24:13,120 --> 00:24:19,280
research based. That's also why we wanted
to reach out here. So I would recommend
180
00:24:19,280 --> 00:24:26,480
getting in touch with the community on
there regarding the research topics that
181
00:24:26,480 --> 00:24:35,360
you have. And you can also get in touch
with us and we connect you. I have a noise
182
00:24:35,360 --> 00:24:41,440
in my ear, but I hope it's only me.
Herald: Well, I don't have it, so it might
183
00:24:42,400 --> 00:24:47,200
just be you, but I feel like there might
be also an echo on the stream, that's what
184
00:24:47,200 --> 00:24:51,280
people on the chat are saying.
Elisabeth: Oh, OK.
185
00:24:51,280 --> 00:24:56,160
Herald: So I don't have any other questions
in the chat and since there seems to be an
186
00:24:56,160 --> 00:25:02,240
echo on the stream, I don't want to annoy
people any further. So I would suggest for
187
00:25:02,240 --> 00:25:07,760
everyone who has further questions to you
that you can meet in our Big Blue Button
188
00:25:07,760 --> 00:25:15,840
meetup room that I will be posting in the
chat right now and we will continue our
189
00:25:15,840 --> 00:25:22,560
program here at 2:20 with another talk
about Flutter by "The one with the braid",
190
00:25:22,560 --> 00:25:29,200
so I'm saying bye for now.
Elisabeth: Thanks, bye.
191
00:25:29,200 --> 00:25:30,251
Herald: Bye.
192
00:25:30,251 --> 00:25:33,601
outro music
193
00:25:33,601 --> 00:25:40,000
Subtitles created by c3subtitles.de
in the year 2021. Join, and help us!