WEBVTT 00:00:00.000 --> 00:00:09.465 intro music 00:00:14.815 --> 00:00:18.081 Herald: Wikidata for (Data) Journalists by Elizabeth Giesemann. 00:00:19.501 --> 00:00:25.520 Elisabeth Giesemann: So our agenda for today is that we will have a look on key 00:00:25.520 --> 00:00:32.697 points of data journalism. We will quickly explain what Wikidata is, what tools you 00:00:32.697 --> 00:00:39.489 can use inside of Wikidata for data visualization, what other third party 00:00:39.489 --> 00:00:46.477 tools are there for your research? Then we have a look at critical research done with 00:00:46.477 --> 00:00:52.589 Wikidata. And finally, we have a critical look on the data of Wikidata itself. 00:00:57.259 --> 00:01:02.979 Key points of data journalism are that you want to interview a dataset, so you want 00:01:02.979 --> 00:01:08.746 to find connections, correlations and causalities behind the data. Also, you 00:01:08.746 --> 00:01:16.786 want to visualize the data in a compelling way and you want to write your own story. 00:01:16.786 --> 00:01:23.987 You want to find a new spin and a new look on- at the facts 00:01:23.987 --> 00:01:26.482 and all of these things you can do with Wikidata. 00:01:31.752 --> 00:01:35.442 At Wikimedia Deutschland, we want to support evidence-based reporting 00:01:35.442 --> 00:01:40.390 that's why we want to support you in using Wikidata. 00:01:40.390 --> 00:01:49.623 Also data journalism helps you to tailor your story to the users or your readers. 00:01:49.623 --> 00:01:55.970 Data journalism helps you to create visual storytelling instead of walls of text. 00:01:55.970 --> 00:02:03.994 And this, again, helps you to convey facts faster and way more easy 00:02:03.994 --> 00:02:06.292 and that makes your story way more inclusive. 00:02:10.553 --> 00:02:13.602 So how do you get to a story with Wikidata? 00:02:13.602 --> 00:02:19.359 You want to find and recognize patterns in a dataset, you can search for geographical 00:02:19.359 --> 00:02:25.644 data, you can search for similarities and differences in the data, and you can also 00:02:25.644 --> 00:02:31.959 search for missing data, because that also exists in Wikidata. You can visualize your 00:02:31.959 --> 00:02:37.731 findings with the tools that you find in the Wikidata Query Service. And what's 00:02:37.731 --> 00:02:43.210 most important is you can connect to the Wikidata community and find people who are 00:02:43.210 --> 00:02:48.592 working on a similar subject or have a similar research- research question to the 00:02:48.592 --> 00:03:00.320 one that you have. So I included this visualization to show you that data is 00:03:00.320 --> 00:03:08.640 only the beginning of your story and the path that you will take. We want you to 00:03:08.640 --> 00:03:17.120 use the data in Wikidata for- to create a compelling story and therefore contribute 00:03:17.680 --> 00:03:29.787 value and your idea about what's in the data. Because data is a lot, but it's not 00:03:29.787 --> 00:03:34.960 everything, as we've seen in the last month, many people aren't convinced by 00:03:34.960 --> 00:03:43.440 facts. Also, there is a lack of time and there is a lack of data- data literacy in 00:03:43.440 --> 00:03:49.200 our society. It's not always easy to understand the complexity of historical 00:03:49.200 --> 00:03:55.280 events and developments, to understand the complexity of medical data or demographic 00:03:55.280 --> 00:04:03.040 changes. So it is important to have a storytelling aspect to your data, have 00:04:03.040 --> 00:04:08.000 good visualizations and an easy to understand approach to convey the 00:04:08.000 --> 00:04:14.320 significance of your data and your story. And finally, it is important to remain 00:04:14.320 --> 00:04:27.758 transparent and clear about the use and analysis of the data. So what is Wikidata? 00:04:27.758 --> 00:04:33.589 Wikidata is a free linked database that can be read and edited by both humans and 00:04:33.589 --> 00:04:39.518 machines, so it is a database of linked open data. It- that means that the data 00:04:39.518 --> 00:04:46.247 doesn't just sit there in tables. It can be connected and combined with other data, 00:04:46.247 --> 00:04:56.269 found on Wikidata. As such, it is a realization of the semantic web as dreamt 00:04:56.269 --> 00:05:04.884 by Tim Berners-Lee and also Wikidata won a prize for its realization of the semantic 00:05:04.884 --> 00:05:12.864 web. We just celebrated Wikidata- data's 8th birthday. It currently holds 90 00:05:12.864 --> 00:05:20.985 million items and has 44,000 active users and contributors, which makes it the most 00:05:20.985 --> 00:05:31.692 edited Wikimedia project. It was initially used to or thought of to support the 00:05:31.692 --> 00:05:39.070 projects of the other projects of the Wikimedia ecosystem and seen as a central 00:05:39.070 --> 00:05:46.162 storage for the structured data of the sister of projects like Wikivoyage, 00:05:46.162 --> 00:05:57.767 Wikisource and the most famous Wikimedia project, Wikipedia. But it also has 00:05:57.767 --> 00:06:04.509 another function, which means- which is to provide free and open data to the 00:06:04.509 --> 00:06:12.841 Internet, and that became really huge. As already said, we now have more than 80- 90 00:06:12.841 --> 00:06:18.921 million data items on Wikidata. A colleague of mine created this map and you 00:06:18.921 --> 00:06:28.312 can see here the geolocation data that is in Wikidata and we are very proud that 00:06:28.312 --> 00:06:33.901 it's distributed all over the world but it's also- we also take it with a grain of 00:06:33.901 --> 00:06:40.960 salt, because as you can see, it's very bright in Europe and on the east and west 00:06:40.960 --> 00:06:51.170 coasts of the US, but there are very dark spots where we can't record the knowledge 00:06:51.170 --> 00:06:55.632 in the same way as we do in our Western societies and that brings us to the 00:06:55.632 --> 00:07:02.314 question of what is knowledge equity and how can we actually best serve everybody 00:07:02.314 --> 00:07:15.600 in our global society? So how does it work? Wikidata items, which are real 00:07:15.600 --> 00:07:22.000 things or concepts in the real world, like Berlin, Barack Obama, helium, and these 00:07:22.000 --> 00:07:36.058 items are identified with an ID, the QID. So Q76 or Q... I don't, I can't read the 00:07:36.058 --> 00:07:43.296 number now, so these items have labels, descriptions, aliases and sitelinks. 00:07:43.296 --> 00:07:49.840 Labels, that means it's described in all of the languages that Wikidata holds 00:07:49.840 --> 00:07:59.246 currently, those are around 300. Descriptions are forms to describe what 00:07:59.246 --> 00:08:10.000 the item holds and aliases, sometimes one item has several names, etc, etc. An item 00:08:10.000 --> 00:08:16.800 also has properties, those are used to label to data like a person is born 00:08:16.800 --> 00:08:22.640 somewhere, its date of birth or death or the location of a specific building. 00:08:24.720 --> 00:08:32.240 Statements hold informations in properties, so P47 shares the border with 00:08:32.240 --> 00:08:42.320 another, like, country or the population. Statements also have qualifiers to expand 00:08:42.320 --> 00:08:48.320 the information and then also they have references which is very important because 00:08:50.080 --> 00:08:59.697 for scientific research, you want to have those references. So here we see again our 00:08:59.697 --> 00:09:22.080 item, Berlin, Q64. The property is the population of 3.7 million. So what's new 00:09:22.080 --> 00:09:29.200 about research with Wikidata is that you can ask your own questions. Before, you 00:09:29.200 --> 00:09:34.480 would go to a library and some- the librarians - librarians are awesome, but 00:09:34.480 --> 00:09:41.120 they would give you books with specific facts in them and you would consume them 00:09:41.120 --> 00:09:48.240 and try to use them for your research. At Wikidata you can ask very specific 00:09:48.240 --> 00:09:56.080 questions that nobody else came up with before. So for your research, you want to 00:09:56.080 --> 00:10:01.440 do your own Wikidata queries, that's what we have the Wikidata Query Service for. 00:10:03.120 --> 00:10:08.320 The good news is that you don't have to learn Python or R or become a data 00:10:08.320 --> 00:10:17.280 scientist, but you want to learn a bit of SPARQL. We included a few resources here 00:10:17.280 --> 00:10:22.720 in this presentation and there's also going to be a talk given by my colleague 00:10:22.720 --> 00:10:33.360 Lucas on the 29th on how to query Wikidata with SPARQL. We also have a guided tour on 00:10:33.360 --> 00:10:47.217 Wikidata on our website which I can recommend. OK, so, um, as said, once you 00:10:47.217 --> 00:10:56.150 queried your data, you can visualize your results for more compelling storytelling 00:10:56.150 --> 00:11:00.090 and there are several ways of doing this and I'm going to show you some of this 00:11:00.090 --> 00:11:09.920 just to give you an idea. You could, for instance, ask the query service to show 00:11:09.920 --> 00:11:17.760 you airports that are named after a person and color code them according to their 00:11:17.760 --> 00:11:32.227 gender. Gender of the person, not the airport, obviously. You can ask the query 00:11:32.227 --> 00:11:45.872 service, show me everything connected to the item Berlin. You can ask it to show 00:11:45.872 --> 00:11:52.218 you the population of the countries that are bordering Germany and how it 00:11:52.218 --> 00:12:03.187 developed. You can also ask the query service to show you the most common cause 00:12:03.187 --> 00:12:17.360 of death among noble people. Or here it shows you an- an historical overview of 00:12:17.360 --> 00:12:42.511 space probes. Or all of the children and grandchildren of Genghis Khan. So we had a 00:12:42.511 --> 00:12:48.220 look on the visualizations inside of Wikidata's Query Service, but there are 00:12:48.220 --> 00:12:55.381 also tools that use Wikidata's data for their own visualizations. And I'm going to 00:12:55.381 --> 00:13:05.280 show you some of them now. So here is Histropedia, which makes time beams of 00:13:05.280 --> 00:13:15.563 historical events using data from Wikidata. This is Inventaire. Basically, 00:13:15.563 --> 00:13:24.132 it lets you create your own private library and then uses the data from 00:13:24.132 --> 00:13:35.280 Wikidata to describe the publications. Here is "Ask me anything". That's done by 00:13:35.280 --> 00:13:43.200 different researchers in Europe, and it lets you pose questions in natural 00:13:43.200 --> 00:13:52.560 language to Wikidata so you don't have to use the query service. That's a way that 00:13:53.200 --> 00:14:01.840 to use Wikidata that's also used by a lot of voice assistants like Siri and Alexa. 00:14:04.800 --> 00:14:10.640 And here you have Scholia, which is basically a platform for scientific 00:14:10.640 --> 00:14:18.960 publications that are published under open access and collected, and it can answer 00:14:18.960 --> 00:14:27.840 your questions like who published what paper, with whom, who and when or who 00:14:27.840 --> 00:14:37.489 wrote the first paper on COVID, when was it published, etc. And here we have "Sum 00:14:37.489 --> 00:14:44.563 of All Paintings". Basically, it's a database that creates all of the paintings 00:14:44.563 --> 00:14:50.884 in the world and lists their metadata so you can combine it in your own specific 00:14:50.884 --> 00:15:06.117 way. So I showed you a couple of examples, what you could do, and I want to hint at 00:15:06.117 --> 00:15:15.273 other researchers who did great stuff with Wikidata and used it for very cool 00:15:15.273 --> 00:15:32.009 storytelling. If my slides work, OK, here we go. So, um, "Women's representation and 00:15:32.009 --> 00:15:37.487 voice in media coverage of the coronavirus crisis", that's the- that's a study done 00:15:37.487 --> 00:15:45.504 by a researcher called Laura Jones regarding the representation of female 00:15:45.504 --> 00:15:53.616 experts within the coverage of coronavirus. It uses evaluations of 00:15:53.616 --> 00:16:03.600 Wikipedia and Wikidata to show- to show how much representation was there, of 00:16:03.600 --> 00:16:21.745 female experts. And, as we see, it's not a lot. Finally, there is another great 00:16:21.745 --> 00:16:29.672 example I want to tell you about, it's a project called Enslaved.org. It's a linked 00:16:29.672 --> 00:16:37.652 open data platform based on Wikibase, which is the software behind Wikidata and 00:16:37.652 --> 00:16:45.970 it basically shows or it collects and connects data related to the transatlantic 00:16:45.970 --> 00:16:53.059 slave trade. So, people who suffered under the slave trade and the records that were 00:16:53.059 --> 00:17:03.122 done by the people active in this slave trade, those data is collected. It has 00:17:03.122 --> 00:17:12.552 been collected in several databases and Enslaved build one large database to 00:17:12.552 --> 00:17:21.946 connect them and rebuild the stories, which I think is a really great idea to or 00:17:21.946 --> 00:17:30.133 really great way to humanize people who have been dehumanized with data. Like you 00:17:30.133 --> 00:17:40.560 can see here, they collect- they collect data from newspapers and from the 00:17:40.560 --> 00:17:56.123 slaveholders to recount a story of individuals. So finally, I also want to 00:17:56.123 --> 00:18:02.720 talk to you about one thing in Wikidata that is always on our minds, which is that 00:18:03.600 --> 00:18:09.680 Wikidata is not perfect. I highly recommend the talk by Os Keyes 00:18:09.680 --> 00:18:15.920 "Questioning Wikidata" in which it is explained that all classification systems 00:18:15.920 --> 00:18:22.640 are inherently dangerous and Wikidata is a large encyclopedic wiki classification 00:18:22.640 --> 00:18:30.720 system which makes choices, ethical and political choices, about what is notable, 00:18:31.280 --> 00:18:43.120 about how to categorize information. And these choices, they reduce complexity and 00:18:43.120 --> 00:18:54.080 reduce also specific forms of- of history, like oral history. This reduction has 00:18:54.080 --> 00:19:03.440 consequences. As you know, Wikidata is used by many programs, apps, voice 00:19:03.440 --> 00:19:17.084 assistance and what- what and how we store information in Wikidata really matters. So 00:19:17.084 --> 00:19:27.280 we ask ourselves, what is encyclopedic knowledge? And how can we organize it in a 00:19:27.280 --> 00:19:34.134 more inclusive way? Encyclopedic knowledge is a Western concept, and we can and must 00:19:34.134 --> 00:19:45.896 do better than just use our own Western view to organize the world. But then also 00:19:45.896 --> 00:19:52.240 the wiki principle applies, we have a huge community behind Wikidata that helps us to 00:19:52.240 --> 00:19:59.760 make these decisions, and you can also become a part of this by researching 00:19:59.760 --> 00:20:11.646 Wikidata, using it for your work and also contributing your research. So once again, 00:20:11.646 --> 00:20:17.927 I want to tell you, you can use Wikidata as a tool for your storytelling. Wikidata 00:20:17.927 --> 00:20:24.162 can help you find connections between data. Wikidata can help you find- can help 00:20:24.162 --> 00:20:30.406 you build visualization in its query service. You can ask questions about 00:20:30.406 --> 00:20:38.080 historical data correlations more critically than you could- than you could 00:20:38.080 --> 00:20:45.360 before. And- but there are also downsides to- downsides to Wikidata because it is an 00:20:45.360 --> 00:20:55.256 encyclopedic way of organizing Western knowledge. So this was only a start. I'm 00:20:55.256 --> 00:21:02.739 looking forward to our Q&A session now and if you have further questions, concerns or 00:21:02.739 --> 00:21:08.021 have ideas, you can contact me and my colleagues and you can also contact me 00:21:08.021 --> 00:21:18.572 individually. Thank you. 00:21:18.572 --> 00:21:23.520 Herald: Hello and welcome to Elizabeth. Thank you very much for your interesting 00:21:23.520 --> 00:21:29.520 talk. That was a very great introduction. Elisabeth: Hi. Yeah, thanks for having me. 00:21:30.320 --> 00:21:36.240 I'm happy that I was able to talk a bit about Wikidata and how you could do 00:21:36.240 --> 00:21:43.040 storytelling with it. I wanted to add that, obviously, you can ask me questions 00:21:43.040 --> 00:21:50.640 now, but also I want to hint at the great introduction of Wikidata that one of my 00:21:50.640 --> 00:21:57.120 colleagues gave. Yesterday, two of my colleagues, which is already online, and 00:21:57.120 --> 00:22:03.040 tomorrow there will be a query service workshops where you can learn a bit more 00:22:03.040 --> 00:22:09.040 in-depth how to query Wikidata. Herald: Yeah, that's a very good hint. 00:22:09.040 --> 00:22:13.280 There's actually there's two questions in the chat right now. The first one is, are 00:22:13.280 --> 00:22:17.840 your slides going to be published because people are interested in your links to the 00:22:17.840 --> 00:22:22.320 tutorials, obviously. Elisabeth: Yes, that was, uh, I asked 00:22:22.320 --> 00:22:29.840 before, I think the talk will be published and the slides. Is there a Wikipaka board 00:22:29.840 --> 00:22:36.320 where I can put it? Otherwise, I can also put a link on our Twitter account, 00:22:36.320 --> 00:22:43.600 Wikimedia Deutschland. And yeah... Herald: I think Twitter for now would 00:22:43.600 --> 00:22:48.160 probably be the best idea, I actually have to check on the Wikipaka board, but we 00:22:48.160 --> 00:22:50.400 will let you know where you can find everything. 00:22:50.400 --> 00:23:01.880 Elisabeth: I put it on the Wikimedia Deutschland Twitter. It's @wmde I think 00:23:01.880 --> 00:23:05.280 Herald: we will also retweet it obviously. You will find it, I promise. 00:23:05.280 --> 00:23:08.720 Elisabeth: OK. Herald: There's another question. What 00:23:08.720 --> 00:23:12.720 resources would you recommend for self- studying the writing of queries for 00:23:12.720 --> 00:23:19.200 query.wikidata.org? Elisabeth: Mhm. Um, I put some links in 00:23:19.200 --> 00:23:27.600 the- in the slides. There is... yeah, we have, like, a few tutorials on Wikidata. 00:23:27.600 --> 00:23:35.040 There was also a couple of months ago, a very nice and very easy tutorial published 00:23:35.040 --> 00:23:41.600 by Wikimedia Israel. And I- so we didn't do it, but I can recommend it, it's a very 00:23:42.640 --> 00:23:47.730 low key introduction to your first queries. 00:23:47.730 --> 00:23:54.400 Herald: OK. We will also publish that somehow. I have a question for you as 00:23:54.400 --> 00:23:58.800 well. You mentioned that Wikidata is like a great way for meeting other people that 00:23:58.800 --> 00:24:05.120 are working on similar topics. So is there some kind of like greater community of 00:24:05.120 --> 00:24:13.120 journalists using Wikidata? Elisabeth: So far, the community is mostly 00:24:13.120 --> 00:24:19.280 research based. That's also why we wanted to reach out here. So I would recommend 00:24:19.280 --> 00:24:26.480 getting in touch with the community on there regarding the research topics that 00:24:26.480 --> 00:24:35.360 you have. And you can also get in touch with us and we connect you. I have a noise 00:24:35.360 --> 00:24:41.440 in my ear, but I hope it's only me. Herald: Well, I don't have it, so it might 00:24:42.400 --> 00:24:47.200 just be you, but I feel like there might be also an echo on the stream, that's what 00:24:47.200 --> 00:24:51.280 people on the chat are saying. Elisabeth: Oh, OK. 00:24:51.280 --> 00:24:56.160 Herald: So I don't have any other questions in the chat and since there seems to be an 00:24:56.160 --> 00:25:02.240 echo on the stream, I don't want to annoy people any further. So I would suggest for 00:25:02.240 --> 00:25:07.760 everyone who has further questions to you that you can meet in our Big Blue Button 00:25:07.760 --> 00:25:15.840 meetup room that I will be posting in the chat right now and we will continue our 00:25:15.840 --> 00:25:22.560 program here at 2:20 with another talk about Flutter by "The one with the braid", 00:25:22.560 --> 00:25:29.200 so I'm saying bye for now. Elisabeth: Thanks, bye. 00:25:29.200 --> 00:25:30.251 Herald: Bye. 00:25:30.251 --> 00:25:33.601 outro music 00:25:33.601 --> 00:25:40.000 Subtitles created by c3subtitles.de in the year 2021. Join, and help us!