1 00:00:00,000 --> 00:00:09,465 intro music 2 00:00:14,815 --> 00:00:18,081 Herald: Wikidata for (Data) Journalists by Elizabeth Giesemann. 3 00:00:19,501 --> 00:00:25,520 Elisabeth Giesemann: So our agenda for today is that we will have a look on key 4 00:00:25,520 --> 00:00:32,697 points of data journalism. We will quickly explain what Wikidata is, what tools you 5 00:00:32,697 --> 00:00:39,489 can use inside of Wikidata for data visualization, what other third party 6 00:00:39,489 --> 00:00:46,477 tools are there for your research? Then we have a look at critical research done with 7 00:00:46,477 --> 00:00:52,589 Wikidata. And finally, we have a critical look on the data of Wikidata itself. 8 00:00:57,259 --> 00:01:02,979 Key points of data journalism are that you want to interview a dataset, so you want 9 00:01:02,979 --> 00:01:08,746 to find connections, correlations and causalities behind the data. Also, you 10 00:01:08,746 --> 00:01:16,786 want to visualize the data in a compelling way and you want to write your own story. 11 00:01:16,786 --> 00:01:23,987 You want to find a new spin and a new look on- at the facts 12 00:01:23,987 --> 00:01:26,482 and all of these things you can do with Wikidata. 13 00:01:31,752 --> 00:01:35,442 At Wikimedia Deutschland, we want to support evidence-based reporting 14 00:01:35,442 --> 00:01:40,390 that's why we want to support you in using Wikidata. 15 00:01:40,390 --> 00:01:49,623 Also data journalism helps you to tailor your story to the users or your readers. 16 00:01:49,623 --> 00:01:55,970 Data journalism helps you to create visual storytelling instead of walls of text. 17 00:01:55,970 --> 00:02:03,994 And this, again, helps you to convey facts faster and way more easy 18 00:02:03,994 --> 00:02:06,292 and that makes your story way more inclusive. 19 00:02:10,553 --> 00:02:13,602 So how do you get to a story with Wikidata? 20 00:02:13,602 --> 00:02:19,359 You want to find and recognize patterns in a dataset, you can search for geographical 21 00:02:19,359 --> 00:02:25,644 data, you can search for similarities and differences in the data, and you can also 22 00:02:25,644 --> 00:02:31,959 search for missing data, because that also exists in Wikidata. You can visualize your 23 00:02:31,959 --> 00:02:37,731 findings with the tools that you find in the Wikidata Query Service. And what's 24 00:02:37,731 --> 00:02:43,210 most important is you can connect to the Wikidata community and find people who are 25 00:02:43,210 --> 00:02:48,592 working on a similar subject or have a similar research- research question to the 26 00:02:48,592 --> 00:03:00,320 one that you have. So I included this visualization to show you that data is 27 00:03:00,320 --> 00:03:08,640 only the beginning of your story and the path that you will take. We want you to 28 00:03:08,640 --> 00:03:17,120 use the data in Wikidata for- to create a compelling story and therefore contribute 29 00:03:17,680 --> 00:03:29,787 value and your idea about what's in the data. Because data is a lot, but it's not 30 00:03:29,787 --> 00:03:34,960 everything, as we've seen in the last month, many people aren't convinced by 31 00:03:34,960 --> 00:03:43,440 facts. Also, there is a lack of time and there is a lack of data- data literacy in 32 00:03:43,440 --> 00:03:49,200 our society. It's not always easy to understand the complexity of historical 33 00:03:49,200 --> 00:03:55,280 events and developments, to understand the complexity of medical data or demographic 34 00:03:55,280 --> 00:04:03,040 changes. So it is important to have a storytelling aspect to your data, have 35 00:04:03,040 --> 00:04:08,000 good visualizations and an easy to understand approach to convey the 36 00:04:08,000 --> 00:04:14,320 significance of your data and your story. And finally, it is important to remain 37 00:04:14,320 --> 00:04:27,758 transparent and clear about the use and analysis of the data. So what is Wikidata? 38 00:04:27,758 --> 00:04:33,589 Wikidata is a free linked database that can be read and edited by both humans and 39 00:04:33,589 --> 00:04:39,518 machines, so it is a database of linked open data. It- that means that the data 40 00:04:39,518 --> 00:04:46,247 doesn't just sit there in tables. It can be connected and combined with other data, 41 00:04:46,247 --> 00:04:56,269 found on Wikidata. As such, it is a realization of the semantic web as dreamt 42 00:04:56,269 --> 00:05:04,884 by Tim Berners-Lee and also Wikidata won a prize for its realization of the semantic 43 00:05:04,884 --> 00:05:12,864 web. We just celebrated Wikidata- data's 8th birthday. It currently holds 90 44 00:05:12,864 --> 00:05:20,985 million items and has 44,000 active users and contributors, which makes it the most 45 00:05:20,985 --> 00:05:31,692 edited Wikimedia project. It was initially used to or thought of to support the 46 00:05:31,692 --> 00:05:39,070 projects of the other projects of the Wikimedia ecosystem and seen as a central 47 00:05:39,070 --> 00:05:46,162 storage for the structured data of the sister of projects like Wikivoyage, 48 00:05:46,162 --> 00:05:57,767 Wikisource and the most famous Wikimedia project, Wikipedia. But it also has 49 00:05:57,767 --> 00:06:04,509 another function, which means- which is to provide free and open data to the 50 00:06:04,509 --> 00:06:12,841 Internet, and that became really huge. As already said, we now have more than 80- 90 51 00:06:12,841 --> 00:06:18,921 million data items on Wikidata. A colleague of mine created this map and you 52 00:06:18,921 --> 00:06:28,312 can see here the geolocation data that is in Wikidata and we are very proud that 53 00:06:28,312 --> 00:06:33,901 it's distributed all over the world but it's also- we also take it with a grain of 54 00:06:33,901 --> 00:06:40,960 salt, because as you can see, it's very bright in Europe and on the east and west 55 00:06:40,960 --> 00:06:51,170 coasts of the US, but there are very dark spots where we can't record the knowledge 56 00:06:51,170 --> 00:06:55,632 in the same way as we do in our Western societies and that brings us to the 57 00:06:55,632 --> 00:07:02,314 question of what is knowledge equity and how can we actually best serve everybody 58 00:07:02,314 --> 00:07:15,600 in our global society? So how does it work? Wikidata items, which are real 59 00:07:15,600 --> 00:07:22,000 things or concepts in the real world, like Berlin, Barack Obama, helium, and these 60 00:07:22,000 --> 00:07:36,058 items are identified with an ID, the QID. So Q76 or Q... I don't, I can't read the 61 00:07:36,058 --> 00:07:43,296 number now, so these items have labels, descriptions, aliases and sitelinks. 62 00:07:43,296 --> 00:07:49,840 Labels, that means it's described in all of the languages that Wikidata holds 63 00:07:49,840 --> 00:07:59,246 currently, those are around 300. Descriptions are forms to describe what 64 00:07:59,246 --> 00:08:10,000 the item holds and aliases, sometimes one item has several names, etc, etc. An item 65 00:08:10,000 --> 00:08:16,800 also has properties, those are used to label to data like a person is born 66 00:08:16,800 --> 00:08:22,640 somewhere, its date of birth or death or the location of a specific building. 67 00:08:24,720 --> 00:08:32,240 Statements hold informations in properties, so P47 shares the border with 68 00:08:32,240 --> 00:08:42,320 another, like, country or the population. Statements also have qualifiers to expand 69 00:08:42,320 --> 00:08:48,320 the information and then also they have references which is very important because 70 00:08:50,080 --> 00:08:59,697 for scientific research, you want to have those references. So here we see again our 71 00:08:59,697 --> 00:09:22,080 item, Berlin, Q64. The property is the population of 3.7 million. So what's new 72 00:09:22,080 --> 00:09:29,200 about research with Wikidata is that you can ask your own questions. Before, you 73 00:09:29,200 --> 00:09:34,480 would go to a library and some- the librarians - librarians are awesome, but 74 00:09:34,480 --> 00:09:41,120 they would give you books with specific facts in them and you would consume them 75 00:09:41,120 --> 00:09:48,240 and try to use them for your research. At Wikidata you can ask very specific 76 00:09:48,240 --> 00:09:56,080 questions that nobody else came up with before. So for your research, you want to 77 00:09:56,080 --> 00:10:01,440 do your own Wikidata queries, that's what we have the Wikidata Query Service for. 78 00:10:03,120 --> 00:10:08,320 The good news is that you don't have to learn Python or R or become a data 79 00:10:08,320 --> 00:10:17,280 scientist, but you want to learn a bit of SPARQL. We included a few resources here 80 00:10:17,280 --> 00:10:22,720 in this presentation and there's also going to be a talk given by my colleague 81 00:10:22,720 --> 00:10:33,360 Lucas on the 29th on how to query Wikidata with SPARQL. We also have a guided tour on 82 00:10:33,360 --> 00:10:47,217 Wikidata on our website which I can recommend. OK, so, um, as said, once you 83 00:10:47,217 --> 00:10:56,150 queried your data, you can visualize your results for more compelling storytelling 84 00:10:56,150 --> 00:11:00,090 and there are several ways of doing this and I'm going to show you some of this 85 00:11:00,090 --> 00:11:09,920 just to give you an idea. You could, for instance, ask the query service to show 86 00:11:09,920 --> 00:11:17,760 you airports that are named after a person and color code them according to their 87 00:11:17,760 --> 00:11:32,227 gender. Gender of the person, not the airport, obviously. You can ask the query 88 00:11:32,227 --> 00:11:45,872 service, show me everything connected to the item Berlin. You can ask it to show 89 00:11:45,872 --> 00:11:52,218 you the population of the countries that are bordering Germany and how it 90 00:11:52,218 --> 00:12:03,187 developed. You can also ask the query service to show you the most common cause 91 00:12:03,187 --> 00:12:17,360 of death among noble people. Or here it shows you an- an historical overview of 92 00:12:17,360 --> 00:12:42,511 space probes. Or all of the children and grandchildren of Genghis Khan. So we had a 93 00:12:42,511 --> 00:12:48,220 look on the visualizations inside of Wikidata's Query Service, but there are 94 00:12:48,220 --> 00:12:55,381 also tools that use Wikidata's data for their own visualizations. And I'm going to 95 00:12:55,381 --> 00:13:05,280 show you some of them now. So here is Histropedia, which makes time beams of 96 00:13:05,280 --> 00:13:15,563 historical events using data from Wikidata. This is Inventaire. Basically, 97 00:13:15,563 --> 00:13:24,132 it lets you create your own private library and then uses the data from 98 00:13:24,132 --> 00:13:35,280 Wikidata to describe the publications. Here is "Ask me anything". That's done by 99 00:13:35,280 --> 00:13:43,200 different researchers in Europe, and it lets you pose questions in natural 100 00:13:43,200 --> 00:13:52,560 language to Wikidata so you don't have to use the query service. That's a way that 101 00:13:53,200 --> 00:14:01,840 to use Wikidata that's also used by a lot of voice assistants like Siri and Alexa. 102 00:14:04,800 --> 00:14:10,640 And here you have Scholia, which is basically a platform for scientific 103 00:14:10,640 --> 00:14:18,960 publications that are published under open access and collected, and it can answer 104 00:14:18,960 --> 00:14:27,840 your questions like who published what paper, with whom, who and when or who 105 00:14:27,840 --> 00:14:37,489 wrote the first paper on COVID, when was it published, etc. And here we have "Sum 106 00:14:37,489 --> 00:14:44,563 of All Paintings". Basically, it's a database that creates all of the paintings 107 00:14:44,563 --> 00:14:50,884 in the world and lists their metadata so you can combine it in your own specific 108 00:14:50,884 --> 00:15:06,117 way. So I showed you a couple of examples, what you could do, and I want to hint at 109 00:15:06,117 --> 00:15:15,273 other researchers who did great stuff with Wikidata and used it for very cool 110 00:15:15,273 --> 00:15:32,009 storytelling. If my slides work, OK, here we go. So, um, "Women's representation and 111 00:15:32,009 --> 00:15:37,487 voice in media coverage of the coronavirus crisis", that's the- that's a study done 112 00:15:37,487 --> 00:15:45,504 by a researcher called Laura Jones regarding the representation of female 113 00:15:45,504 --> 00:15:53,616 experts within the coverage of coronavirus. It uses evaluations of 114 00:15:53,616 --> 00:16:03,600 Wikipedia and Wikidata to show- to show how much representation was there, of 115 00:16:03,600 --> 00:16:21,745 female experts. And, as we see, it's not a lot. Finally, there is another great 116 00:16:21,745 --> 00:16:29,672 example I want to tell you about, it's a project called Enslaved.org. It's a linked 117 00:16:29,672 --> 00:16:37,652 open data platform based on Wikibase, which is the software behind Wikidata and 118 00:16:37,652 --> 00:16:45,970 it basically shows or it collects and connects data related to the transatlantic 119 00:16:45,970 --> 00:16:53,059 slave trade. So, people who suffered under the slave trade and the records that were 120 00:16:53,059 --> 00:17:03,122 done by the people active in this slave trade, those data is collected. It has 121 00:17:03,122 --> 00:17:12,552 been collected in several databases and Enslaved build one large database to 122 00:17:12,552 --> 00:17:21,946 connect them and rebuild the stories, which I think is a really great idea to or 123 00:17:21,946 --> 00:17:30,133 really great way to humanize people who have been dehumanized with data. Like you 124 00:17:30,133 --> 00:17:40,560 can see here, they collect- they collect data from newspapers and from the 125 00:17:40,560 --> 00:17:56,123 slaveholders to recount a story of individuals. So finally, I also want to 126 00:17:56,123 --> 00:18:02,720 talk to you about one thing in Wikidata that is always on our minds, which is that 127 00:18:03,600 --> 00:18:09,680 Wikidata is not perfect. I highly recommend the talk by Os Keyes 128 00:18:09,680 --> 00:18:15,920 "Questioning Wikidata" in which it is explained that all classification systems 129 00:18:15,920 --> 00:18:22,640 are inherently dangerous and Wikidata is a large encyclopedic wiki classification 130 00:18:22,640 --> 00:18:30,720 system which makes choices, ethical and political choices, about what is notable, 131 00:18:31,280 --> 00:18:43,120 about how to categorize information. And these choices, they reduce complexity and 132 00:18:43,120 --> 00:18:54,080 reduce also specific forms of- of history, like oral history. This reduction has 133 00:18:54,080 --> 00:19:03,440 consequences. As you know, Wikidata is used by many programs, apps, voice 134 00:19:03,440 --> 00:19:17,084 assistance and what- what and how we store information in Wikidata really matters. So 135 00:19:17,084 --> 00:19:27,280 we ask ourselves, what is encyclopedic knowledge? And how can we organize it in a 136 00:19:27,280 --> 00:19:34,134 more inclusive way? Encyclopedic knowledge is a Western concept, and we can and must 137 00:19:34,134 --> 00:19:45,896 do better than just use our own Western view to organize the world. But then also 138 00:19:45,896 --> 00:19:52,240 the wiki principle applies, we have a huge community behind Wikidata that helps us to 139 00:19:52,240 --> 00:19:59,760 make these decisions, and you can also become a part of this by researching 140 00:19:59,760 --> 00:20:11,646 Wikidata, using it for your work and also contributing your research. So once again, 141 00:20:11,646 --> 00:20:17,927 I want to tell you, you can use Wikidata as a tool for your storytelling. Wikidata 142 00:20:17,927 --> 00:20:24,162 can help you find connections between data. Wikidata can help you find- can help 143 00:20:24,162 --> 00:20:30,406 you build visualization in its query service. You can ask questions about 144 00:20:30,406 --> 00:20:38,080 historical data correlations more critically than you could- than you could 145 00:20:38,080 --> 00:20:45,360 before. And- but there are also downsides to- downsides to Wikidata because it is an 146 00:20:45,360 --> 00:20:55,256 encyclopedic way of organizing Western knowledge. So this was only a start. I'm 147 00:20:55,256 --> 00:21:02,739 looking forward to our Q&A session now and if you have further questions, concerns or 148 00:21:02,739 --> 00:21:08,021 have ideas, you can contact me and my colleagues and you can also contact me 149 00:21:08,021 --> 00:21:18,572 individually. Thank you. 150 00:21:18,572 --> 00:21:23,520 Herald: Hello and welcome to Elizabeth. Thank you very much for your interesting 151 00:21:23,520 --> 00:21:29,520 talk. That was a very great introduction. Elisabeth: Hi. Yeah, thanks for having me. 152 00:21:30,320 --> 00:21:36,240 I'm happy that I was able to talk a bit about Wikidata and how you could do 153 00:21:36,240 --> 00:21:43,040 storytelling with it. I wanted to add that, obviously, you can ask me questions 154 00:21:43,040 --> 00:21:50,640 now, but also I want to hint at the great introduction of Wikidata that one of my 155 00:21:50,640 --> 00:21:57,120 colleagues gave. Yesterday, two of my colleagues, which is already online, and 156 00:21:57,120 --> 00:22:03,040 tomorrow there will be a query service workshops where you can learn a bit more 157 00:22:03,040 --> 00:22:09,040 in-depth how to query Wikidata. Herald: Yeah, that's a very good hint. 158 00:22:09,040 --> 00:22:13,280 There's actually there's two questions in the chat right now. The first one is, are 159 00:22:13,280 --> 00:22:17,840 your slides going to be published because people are interested in your links to the 160 00:22:17,840 --> 00:22:22,320 tutorials, obviously. Elisabeth: Yes, that was, uh, I asked 161 00:22:22,320 --> 00:22:29,840 before, I think the talk will be published and the slides. Is there a Wikipaka board 162 00:22:29,840 --> 00:22:36,320 where I can put it? Otherwise, I can also put a link on our Twitter account, 163 00:22:36,320 --> 00:22:43,600 Wikimedia Deutschland. And yeah... Herald: I think Twitter for now would 164 00:22:43,600 --> 00:22:48,160 probably be the best idea, I actually have to check on the Wikipaka board, but we 165 00:22:48,160 --> 00:22:50,400 will let you know where you can find everything. 166 00:22:50,400 --> 00:23:01,880 Elisabeth: I put it on the Wikimedia Deutschland Twitter. It's @wmde I think 167 00:23:01,880 --> 00:23:05,280 Herald: we will also retweet it obviously. You will find it, I promise. 168 00:23:05,280 --> 00:23:08,720 Elisabeth: OK. Herald: There's another question. What 169 00:23:08,720 --> 00:23:12,720 resources would you recommend for self- studying the writing of queries for 170 00:23:12,720 --> 00:23:19,200 query.wikidata.org? Elisabeth: Mhm. Um, I put some links in 171 00:23:19,200 --> 00:23:27,600 the- in the slides. There is... yeah, we have, like, a few tutorials on Wikidata. 172 00:23:27,600 --> 00:23:35,040 There was also a couple of months ago, a very nice and very easy tutorial published 173 00:23:35,040 --> 00:23:41,600 by Wikimedia Israel. And I- so we didn't do it, but I can recommend it, it's a very 174 00:23:42,640 --> 00:23:47,730 low key introduction to your first queries. 175 00:23:47,730 --> 00:23:54,400 Herald: OK. We will also publish that somehow. I have a question for you as 176 00:23:54,400 --> 00:23:58,800 well. You mentioned that Wikidata is like a great way for meeting other people that 177 00:23:58,800 --> 00:24:05,120 are working on similar topics. So is there some kind of like greater community of 178 00:24:05,120 --> 00:24:13,120 journalists using Wikidata? Elisabeth: So far, the community is mostly 179 00:24:13,120 --> 00:24:19,280 research based. That's also why we wanted to reach out here. So I would recommend 180 00:24:19,280 --> 00:24:26,480 getting in touch with the community on there regarding the research topics that 181 00:24:26,480 --> 00:24:35,360 you have. And you can also get in touch with us and we connect you. I have a noise 182 00:24:35,360 --> 00:24:41,440 in my ear, but I hope it's only me. Herald: Well, I don't have it, so it might 183 00:24:42,400 --> 00:24:47,200 just be you, but I feel like there might be also an echo on the stream, that's what 184 00:24:47,200 --> 00:24:51,280 people on the chat are saying. Elisabeth: Oh, OK. 185 00:24:51,280 --> 00:24:56,160 Herald: So I don't have any other questions in the chat and since there seems to be an 186 00:24:56,160 --> 00:25:02,240 echo on the stream, I don't want to annoy people any further. So I would suggest for 187 00:25:02,240 --> 00:25:07,760 everyone who has further questions to you that you can meet in our Big Blue Button 188 00:25:07,760 --> 00:25:15,840 meetup room that I will be posting in the chat right now and we will continue our 189 00:25:15,840 --> 00:25:22,560 program here at 2:20 with another talk about Flutter by "The one with the braid", 190 00:25:22,560 --> 00:25:29,200 so I'm saying bye for now. Elisabeth: Thanks, bye. 191 00:25:29,200 --> 00:25:30,251 Herald: Bye. 192 00:25:30,251 --> 00:25:33,601 outro music 193 00:25:33,601 --> 00:25:40,000 Subtitles created by c3subtitles.de in the year 2021. Join, and help us!