1 00:00:06,370 --> 00:00:08,540 Hello, everyone. 2 00:00:08,540 --> 00:00:11,636 It's awesome that you're all here, so many of you. 3 00:00:11,647 --> 00:00:13,298 It's really, really great. 4 00:00:14,659 --> 00:00:19,541 So Lea already talked a lot about this event, 5 00:00:19,541 --> 00:00:22,875 and I'm going to talk a bit about Wikidata itself 6 00:00:22,875 --> 00:00:26,255 and what has been happening around it over the last year 7 00:00:26,255 --> 00:00:28,151 and where we are going. 8 00:00:28,663 --> 00:00:32,974 So... what is this? Sorry. 9 00:00:40,118 --> 00:00:44,329 So... where are we? Where are we going? 10 00:00:44,950 --> 00:00:49,680 Over the last year there has been so much to celebrate 11 00:00:49,680 --> 00:00:52,329 and I want to highlight some of that 12 00:00:52,329 --> 00:00:55,125 because sometimes it goes unnoticed. 13 00:00:56,855 --> 00:01:03,864 And first I want to take you through some statistics around editors 14 00:01:03,985 --> 00:01:07,119 and our content and how our data is used. 15 00:01:10,376 --> 00:01:14,976 Over the last year, we have grown our community 16 00:01:14,976 --> 00:01:16,720 which is amazing. 17 00:01:16,724 --> 00:01:21,248 We have around 3,000 new people 18 00:01:21,248 --> 00:01:25,963 who edit once or more in 30 days. 19 00:01:26,133 --> 00:01:30,276 So that's 3,000 new Wikidatans, yay! 20 00:01:31,617 --> 00:01:36,544 Now if you look at people who do more, like five edits in 30 days, 21 00:01:36,544 --> 00:01:40,727 we've got an additional 1,200 roughly. 22 00:01:40,995 --> 00:01:44,202 And if you look at the people who do 100 edits or more-- 23 00:01:44,202 --> 00:01:47,366 I hope many of you in this room-- 24 00:01:47,366 --> 00:01:48,996 we have 300 more. 25 00:01:49,277 --> 00:01:51,450 Raise your hand if you're in this last group. 26 00:01:52,733 --> 00:01:56,049 Woot! You're awesome! 27 00:01:58,059 --> 00:02:04,436 And while the number of edits is usually not something 28 00:02:04,436 --> 00:02:08,592 we pay a lot of attention to, 29 00:02:08,592 --> 00:02:12,683 we did cross the 1 billion edits mark this year. 30 00:02:12,967 --> 00:02:14,597 (applause) 31 00:02:21,347 --> 00:02:23,224 Alright, let's look at content. 32 00:02:27,610 --> 00:02:31,222 So, we're now at 65 million items, 33 00:02:31,462 --> 00:02:34,093 so entities to describe the world, 34 00:02:34,093 --> 00:02:40,541 and we're doing this with around 6,700 properties. 35 00:02:43,667 --> 00:02:48,079 Of those, around 4,300 are external identifiers, 36 00:02:48,079 --> 00:02:53,328 which gives us a lot of linking to other catalogues, databases, 37 00:02:53,328 --> 00:02:55,607 websites and more 38 00:02:55,927 --> 00:02:59,024 and really makes Wikidata the central place 39 00:02:59,024 --> 00:03:01,594 in a linked open data web. 40 00:03:02,453 --> 00:03:07,241 So using those properties and items, 41 00:03:07,241 --> 00:03:11,990 we have around 800 million statements now, 42 00:03:11,990 --> 00:03:15,892 and compared to last year, we know about half a statement more 43 00:03:15,892 --> 00:03:18,365 about every single item. 44 00:03:18,550 --> 00:03:20,480 (laughter) 45 00:03:22,595 --> 00:03:25,144 So, yeah, Wikidata got smarter. 46 00:03:26,914 --> 00:03:29,694 But we don't just have items and properties, 47 00:03:29,724 --> 00:03:33,704 we also have new stuff like lexemes 48 00:03:33,866 --> 00:03:39,825 and we are now at 204,000 lexemes that describe words 49 00:03:39,825 --> 00:03:41,860 in many different languages. 50 00:03:41,939 --> 00:03:43,241 It's very cool. 51 00:03:43,668 --> 00:03:47,661 I will talk more about this in a session later today. 52 00:03:48,860 --> 00:03:52,690 Last, the latest addition are entity schemas 53 00:03:52,690 --> 00:03:58,503 that help us figure out how to consistently model data 54 00:03:58,503 --> 00:04:00,971 across a certain area. 55 00:04:02,171 --> 00:04:04,462 And of those, we have around 140 now. 56 00:04:07,571 --> 00:04:11,432 Now numbers aren't everything around content, right, 57 00:04:11,432 --> 00:04:14,697 amount of content--we also care about quality of the content. 58 00:04:15,613 --> 00:04:21,976 And what we've done now is we've trained a machine learning system 59 00:04:21,976 --> 00:04:25,287 to judge the quality of an item. 60 00:04:25,822 --> 00:04:29,531 Now this is far from perfect, but it gives you an idea. 61 00:04:29,916 --> 00:04:35,011 So every item in Wikidata gets a score between 1 and 5. 62 00:04:35,011 --> 00:04:37,895 One is pretty terrible; five is amazing. 63 00:04:38,446 --> 00:04:41,901 And it looks at things like how many statements does it have, 64 00:04:41,901 --> 00:04:44,031 how many external identifiers does it have, 65 00:04:44,031 --> 00:04:45,922 how many references are there, 66 00:04:45,922 --> 00:04:49,414 how many different labels are there in different languages, 67 00:04:49,414 --> 00:04:50,604 and so on. 68 00:04:50,727 --> 00:04:55,118 And then we looked at Wikidata over time, 69 00:04:55,118 --> 00:04:59,751 and as you can see, based on these measures, 70 00:04:59,751 --> 00:05:03,918 we went from pretty terrible to much better. 71 00:05:03,918 --> 00:05:05,238 (laughter) 72 00:05:05,649 --> 00:05:07,068 So that's good. 73 00:05:07,303 --> 00:05:11,961 But what you can also see, there's still a lot of room to 5. 74 00:05:13,664 --> 00:05:20,171 Now I don't think this is where we will get to, right? 75 00:05:20,380 --> 00:05:23,263 Not every item will be absolutely perfect 76 00:05:23,266 --> 00:05:26,087 according to these measures that we have taken. 77 00:05:26,354 --> 00:05:30,569 But I'm really happy to see that consistently the quality of our data 78 00:05:30,569 --> 00:05:32,387 is getting better and better. 79 00:05:36,709 --> 00:05:43,111 Okay, but creating that data isn't enough. 80 00:05:44,428 --> 00:05:46,734 We want this--we do this for a reason. 81 00:05:46,734 --> 00:05:48,749 We want it to be used. 82 00:05:48,749 --> 00:05:55,450 And now we looked at how many articles 83 00:05:55,450 --> 00:06:00,770 on each of the other Wikimedia projects use data from Wikidata, 84 00:06:02,040 --> 00:06:06,762 and we looked at the percentage of all articles on those projects. 85 00:06:07,395 --> 00:06:09,554 Now if you look across all of Wikimedia 86 00:06:09,554 --> 00:06:11,989 and all of the articles there, 87 00:06:11,989 --> 00:06:18,768 then 56.35% of them today make use of some data from Wikidata. 88 00:06:20,054 --> 00:06:21,815 Which I think is pretty good, 89 00:06:21,815 --> 00:06:27,378 but of course, there's still a lot of room to 100. 90 00:06:29,085 --> 00:06:33,811 And then I looked at which projects are actually making most use 91 00:06:33,811 --> 00:06:36,188 of Wikidata's data, 92 00:06:36,188 --> 00:06:39,401 and I split this by language versions and so on. 93 00:06:39,606 --> 00:06:44,997 And now what do you think the top five projects-- 94 00:06:45,577 --> 00:06:48,254 which ones are all of them? 95 00:06:48,254 --> 00:06:50,834 Which project family do they belong to? 96 00:06:51,036 --> 00:06:53,177 (several in audience) Commons. 97 00:06:53,278 --> 00:06:56,607 Okay, that's pretty uniformly Commons. 98 00:06:57,216 --> 00:06:58,903 You would actually be wrong. 99 00:06:59,112 --> 00:07:01,684 All of the top five are Wikivoyage. 100 00:07:02,084 --> 00:07:03,650 (audience) Oh! 101 00:07:03,692 --> 00:07:05,044 (laughter) 102 00:07:05,439 --> 00:07:08,345 So yeah, applause to Wikivoyage. 103 00:07:08,937 --> 00:07:10,741 (applause) 104 00:07:17,070 --> 00:07:20,383 If you would like to check where Commons actually is 105 00:07:20,383 --> 00:07:22,053 and where all of your other projects are, 106 00:07:22,053 --> 00:07:23,521 there is a dashboard. 107 00:07:23,521 --> 00:07:25,443 Come to me and we can check it out. 108 00:07:28,049 --> 00:07:32,016 Of course, inside Wikimedia is not the only place where our data is used. 109 00:07:32,016 --> 00:07:34,606 It's also used outside, and so much has happened. 110 00:07:34,966 --> 00:07:39,256 I can't begin to mention it all, but to highlight some 111 00:07:39,518 --> 00:07:44,028 there are great uses of our data at the Met, at the Wellcome Trust, 112 00:07:44,030 --> 00:07:45,687 at the Library of Congress, 113 00:07:45,687 --> 00:07:47,848 in GeneWiki and so many more. 114 00:07:47,951 --> 00:07:51,296 And if you go through some of the sessions later in the program, 115 00:07:51,296 --> 00:07:53,292 you will hear about some of them. 116 00:07:56,635 --> 00:07:59,608 Alright, enough statistics. 117 00:07:59,977 --> 00:08:02,171 Let's look at some other highlights. 118 00:08:02,644 --> 00:08:06,897 So we already talked about data quality improving, 119 00:08:06,897 --> 00:08:10,646 and when you look at data quality, there are a lot of dimensions 120 00:08:10,646 --> 00:08:16,426 that you can look at, and we've improved on some of those, 121 00:08:16,482 --> 00:08:18,980 like how accurate is the data, 122 00:08:18,980 --> 00:08:20,751 how trustworthy is the data, 123 00:08:20,751 --> 00:08:22,515 how referenced is it, 124 00:08:22,515 --> 00:08:24,865 how consistent is it modeled, 125 00:08:26,351 --> 00:08:28,992 how completed is it and so on. 126 00:08:31,263 --> 00:08:35,746 Just to pick out one-- for consistency for example, 127 00:08:35,746 --> 00:08:42,355 we have created the ability to store entity schemas now in Wikidata 128 00:08:42,355 --> 00:08:46,553 so that you can describe how certain domains should be modeled. 129 00:08:46,806 --> 00:08:49,172 So you can find-- 130 00:08:49,557 --> 00:08:53,902 you can create an entity schema, say, for Dutch painters, 131 00:08:53,902 --> 00:08:56,492 and then you can look how-- 132 00:08:56,492 --> 00:08:59,359 which items that are for Dutch painters 133 00:08:59,359 --> 00:09:02,470 do not, for example, have a date of birth but should 134 00:09:02,470 --> 00:09:05,235 and similar things like that. 135 00:09:05,557 --> 00:09:10,011 And I hope that a lot more wiki projects and so on 136 00:09:10,011 --> 00:09:13,291 will be able to make use of entity schemas to take good care 137 00:09:13,291 --> 00:09:15,925 of their data, and if you want to learn how to do that, 138 00:09:15,925 --> 00:09:18,055 there's a session later in the program as well 139 00:09:18,055 --> 00:09:23,072 by people who know all about this and will make this less 140 00:09:23,072 --> 00:09:24,858 of a black box for you. 141 00:09:27,575 --> 00:09:28,745 Alright. 142 00:09:30,899 --> 00:09:34,701 Another thing that really got traction 143 00:09:34,774 --> 00:09:37,819 over the last year is the Wikibase ecosystem, right? 144 00:09:38,087 --> 00:09:44,015 This idea that not all open data should and has to happen 145 00:09:44,015 --> 00:09:47,490 in Wikidata, but instead, we want a thriving ecosystem 146 00:09:47,490 --> 00:09:51,151 of different places, of different actors, 147 00:09:51,151 --> 00:09:53,513 like institutions, companies, 148 00:09:53,513 --> 00:09:56,929 volunteer projects opening up their data in a similar way 149 00:09:56,929 --> 00:10:00,372 that Wikidata does it and then connecting all of it, 150 00:10:00,372 --> 00:10:03,317 exchanging data between those, linking that data. 151 00:10:04,282 --> 00:10:08,808 And over the last year, the interest in that 152 00:10:08,808 --> 00:10:11,760 and the interest in institutions and people running 153 00:10:11,760 --> 00:10:14,977 their own Wikibase instance has really exploded, 154 00:10:14,977 --> 00:10:20,466 and especially in the sector of libraries. 155 00:10:23,009 --> 00:10:26,210 There's a lot of testing, evaluating, 156 00:10:26,226 --> 00:10:28,787 and to be honest, trailblazing, 157 00:10:28,787 --> 00:10:33,536 going on there at the moment where adventurous institutions 158 00:10:33,536 --> 00:10:38,872 work with us to really figure out how Wikibase can work 159 00:10:38,872 --> 00:10:42,243 for their collections, for their catalogues and so on. 160 00:10:42,539 --> 00:10:45,024 Among them, the German National Library, 161 00:10:45,024 --> 00:10:46,419 the French National Library, 162 00:10:46,419 --> 00:10:49,194 OCLC and it's really exciting to see. 163 00:10:55,278 --> 00:10:57,360 One of the reasons why I think this is so exciting 164 00:10:57,360 --> 00:11:02,868 is that we are helping these institutions open up data in a way that is 165 00:11:02,868 --> 00:11:07,914 not just putting it on a website and someone can access it 166 00:11:07,926 --> 00:11:11,947 but really thinking about this-- the next step after that, right? 167 00:11:11,947 --> 00:11:15,229 Letting people help you maintain that data, augment that data, 168 00:11:15,229 --> 00:11:20,449 enrich it, and that's really a shift 169 00:11:20,450 --> 00:11:24,526 that I hope will bring good things. 170 00:11:26,041 --> 00:11:27,859 And the other thing it helps us with 171 00:11:27,859 --> 00:11:31,203 is that it lets experts curate the data 172 00:11:31,203 --> 00:11:37,474 in their space, keep it in good shape so that we can then set up 173 00:11:37,474 --> 00:11:42,317 synchronizing processes to Wikidata, for example, 174 00:11:42,317 --> 00:11:45,604 instead of having to take care of it ourselves all the time. 175 00:11:46,519 --> 00:11:50,223 And at the end of the day, I hope it will take some pressure 176 00:11:50,223 --> 00:11:53,776 off of Wikidata to be that place where everything has to go. 177 00:11:58,040 --> 00:12:00,450 Lexicographical data-- 178 00:12:01,962 --> 00:12:06,997 Over the last year, people started describing words 179 00:12:07,060 --> 00:12:12,264 in their language in Wikidata so that we can build things 180 00:12:12,264 --> 00:12:14,713 like automated translation tools, 181 00:12:16,413 --> 00:12:21,019 and we are at the point where in some languages 182 00:12:21,019 --> 00:12:25,500 we are starting to get nearer to reaching that critical mass 183 00:12:25,500 --> 00:12:29,143 that is needed to actually build a serious application. 184 00:12:29,527 --> 00:12:32,614 In a lot of languages, we still have a long way to go, 185 00:12:32,614 --> 00:12:35,411 but in some, we're really starting to get there, 186 00:12:35,411 --> 00:12:37,265 and that's really great to see. 187 00:12:38,621 --> 00:12:41,430 If you want to know more about this, come to my session later today. 188 00:12:46,064 --> 00:12:48,954 And, of course, not to forget, 189 00:12:48,954 --> 00:12:50,955 structured data on Commons. 190 00:12:51,150 --> 00:12:52,384 (audience member whistles) 191 00:12:52,440 --> 00:12:54,052 Yes! (laughs) 192 00:12:54,216 --> 00:12:55,941 (applause) 193 00:12:59,324 --> 00:13:01,515 The structured data on Commons seen at the foundation 194 00:13:01,515 --> 00:13:05,571 has really gotten... 195 00:13:07,121 --> 00:13:11,459 everything together and made it possible 196 00:13:11,459 --> 00:13:15,479 to add statements to files on Commons over the last year, 197 00:13:15,526 --> 00:13:18,586 and people are starting to add those statements to images 198 00:13:18,586 --> 00:13:22,770 to then make it easier to find to build better applications on top of it, 199 00:13:22,770 --> 00:13:24,292 and so much more. 200 00:13:24,292 --> 00:13:26,852 It's really exciting to see how that is growing, 201 00:13:26,852 --> 00:13:29,988 and I think what's really important 202 00:13:29,988 --> 00:13:32,959 for the Wikidata community to understand here 203 00:13:32,959 --> 00:13:36,555 is that when you see "depicts" 204 00:13:36,555 --> 00:13:41,577 or "house cat" or "sitting," "lizard" and "wall" here, 205 00:13:41,577 --> 00:13:44,867 those are links to Wikidata items and properties. 206 00:13:45,425 --> 00:13:49,620 That means when we create items and properties, 207 00:13:49,620 --> 00:13:54,031 those are no longer just providing the vocabulary for Wikidata itself. 208 00:13:54,031 --> 00:13:57,749 They are providing the vocabulary for Commons as well. 209 00:13:57,904 --> 00:14:00,695 And this will only get more and more so, 210 00:14:00,695 --> 00:14:02,929 so we have to pay a lot more attention 211 00:14:02,929 --> 00:14:06,550 to how our ontology, our vocabulary 212 00:14:06,550 --> 00:14:09,777 is actually used in other places than we had before. 213 00:14:13,589 --> 00:14:19,905 And the last one I have is that we've started building stronger bridges 214 00:14:19,905 --> 00:14:21,902 to the other Wikimedia projects. 215 00:14:23,281 --> 00:14:26,159 My team and I are working on a project called the Wikidata Bridge, 216 00:14:26,159 --> 00:14:28,849 and you should totally come to the UX booth 217 00:14:28,849 --> 00:14:32,904 and do some testing of the current state 218 00:14:32,904 --> 00:14:36,240 that will have for example Wikipedia editors 219 00:14:36,240 --> 00:14:38,970 edit Wikidata directly from their projects 220 00:14:38,976 --> 00:14:40,988 without having to go to Wikidata 221 00:14:40,988 --> 00:14:43,958 and having to understand everything around it. 222 00:14:43,958 --> 00:14:50,571 I hope that this will take away one more hurdle that makes it difficult 223 00:14:50,571 --> 00:14:54,498 for Wikimedia projects to adopt more data from Wikidata. 224 00:14:57,165 --> 00:15:01,012 Alright, now to strategies and where are we going? 225 00:15:03,005 --> 00:15:07,179 Since December, the Wikidata team at Wikimedia Deutschland, 226 00:15:07,179 --> 00:15:12,262 and people from the Wikimedia Foundation have been working on strategies, 227 00:15:12,262 --> 00:15:14,675 papers around Wikidata. 228 00:15:14,675 --> 00:15:16,101 It's basically writing down 229 00:15:16,101 --> 00:15:19,526 what a lot of us have been talking about already 230 00:15:19,526 --> 00:15:22,958 over the last four or five years. 231 00:15:23,995 --> 00:15:29,492 And I don't know if all of you have read those papers. 232 00:15:29,492 --> 00:15:33,887 They're published on Meta Commons until the end of the month. 233 00:15:33,887 --> 00:15:35,806 It would be great if you haven't read them, 234 00:15:35,806 --> 00:15:39,019 go read them, leave your comments and so on. 235 00:15:40,062 --> 00:15:44,338 Now the very quick overview of what is in there 236 00:15:44,338 --> 00:15:50,991 is that we think about Wikidata and Wikibase in three pieces. 237 00:15:51,506 --> 00:15:55,442 The first one is Wikidata as a platform. 238 00:15:55,442 --> 00:15:57,023 You can see it in the lower corner, 239 00:15:57,301 --> 00:16:03,876 and that is really around Wikidata enables every person 240 00:16:03,876 --> 00:16:06,273 to access and share information 241 00:16:06,273 --> 00:16:09,038 regardless of their language and technology, 242 00:16:09,038 --> 00:16:14,479 and we do that by providing general purpose data about the world. 243 00:16:14,479 --> 00:16:18,161 So basically what you do every day. 244 00:16:21,282 --> 00:16:25,497 The second thing is the Wikibase ecosystem part 245 00:16:25,497 --> 00:16:30,047 where Wikibase, the software running Wikidata, powers 246 00:16:30,047 --> 00:16:34,993 not just Wikidata, but a thriving open data web that is the backbone 247 00:16:35,008 --> 00:16:36,817 of free and open knowledge. 248 00:16:38,126 --> 00:16:43,005 And the third and last thing is Wikidata for the Wikimedia projects 249 00:16:43,005 --> 00:16:47,011 at the top where Wikidata is there 250 00:16:47,011 --> 00:16:49,594 to help the Wikimedia projects-- 251 00:16:50,750 --> 00:16:53,559 help make them ready for the future. 252 00:16:57,597 --> 00:17:02,973 Concretely, what does that mean for the near or midterm future? 253 00:17:03,898 --> 00:17:06,235 Wikidata as a platform-- 254 00:17:06,669 --> 00:17:10,700 We want to have better data quality, so we will continue working 255 00:17:10,700 --> 00:17:14,195 on better tools, improving the tools we have and so on. 256 00:17:14,633 --> 00:17:18,899 We need to make our data more accessible 257 00:17:18,899 --> 00:17:23,864 through better APIs, a more robust SPARQL endpoint 258 00:17:23,864 --> 00:17:27,315 but also things like more consistently modeling our data 259 00:17:27,315 --> 00:17:31,235 so it actually is easy to reuse in applications. 260 00:17:31,867 --> 00:17:37,203 And the last thing I had was setting up feedback processes 261 00:17:37,203 --> 00:17:38,769 with our partners. 262 00:17:40,399 --> 00:17:43,905 Unlike Wikipedia, Wikidata is not 263 00:17:43,905 --> 00:17:46,142 what I call a destination project, right? 264 00:17:46,142 --> 00:17:49,166 Someone goes to Wikipedia and reads it 265 00:17:49,166 --> 00:17:50,742 whereas Wikidata is usually not 266 00:17:50,742 --> 00:17:53,295 someone goes to Wikidata and reads it. 267 00:17:53,295 --> 00:17:54,309 It would be awesome, 268 00:17:54,309 --> 00:17:57,834 but realistically it's not what it is, right? 269 00:17:57,882 --> 00:18:00,520 A lot of the people who are exposed 270 00:18:00,520 --> 00:18:02,719 to our data are not on Wikidata itself, 271 00:18:02,770 --> 00:18:06,838 but they are seeing it through Wikipedia and many other places. 272 00:18:07,847 --> 00:18:12,238 Now these other places do get feedback on that data, right? 273 00:18:12,238 --> 00:18:14,635 Their users tell them, "Hey, here's something that's wrong," 274 00:18:16,775 --> 00:18:20,952 and I would like to have that so that we can make it available 275 00:18:20,958 --> 00:18:24,179 to the people who actually edit on Wikidata, meaning you. 276 00:18:24,374 --> 00:18:27,212 And figuring out how to do that in a meaningful way 277 00:18:27,212 --> 00:18:31,679 without overwhelming everyone will be one of the things to do 278 00:18:31,679 --> 00:18:33,143 over the next year. 279 00:18:34,623 --> 00:18:37,127 Alright, Wikibase ecosystem. 280 00:18:37,127 --> 00:18:40,925 There, we will continue to work with the libraries, 281 00:18:41,055 --> 00:18:46,192 but also look into science, for example, and more. 282 00:18:46,278 --> 00:18:51,641 There is a Wikibase showcase later today that you should totally go to 283 00:18:51,641 --> 00:18:52,951 and see what's already there 284 00:18:52,951 --> 00:18:55,852 and what people are already doing with Wikibase. 285 00:18:55,875 --> 00:18:57,281 It's really worth it. 286 00:18:57,682 --> 00:19:00,832 And what's needed there is 287 00:19:00,832 --> 00:19:03,181 also setting up good processes around that. 288 00:19:04,384 --> 00:19:08,138 Helping people figure out who to talk to about what, 289 00:19:08,138 --> 00:19:10,467 where they can find help, 290 00:19:10,467 --> 00:19:11,831 all these kinds of things. 291 00:19:13,474 --> 00:19:17,395 And, of course, making it easier to install and maintain 292 00:19:17,395 --> 00:19:20,309 a Wikibase because that's still a bit of a pain. 293 00:19:21,144 --> 00:19:24,617 And the last thing is federation which is basically 294 00:19:24,617 --> 00:19:27,245 what we've been talking about for Commons earlier 295 00:19:27,245 --> 00:19:30,704 where Commons uses Wikidata's items and properties 296 00:19:30,704 --> 00:19:33,514 but for other Wikibase instances out there 297 00:19:33,514 --> 00:19:36,488 so they can also use Wikidata's vocabulary. 298 00:19:37,742 --> 00:19:42,107 And that, as I was saying earlier, increases yet again 299 00:19:42,107 --> 00:19:48,228 the need to be mindful of how our vocabulary is used out there 300 00:19:48,228 --> 00:19:51,055 more than we have had to so far. 301 00:19:53,792 --> 00:19:56,556 And Wikidata for the Wikimedia projects-- 302 00:19:57,132 --> 00:20:00,580 of course, tighter integration through the Wikidata Bridge 303 00:20:00,580 --> 00:20:04,154 and helping people edit directly from their projects 304 00:20:04,154 --> 00:20:08,999 and the other thing that we all need to think about together, I think, 305 00:20:08,999 --> 00:20:14,684 is figuring out how to reduce the language barriers. 306 00:20:15,484 --> 00:20:19,096 The more Wikidata is integrated in the Wikimedia projects, 307 00:20:19,096 --> 00:20:22,472 the more people will have a need to talk to each other 308 00:20:22,472 --> 00:20:25,705 about that data without speaking the same language, 309 00:20:25,705 --> 00:20:31,680 and we have to figure out how to deal with that. 310 00:20:33,276 --> 00:20:36,634 If people have smart ideas, I would love to talk to you. 311 00:20:38,790 --> 00:20:41,365 And with that, I come to the end of my talk. 312 00:20:41,618 --> 00:20:44,248 Thank you, everyone, for giving more people more access 313 00:20:44,248 --> 00:20:46,305 to more knowledge every day. 314 00:20:46,688 --> 00:20:48,914 (applause) 315 00:20:58,015 --> 00:20:59,902 We have some time for questions 316 00:20:59,902 --> 00:21:01,774 so if there are any questions in the audience 317 00:21:01,774 --> 00:21:04,975 or if you are remotely watching the livestream--Hi, Mom-- 318 00:21:04,992 --> 00:21:08,072 you can ask the question on the EtherPad 319 00:21:08,072 --> 00:21:11,387 or on the Telegram Channel and we'll do our best. 320 00:21:11,387 --> 00:21:13,233 So anything? 321 00:21:15,516 --> 00:21:16,655 Ah. 322 00:21:21,133 --> 00:21:25,208 (person 1) Hi, everyone, this is more of a meme than a question, 323 00:21:25,243 --> 00:21:32,341 so when the time extension will be able to also to get 324 00:21:32,341 --> 00:21:35,509 hours and minutes and seconds 325 00:21:35,509 --> 00:21:38,376 because up till now the position is just to date. 326 00:21:38,376 --> 00:21:41,610 - I know... it's not my question-- - (laughing) 327 00:21:41,610 --> 00:21:44,230 That's why I said it's a meme. 328 00:21:44,230 --> 00:21:46,093 Every time is always like that, 329 00:21:46,093 --> 00:21:48,738 but it comes always from remote so... 330 00:21:50,001 --> 00:21:53,188 I do not have a very good answer to that. 331 00:21:53,260 --> 00:21:54,443 I'm sorry. 332 00:21:55,678 --> 00:22:01,636 But maybe as some background, people need it even more 333 00:22:01,636 --> 00:22:07,531 to describe images on Commons so it might bubble up the long list 334 00:22:07,531 --> 00:22:11,071 of things that need to be done a bit faster through that. 335 00:22:14,713 --> 00:22:16,236 Any more questions? 336 00:22:24,686 --> 00:22:27,655 (person 2) [Linda] from Wikimedia Foundation's research team-- 337 00:22:27,655 --> 00:22:31,080 I have a question about your thoughts 338 00:22:31,080 --> 00:22:37,763 on patrolling, and that may be related to quality of content on Wikidata, 339 00:22:37,803 --> 00:22:39,756 but if you can speak to that 340 00:22:39,756 --> 00:22:43,542 like how do you see the near medium term patrolling efforts changing, 341 00:22:43,542 --> 00:22:45,557 especially with the Bridge project 342 00:22:45,559 --> 00:22:48,147 which I'm looking forward to going out and trying it. 343 00:22:48,147 --> 00:22:49,433 Yeah, thank you. 344 00:22:52,298 --> 00:22:56,812 So as you say, with things like we did at Bridge, 345 00:22:58,812 --> 00:23:03,287 a lot more effort will have to be spent on patrolling, I think. 346 00:23:04,482 --> 00:23:08,554 But we are at a size where this is probably not feasible 347 00:23:08,554 --> 00:23:10,922 to do it by hand, by a human, 348 00:23:10,922 --> 00:23:15,090 so we need to spend a lot more effort on improving, for example, 349 00:23:15,090 --> 00:23:18,387 ORES, the machine learning system to help us with that, 350 00:23:18,407 --> 00:23:24,588 to help us figure out which edits a human really needs to look at 351 00:23:24,588 --> 00:23:26,493 and which is probably just like yeah, 352 00:23:26,493 --> 00:23:29,792 the regular stuff I don't need to look at this. 353 00:23:33,777 --> 00:23:38,878 Currently, ORES is not super good at judging what-- 354 00:23:38,878 --> 00:23:41,459 if an edit on Wikidata is good or bad. 355 00:23:41,459 --> 00:23:44,549 There's currently a campaign going on 356 00:23:44,549 --> 00:23:50,280 that is training the machine learning system, 357 00:23:51,062 --> 00:23:52,474 with your help, 358 00:23:53,141 --> 00:23:55,550 to teach it basically what a good edit is 359 00:23:55,550 --> 00:23:57,078 and what a bad edit is, 360 00:23:57,109 --> 00:24:02,774 and we haven't reached the threshold of enough humans teaching it yet 361 00:24:02,774 --> 00:24:08,025 to really improve it, but if you have a few minutes, 362 00:24:08,025 --> 00:24:11,098 it would be great if you help teach ORES 363 00:24:11,098 --> 00:24:13,586 make better judgements about Wikidata edits. 364 00:24:13,768 --> 00:24:15,837 And it's really simple-- it shows you an edit, 365 00:24:15,842 --> 00:24:17,584 and you say this is a good edit, 366 00:24:17,584 --> 00:24:19,658 this is a bad edit, and that's it. 367 00:24:20,041 --> 00:24:23,193 You can do this in front of the TV in the evening on the couch. 368 00:24:25,588 --> 00:24:27,021 (person 3) Share a link. 369 00:24:28,000 --> 00:24:31,059 We will share a link in the Telegram Group, yes. 370 00:24:32,239 --> 00:24:36,239 And once we've reached the threshold we need-- 371 00:24:36,239 --> 00:24:39,269 I think it's around 7,000, but I might be wrong-- 372 00:24:40,223 --> 00:24:44,359 then we can rerun the training for ORES and then it will be 373 00:24:44,374 --> 00:24:48,484 hopefully considerably better at judging the edits on Wikidata. 374 00:24:49,909 --> 00:24:52,063 And then I hope more of you can use that 375 00:24:52,063 --> 00:24:56,029 to filter recent changes, for example, or your watch list 376 00:24:56,029 --> 00:24:58,333 for edits that really need your attention. 377 00:24:59,093 --> 00:25:00,227 Yeah. 378 00:25:02,899 --> 00:25:04,004 Hi. 379 00:25:07,116 --> 00:25:09,964 (person 4) I'm just curious to know, and this is a question not from me, 380 00:25:09,964 --> 00:25:12,729 but from partners that I've been working with, 381 00:25:12,729 --> 00:25:16,190 the more partners we have joining Wikidata 382 00:25:16,190 --> 00:25:19,916 and starting to experiment with queries, 383 00:25:19,916 --> 00:25:23,079 the more issues we are having with timeout of queries 384 00:25:23,147 --> 00:25:25,766 so what's happening with that? 385 00:25:27,732 --> 00:25:30,170 So, some people at the Wikimedia Foundation 386 00:25:30,170 --> 00:25:34,355 are looking into that, and--small spoiler-- 387 00:25:34,355 --> 00:25:36,988 be there for the birthday present session. 388 00:25:37,142 --> 00:25:38,181 (laughter) 389 00:25:43,384 --> 00:25:46,201 (person 5) Hello, I'm Bart Magnus from Belgium (PACKED). 390 00:25:46,201 --> 00:25:48,620 I would like to know what the current state of affairs is 391 00:25:48,620 --> 00:25:52,115 regarding federation so raising your properties 392 00:25:52,115 --> 00:25:53,752 in your own Wikibase instance-- 393 00:25:53,752 --> 00:25:56,887 is there anything to mention about that? 394 00:25:56,898 --> 00:26:01,425 So over the last year, a lot of people have told us 395 00:26:01,425 --> 00:26:03,996 that they want federation, right? 396 00:26:03,996 --> 00:26:06,866 But the problem was that a lot of people understood 397 00:26:06,866 --> 00:26:09,318 very different things when they said federation. 398 00:26:10,566 --> 00:26:13,533 Some of those things were very easily doable. 399 00:26:13,533 --> 00:26:15,664 Some of those things were really, really hard. 400 00:26:16,934 --> 00:26:22,148 And my team and I have been talking to a lot of people, for example, 401 00:26:22,148 --> 00:26:27,193 the partners we work with at libraries to figure out what is it actually 402 00:26:27,193 --> 00:26:28,836 precisely that they need. 403 00:26:30,111 --> 00:26:33,893 And we finished that now, though, of course, I'm happy 404 00:26:33,893 --> 00:26:37,850 to take more feedback if you want to talk to me about that, 405 00:26:37,850 --> 00:26:41,397 and now I'm at a stage where I'm comfortable to say, 406 00:26:41,397 --> 00:26:43,480 "Okay, we're going to start with that." 407 00:26:44,606 --> 00:26:48,197 And that will happen over the next I would say two or three months 408 00:26:48,197 --> 00:26:51,243 that we actually write the first lines of code 409 00:26:51,243 --> 00:26:53,793 and then hopefully have people able 410 00:26:53,793 --> 00:26:56,533 to test it early next year, I would say. 411 00:26:59,661 --> 00:27:01,023 (presenter) Okay, last questions. 412 00:27:02,457 --> 00:27:05,603 (person 6) Finn Årup Nielsen from Copenhagen, Denmark. 413 00:27:05,973 --> 00:27:09,833 In relation to the other language, there's been a sort of discussion 414 00:27:09,833 --> 00:27:13,617 in the WikiCite community about whether we should continue 415 00:27:13,617 --> 00:27:15,765 to put more scientific papers in there-- 416 00:27:15,768 --> 00:27:19,913 this relates to how much data we can put into Wikidata. 417 00:27:19,913 --> 00:27:23,032 Timeout in the Wikidata Query Service is one issue 418 00:27:23,032 --> 00:27:24,468 but also the maintaining 419 00:27:24,468 --> 00:27:30,300 so what are your thoughts about... 420 00:27:31,060 --> 00:27:35,173 Is the size of Wikidata beginning to be a problem 421 00:27:35,173 --> 00:27:36,237 in general? 422 00:27:36,237 --> 00:27:38,666 Should we stop putting in lexeme data? 423 00:27:38,666 --> 00:27:41,222 Should we stop putting in scientific data 424 00:27:41,222 --> 00:27:45,717 into Wikidata or do we have any research on this 425 00:27:45,717 --> 00:27:50,053 or technical problems inflating? 426 00:27:50,292 --> 00:27:51,445 Yeah... 427 00:27:53,266 --> 00:27:57,419 Wikidata is definitely coming to some... 428 00:27:58,906 --> 00:28:02,732 scalability boundaries, let's say, 429 00:28:03,740 --> 00:28:05,975 both technically and socially. 430 00:28:05,975 --> 00:28:09,197 And for both we need solutions, right? 431 00:28:09,197 --> 00:28:12,518 Socially, we have things like more editors 432 00:28:12,518 --> 00:28:15,689 and recent changes to the point where it's completely unfeasible 433 00:28:15,689 --> 00:28:19,623 for a human to patrol that because it's simply too much. 434 00:28:21,246 --> 00:28:26,205 But also technically, and we've been addressing some of that. 435 00:28:26,205 --> 00:28:29,958 For example, some database re-architecturing 436 00:28:29,958 --> 00:28:33,718 around database view-turned table, if that says anything for anyone. 437 00:28:35,900 --> 00:28:38,366 But those only get us so far, 438 00:28:38,516 --> 00:28:41,343 and one of the things we want to look at next year 439 00:28:41,343 --> 00:28:45,968 is where the other pain points are and what to do about them 440 00:28:45,968 --> 00:28:47,585 on the technical side. 441 00:28:49,085 --> 00:28:50,728 So that's a general picture. 442 00:28:50,728 --> 00:28:54,455 At the same time, I am very hesitant 443 00:28:54,455 --> 00:28:58,387 to tell anyone, "No, no, no, stop putting data into Wikidata." 444 00:28:58,400 --> 00:29:02,408 That would kind of defeat the purpose. 445 00:29:04,311 --> 00:29:07,061 But, for example, the Wikibase ecosystem 446 00:29:07,061 --> 00:29:09,220 is one way to address that, right, 447 00:29:09,220 --> 00:29:13,952 to not require everything in Wikidata. 448 00:29:13,952 --> 00:29:16,267 That's the whole beauty of linked open data. 449 00:29:16,267 --> 00:29:18,298 You don't have to have it all in the same place. 450 00:29:18,298 --> 00:29:19,642 You can connect different places. 451 00:29:19,642 --> 00:29:20,859 It's amazing. 452 00:29:21,957 --> 00:29:28,309 So around WikiCites specifically, yes-- 453 00:29:29,644 --> 00:29:34,718 okay, WikiCites specifically, I think we need 454 00:29:34,718 --> 00:29:36,256 to look at in proportion. 455 00:29:36,256 --> 00:29:40,548 I don't have an exact percentage of what percentage 456 00:29:40,548 --> 00:29:44,511 of the items in Wikidata are around WikiCite topics, 457 00:29:44,511 --> 00:29:46,696 but it's a big percentage. 458 00:29:46,696 --> 00:29:49,869 And maybe that's the thing we need to talk about... 459 00:29:50,356 --> 00:29:52,442 in the break. 460 00:29:53,191 --> 00:29:54,766 Well, thank you very much! 461 00:29:54,845 --> 00:29:56,281 (applause)