WEBVTT 00:00:06.370 --> 00:00:08.540 Hello, everyone. 00:00:08.540 --> 00:00:11.636 It's awesome that you're all here, so many of you. 00:00:11.647 --> 00:00:13.298 It's really, really great. 00:00:14.659 --> 00:00:19.541 So Lea already talked a lot about this event, 00:00:19.541 --> 00:00:22.875 and I'm going to talk a bit about Wikidata itself 00:00:22.875 --> 00:00:26.255 and what has been happening around it over the last year 00:00:26.255 --> 00:00:28.151 and where we are going. 00:00:28.663 --> 00:00:32.974 So... what is this? Sorry. 00:00:40.118 --> 00:00:44.329 So... where are we? Where are we going? 00:00:44.950 --> 00:00:49.680 Over the last year there has been so much to celebrate 00:00:49.680 --> 00:00:52.329 and I want to highlight some of that 00:00:52.329 --> 00:00:55.125 because sometimes it goes unnoticed. 00:00:56.855 --> 00:01:03.864 And first I want to take you through some statistics around editors 00:01:03.985 --> 00:01:07.119 and our content and how our data is used. 00:01:10.376 --> 00:01:14.976 Over the last year, we have grown our community 00:01:14.976 --> 00:01:16.720 which is amazing. 00:01:16.724 --> 00:01:21.248 We have around 3,000 new people 00:01:21.248 --> 00:01:25.963 who edit once or more in 30 days. 00:01:26.133 --> 00:01:30.276 So that's 3,000 new Wikidatans, yay! 00:01:31.617 --> 00:01:36.544 Now if you look at people who do more, like five edits in 30 days, 00:01:36.544 --> 00:01:40.727 we've got an additional 1,200 roughly. 00:01:40.995 --> 00:01:44.202 And if you look at the people who do 100 edits or more-- 00:01:44.202 --> 00:01:47.366 I hope many of you in this room-- 00:01:47.366 --> 00:01:48.996 we have 300 more. 00:01:49.277 --> 00:01:51.450 Raise your hand if you're in this last group. 00:01:52.733 --> 00:01:56.049 Woot! You're awesome! 00:01:58.059 --> 00:02:04.436 And while the number of edits is usually not something 00:02:04.436 --> 00:02:08.592 we pay a lot of attention to, 00:02:08.592 --> 00:02:12.683 we did cross the 1 billion edits mark this year. 00:02:12.967 --> 00:02:14.597 (applause) 00:02:21.347 --> 00:02:23.224 Alright, let's look at content. 00:02:27.610 --> 00:02:31.222 So, we're now at 65 million items, 00:02:31.462 --> 00:02:34.093 so entities to describe the world, 00:02:34.093 --> 00:02:40.541 and we're doing this with around 6,700 properties. 00:02:43.667 --> 00:02:48.079 Of those, around 4,300 are external identifiers, 00:02:48.079 --> 00:02:53.328 which gives us a lot of linking to other catalogues, databases, 00:02:53.328 --> 00:02:55.607 websites and more 00:02:55.927 --> 00:02:59.024 and really makes Wikidata the central place 00:02:59.024 --> 00:03:01.594 in a linked open data web. 00:03:02.453 --> 00:03:07.241 So using those properties and items, 00:03:07.241 --> 00:03:11.990 we have around 800 million statements now, 00:03:11.990 --> 00:03:15.892 and compared to last year, we know about half a statement more 00:03:15.892 --> 00:03:18.365 about every single item. 00:03:18.550 --> 00:03:20.480 (laughter) 00:03:22.595 --> 00:03:25.144 So, yeah, Wikidata got smarter. 00:03:26.914 --> 00:03:29.694 But we don't just have items and properties, 00:03:29.724 --> 00:03:33.704 we also have new stuff like lexemes 00:03:33.866 --> 00:03:39.825 and we are now at 204,000 lexemes that describe words 00:03:39.825 --> 00:03:41.860 in many different languages. 00:03:41.939 --> 00:03:43.241 It's very cool. 00:03:43.668 --> 00:03:47.661 I will talk more about this in a session later today. 00:03:48.860 --> 00:03:52.690 Last, the latest addition are entity schemas 00:03:52.690 --> 00:03:58.503 that help us figure out how to consistently model data 00:03:58.503 --> 00:04:00.971 across a certain area. 00:04:02.171 --> 00:04:04.462 And of those, we have around 140 now. 00:04:07.571 --> 00:04:11.432 Now numbers aren't everything around content, right, 00:04:11.432 --> 00:04:14.697 amount of content--we also care about quality of the content. 00:04:15.613 --> 00:04:21.976 And what we've done now is we've trained a machine learning system 00:04:21.976 --> 00:04:25.287 to judge the quality of an item. 00:04:25.822 --> 00:04:29.531 Now this is far from perfect, but it gives you an idea. 00:04:29.916 --> 00:04:35.011 So every item in Wikidata gets a score between 1 and 5. 00:04:35.011 --> 00:04:37.895 One is pretty terrible; five is amazing. 00:04:38.446 --> 00:04:41.901 And it looks at things like how many statements does it have, 00:04:41.901 --> 00:04:44.031 how many external identifiers does it have, 00:04:44.031 --> 00:04:45.922 how many references are there, 00:04:45.922 --> 00:04:49.414 how many different labels are there in different languages, 00:04:49.414 --> 00:04:50.604 and so on. 00:04:50.727 --> 00:04:55.118 And then we looked at Wikidata over time, 00:04:55.118 --> 00:04:59.751 and as you can see, based on these measures, 00:04:59.751 --> 00:05:03.918 we went from pretty terrible to much better. 00:05:03.918 --> 00:05:05.238 (laughter) 00:05:05.649 --> 00:05:07.068 So that's good. 00:05:07.303 --> 00:05:11.961 But what you can also see, there's still a lot of room to 5. 00:05:13.664 --> 00:05:20.171 Now I don't think this is where we will get to, right? 00:05:20.380 --> 00:05:23.263 Not every item will be absolutely perfect 00:05:23.266 --> 00:05:26.087 according to these measures that we have taken. 00:05:26.354 --> 00:05:30.569 But I'm really happy to see that consistently the quality of our data 00:05:30.569 --> 00:05:32.387 is getting better and better. 00:05:36.709 --> 00:05:43.111 Okay, but creating that data isn't enough. 00:05:44.428 --> 00:05:46.734 We want this--we do this for a reason. 00:05:46.734 --> 00:05:48.749 We want it to be used. 00:05:48.749 --> 00:05:55.450 And now we looked at how many articles 00:05:55.450 --> 00:06:00.770 on each of the other Wikimedia projects use data from Wikidata, 00:06:02.040 --> 00:06:06.762 and we looked at the percentage of all articles on those projects. 00:06:07.395 --> 00:06:09.554 Now if you look across all of Wikimedia 00:06:09.554 --> 00:06:11.989 and all of the articles there, 00:06:11.989 --> 00:06:18.768 then 56.35% of them today make use of some data from Wikidata. 00:06:20.054 --> 00:06:21.815 Which I think is pretty good, 00:06:21.815 --> 00:06:27.378 but of course, there's still a lot of room to 100. 00:06:29.085 --> 00:06:33.811 And then I looked at which projects are actually making most use 00:06:33.811 --> 00:06:36.188 of Wikidata's data, 00:06:36.188 --> 00:06:39.401 and I split this by language versions and so on. 00:06:39.606 --> 00:06:44.997 And now what do you think the top five projects-- 00:06:45.577 --> 00:06:48.254 which ones are all of them? 00:06:48.254 --> 00:06:50.834 Which project family do they belong to? 00:06:51.036 --> 00:06:53.177 (several in audience) Commons. 00:06:53.278 --> 00:06:56.607 Okay, that's pretty uniformly Commons. 00:06:57.216 --> 00:06:58.903 You would actually be wrong. 00:06:59.112 --> 00:07:01.684 All of the top five are Wikivoyage. 00:07:02.084 --> 00:07:03.650 (audience) Oh! 00:07:03.692 --> 00:07:05.044 (laughter) 00:07:05.439 --> 00:07:08.345 So yeah, applause to Wikivoyage. 00:07:08.937 --> 00:07:10.741 (applause) 00:07:17.070 --> 00:07:20.383 If you would like to check where Commons actually is 00:07:20.383 --> 00:07:22.053 and where all of your other projects are, 00:07:22.053 --> 00:07:23.521 there is a dashboard. 00:07:23.521 --> 00:07:25.443 Come to me and we can check it out. 00:07:28.049 --> 00:07:32.016 Of course, inside Wikimedia is not the only place where our data is used. 00:07:32.016 --> 00:07:34.606 It's also used outside, and so much has happened. 00:07:34.966 --> 00:07:39.256 I can't begin to mention it all, but to highlight some 00:07:39.518 --> 00:07:44.028 there are great uses of our data at the Met, at the Wellcome Trust, 00:07:44.030 --> 00:07:45.687 at the Library of Congress, 00:07:45.687 --> 00:07:47.848 in GeneWiki and so many more. 00:07:47.951 --> 00:07:51.296 And if you go through some of the sessions later in the program, 00:07:51.296 --> 00:07:53.292 you will hear about some of them. 00:07:56.635 --> 00:07:59.608 Alright, enough statistics. 00:07:59.977 --> 00:08:02.171 Let's look at some other highlights. 00:08:02.644 --> 00:08:06.897 So we already talked about data quality improving, 00:08:06.897 --> 00:08:10.646 and when you look at data quality, there are a lot of dimensions 00:08:10.646 --> 00:08:16.426 that you can look at, and we've improved on some of those, 00:08:16.482 --> 00:08:18.980 like how accurate is the data, 00:08:18.980 --> 00:08:20.751 how trustworthy is the data, 00:08:20.751 --> 00:08:22.515 how referenced is it, 00:08:22.515 --> 00:08:24.865 how consistent is it modeled, 00:08:26.351 --> 00:08:28.992 how completed is it and so on. 00:08:31.263 --> 00:08:35.746 Just to pick out one-- for consistency for example, 00:08:35.746 --> 00:08:42.355 we have created the ability to store entity schemas now in Wikidata 00:08:42.355 --> 00:08:46.553 so that you can describe how certain domains should be modeled. 00:08:46.806 --> 00:08:49.172 So you can find-- 00:08:49.557 --> 00:08:53.902 you can create an entity schema, say, for Dutch painters, 00:08:53.902 --> 00:08:56.492 and then you can look how-- 00:08:56.492 --> 00:08:59.359 which items that are for Dutch painters 00:08:59.359 --> 00:09:02.470 do not, for example, have a date of birth but should 00:09:02.470 --> 00:09:05.235 and similar things like that. 00:09:05.557 --> 00:09:10.011 And I hope that a lot more wiki projects and so on 00:09:10.011 --> 00:09:13.291 will be able to make use of entity schemas to take good care 00:09:13.291 --> 00:09:15.925 of their data, and if you want to learn how to do that, 00:09:15.925 --> 00:09:18.055 there's a session later in the program as well 00:09:18.055 --> 00:09:23.072 by people who know all about this and will make this less 00:09:23.072 --> 00:09:24.858 of a black box for you. 00:09:27.575 --> 00:09:28.745 Alright. 00:09:30.899 --> 00:09:34.701 Another thing that really got traction 00:09:34.774 --> 00:09:37.819 over the last year is the Wikibase ecosystem, right? 00:09:38.087 --> 00:09:44.015 This idea that not all open data should and has to happen 00:09:44.015 --> 00:09:47.490 in Wikidata, but instead, we want a thriving ecosystem 00:09:47.490 --> 00:09:51.151 of different places, of different actors, 00:09:51.151 --> 00:09:53.513 like institutions, companies, 00:09:53.513 --> 00:09:56.929 volunteer projects opening up their data in a similar way 00:09:56.929 --> 00:10:00.372 that Wikidata does it and then connecting all of it, 00:10:00.372 --> 00:10:03.317 exchanging data between those, linking that data. 00:10:04.282 --> 00:10:08.808 And over the last year, the interest in that 00:10:08.808 --> 00:10:11.760 and the interest in institutions and people running 00:10:11.760 --> 00:10:14.977 their own Wikibase instance has really exploded, 00:10:14.977 --> 00:10:20.466 and especially in the sector of libraries. 00:10:23.009 --> 00:10:26.210 There's a lot of testing, evaluating, 00:10:26.226 --> 00:10:28.787 and to be honest, trailblazing, 00:10:28.787 --> 00:10:33.536 going on there at the moment where adventurous institutions 00:10:33.536 --> 00:10:38.872 work with us to really figure out how Wikibase can work 00:10:38.872 --> 00:10:42.243 for their collections, for their catalogues and so on. 00:10:42.539 --> 00:10:45.024 Among them, the German National Library, 00:10:45.024 --> 00:10:46.419 the French National Library, 00:10:46.419 --> 00:10:49.194 OCLC and it's really exciting to see. 00:10:55.278 --> 00:10:57.360 One of the reasons why I think this is so exciting 00:10:57.360 --> 00:11:02.868 is that we are helping these institutions open up data in a way that is 00:11:02.868 --> 00:11:07.914 not just putting it on a website and someone can access it 00:11:07.926 --> 00:11:11.947 but really thinking about this-- the next step after that, right? 00:11:11.947 --> 00:11:15.229 Letting people help you maintain that data, augment that data, 00:11:15.229 --> 00:11:20.449 enrich it, and that's really a shift 00:11:20.450 --> 00:11:24.526 that I hope will bring good things. 00:11:26.041 --> 00:11:27.859 And the other thing it helps us with 00:11:27.859 --> 00:11:31.203 is that it lets experts curate the data 00:11:31.203 --> 00:11:37.474 in their space, keep it in good shape so that we can then set up 00:11:37.474 --> 00:11:42.317 synchronizing processes to Wikidata, for example, 00:11:42.317 --> 00:11:45.604 instead of having to take care of it ourselves all the time. 00:11:46.519 --> 00:11:50.223 And at the end of the day, I hope it will take some pressure 00:11:50.223 --> 00:11:53.776 off of Wikidata to be that place where everything has to go. 00:11:58.040 --> 00:12:00.450 Lexicographical data-- 00:12:01.962 --> 00:12:06.997 Over the last year, people started describing words 00:12:07.060 --> 00:12:12.264 in their language in Wikidata so that we can build things 00:12:12.264 --> 00:12:14.713 like automated translation tools, 00:12:16.413 --> 00:12:21.019 and we are at the point where in some languages 00:12:21.019 --> 00:12:25.500 we are starting to get nearer to reaching that critical mass 00:12:25.500 --> 00:12:29.143 that is needed to actually build a serious application. 00:12:29.527 --> 00:12:32.614 In a lot of languages, we still have a long way to go, 00:12:32.614 --> 00:12:35.411 but in some, we're really starting to get there, 00:12:35.411 --> 00:12:37.265 and that's really great to see. 00:12:38.621 --> 00:12:41.430 If you want to know more about this, come to my session later today. 00:12:46.064 --> 00:12:48.954 And, of course, not to forget, 00:12:48.954 --> 00:12:50.955 structured data on Commons. 00:12:51.150 --> 00:12:52.384 (audience member whistles) 00:12:52.440 --> 00:12:54.052 Yes! (laughs) 00:12:54.216 --> 00:12:55.941 (applause) 00:12:59.324 --> 00:13:01.515 The structured data on Commons seen at the foundation 00:13:01.515 --> 00:13:05.571 has really gotten... 00:13:07.121 --> 00:13:11.459 everything together and made it possible 00:13:11.459 --> 00:13:15.479 to add statements to files on Commons over the last year, 00:13:15.526 --> 00:13:18.586 and people are starting to add those statements to images 00:13:18.586 --> 00:13:22.770 to then make it easier to find to build better applications on top of it, 00:13:22.770 --> 00:13:24.292 and so much more. 00:13:24.292 --> 00:13:26.852 It's really exciting to see how that is growing, 00:13:26.852 --> 00:13:29.988 and I think what's really important 00:13:29.988 --> 00:13:32.959 for the Wikidata community to understand here 00:13:32.959 --> 00:13:36.555 is that when you see "depicts" 00:13:36.555 --> 00:13:41.577 or "house cat" or "sitting," "lizard" and "wall" here, 00:13:41.577 --> 00:13:44.867 those are links to Wikidata items and properties. 00:13:45.425 --> 00:13:49.620 That means when we create items and properties, 00:13:49.620 --> 00:13:54.031 those are no longer just providing the vocabulary for Wikidata itself. 00:13:54.031 --> 00:13:57.749 They are providing the vocabulary for Commons as well. 00:13:57.904 --> 00:14:00.695 And this will only get more and more so, 00:14:00.695 --> 00:14:02.929 so we have to pay a lot more attention 00:14:02.929 --> 00:14:06.550 to how our ontology, our vocabulary 00:14:06.550 --> 00:14:09.777 is actually used in other places than we had before. 00:14:13.589 --> 00:14:19.905 And the last one I have is that we've started building stronger bridges 00:14:19.905 --> 00:14:21.902 to the other Wikimedia projects. 00:14:23.281 --> 00:14:26.159 My team and I are working on a project called the Wikidata Bridge, 00:14:26.159 --> 00:14:28.849 and you should totally come to the UX booth 00:14:28.849 --> 00:14:32.904 and do some testing of the current state 00:14:32.904 --> 00:14:36.240 that will have for example Wikipedia editors 00:14:36.240 --> 00:14:38.970 edit Wikidata directly from their projects 00:14:38.976 --> 00:14:40.988 without having to go to Wikidata 00:14:40.988 --> 00:14:43.958 and having to understand everything around it. 00:14:43.958 --> 00:14:50.571 I hope that this will take away one more hurdle that makes it difficult 00:14:50.571 --> 00:14:54.498 for Wikimedia projects to adopt more data from Wikidata. 00:14:57.165 --> 00:15:01.012 Alright, now to strategies and where are we going? 00:15:03.005 --> 00:15:07.179 Since December, the Wikidata team at Wikimedia Deutschland, 00:15:07.179 --> 00:15:12.262 and people from the Wikimedia Foundation have been working on strategies, 00:15:12.262 --> 00:15:14.675 papers around Wikidata. 00:15:14.675 --> 00:15:16.101 It's basically writing down 00:15:16.101 --> 00:15:19.526 what a lot of us have been talking about already 00:15:19.526 --> 00:15:22.958 over the last four or five years. 00:15:23.995 --> 00:15:29.492 And I don't know if all of you have read those papers. 00:15:29.492 --> 00:15:33.887 They're published on Meta Commons until the end of the month. 00:15:33.887 --> 00:15:35.806 It would be great if you haven't read them, 00:15:35.806 --> 00:15:39.019 go read them, leave your comments and so on. 00:15:40.062 --> 00:15:44.338 Now the very quick overview of what is in there 00:15:44.338 --> 00:15:50.991 is that we think about Wikidata and Wikibase in three pieces. 00:15:51.506 --> 00:15:55.442 The first one is Wikidata as a platform. 00:15:55.442 --> 00:15:57.023 You can see it in the lower corner, 00:15:57.301 --> 00:16:03.876 and that is really around Wikidata enables every person 00:16:03.876 --> 00:16:06.273 to access and share information 00:16:06.273 --> 00:16:09.038 regardless of their language and technology, 00:16:09.038 --> 00:16:14.479 and we do that by providing general purpose data about the world. 00:16:14.479 --> 00:16:18.161 So basically what you do every day. 00:16:21.282 --> 00:16:25.497 The second thing is the Wikibase ecosystem part 00:16:25.497 --> 00:16:30.047 where Wikibase, the software running Wikidata, powers 00:16:30.047 --> 00:16:34.993 not just Wikidata, but a thriving open data web that is the backbone 00:16:35.008 --> 00:16:36.817 of free and open knowledge. 00:16:38.126 --> 00:16:43.005 And the third and last thing is Wikidata for the Wikimedia projects 00:16:43.005 --> 00:16:47.011 at the top where Wikidata is there 00:16:47.011 --> 00:16:49.594 to help the Wikimedia projects-- 00:16:50.750 --> 00:16:53.559 help make them ready for the future. 00:16:57.597 --> 00:17:02.973 Concretely, what does that mean for the near or midterm future? 00:17:03.898 --> 00:17:06.235 Wikidata as a platform-- 00:17:06.669 --> 00:17:10.700 We want to have better data quality, so we will continue working 00:17:10.700 --> 00:17:14.195 on better tools, improving the tools we have and so on. 00:17:14.633 --> 00:17:18.899 We need to make our data more accessible 00:17:18.899 --> 00:17:23.864 through better APIs, a more robust SPARQL endpoint 00:17:23.864 --> 00:17:27.315 but also things like more consistently modeling our data 00:17:27.315 --> 00:17:31.235 so it actually is easy to reuse in applications. 00:17:31.867 --> 00:17:37.203 And the last thing I had was setting up feedback processes 00:17:37.203 --> 00:17:38.769 with our partners. 00:17:40.399 --> 00:17:43.905 Unlike Wikipedia, Wikidata is not 00:17:43.905 --> 00:17:46.142 what I call a destination project, right? 00:17:46.142 --> 00:17:49.166 Someone goes to Wikipedia and reads it 00:17:49.166 --> 00:17:50.742 whereas Wikidata is usually not 00:17:50.742 --> 00:17:53.295 someone goes to Wikidata and reads it. 00:17:53.295 --> 00:17:54.309 It would be awesome, 00:17:54.309 --> 00:17:57.834 but realistically it's not what it is, right? 00:17:57.882 --> 00:18:00.520 A lot of the people who are exposed 00:18:00.520 --> 00:18:02.719 to our data are not on Wikidata itself, 00:18:02.770 --> 00:18:06.838 but they are seeing it through Wikipedia and many other places. 00:18:07.847 --> 00:18:12.238 Now these other places do get feedback on that data, right? 00:18:12.238 --> 00:18:14.635 Their users tell them, "Hey, here's something that's wrong," 00:18:16.775 --> 00:18:20.952 and I would like to have that so that we can make it available 00:18:20.958 --> 00:18:24.179 to the people who actually edit on Wikidata, meaning you. 00:18:24.374 --> 00:18:27.212 And figuring out how to do that in a meaningful way 00:18:27.212 --> 00:18:31.679 without overwhelming everyone will be one of the things to do 00:18:31.679 --> 00:18:33.143 over the next year. 00:18:34.623 --> 00:18:37.127 Alright, Wikibase ecosystem. 00:18:37.127 --> 00:18:40.925 There, we will continue to work with the libraries, 00:18:41.055 --> 00:18:46.192 but also look into science, for example, and more. 00:18:46.278 --> 00:18:51.641 There is a Wikibase showcase later today that you should totally go to 00:18:51.641 --> 00:18:52.951 and see what's already there 00:18:52.951 --> 00:18:55.852 and what people are already doing with Wikibase. 00:18:55.875 --> 00:18:57.281 It's really worth it. 00:18:57.682 --> 00:19:00.832 And what's needed there is 00:19:00.832 --> 00:19:03.181 also setting up good processes around that. 00:19:04.384 --> 00:19:08.138 Helping people figure out who to talk to about what, 00:19:08.138 --> 00:19:10.467 where they can find help, 00:19:10.467 --> 00:19:11.831 all these kinds of things. 00:19:13.474 --> 00:19:17.395 And, of course, making it easier to install and maintain 00:19:17.395 --> 00:19:20.309 a Wikibase because that's still a bit of a pain. 00:19:21.144 --> 00:19:24.617 And the last thing is federation which is basically 00:19:24.617 --> 00:19:27.245 what we've been talking about for Commons earlier 00:19:27.245 --> 00:19:30.704 where Commons uses Wikidata's items and properties 00:19:30.704 --> 00:19:33.514 but for other Wikibase instances out there 00:19:33.514 --> 00:19:36.488 so they can also use Wikidata's vocabulary. 00:19:37.742 --> 00:19:42.107 And that, as I was saying earlier, increases yet again 00:19:42.107 --> 00:19:48.228 the need to be mindful of how our vocabulary is used out there 00:19:48.228 --> 00:19:51.055 more than we have had to so far. 00:19:53.792 --> 00:19:56.556 And Wikidata for the Wikimedia projects-- 00:19:57.132 --> 00:20:00.580 of course, tighter integration through the Wikidata Bridge 00:20:00.580 --> 00:20:04.154 and helping people edit directly from their projects 00:20:04.154 --> 00:20:08.999 and the other thing that we all need to think about together, I think, 00:20:08.999 --> 00:20:14.684 is figuring out how to reduce the language barriers. 00:20:15.484 --> 00:20:19.096 The more Wikidata is integrated in the Wikimedia projects, 00:20:19.096 --> 00:20:22.472 the more people will have a need to talk to each other 00:20:22.472 --> 00:20:25.705 about that data without speaking the same language, 00:20:25.705 --> 00:20:31.680 and we have to figure out how to deal with that. 00:20:33.276 --> 00:20:36.634 If people have smart ideas, I would love to talk to you. 00:20:38.790 --> 00:20:41.365 And with that, I come to the end of my talk. 00:20:41.618 --> 00:20:44.248 Thank you, everyone, for giving more people more access 00:20:44.248 --> 00:20:46.305 to more knowledge every day. 00:20:46.688 --> 00:20:48.914 (applause) 00:20:58.015 --> 00:20:59.902 We have some time for questions 00:20:59.902 --> 00:21:01.774 so if there are any questions in the audience 00:21:01.774 --> 00:21:04.975 or if you are remotely watching the livestream--Hi, Mom-- 00:21:04.992 --> 00:21:08.072 you can ask the question on the EtherPad 00:21:08.072 --> 00:21:11.387 or on the Telegram Channel and we'll do our best. 00:21:11.387 --> 00:21:13.233 So anything? 00:21:15.516 --> 00:21:16.655 Ah. 00:21:21.133 --> 00:21:25.208 (person 1) Hi, everyone, this is more of a meme than a question, 00:21:25.243 --> 00:21:32.341 so when the time extension will be able to also to get 00:21:32.341 --> 00:21:35.509 hours and minutes and seconds 00:21:35.509 --> 00:21:38.376 because up till now the position is just to date. 00:21:38.376 --> 00:21:41.610 - I know... it's not my question-- - (laughing) 00:21:41.610 --> 00:21:44.230 That's why I said it's a meme. 00:21:44.230 --> 00:21:46.093 Every time is always like that, 00:21:46.093 --> 00:21:48.738 but it comes always from remote so... 00:21:50.001 --> 00:21:53.188 I do not have a very good answer to that. 00:21:53.260 --> 00:21:54.443 I'm sorry. 00:21:55.678 --> 00:22:01.636 But maybe as some background, people need it even more 00:22:01.636 --> 00:22:07.531 to describe images on Commons so it might bubble up the long list 00:22:07.531 --> 00:22:11.071 of things that need to be done a bit faster through that. 00:22:14.713 --> 00:22:16.236 Any more questions? 00:22:24.686 --> 00:22:27.655 (person 2) [Linda] from Wikimedia Foundation's research team-- 00:22:27.655 --> 00:22:31.080 I have a question about your thoughts 00:22:31.080 --> 00:22:37.763 on patrolling, and that may be related to quality of content on Wikidata, 00:22:37.803 --> 00:22:39.756 but if you can speak to that 00:22:39.756 --> 00:22:43.542 like how do you see the near medium term patrolling efforts changing, 00:22:43.542 --> 00:22:45.557 especially with the Bridge project 00:22:45.559 --> 00:22:48.147 which I'm looking forward to going out and trying it. 00:22:48.147 --> 00:22:49.433 Yeah, thank you. 00:22:52.298 --> 00:22:56.812 So as you say, with things like we did at Bridge, 00:22:58.812 --> 00:23:03.287 a lot more effort will have to be spent on patrolling, I think. 00:23:04.482 --> 00:23:08.554 But we are at a size where this is probably not feasible 00:23:08.554 --> 00:23:10.922 to do it by hand, by a human, 00:23:10.922 --> 00:23:15.090 so we need to spend a lot more effort on improving, for example, 00:23:15.090 --> 00:23:18.387 ORES, the machine learning system to help us with that, 00:23:18.407 --> 00:23:24.588 to help us figure out which edits a human really needs to look at 00:23:24.588 --> 00:23:26.493 and which is probably just like yeah, 00:23:26.493 --> 00:23:29.792 the regular stuff I don't need to look at this. 00:23:33.777 --> 00:23:38.878 Currently, ORES is not super good at judging what-- 00:23:38.878 --> 00:23:41.459 if an edit on Wikidata is good or bad. 00:23:41.459 --> 00:23:44.549 There's currently a campaign going on 00:23:44.549 --> 00:23:50.280 that is training the machine learning system, 00:23:51.062 --> 00:23:52.474 with your help, 00:23:53.141 --> 00:23:55.550 to teach it basically what a good edit is 00:23:55.550 --> 00:23:57.078 and what a bad edit is, 00:23:57.109 --> 00:24:02.774 and we haven't reached the threshold of enough humans teaching it yet 00:24:02.774 --> 00:24:08.025 to really improve it, but if you have a few minutes, 00:24:08.025 --> 00:24:11.098 it would be great if you help teach ORES 00:24:11.098 --> 00:24:13.586 make better judgements about Wikidata edits. 00:24:13.768 --> 00:24:15.837 And it's really simple-- it shows you an edit, 00:24:15.842 --> 00:24:17.584 and you say this is a good edit, 00:24:17.584 --> 00:24:19.658 this is a bad edit, and that's it. 00:24:20.041 --> 00:24:23.193 You can do this in front of the TV in the evening on the couch. 00:24:25.588 --> 00:24:27.021 (person 3) Share a link. 00:24:28.000 --> 00:24:31.059 We will share a link in the Telegram Group, yes. 00:24:32.239 --> 00:24:36.239 And once we've reached the threshold we need-- 00:24:36.239 --> 00:24:39.269 I think it's around 7,000, but I might be wrong-- 00:24:40.223 --> 00:24:44.359 then we can rerun the training for ORES and then it will be 00:24:44.374 --> 00:24:48.484 hopefully considerably better at judging the edits on Wikidata. 00:24:49.909 --> 00:24:52.063 And then I hope more of you can use that 00:24:52.063 --> 00:24:56.029 to filter recent changes, for example, or your watch list 00:24:56.029 --> 00:24:58.333 for edits that really need your attention. 00:24:59.093 --> 00:25:00.227 Yeah. 00:25:02.899 --> 00:25:04.004 Hi. 00:25:07.116 --> 00:25:09.964 (person 4) I'm just curious to know, and this is a question not from me, 00:25:09.964 --> 00:25:12.729 but from partners that I've been working with, 00:25:12.729 --> 00:25:16.190 the more partners we have joining Wikidata 00:25:16.190 --> 00:25:19.916 and starting to experiment with queries, 00:25:19.916 --> 00:25:23.079 the more issues we are having with timeout of queries 00:25:23.147 --> 00:25:25.766 so what's happening with that? 00:25:27.732 --> 00:25:30.170 So, some people at the Wikimedia Foundation 00:25:30.170 --> 00:25:34.355 are looking into that, and--small spoiler-- 00:25:34.355 --> 00:25:36.988 be there for the birthday present session. 00:25:37.142 --> 00:25:38.181 (laughter) 00:25:43.384 --> 00:25:46.201 (person 5) Hello, I'm Bart Magnus from Belgium (PACKED). 00:25:46.201 --> 00:25:48.620 I would like to know what the current state of affairs is 00:25:48.620 --> 00:25:52.115 regarding federation so raising your properties 00:25:52.115 --> 00:25:53.752 in your own Wikibase instance-- 00:25:53.752 --> 00:25:56.887 is there anything to mention about that? 00:25:56.898 --> 00:26:01.425 So over the last year, a lot of people have told us 00:26:01.425 --> 00:26:03.996 that they want federation, right? 00:26:03.996 --> 00:26:06.866 But the problem was that a lot of people understood 00:26:06.866 --> 00:26:09.318 very different things when they said federation. 00:26:10.566 --> 00:26:13.533 Some of those things were very easily doable. 00:26:13.533 --> 00:26:15.664 Some of those things were really, really hard. 00:26:16.934 --> 00:26:22.148 And my team and I have been talking to a lot of people, for example, 00:26:22.148 --> 00:26:27.193 the partners we work with at libraries to figure out what is it actually 00:26:27.193 --> 00:26:28.836 precisely that they need. 00:26:30.111 --> 00:26:33.893 And we finished that now, though, of course, I'm happy 00:26:33.893 --> 00:26:37.850 to take more feedback if you want to talk to me about that, 00:26:37.850 --> 00:26:41.397 and now I'm at a stage where I'm comfortable to say, 00:26:41.397 --> 00:26:43.480 "Okay, we're going to start with that." 00:26:44.606 --> 00:26:48.197 And that will happen over the next I would say two or three months 00:26:48.197 --> 00:26:51.243 that we actually write the first lines of code 00:26:51.243 --> 00:26:53.793 and then hopefully have people able 00:26:53.793 --> 00:26:56.533 to test it early next year, I would say. 00:26:59.661 --> 00:27:01.023 (presenter) Okay, last questions. 00:27:02.457 --> 00:27:05.603 (person 6) Finn Årup Nielsen from Copenhagen, Denmark. 00:27:05.973 --> 00:27:09.833 In relation to the other language, there's been a sort of discussion 00:27:09.833 --> 00:27:13.617 in the WikiCite community about whether we should continue 00:27:13.617 --> 00:27:15.765 to put more scientific papers in there-- 00:27:15.768 --> 00:27:19.913 this relates to how much data we can put into Wikidata. 00:27:19.913 --> 00:27:23.032 Timeout in the Wikidata Query Service is one issue 00:27:23.032 --> 00:27:24.468 but also the maintaining 00:27:24.468 --> 00:27:30.300 so what are your thoughts about... 00:27:31.060 --> 00:27:35.173 Is the size of Wikidata beginning to be a problem 00:27:35.173 --> 00:27:36.237 in general? 00:27:36.237 --> 00:27:38.666 Should we stop putting in lexeme data? 00:27:38.666 --> 00:27:41.222 Should we stop putting in scientific data 00:27:41.222 --> 00:27:45.717 into Wikidata or do we have any research on this 00:27:45.717 --> 00:27:50.053 or technical problems inflating? 00:27:50.292 --> 00:27:51.445 Yeah... 00:27:53.266 --> 00:27:57.419 Wikidata is definitely coming to some... 00:27:58.906 --> 00:28:02.732 scalability boundaries, let's say, 00:28:03.740 --> 00:28:05.975 both technically and socially. 00:28:05.975 --> 00:28:09.197 And for both we need solutions, right? 00:28:09.197 --> 00:28:12.518 Socially, we have things like more editors 00:28:12.518 --> 00:28:15.689 and recent changes to the point where it's completely unfeasible 00:28:15.689 --> 00:28:19.623 for a human to patrol that because it's simply too much. 00:28:21.246 --> 00:28:26.205 But also technically, and we've been addressing some of that. 00:28:26.205 --> 00:28:29.958 For example, some database re-architecturing 00:28:29.958 --> 00:28:33.718 around database view-turned table, if that says anything for anyone. 00:28:35.900 --> 00:28:38.366 But those only get us so far, 00:28:38.516 --> 00:28:41.343 and one of the things we want to look at next year 00:28:41.343 --> 00:28:45.968 is where the other pain points are and what to do about them 00:28:45.968 --> 00:28:47.585 on the technical side. 00:28:49.085 --> 00:28:50.728 So that's a general picture. 00:28:50.728 --> 00:28:54.455 At the same time, I am very hesitant 00:28:54.455 --> 00:28:58.387 to tell anyone, "No, no, no, stop putting data into Wikidata." 00:28:58.400 --> 00:29:02.408 That would kind of defeat the purpose. 00:29:04.311 --> 00:29:07.061 But, for example, the Wikibase ecosystem 00:29:07.061 --> 00:29:09.220 is one way to address that, right, 00:29:09.220 --> 00:29:13.952 to not require everything in Wikidata. 00:29:13.952 --> 00:29:16.267 That's the whole beauty of linked open data. 00:29:16.267 --> 00:29:18.298 You don't have to have it all in the same place. 00:29:18.298 --> 00:29:19.642 You can connect different places. 00:29:19.642 --> 00:29:20.859 It's amazing. 00:29:21.957 --> 00:29:28.309 So around WikiCites specifically, yes-- 00:29:29.644 --> 00:29:34.718 okay, WikiCites specifically, I think we need 00:29:34.718 --> 00:29:36.256 to look at in proportion. 00:29:36.256 --> 00:29:40.548 I don't have an exact percentage of what percentage 00:29:40.548 --> 00:29:44.511 of the items in Wikidata are around WikiCite topics, 00:29:44.511 --> 00:29:46.696 but it's a big percentage. 00:29:46.696 --> 00:29:49.869 And maybe that's the thing we need to talk about... 00:29:50.356 --> 00:29:52.442 in the break. 00:29:53.191 --> 00:29:54.766 Well, thank you very much! 00:29:54.845 --> 00:29:56.281 (applause)