Hello, everyone. It's awesome that you're all here, so many of you. It's really, really great. So Lea already talked a lot about this event, and I'm going to talk a bit about Wikidata itself and what has been happening around it over the last year and where we are going. So... what is this? Sorry. So... where are we? Where are we going? Over the last year there has been so much to celebrate and I want to highlight some of that because sometimes it goes unnoticed. And first I want to take you through some statistics around editors and our content and how our data is used. Over the last year, we have grown our community which is amazing. We have around 3,000 new people who edit once or more in 30 days. So that's 3,000 new Wikidatans, yay! Now if you look at people who do more, like five edits in 30 days, we've got an additional 1,200 roughly. And if you look at the people who do 100 edits or more-- I hope many of you in this room-- we have 300 more. Raise your hand if you're in this last group. Woot! You're awesome! And while the number of edits is usually not something we pay a lot of attention to, we did cross the 1 billion edits mark this year. (applause) Alright, let's look at content. So, we're now at 65 million items, so entities to describe the world, and we're doing this with around 6,700 properties. Of those, around 4,300 are external identifiers, which gives us a lot of linking to other catalogues, databases, websites and more and really makes Wikidata the central place in a linked open data web. So using those properties and items, we have around 800 million statements now, and compared to last year, we know about half a statement more about every single item. (laughter) So, yeah, Wikidata got smarter. But we don't just have items and properties, we also have new stuff like lexemes and we are now at 204,000 lexemes that describe words in many different languages. It's very cool. I will talk more about this in a session later today. Last, the latest addition are entity schemas that help us figure out how to consistently model data across a certain area. And of those, we have around 140 now. Now numbers aren't everything around content, right, amount of content--we also care about quality of the content. And what we've done now is we've trained a machine learning system to judge the quality of an item. Now this is far from perfect, but it gives you an idea. So every item in Wikidata gets a score between 1 and 5. One is pretty terrible; five is amazing. And it looks at things like how many statements does it have, how many external identifiers does it have, how many references are there, how many different labels are there in different languages, and so on. And then we looked at Wikidata over time, and as you can see, based on these measures, we went from pretty terrible to much better. (laughter) So that's good. But what you can also see, there's still a lot of room to 5. Now I don't think this is where we will get to, right? Not every item will be absolutely perfect according to these measures that we have taken. But I'm really happy to see that consistently the quality of our data is getting better and better. Okay, but creating that data isn't enough. We want this--we do this for a reason. We want it to be used. And now we looked at how many articles on each of the other Wikimedia projects use data from Wikidata, and we looked at the percentage of all articles on those projects. Now if you look across all of Wikimedia and all of the articles there, then 56.35% of them today make use of some data from Wikidata. Which I think is pretty good, but of course, there's still a lot of room to 100. And then I looked at which projects are actually making most use of Wikidata's data, and I split this by language versions and so on. And now what do you think the top five projects-- which ones are all of them? Which project family do they belong to? (several in audience) Commons. Okay, that's pretty uniformly Commons. You would actually be wrong. All of the top five are Wikivoyage. (audience) Oh! (laughter) So yeah, applause to Wikivoyage. (applause) If you would like to check where Commons actually is and where all of your other projects are, there is a dashboard. Come to me and we can check it out. Of course, inside Wikimedia is not the only place where our data is used. It's also used outside, and so much has happened. I can't begin to mention it all, but to highlight some there are great uses of our data at the Met, at the Wellcome Trust, at the Library of Congress, in GeneWiki and so many more. And if you go through some of the sessions later in the program, you will hear about some of them. Alright, enough statistics. Let's look at some other highlights. So we already talked about data quality improving, and when you look at data quality, there are a lot of dimensions that you can look at, and we've improved on some of those, like how accurate is the data, how trustworthy is the data, how referenced is it, how consistent is it modeled, how completed is it and so on. Just to pick out one-- for consistency for example, we have created the ability to store entity schemas now in Wikidata so that you can describe how certain domains should be modeled. So you can find-- you can create an entity schema, say, for Dutch painters, and then you can look how-- which items that are for Dutch painters do not, for example, have a date of birth but should and similar things like that. And I hope that a lot more wiki projects and so on will be able to make use of entity schemas to take good care of their data, and if you want to learn how to do that, there's a session later in the program as well by people who know all about this and will make this less of a black box for you. Alright. Another thing that really got traction over the last year is the Wikibase ecosystem, right? This idea that not all open data should and has to happen in Wikidata, but instead, we want a thriving ecosystem of different places, of different actors, like institutions, companies, volunteer projects opening up their data in a similar way that Wikidata does it and then connecting all of it, exchanging data between those, linking that data. And over the last year, the interest in that and the interest in institutions and people running their own Wikibase instance has really exploded, and especially in the sector of libraries. There's a lot of testing, evaluating, and to be honest, trailblazing, going on there at the moment where adventurous institutions work with us to really figure out how Wikibase can work for their collections, for their catalogues and so on. Among them, the German National Library, the French National Library, OCLC and it's really exciting to see. One of the reasons why I think this is so exciting is that we are helping these institutions open up data in a way that is not just putting it on a website and someone can access it but really thinking about this-- the next step after that, right? Letting people help you maintain that data, augment that data, enrich it, and that's really a shift that I hope will bring good things. And the other thing it helps us with is that it lets experts curate the data in their space, keep it in good shape so that we can then set up synchronizing processes to Wikidata, for example, instead of having to take care of it ourselves all the time. And at the end of the day, I hope it will take some pressure off of Wikidata to be that place where everything has to go. Lexicographical data-- Over the last year, people started describing words in their language in Wikidata so that we can build things like automated translation tools, and we are at the point where in some languages we are starting to get nearer to reaching that critical mass that is needed to actually build a serious application. In a lot of languages, we still have a long way to go, but in some, we're really starting to get there, and that's really great to see. If you want to know more about this, come to my session later today. And, of course, not to forget, structured data on Commons. (audience member whistles) Yes! (laughs) (applause) The structured data on Commons seen at the foundation has really gotten... everything together and made it possible to add statements to files on Commons over the last year, and people are starting to add those statements to images to then make it easier to find to build better applications on top of it, and so much more. It's really exciting to see how that is growing, and I think what's really important for the Wikidata community to understand here is that when you see "depicts" or "house cat" or "sitting," "lizard" and "wall" here, those are links to Wikidata items and properties. That means when we create items and properties, those are no longer just providing the vocabulary for Wikidata itself. They are providing the vocabulary for Commons as well. And this will only get more and more so, so we have to pay a lot more attention to how our ontology, our vocabulary is actually used in other places than we had before. And the last one I have is that we've started building stronger bridges to the other Wikimedia projects. My team and I are working on a project called the Wikidata Bridge, and you should totally come to the UX booth and do some testing of the current state that will have for example Wikipedia editors edit Wikidata directly from their projects without having to go to Wikidata and having to understand everything around it. I hope that this will take away one more hurdle that makes it difficult for Wikimedia projects to adopt more data from Wikidata. Alright, now to strategies and where are we going? Since December, the Wikidata team at Wikimedia Deutschland, and people from the Wikimedia Foundation have been working on strategies, papers around Wikidata. It's basically writing down what a lot of us have been talking about already over the last four or five years. And I don't know if all of you have read those papers. They're published on Meta Commons until the end of the month. It would be great if you haven't read them, go read them, leave your comments and so on. Now the very quick overview of what is in there is that we think about Wikidata and Wikibase in three pieces. The first one is Wikidata as a platform. You can see it in the lower corner, and that is really around Wikidata enables every person to access and share information regardless of their language and technology, and we do that by providing general purpose data about the world. So basically what you do every day. The second thing is the Wikibase ecosystem part where Wikibase, the software running Wikidata, powers not just Wikidata, but a thriving open data web that is the backbone of free and open knowledge. And the third and last thing is Wikidata for the Wikimedia projects at the top where Wikidata is there to help the Wikimedia projects-- help make them ready for the future. Concretely, what does that mean for the near or midterm future? Wikidata as a platform-- We want to have better data quality, so we will continue working on better tools, improving the tools we have and so on. We need to make our data more accessible through better APIs, a more robust SPARQL endpoint but also things like more consistently modeling our data so it actually is easy to reuse in applications. And the last thing I had was setting up feedback processes with our partners. Unlike Wikipedia, Wikidata is not what I call a destination project, right? Someone goes to Wikipedia and reads it whereas Wikidata is usually not someone goes to Wikidata and reads it. It would be awesome, but realistically it's not what it is, right? A lot of the people who are exposed to our data are not on Wikidata itself, but they are seeing it through Wikipedia and many other places. Now these other places do get feedback on that data, right? Their users tell them, "Hey, here's something that's wrong," and I would like to have that so that we can make it available to the people who actually edit on Wikidata, meaning you. And figuring out how to do that in a meaningful way without overwhelming everyone will be one of the things to do over the next year. Alright, Wikibase ecosystem. There, we will continue to work with the libraries, but also look into science, for example, and more. There is a Wikibase showcase later today that you should totally go to and see what's already there and what people are already doing with Wikibase. It's really worth it. And what's needed there is also setting up good processes around that. Helping people figure out who to talk to about what, where they can find help, all these kinds of things. And, of course, making it easier to install and maintain a Wikibase because that's still a bit of a pain. And the last thing is federation which is basically what we've been talking about for Commons earlier where Commons uses Wikidata's items and properties but for other Wikibase instances out there so they can also use Wikidata's vocabulary. And that, as I was saying earlier, increases yet again the need to be mindful of how our vocabulary is used out there more than we have had to so far. And Wikidata for the Wikimedia projects-- of course, tighter integration through the Wikidata Bridge and helping people edit directly from their projects and the other thing that we all need to think about together, I think, is figuring out how to reduce the language barriers. The more Wikidata is integrated in the Wikimedia projects, the more people will have a need to talk to each other about that data without speaking the same language, and we have to figure out how to deal with that. If people have smart ideas, I would love to talk to you. And with that, I come to the end of my talk. Thank you, everyone, for giving more people more access to more knowledge every day. (applause) We have some time for questions so if there are any questions in the audience or if you are remotely watching the livestream--Hi, Mom-- you can ask the question on the EtherPad or on the Telegram Channel and we'll do our best. So anything? Ah. (person 1) Hi, everyone, this is more of a meme than a question, so when the time extension will be able to also to get hours and minutes and seconds because up till now the position is just to date. - I know... it's not my question-- - (laughing) That's why I said it's a meme. Every time is always like that, but it comes always from remote so... I do not have a very good answer to that. I'm sorry. But maybe as some background, people need it even more to describe images on Commons so it might bubble up the long list of things that need to be done a bit faster through that. Any more questions? (person 2) [Linda] from Wikimedia Foundation's research team-- I have a question about your thoughts on patrolling, and that may be related to quality of content on Wikidata, but if you can speak to that like how do you see the near medium term patrolling efforts changing, especially with the Bridge project which I'm looking forward to going out and trying it. Yeah, thank you. So as you say, with things like we did at Bridge, a lot more effort will have to be spent on patrolling, I think. But we are at a size where this is probably not feasible to do it by hand, by a human, so we need to spend a lot more effort on improving, for example, ORES, the machine learning system to help us with that, to help us figure out which edits a human really needs to look at and which is probably just like yeah, the regular stuff I don't need to look at this. Currently, ORES is not super good at judging what-- if an edit on Wikidata is good or bad. There's currently a campaign going on that is training the machine learning system, with your help, to teach it basically what a good edit is and what a bad edit is, and we haven't reached the threshold of enough humans teaching it yet to really improve it, but if you have a few minutes, it would be great if you help teach ORES make better judgements about Wikidata edits. And it's really simple-- it shows you an edit, and you say this is a good edit, this is a bad edit, and that's it. You can do this in front of the TV in the evening on the couch. (person 3) Share a link. We will share a link in the Telegram Group, yes. And once we've reached the threshold we need-- I think it's around 7,000, but I might be wrong-- then we can rerun the training for ORES and then it will be hopefully considerably better at judging the edits on Wikidata. And then I hope more of you can use that to filter recent changes, for example, or your watch list for edits that really need your attention. Yeah. Hi. (person 4) I'm just curious to know, and this is a question not from me, but from partners that I've been working with, the more partners we have joining Wikidata and starting to experiment with queries, the more issues we are having with timeout of queries so what's happening with that? So, some people at the Wikimedia Foundation are looking into that, and--small spoiler-- be there for the birthday present session. (laughter) (person 5) Hello, I'm Bart Magnus from Belgium (PACKED). I would like to know what the current state of affairs is regarding federation so raising your properties in your own Wikibase instance-- is there anything to mention about that? So over the last year, a lot of people have told us that they want federation, right? But the problem was that a lot of people understood very different things when they said federation. Some of those things were very easily doable. Some of those things were really, really hard. And my team and I have been talking to a lot of people, for example, the partners we work with at libraries to figure out what is it actually precisely that they need. And we finished that now, though, of course, I'm happy to take more feedback if you want to talk to me about that, and now I'm at a stage where I'm comfortable to say, "Okay, we're going to start with that." And that will happen over the next I would say two or three months that we actually write the first lines of code and then hopefully have people able to test it early next year, I would say. (presenter) Okay, last questions. (person 6) Finn Årup Nielsen from Copenhagen, Denmark. In relation to the other language, there's been a sort of discussion in the WikiCite community about whether we should continue to put more scientific papers in there-- this relates to how much data we can put into Wikidata. Timeout in the Wikidata Query Service is one issue but also the maintaining so what are your thoughts about... Is the size of Wikidata beginning to be a problem in general? Should we stop putting in lexeme data? Should we stop putting in scientific data into Wikidata or do we have any research on this or technical problems inflating? Yeah... Wikidata is definitely coming to some... scalability boundaries, let's say, both technically and socially. And for both we need solutions, right? Socially, we have things like more editors and recent changes to the point where it's completely unfeasible for a human to patrol that because it's simply too much. But also technically, and we've been addressing some of that. For example, some database re-architecturing around database view-turned table, if that says anything for anyone. But those only get us so far, and one of the things we want to look at next year is where the other pain points are and what to do about them on the technical side. So that's a general picture. At the same time, I am very hesitant to tell anyone, "No, no, no, stop putting data into Wikidata." That would kind of defeat the purpose. But, for example, the Wikibase ecosystem is one way to address that, right, to not require everything in Wikidata. That's the whole beauty of linked open data. You don't have to have it all in the same place. You can connect different places. It's amazing. So around WikiCites specifically, yes-- okay, WikiCites specifically, I think we need to look at in proportion. I don't have an exact percentage of what percentage of the items in Wikidata are around WikiCite topics, but it's a big percentage. And maybe that's the thing we need to talk about... in the break. Well, thank you very much! (applause)