[inaudible] and I have an effort called WikiLoop, and this is what I'm going to introduce to you about. We have presented WikiLoop, the idea, to several Wikimedia related conferences. How many of you have heard about WikiLoop before? Thanks. And how many of you have interacted with the datasets and toolings that we provided before? Okay, fairly new. So this will be mostly an introduction. So we would like to tell you why we start this initiative and what it intends to do, and how you can get involved or what it will go for. So, to begin with, we would like to give you an example. This is a vandalism that happened in Italian... that happened in Italy Wikipedia. I know that most people here are interested in Wikidata. I will tell you why this is relevant too. So basically what we found is that someone vandalized Wikipedia on Italian and says, "Bezos who cannot afford a car." And this is an interesting question, if you think about it, this is blatant obvious vandalism but when it comes to machines and algorithms which find to detect vandalism and avoid serving users the information, how can computer understand this kind of information, like it would be... we realize that sometimes there are limitations of how far algorithms can go and machine can go. Another example here is let's say, there is a word or label, or a category on Wikipedia says, someone, a person, is a Christian scientist. Now, given this label, what facts do you come up with like what would you infer from this category? Do you think it would be a "Christian" or do you think it would be a "scientist"? In this specific case-- it does not apply everywhere-- but it this specific case, there is a religion called "Christian Science," and people who hold that belief is called "Christian Scientist." And, again, for machines, how can we know, like even if many people here are big [fan] that's the better we make our data a knowledge machine-friendly the easier we can work and improve the overall knowledge accessibility and contribute together but there is always things that we believe that machine has restrictions. So all in all, we start to realize that coming from Internet companies who have a strong belief of our technology and what machine can do, there is always a gap or there is always something that we would need to rely on human being and more, we would need to rely on communities who are actively contributing, who are doing the peer reviews to our... collaborating with each other. So this is a picture about the background effort of WikiLoop. For the human being, they have the knowledge, we have our domain expertize and we can crosscheck each other but we just have that enough time. And there are many things that machine can empower this but there is restrictions there. So the goal is to empower or improve the productivity of human editors. But also the other side of the formula is we want to loop that back to the research and the academic efforts that improve how machine can help in these cases. So by raise of hand, how many of you have used Google? Thank you. And how many of you think that companies like Google and other big knowledge companies should contribute more to the knowledge world? So what happens is that... we all know that our mission at Google or other similar companies-- we have a strong background of leveraging the open knowledge world, like for Google specific case it's like organize the world's information. So we help disseminate the information, which in one sense that helps the mission of this movement. But only every once a while we have sporadic help trying to donate knowledge and datasets, and tools, and we want to see if we can make this sustainable, both in the technical sense and also in the business sense. So this is like a one-sentence introduction. We want WikiLoop to become an umbrella program for a series of technical projects intended to contribute datasets and toolings and hopefully make this a community effort with participation of other likeminded people, partners and institutions to join with this effort. There are several projects that we think would be a good fit, and these are the criteria. First of all, the idea is that it needs to be source improvements or source improvements by and large is a good fit, and also the second thing that companies like us really cannot do very well by ourself is to maximize the neutrality, to avoid picking sides on the controversies, decisions or discussions and another thing is that to make this in the long-term sustainability and to keep it being supported by this industry. We want to see the productivity, the scalability of our contribution and efforts. To explain a little bit more... We always look trying to extract... for example, we are trying to extract facts from Wikipedia. And while we can do several separations, we're labeling, fairly well, up to certain point the bottleneck is no longer how good the machine, the algorithm can reach but sometimes there is a noise in the source, and if we do not remove the source or minimize the source noise there, that's how far the machine can go. So that's the first criteria. And the second criteria is, we don't want to get to be seen as buyers or introduce potential buyers. We want to rely on governance that is peer reviewed and that is done by the community so that we can avoid picking sides in the controversy questions. And the third thing which probably not so intuitive but this is the kind of... I would like... Let me give you an example of the projects we have in mind. Let's say there are smaller, minority language there. I have heard a very good talk earlier this morning. But one idea we have here is, let's say you are a minority language contributor, very active, and you want to advocate for your culture and supporting your knowledge creation. But because companies like Google or other consumer company, they have a bar for releasing a translation, to make it available. They want the precision to be high enough so that they can use it to serve users. But maybe internally they have AI modules that are experimenting, not good enough to the bar because lack of training data, so the translation is not available. But the community is doing the translation by hand anyway. Now, one of the things we are thinking of, if we can provide some of this experimental thing that is not good enough to serve general user purpose but still good for the community and somewhat improving the productivity, it would be able to one, improve the speed of how well a community can contribute, and second, what a community is creating anyway can come back as a training data that keeps bootstrapping the machines. So over time by this effort we hope to generate a model that both helps the human being, the editors, but also helps the research that improves the AI and other approaches. And this is a big overview of a few projects we are going to introduce. Due to the time limitation I will feature a few. The WikiLoop Game, which you can look up, is one that we leveraged a platform created by Magnus called Wikidata Game. We provide several datasets there to be played, to be introduced and commit to the Wikidata but by the human review. And Google doesn't get to contribute data directly to Wikipedia or Wikidata but having someone who is reviewing it as non-biased individuals to do so. And the second one I'm going to feature is WikiLoop Battlefield, the one that you have seen just now as a counter-vandalism platform, and this one also features the same criteria of source improvements, of how it can empower machines by looping back to the training data and also how it avoids companies like us to pick sides allowing way to rely on the community's assessment. And the third one is CitePool, which is creating... we're trying to help creating citation candidate pool to improve the productivity of people who want to add citation but also see if we can make that into a training data accessible to researchers. So let me use WikiLoop Battlefield as an example. If you have... try it on your phone-- battlefield.wikiloop.org. By the way, I want to highlight, the name is subject to change because some friendly community members have come to me and suggest that Battlefield might not be the best name for a project serving the Wikimedia movement. So if you don't like this name, come join us in the discussion, provide your suggestion, we will be very happy to converge to a name that has community consensus and popularity. But let's use that as a placeholder here. I don't need to introduce to this group of people about the typical vandalism workflow but if you have already... trying to conduct some counter-vandalism activity, you might know that it's not very trivial. How many of you have seen vandalism on Wikipedia and Wikidata? Okay, how many of you have reverted, by hand, some of them? How many of you have used certain tools or go ahead and find certain tools to patrol or revert vandalism? Okay. Cool, this is the highest density of people who have tried to revert vandalism that I have spoken to before. So maybe some of you have been very comfortably doing that but for me as someone who started editing actively only since like three years ago and who only started to be very serous doing vandalism detection and patrolling only since about last year I found that doing so is not super easy on the world of Wikimedia movement. If we look at the existing alternatives there are tools that is built featuring desktops, there are tools that is relying on users who have rollback permissions, which itself is a big barrier to get. We want to make this a super easy to use platform for all the three roles. The first one is user, reviewer or editor, whatever you call it. The second one is researcher who is trying to create vandalism detecting algorithms or systems. And the third one is developers who is trying to improve this WikiLoop Battlefield tooling. We want it to be super easy for user to use. You can you pull up your phone, you don't have to install it, you can do in on your laptop. And we also want to lower a barrier to review. The reason why other tools are trying to limit the access to the tool is because there needs to be a base trust level for people to use them. You don't want someone to come to a counter-vandalism tool to vandalize itself. So what we are trying to do is that, to begin with, we want to make it super easy but also we want to allow multiple people to label the same thing. Also we want to make it super convenient to see the [inaudible], to see other label, and all in real time. We also want to make it for researchers super easy to use. By one click you can download the labeling and maybe start play with the data and see how it fits in your model. And we provide APIs that have access to real time data. And for the developer we make it very easy to pick up-- we have one click-- you can deploy your trial instances, things like that. This is an example about building projects for umbrella like WikiLoop. We want to make sure the community trust comes the first. We usually need to make it open source the best. And we want to avoid proprietary tech, we want to avoid tech lock-down, and we rely on community approval for certain features. And if you have seen this-- this is the components that we rely on-- still very early stage but you get the principles behind the design. So what's next, we are trying to grow our usage. Hopefully you can try it out by yourself and promise to me that you don't click on the login. There is a login button-- there will be some good features that make it super easy to even revert something. Currently it's still a jump to revert. But we are building features, and we are also trying to let you choose some categories or the watchlist that you will be watching and the one that you care about to patrol. And also if you are researchers while doing related vandalism detection, try our data and give us feedback. And I will go through quickly about a few other projects that we are featuring here and we will look for questions and feedback from you about what we think and what you think should be there or how we should fix things if it doesn't work right. Wikidata Game is a platform built by a community member Magnus, a celebrity in this community, I think. And by showing this we are providing datasets but we also want to let people know that we are not reinventing the wheels, that we are not trying to... When we come up with some idea, we look into with community and see if there is existing tools that's there and how we can be a part of the ecosystem rather than building everything independently and everything separately. And this is the current status. By early results, we show that Wikidata... a few games that we released have triggered and proved activity on the entities related and a few follow up. One thing that we have come up with, as I have talked to a few community members is the PreCheck idea that is basically providing preliminary check about bulk uploads, sampled preliminary check by community member and use that to generate a report, make it easier for discussions about whether this big block of Wikidata datasets should be included or uploaded to wikidata.org or it should be rechecked or fixed. And there is another project that is mostly a dataset project called CatFacts. CatFacts is datasets that we generate about facts from categories, the one that you see, the Christian Scientist, just now is actually an interesting outlier of data points from this effort. This goal is to generate the facts from category which we think have been very rich facts online that people... that has been under leverage. But before it can be fully leveraged we need to make sure that quality is good enough as well and there is efforts of putting it onto Wikidata Game and there is effort that we're thinking maybe building PreCheck would help as well. And it's still in early stage. Feel free to come to talk us about other efforts, other ideas you think about datasets we could provide. The Bot, which is communication tools. We know that Bot can do many things like writhing Wikipedia article but we promised that we don't write actual article but we mostly use it as a way to communicate from, let's say, user talk to give us access to large scale conversations with the community members. Explorer is going to show all our datasets, our toolings, their stats and queries you can run on our things. Stay tuned, this one is releasing soon. And we have several other ideas but I would jump to this overall portfolio. It would be several projects to begin with datasets and tooling, and what we are doing currently is Explorer, Battlefield, CatFacts and PageRank, and there are some other upcoming ideas like PreCheck, CitePool and Bubbles. And this is one of the diagrams that I want to show you. We want to not only use one individual project to contribute the community and also generate the training data for the research, academia, we also have an idea that these projects may work together. For example, the CitePool, the system that we want to build to allow people to easier find citations for Wikipedia articles or Wikidata but also use the Explorer to display the result-- it depends on the page rank scorances of datasets to determine how to rank the citation page that we will recommend and use the PreCheck to do quality, sanity check and maybe create bulk batch reports by Bot and the PreCheck will depend on the Game as well. If some of our community friends have been following the progress of WikiLoop, we have been through ice-breaking phase, we were trying to earn the community trust because we know how cautious we need to be coming to contribute to a movement that relies so much on the neutrality and non-bias policies. And we have gradually start to have ideas about tools and datas and find the direction of how we can possibly make this sustainable. And we are looking into creating long-term sustainability, both internally and also externally, both in terms of getting resource and getting support, also externally of getting engagement, getting usage, and getting contributors, starting from next quarter. I want to quote Evan You, who is a creator of popular frontend framework Vue.js, "Software development gets tremendously harder when you start to have to convince people instead of just writing the code." This applies to editing Wikipedia or Wikidata. It's very easy to click a button and add individual articles but also it's very hard when you need to convince people. I hope to leave some time for questions, although we only have few, probably one or two minutes. Yes, so we have about two minutes. So if people want to shout questions out, I'll bring the mic over. Hands up maybe. (person 1) So where would I go to at this moment if I would like to use this to solve some of the problem with chemicals, where some Wikipedia pages about chemicals, they have a chem box about a specific chemical but are otherwise about a class of chemicals. Is that something where WikiLoop could help? I think that's the individual domain expertize part, right? If you are talking about topics of articles that are associated with specific topics. We are trying to... we might be able to help but we are trying to tackle the problem that is like more general currently. And overall the goal is to find the possibility of empowering human beings productivity and also trying to generate the knowledge that potentially helps... the training data that potentially helps the algorithms. (person 2) I think we have time for a very quick one. (person 3) Are you also going to do this for search of data on Commons? Yeah, we hope to... If you are referring to Battlefield or counter-vandalism tools, yeah, we are planning to expand it to other Wiki projects, including Commons in Wikidata. (person 2) I think that's all the questions we have time for but if you'd like to show your appreciation for [Victor.] Thank you. (applause)