WEBVTT 00:00:07.278 --> 00:00:11.778 [inaudible] and I have an effort called WikiLoop, 00:00:11.778 --> 00:00:15.368 and this is what I'm going to introduce to you about. 00:00:15.728 --> 00:00:22.604 We have presented WikiLoop, the idea, to several Wikimedia related conferences. 00:00:22.604 --> 00:00:25.017 How many of you have heard about WikiLoop before? 00:00:26.020 --> 00:00:27.040 Thanks. 00:00:27.040 --> 00:00:31.014 And how many of you have interacted with the datasets and toolings 00:00:31.014 --> 00:00:32.664 that we provided before? 00:00:33.308 --> 00:00:36.870 Okay, fairly new. So this will be mostly an introduction. 00:00:36.870 --> 00:00:42.008 So we would like to tell you why we start this initiative 00:00:42.008 --> 00:00:44.148 and what it intends to do, 00:00:44.148 --> 00:00:48.803 and how you can get involved or what it will go for. 00:00:50.390 --> 00:00:53.810 So, to begin with, we would like to give you an example. 00:00:53.810 --> 00:00:58.409 This is a vandalism that happened in Italian... 00:01:00.621 --> 00:01:03.623 that happened in Italy Wikipedia. 00:01:04.142 --> 00:01:06.935 I know that most people here are interested in Wikidata. 00:01:06.935 --> 00:01:09.780 I will tell you why this is relevant too. 00:01:10.137 --> 00:01:11.879 So basically what we found is 00:01:11.879 --> 00:01:15.970 that someone vandalized Wikipedia on Italian 00:01:15.970 --> 00:01:20.590 and says, "Bezos who cannot afford a car." 00:01:20.809 --> 00:01:22.666 And this is an interesting question, 00:01:23.799 --> 00:01:28.379 if you think about it, this is blatant obvious vandalism 00:01:28.379 --> 00:01:33.412 but when it comes to machines and algorithms 00:01:33.412 --> 00:01:37.881 which find to detect vandalism and avoid serving users the information, 00:01:38.309 --> 00:01:41.989 how can computer understand this kind of information, 00:01:41.989 --> 00:01:43.286 like it would be... 00:01:46.869 --> 00:01:49.180 we realize that sometimes there are limitations 00:01:49.180 --> 00:01:54.083 of how far algorithms can go and machine can go. 00:01:54.931 --> 00:01:57.666 Another example here is let's say, 00:01:57.666 --> 00:02:02.044 there is a word or label, or a category on Wikipedia says, 00:02:02.044 --> 00:02:06.077 someone, a person, is a Christian scientist. 00:02:06.077 --> 00:02:09.627 Now, given this label, what facts do you come up with 00:02:09.627 --> 00:02:13.815 like what would you infer from this category? 00:02:14.205 --> 00:02:18.586 Do you think it would be a "Christian" or do you think it would be a "scientist"? 00:02:18.981 --> 00:02:21.621 In this specific case-- it does not apply everywhere-- 00:02:21.621 --> 00:02:23.481 but it this specific case, 00:02:23.481 --> 00:02:26.991 there is a religion called "Christian Science," 00:02:26.991 --> 00:02:30.199 and people who hold that belief is called "Christian Scientist." 00:02:31.549 --> 00:02:34.891 And, again, for machines, how can we know, like 00:02:36.272 --> 00:02:40.392 even if many people here are big [fan] 00:02:40.392 --> 00:02:45.242 that's the better we make our data a knowledge machine-friendly 00:02:45.459 --> 00:02:51.709 the easier we can work and improve the overall knowledge accessibility 00:02:51.709 --> 00:02:54.139 and contribute together 00:02:54.139 --> 00:02:55.589 but there is always things 00:02:55.589 --> 00:02:58.449 that we believe that machine has restrictions. 00:03:00.136 --> 00:03:04.479 So all in all, we start to realize 00:03:04.479 --> 00:03:08.307 that coming from Internet companies 00:03:08.307 --> 00:03:10.690 who have a strong belief of our technology 00:03:10.690 --> 00:03:12.571 and what machine can do, 00:03:12.571 --> 00:03:16.222 there is always a gap or there is always something 00:03:16.222 --> 00:03:18.992 that we would need to rely on human being 00:03:18.992 --> 00:03:22.442 and more, we would need to rely on communities 00:03:22.753 --> 00:03:28.383 who are actively contributing, who are doing the peer reviews to our... 00:03:28.383 --> 00:03:30.163 collaborating with each other. 00:03:30.163 --> 00:03:36.082 So this is a picture about the background effort of WikiLoop. 00:03:36.595 --> 00:03:39.945 For the human being, they have the knowledge, 00:03:40.485 --> 00:03:46.205 we have our domain expertize and we can crosscheck each other 00:03:46.205 --> 00:03:48.503 but we just have that enough time. 00:03:49.333 --> 00:03:52.803 And there are many things that machine can empower this 00:03:52.803 --> 00:03:56.123 but there is restrictions there. 00:03:56.123 --> 00:03:58.643 So the goal is to empower 00:03:58.643 --> 00:04:03.039 or improve the productivity of human editors. 00:04:03.039 --> 00:04:08.633 But also the other side of the formula is we want to loop that back 00:04:08.634 --> 00:04:13.234 to the research and the academic efforts 00:04:13.234 --> 00:04:17.312 that improve how machine can help in these cases. 00:04:17.875 --> 00:04:22.580 So by raise of hand, how many of you have used Google? 00:04:23.870 --> 00:04:25.090 Thank you. 00:04:25.090 --> 00:04:26.380 And how many of you 00:04:26.900 --> 00:04:31.455 think that companies like Google and other big knowledge companies 00:04:31.455 --> 00:04:34.202 should contribute more to the knowledge world? 00:04:35.881 --> 00:04:37.707 So what happens is that... 00:04:37.707 --> 00:04:42.157 we all know that our mission at Google or other similar companies-- 00:04:42.157 --> 00:04:47.647 we have a strong background of leveraging the open knowledge world, 00:04:48.347 --> 00:04:50.107 like for Google specific case 00:04:50.107 --> 00:04:52.740 it's like organize the world's information. 00:04:52.740 --> 00:04:55.059 So we help disseminate the information, 00:04:56.207 --> 00:04:59.996 which in one sense that helps the mission of this movement. 00:04:59.996 --> 00:05:06.358 But only every once a while we have sporadic help 00:05:07.864 --> 00:05:12.103 trying to donate knowledge and datasets, and tools, 00:05:12.103 --> 00:05:16.223 and we want to see if we can make this sustainable, 00:05:18.323 --> 00:05:21.424 both in the technical sense 00:05:21.424 --> 00:05:23.234 and also in the business sense. 00:05:24.943 --> 00:05:29.639 So this is like a one-sentence introduction. 00:05:29.639 --> 00:05:34.885 We want WikiLoop to become an umbrella program 00:05:34.885 --> 00:05:37.084 for a series of technical projects 00:05:37.084 --> 00:05:39.632 intended to contribute datasets and toolings 00:05:39.632 --> 00:05:44.734 and hopefully make this a community effort with participation of 00:05:44.734 --> 00:05:50.154 other likeminded people, partners and institutions 00:05:50.154 --> 00:05:52.410 to join with this effort. 00:05:52.410 --> 00:05:56.204 There are several projects that we think would be a good fit, 00:05:56.204 --> 00:05:59.204 and these are the criteria. 00:05:59.204 --> 00:06:04.281 First of all, the idea is that it needs to be source improvements 00:06:04.281 --> 00:06:07.251 or source improvements by and large is a good fit, 00:06:07.251 --> 00:06:10.801 and also the second thing that companies like us 00:06:10.801 --> 00:06:13.941 really cannot do very well by ourself 00:06:13.941 --> 00:06:17.691 is to maximize the neutrality, to avoid picking sides 00:06:17.691 --> 00:06:21.611 on the controversies, decisions or discussions 00:06:21.611 --> 00:06:26.945 and another thing is that to make this in the long-term sustainability 00:06:26.945 --> 00:06:31.705 and to keep it being supported by this industry. 00:06:31.705 --> 00:06:35.017 We want to see the productivity, the scalability 00:06:35.017 --> 00:06:37.632 of our contribution and efforts. 00:06:38.444 --> 00:06:41.078 To explain a little bit more... 00:06:41.584 --> 00:06:43.570 We always look trying to extract... 00:06:43.570 --> 00:06:47.061 for example, we are trying to extract facts from Wikipedia. 00:06:47.417 --> 00:06:52.539 And while we can do several separations, 00:06:52.539 --> 00:06:55.704 we're labeling, fairly well, 00:06:56.315 --> 00:06:59.915 up to certain point the bottleneck is no longer 00:06:59.915 --> 00:07:02.475 how good the machine, the algorithm can reach 00:07:02.475 --> 00:07:06.117 but sometimes there is a noise in the source, 00:07:06.117 --> 00:07:10.917 and if we do not remove the source 00:07:10.917 --> 00:07:13.624 or minimize the source noise there, 00:07:13.624 --> 00:07:15.634 that's how far the machine can go. 00:07:15.634 --> 00:07:18.024 So that's the first criteria. 00:07:18.024 --> 00:07:19.383 And the second criteria is, 00:07:19.383 --> 00:07:24.492 we don't want to get to be seen as buyers or introduce potential buyers. 00:07:24.492 --> 00:07:29.822 We want to rely on governance that is peer reviewed 00:07:29.822 --> 00:07:32.686 and that is done by the community 00:07:32.686 --> 00:07:36.570 so that we can avoid picking sides in the controversy questions. 00:07:37.319 --> 00:07:40.809 And the third thing which probably not so intuitive 00:07:40.809 --> 00:07:43.309 but this is the kind of... I would like... 00:07:43.309 --> 00:07:48.039 Let me give you an example of the projects we have in mind. 00:07:48.435 --> 00:07:51.665 Let's say there are smaller, minority language there. 00:07:51.665 --> 00:07:55.940 I have heard a very good talk earlier this morning. 00:07:55.940 --> 00:07:58.460 But one idea we have here is, 00:07:58.460 --> 00:08:02.050 let's say you are a minority language contributor, very active, 00:08:02.050 --> 00:08:07.063 and you want to advocate for your culture and supporting your knowledge creation. 00:08:07.607 --> 00:08:11.747 But because companies like Google or other consumer company, 00:08:11.747 --> 00:08:14.795 they have a bar for releasing a translation, 00:08:14.795 --> 00:08:16.165 to make it available. 00:08:16.165 --> 00:08:18.837 They want the precision to be high enough 00:08:18.837 --> 00:08:21.594 so that they can use it to serve users. 00:08:22.568 --> 00:08:26.568 But maybe internally they have AI modules that are experimenting, 00:08:26.568 --> 00:08:28.914 not good enough to the bar 00:08:28.914 --> 00:08:31.494 because lack of training data, 00:08:32.734 --> 00:08:34.834 so the translation is not available. 00:08:34.834 --> 00:08:38.080 But the community is doing the translation by hand anyway. 00:08:39.160 --> 00:08:41.170 Now, one of the things we are thinking of, 00:08:41.170 --> 00:08:45.170 if we can provide some of this experimental thing 00:08:45.170 --> 00:08:47.660 that is not good enough to serve general user purpose 00:08:47.660 --> 00:08:50.350 but still good for the community 00:08:50.350 --> 00:08:53.558 and somewhat improving the productivity, 00:08:53.811 --> 00:08:55.731 it would be able to 00:08:55.731 --> 00:09:01.381 one, improve the speed of how well a community can contribute, 00:09:01.381 --> 00:09:06.231 and second, what a community is creating anyway can come back as a training data 00:09:06.231 --> 00:09:08.881 that keeps bootstrapping the machines. 00:09:10.376 --> 00:09:15.406 So over time by this effort we hope to generate a model 00:09:15.673 --> 00:09:19.463 that both helps the human being, the editors, 00:09:19.463 --> 00:09:22.246 but also helps the research 00:09:22.246 --> 00:09:26.765 that improves the AI and other approaches. 00:09:28.489 --> 00:09:31.549 And this is a big overview of a few projects 00:09:31.549 --> 00:09:33.509 we are going to introduce. 00:09:33.509 --> 00:09:36.539 Due to the time limitation I will feature a few. 00:09:36.539 --> 00:09:41.492 The WikiLoop Game, which you can look up, 00:09:41.492 --> 00:09:46.732 is one that we leveraged a platform 00:09:46.732 --> 00:09:50.057 created by Magnus called Wikidata Game. 00:09:50.057 --> 00:09:54.847 We provide several datasets there to be played, to be introduced 00:09:54.847 --> 00:09:56.677 and commit to the Wikidata 00:09:56.677 --> 00:09:58.867 but by the human review. 00:09:59.727 --> 00:10:03.947 And Google doesn't get to contribute data directly 00:10:03.947 --> 00:10:06.257 to Wikipedia or Wikidata 00:10:06.257 --> 00:10:12.269 but having someone who is reviewing it as non-biased individuals to do so. 00:10:12.550 --> 00:10:16.620 And the second one I'm going to feature is WikiLoop Battlefield, 00:10:16.620 --> 00:10:21.420 the one that you have seen just now as a counter-vandalism platform, 00:10:21.420 --> 00:10:25.629 and this one also features the same criteria 00:10:25.629 --> 00:10:28.029 of source improvements, 00:10:29.918 --> 00:10:33.328 of how it can empower machines 00:10:33.328 --> 00:10:38.794 by looping back to the training data 00:10:38.794 --> 00:10:43.064 and also how it avoids companies like us 00:10:43.064 --> 00:10:48.526 to pick sides allowing way to rely on the community's assessment. 00:10:48.526 --> 00:10:53.517 And the third one is CitePool, which is creating... 00:10:53.517 --> 00:10:58.469 we're trying to help creating citation candidate pool 00:10:58.469 --> 00:11:02.731 to improve the productivity of people who want to add citation 00:11:02.731 --> 00:11:04.721 but also see if we can make that 00:11:04.721 --> 00:11:09.569 into a training data accessible to researchers. 00:11:10.010 --> 00:11:13.120 So let me use WikiLoop Battlefield as an example. 00:11:13.120 --> 00:11:18.427 If you have... try it on your phone-- battlefield.wikiloop.org. 00:11:18.427 --> 00:11:21.575 By the way, I want to highlight, the name is subject to change 00:11:21.575 --> 00:11:25.870 because some friendly community members have come to me and suggest 00:11:25.870 --> 00:11:32.224 that Battlefield might not be the best name for a project 00:11:32.224 --> 00:11:34.653 serving the Wikimedia movement. 00:11:34.952 --> 00:11:39.542 So if you don't like this name, come join us in the discussion, 00:11:39.542 --> 00:11:40.984 provide your suggestion, 00:11:40.984 --> 00:11:44.499 we will be very happy to converge to a name 00:11:44.499 --> 00:11:48.111 that has community consensus and popularity. 00:11:48.244 --> 00:11:51.166 But let's use that as a placeholder here. 00:11:52.885 --> 00:11:56.500 I don't need to introduce to this group of people 00:11:56.500 --> 00:11:59.097 about the typical vandalism workflow 00:11:59.820 --> 00:12:03.400 but if you have already... 00:12:04.934 --> 00:12:08.886 trying to conduct some counter-vandalism activity, 00:12:08.886 --> 00:12:11.566 you might know that it's not very trivial. 00:12:11.566 --> 00:12:16.413 How many of you have seen vandalism on Wikipedia and Wikidata? 00:12:16.992 --> 00:12:22.329 Okay, how many of you have reverted, by hand, some of them? 00:12:22.890 --> 00:12:27.680 How many of you have used certain tools or go ahead and find certain tools 00:12:27.680 --> 00:12:30.875 to patrol or revert vandalism? 00:12:31.407 --> 00:12:32.497 Okay. 00:12:33.474 --> 00:12:36.124 Cool, this is the highest density of people 00:12:36.124 --> 00:12:41.264 who have tried to revert vandalism 00:12:41.264 --> 00:12:43.625 that I have spoken to before. 00:12:44.336 --> 00:12:48.756 So maybe some of you have been very comfortably doing that 00:12:48.756 --> 00:12:52.966 but for me as someone who started editing actively 00:12:53.808 --> 00:12:57.348 only since like three years ago 00:12:57.562 --> 00:13:03.439 and who only started to be very serous doing vandalism detection and patrolling 00:13:03.879 --> 00:13:06.191 only since about last year 00:13:06.428 --> 00:13:10.836 I found that doing so is not super easy 00:13:10.836 --> 00:13:14.161 on the world of Wikimedia movement. 00:13:15.080 --> 00:13:21.761 If we look at the existing alternatives 00:13:21.761 --> 00:13:25.761 there are tools that is built featuring desktops, 00:13:25.761 --> 00:13:30.748 there are tools that is relying on users who have rollback permissions, 00:13:30.748 --> 00:13:33.976 which itself is a big barrier to get. 00:13:35.248 --> 00:13:39.097 We want to make this a super easy to use platform 00:13:39.097 --> 00:13:41.637 for all the three roles. 00:13:41.637 --> 00:13:46.017 The first one is user, reviewer or editor, whatever you call it. 00:13:46.612 --> 00:13:48.460 The second one is researcher 00:13:48.460 --> 00:13:52.982 who is trying to create vandalism detecting algorithms or systems. 00:13:52.982 --> 00:13:54.732 And the third one is developers 00:13:54.732 --> 00:13:59.573 who is trying to improve this WikiLoop Battlefield tooling. 00:13:59.573 --> 00:14:02.241 We want it to be super easy for user to use. 00:14:02.241 --> 00:14:04.970 You can you pull up your phone, you don't have to install it, 00:14:04.970 --> 00:14:07.168 you can do in on your laptop. 00:14:07.168 --> 00:14:10.170 And we also want to lower a barrier to review. 00:14:10.170 --> 00:14:16.650 The reason why other tools are trying to limit the access to the tool 00:14:16.650 --> 00:14:22.250 is because there needs to be a base trust level for people to use them. 00:14:22.250 --> 00:14:26.634 You don't want someone to come to a counter-vandalism tool 00:14:26.634 --> 00:14:28.226 to vandalize itself. 00:14:29.259 --> 00:14:32.479 So what we are trying to do is that, 00:14:32.479 --> 00:14:34.489 to begin with, we want to make it super easy 00:14:34.489 --> 00:14:39.522 but also we want to allow multiple people to label the same thing. 00:14:39.968 --> 00:14:42.258 Also we want to make it super convenient 00:14:42.258 --> 00:14:48.240 to see the [inaudible], to see other label, and all in real time. 00:14:48.438 --> 00:14:52.317 We also want to make it for researchers super easy to use. 00:14:52.317 --> 00:14:55.227 By one click you can download the labeling 00:14:55.227 --> 00:15:01.356 and maybe start play with the data and see how it fits in your model. 00:15:01.502 --> 00:15:06.129 And we provide APIs that have access to real time data. 00:15:06.758 --> 00:15:10.448 And for the developer we make it very easy to pick up-- 00:15:10.448 --> 00:15:15.433 we have one click-- you can deploy your trial instances, 00:15:15.433 --> 00:15:16.726 things like that. 00:15:17.100 --> 00:15:20.820 This is an example about building projects 00:15:20.820 --> 00:15:23.191 for umbrella like WikiLoop. 00:15:23.191 --> 00:15:27.637 We want to make sure the community trust comes the first. 00:15:27.947 --> 00:15:31.336 We usually need to make it open source the best. 00:15:31.800 --> 00:15:37.478 And we want to avoid proprietary tech, we want to avoid tech lock-down, 00:15:37.778 --> 00:15:42.999 and we rely on community approval for certain features. 00:15:44.366 --> 00:15:49.474 And if you have seen this-- this is the components that we rely on-- 00:15:49.474 --> 00:15:56.207 still very early stage but you get the principles behind the design. 00:15:56.438 --> 00:16:00.288 So what's next, we are trying to grow our usage. 00:16:00.288 --> 00:16:02.458 Hopefully you can try it out by yourself 00:16:02.458 --> 00:16:06.726 and promise to me that you don't click on the login. 00:16:07.782 --> 00:16:09.132 There is a login button-- 00:16:09.132 --> 00:16:10.452 there will be some good features 00:16:10.452 --> 00:16:13.292 that make it super easy to even revert something. 00:16:13.292 --> 00:16:15.452 Currently it's still a jump to revert. 00:16:16.714 --> 00:16:18.444 But we are building features, 00:16:18.444 --> 00:16:23.954 and we are also trying to let you choose some categories 00:16:23.954 --> 00:16:26.656 or the watchlist that you will be watching 00:16:26.656 --> 00:16:31.366 and the one that you care about to patrol. 00:16:31.775 --> 00:16:38.069 And also if you are researchers while doing related vandalism detection, 00:16:38.362 --> 00:16:41.580 try our data and give us feedback. 00:16:44.411 --> 00:16:47.181 And I will go through quickly about a few other projects 00:16:47.181 --> 00:16:48.731 that we are featuring here 00:16:48.731 --> 00:16:52.171 and we will look for questions and feedback from you 00:16:52.171 --> 00:16:57.976 about what we think and what you think should be there 00:16:57.976 --> 00:17:01.550 or how we should fix things if it doesn't work right. 00:17:01.843 --> 00:17:06.163 Wikidata Game is a platform built by a community member Magnus, 00:17:06.163 --> 00:17:08.913 a celebrity in this community, I think. 00:17:09.891 --> 00:17:13.371 And by showing this we are providing datasets 00:17:13.371 --> 00:17:19.748 but we also want to let people know that we are not reinventing the wheels, 00:17:19.748 --> 00:17:21.368 that we are not trying to... 00:17:21.368 --> 00:17:24.168 When we come up with some idea, we look into with community 00:17:24.168 --> 00:17:27.028 and see if there is existing tools that's there 00:17:27.028 --> 00:17:30.198 and how we can be a part of the ecosystem 00:17:30.198 --> 00:17:35.692 rather than building everything independently and everything separately. 00:17:36.661 --> 00:17:38.721 And this is the current status. 00:17:39.624 --> 00:17:42.668 By early results, we show that Wikidata... 00:17:44.945 --> 00:17:47.075 a few games that we released 00:17:47.075 --> 00:17:51.747 have triggered and proved activity on the entities related 00:17:52.546 --> 00:17:54.646 and a few follow up. 00:17:54.646 --> 00:17:57.261 One thing that we have come up with, 00:17:57.261 --> 00:17:59.971 as I have talked to a few community members 00:17:59.971 --> 00:18:02.388 is the PreCheck idea 00:18:02.388 --> 00:18:09.088 that is basically providing preliminary check about bulk uploads, 00:18:09.088 --> 00:18:12.268 sampled preliminary check by community member 00:18:12.268 --> 00:18:14.478 and use that to generate a report, 00:18:14.478 --> 00:18:16.185 make it easier for discussions 00:18:16.185 --> 00:18:20.445 about whether this big block of Wikidata datasets 00:18:20.445 --> 00:18:25.095 should be included or uploaded to wikidata.org 00:18:25.095 --> 00:18:27.484 or it should be rechecked or fixed. 00:18:30.994 --> 00:18:35.884 And there is another project that is mostly a dataset project 00:18:35.884 --> 00:18:37.300 called CatFacts. 00:18:37.572 --> 00:18:42.642 CatFacts is datasets that we generate 00:18:42.642 --> 00:18:45.552 about facts from categories, 00:18:45.552 --> 00:18:50.231 the one that you see, the Christian Scientist, just now 00:18:50.803 --> 00:18:56.495 is actually an interesting outlier of data points 00:18:56.495 --> 00:18:58.344 from this effort. 00:18:58.344 --> 00:19:01.861 This goal is to generate the facts from category 00:19:01.861 --> 00:19:07.363 which we think have been very rich facts online that people... 00:19:07.731 --> 00:19:10.087 that has been under leverage. 00:19:10.321 --> 00:19:13.621 But before it can be fully leveraged 00:19:13.621 --> 00:19:17.311 we need to make sure that quality is good enough as well 00:19:17.311 --> 00:19:22.261 and there is efforts of putting it onto Wikidata Game 00:19:22.261 --> 00:19:23.861 and there is effort that we're thinking 00:19:23.861 --> 00:19:27.110 maybe building PreCheck would help as well. 00:19:27.611 --> 00:19:29.741 And it's still in early stage. 00:19:29.741 --> 00:19:34.041 Feel free to come to talk us about other efforts, 00:19:34.041 --> 00:19:37.991 other ideas you think about datasets we could provide. 00:19:38.499 --> 00:19:41.539 The Bot, which is communication tools. 00:19:41.539 --> 00:19:45.149 We know that Bot can do many things like writhing Wikipedia article 00:19:45.149 --> 00:19:49.841 but we promised that we don't write actual article 00:19:49.841 --> 00:19:52.597 but we mostly use it 00:19:52.911 --> 00:19:58.329 as a way to communicate from, let's say, user talk 00:19:58.817 --> 00:20:04.397 to give us access to large scale conversations 00:20:04.397 --> 00:20:06.103 with the community members. 00:20:06.416 --> 00:20:09.686 Explorer is going to show all our datasets, 00:20:09.686 --> 00:20:11.879 our toolings, their stats 00:20:11.879 --> 00:20:15.491 and queries you can run on our things. 00:20:15.491 --> 00:20:18.238 Stay tuned, this one is releasing soon. 00:20:18.960 --> 00:20:20.933 And we have several other ideas 00:20:20.933 --> 00:20:24.003 but I would jump to this overall portfolio. 00:20:24.003 --> 00:20:28.443 It would be several projects to begin with datasets and tooling, 00:20:28.443 --> 00:20:30.338 and what we are doing currently 00:20:30.338 --> 00:20:33.190 is Explorer, Battlefield, CatFacts and PageRank, 00:20:33.190 --> 00:20:39.600 and there are some other upcoming ideas like PreCheck, CitePool and Bubbles. 00:20:41.294 --> 00:20:46.494 And this is one of the diagrams 00:20:46.494 --> 00:20:48.574 that I want to show you. 00:20:48.994 --> 00:20:53.385 We want to not only use one individual project 00:20:53.385 --> 00:20:54.734 to contribute the community 00:20:54.734 --> 00:20:58.007 and also generate the training data for the research, academia, 00:20:58.007 --> 00:21:00.807 we also have an idea 00:21:00.807 --> 00:21:04.519 that these projects may work together. 00:21:05.676 --> 00:21:08.976 For example, the CitePool, the system that we want to build 00:21:08.976 --> 00:21:15.352 to allow people to easier find citations for Wikipedia articles or Wikidata 00:21:15.887 --> 00:21:19.316 but also use the Explorer to display the result-- 00:21:19.499 --> 00:21:23.079 it depends on the page rank scorances of datasets 00:21:23.830 --> 00:21:30.284 to determine how to rank the citation page that we will recommend 00:21:30.423 --> 00:21:35.630 and use the PreCheck to do quality, sanity check 00:21:35.630 --> 00:21:40.235 and maybe create bulk batch reports by Bot 00:21:40.235 --> 00:21:44.255 and the PreCheck will depend on the Game as well. 00:21:50.727 --> 00:21:52.566 If some of our community friends 00:21:52.566 --> 00:21:55.476 have been following the progress of WikiLoop, 00:21:55.476 --> 00:21:59.005 we have been through ice-breaking phase, 00:21:59.655 --> 00:22:02.335 we were trying to earn the community trust 00:22:02.335 --> 00:22:06.152 because we know how cautious we need to be 00:22:06.152 --> 00:22:09.575 coming to contribute to a movement 00:22:09.575 --> 00:22:14.704 that relies so much on the neutrality and non-bias policies. 00:22:14.999 --> 00:22:19.539 And we have gradually start to have ideas 00:22:19.539 --> 00:22:22.545 about tools and datas and find the direction 00:22:22.545 --> 00:22:25.974 of how we can possibly make this sustainable. 00:22:26.231 --> 00:22:31.880 And we are looking into creating long-term sustainability, 00:22:31.880 --> 00:22:34.853 both internally and also externally, 00:22:35.160 --> 00:22:38.654 both in terms of getting resource and getting support, 00:22:39.024 --> 00:22:44.545 also externally of getting engagement, getting usage, and getting contributors, 00:22:45.568 --> 00:22:48.122 starting from next quarter. 00:22:49.364 --> 00:22:53.066 I want to quote Evan You, who is a creator 00:22:53.066 --> 00:22:58.588 of popular frontend framework Vue.js, 00:22:58.588 --> 00:23:01.154 "Software development gets tremendously harder 00:23:01.154 --> 00:23:05.504 when you start to have to convince people instead of just writing the code." 00:23:05.504 --> 00:23:08.891 This applies to editing Wikipedia or Wikidata. 00:23:08.891 --> 00:23:13.261 It's very easy to click a button and add individual articles 00:23:13.261 --> 00:23:18.879 but also it's very hard when you need to convince people. 00:23:23.330 --> 00:23:27.440 I hope to leave some time for questions, 00:23:27.440 --> 00:23:31.893 although we only have few, probably one or two minutes. 00:23:33.229 --> 00:23:35.993 Yes, so we have about two minutes. 00:23:35.993 --> 00:23:39.085 So if people want to shout questions out, I'll bring the mic over. 00:23:40.539 --> 00:23:41.969 Hands up maybe. 00:23:45.433 --> 00:23:50.273 (person 1) So where would I go to at this moment if I would like to use this 00:23:50.273 --> 00:23:53.563 to solve some of the problem with chemicals, 00:23:53.563 --> 00:23:56.553 where some Wikipedia pages about chemicals, 00:23:56.553 --> 00:23:59.663 they have a chem box about a specific chemical 00:23:59.663 --> 00:24:03.523 but are otherwise about a class of chemicals. 00:24:03.523 --> 00:24:05.746 Is that something where WikiLoop could help? 00:24:07.750 --> 00:24:12.923 I think that's the individual domain expertize part, right? 00:24:12.923 --> 00:24:15.523 If you are talking about topics of articles 00:24:15.523 --> 00:24:18.701 that are associated with specific topics. 00:24:18.701 --> 00:24:21.131 We are trying to... we might be able to help 00:24:21.131 --> 00:24:26.301 but we are trying to tackle the problem that is like more general currently. 00:24:26.301 --> 00:24:32.531 And overall the goal is to find the possibility of 00:24:35.201 --> 00:24:39.354 empowering human beings productivity 00:24:39.354 --> 00:24:42.204 and also trying to generate the knowledge 00:24:42.204 --> 00:24:44.469 that potentially helps... 00:24:44.469 --> 00:24:47.419 the training data that potentially helps the algorithms. 00:24:49.682 --> 00:24:52.231 (person 2) I think we have time for a very quick one. 00:24:55.292 --> 00:24:58.637 (person 3) Are you also going to do this for search of data on Commons? 00:24:59.522 --> 00:25:01.096 Yeah, we hope to... 00:25:01.096 --> 00:25:05.239 If you are referring to Battlefield or counter-vandalism tools, 00:25:06.451 --> 00:25:11.615 yeah, we are planning to expand it to other Wiki projects, 00:25:11.615 --> 00:25:14.032 including Commons in Wikidata. 00:25:15.280 --> 00:25:17.240 (person 2) I think that's all the questions we have time for 00:25:17.240 --> 00:25:19.800 but if you'd like to show your appreciation for [Victor.] 00:25:19.802 --> 00:25:20.932 Thank you. 00:25:20.932 --> 00:25:24.612 (applause)