cdn.media.ccc.de/.../wikidatacon2019-1147-eng-Project_WikiLoop_hd.mp4

Edit subtitles

0:07 - 0:12

[inaudible] and I
have an effort called WikiLoop,
0:12 - 0:15

and this is what I'm going
to introduce to you about.
0:16 - 0:23

We have presented WikiLoop, the idea,
to several Wikimedia related conferences.
0:23 - 0:25

How many of you have heard
about WikiLoop before?
0:26 - 0:27

Thanks.
0:27 - 0:31

And how many of you have interacted
with the datasets and toolings
0:31 - 0:33

that we provided before?
0:33 - 0:37

Okay, fairly new.
So this will be mostly an introduction.
0:37 - 0:42

So we would like to tell you
why we start this initiative
0:42 - 0:44

and what it intends to do,
0:44 - 0:49

and how you can get involved
or what it will go for.
0:50 - 0:54

So, to begin with,
we would like to give you an example.
0:54 - 0:58

This is a vandalism
that happened in Italian...
1:01 - 1:04

that happened in Italy Wikipedia.
1:04 - 1:07

I know that most people here
are interested in Wikidata.
1:07 - 1:10

I will tell you why this is relevant too.
1:10 - 1:12

So basically what we found is
1:12 - 1:16

that someone vandalized
Wikipedia on Italian
1:16 - 1:21

and says, "Bezos who cannot afford a car."
1:21 - 1:23

And this is an interesting question,
1:24 - 1:28

if you think about it,
this is blatant obvious vandalism
1:28 - 1:33

but when it comes to machines
and algorithms
1:33 - 1:38

which find to detect vandalism
and avoid serving users the information,
1:38 - 1:42

how can computer understand
this kind of information,
1:42 - 1:43

like it would be...
1:47 - 1:49

we realize that sometimes
there are limitations
1:49 - 1:54

of how far algorithms can go
and machine can go.
1:55 - 1:58

Another example here is let's say,
1:58 - 2:02

there is a word or label,
or a category on Wikipedia says,
2:02 - 2:06

someone, a person,
is a Christian scientist.
2:06 - 2:10

Now, given this label,
what facts do you come up with
2:10 - 2:14

like what would you infer
from this category?
2:14 - 2:19

Do you think it would be a "Christian"
or do you think it would be a "scientist"?
2:19 - 2:22

In this specific case--
it does not apply everywhere--
2:22 - 2:23

but it this specific case,
2:23 - 2:27

there is a religion
called "Christian Science,"
2:27 - 2:30

and people who hold that belief
is called "Christian Scientist."
2:32 - 2:35

And, again, for machines,
how can we know, like
2:36 - 2:40

even if many people here are big [fan]
2:40 - 2:45

that's the better we make our data
a knowledge machine-friendly
2:45 - 2:52

the easier we can work and improve
the overall knowledge accessibility
2:52 - 2:54

and contribute together
2:54 - 2:56

but there is always things
2:56 - 2:58

that we believe
that machine has restrictions.
3:00 - 3:04

So all in all, we start to realize
3:04 - 3:08

that coming from Internet companies
3:08 - 3:11

who have a strong belief
of our technology
3:11 - 3:13

and what machine can do,
3:13 - 3:16

there is always a gap
or there is always something
3:16 - 3:19

that we would need to rely on human being
3:19 - 3:22

and more, we would need
to rely on communities
3:23 - 3:28

who are actively contributing,
who are doing the peer reviews to our...
3:28 - 3:30

collaborating with each other.
3:30 - 3:36

So this is a picture
about the background effort of WikiLoop.
3:37 - 3:40

For the human being,
they have the knowledge,
3:40 - 3:46

we have our domain expertize
and we can crosscheck each other
3:46 - 3:49

but we just have that enough time.
3:49 - 3:53

And there are many things
that machine can empower this
3:53 - 3:56

but there is restrictions there.
3:56 - 3:59

So the goal is to empower
3:59 - 4:03

or improve the productivity
of human editors.
4:03 - 4:09

But also the other side of the formula
is we want to loop that back
4:09 - 4:13

to the research and the academic efforts
4:13 - 4:17

that improve how machine
can help in these cases.
4:18 - 4:23

So by raise of hand,
how many of you have used Google?
4:24 - 4:25

Thank you.
4:25 - 4:26

And how many of you
4:27 - 4:31

think that companies like Google
and other big knowledge companies
4:31 - 4:34

should contribute more
to the knowledge world?
4:36 - 4:38

So what happens is that...
4:38 - 4:42

we all know that our mission at Google
or other similar companies--
4:42 - 4:48

we have a strong background
of leveraging the open knowledge world,
4:48 - 4:50

like for Google specific case
4:50 - 4:53

it's like organize
the world's information.
4:53 - 4:55

So we help disseminate the information,
4:56 - 5:00

which in one sense that helps
the mission of this movement.
5:00 - 5:06

But only every once a while
we have sporadic help
5:08 - 5:12

trying to donate knowledge
and datasets, and tools,
5:12 - 5:16

and we want to see
if we can make this sustainable,
5:18 - 5:21

both in the technical sense
5:21 - 5:23

and also in the business sense.
5:25 - 5:30

So this is like
a one-sentence introduction.
5:30 - 5:35

We want WikiLoop
to become an umbrella program
5:35 - 5:37

for a series of technical projects
5:37 - 5:40

intended to contribute
datasets and toolings
5:40 - 5:45

and hopefully make this a community effort
with participation of
5:45 - 5:50

other likeminded people,
partners and institutions
5:50 - 5:52

to join with this effort.
5:52 - 5:56

There are several projects
that we think would be a good fit,
5:56 - 5:59

and these are the criteria.
5:59 - 6:04

First of all, the idea is
that it needs to be source improvements
6:04 - 6:07

or source improvements
by and large is a good fit,
6:07 - 6:11

and also the second thing
that companies like us
6:11 - 6:14

really cannot do very well by ourself
6:14 - 6:18

is to maximize the neutrality,
to avoid picking sides
6:18 - 6:22

on the controversies,
decisions or discussions
6:22 - 6:27

and another thing is that to make this
in the long-term sustainability
6:27 - 6:32

and to keep it
being supported by this industry.
6:32 - 6:35

We want to see
the productivity, the scalability
6:35 - 6:38

of our contribution and efforts.
6:38 - 6:41

To explain a little bit more...
6:42 - 6:44

We always look trying to extract...
6:44 - 6:47

for example, we are trying
to extract facts from Wikipedia.
6:47 - 6:53

And while we can do
several separations,
6:53 - 6:56

we're labeling, fairly well,
6:56 - 7:00

up to certain point
the bottleneck is no longer
7:00 - 7:02

how good the machine,
the algorithm can reach
7:02 - 7:06

but sometimes
there is a noise in the source,
7:06 - 7:11

and if we do not remove the source
7:11 - 7:14

or minimize the source noise there,
7:14 - 7:16

that's how far the machine can go.
7:16 - 7:18

So that's the first criteria.
7:18 - 7:19

And the second criteria is,
7:19 - 7:24

we don't want to get to be seen as buyers
or introduce potential buyers.
7:24 - 7:30

We want to rely on
governance that is peer reviewed
7:30 - 7:33

and that is done by the community
7:33 - 7:37

so that we can avoid picking sides
in the controversy questions.
7:37 - 7:41

And the third thing
which probably not so intuitive
7:41 - 7:43

but this is the kind of...
I would like...
7:43 - 7:48

Let me give you an example
of the projects we have in mind.
7:48 - 7:52

Let's say there are smaller,
minority language there.
7:52 - 7:56

I have heard a very good talk
earlier this morning.
7:56 - 7:58

But one idea we have here is,
7:58 - 8:02

let's say you are a minority
language contributor, very active,
8:02 - 8:07

and you want to advocate for your culture
and supporting your knowledge creation.
8:08 - 8:12

But because companies like Google
or other consumer company,
8:12 - 8:15

they have a bar
for releasing a translation,
8:15 - 8:16

to make it available.
8:16 - 8:19

They want the precision to be high enough
8:19 - 8:22

so that they can use it to serve users.
8:23 - 8:27

But maybe internally they have AI modules
that are experimenting,
8:27 - 8:29

not good enough to the bar
8:29 - 8:31

because lack of training data,
8:33 - 8:35

so the translation is not available.
8:35 - 8:38

But the community is doing
the translation by hand anyway.
8:39 - 8:41

Now, one of the things we are thinking of,
8:41 - 8:45

if we can provide
some of this experimental thing
8:45 - 8:48

that is not good enough
to serve general user purpose
8:48 - 8:50

but still good for the community
8:50 - 8:54

and somewhat improving the productivity,
8:54 - 8:56

it would be able to
8:56 - 9:01

one, improve the speed of how well
a community can contribute,
9:01 - 9:06

and second, what a community is creating
anyway can come back as a training data
9:06 - 9:09

that keeps bootstrapping the machines.
9:10 - 9:15

So over time by this effort
we hope to generate a model
9:16 - 9:19

that both helps
the human being, the editors,
9:19 - 9:22

but also helps the research
9:22 - 9:27

that improves the AI and other approaches.
9:28 - 9:32

And this is a big overview
of a few projects
9:32 - 9:34

we are going to introduce.
9:34 - 9:37

Due to the time limitation
I will feature a few.
9:37 - 9:41

The WikiLoop Game, which you can look up,
9:41 - 9:47

is one that we leveraged a platform
9:47 - 9:50

created by Magnus called Wikidata Game.
9:50 - 9:55

We provide several datasets there
to be played, to be introduced
9:55 - 9:57

and commit to the Wikidata
9:57 - 9:59

but by the human review.
10:00 - 10:04

And Google doesn't get
to contribute data directly
10:04 - 10:06

to Wikipedia or Wikidata
10:06 - 10:12

but having someone who is reviewing it
as non-biased individuals to do so.
10:13 - 10:17

And the second one I'm going to feature
is WikiLoop Battlefield,
10:17 - 10:21

the one that you have seen just now
as a counter-vandalism platform,
10:21 - 10:26

and this one also features
the same criteria
10:26 - 10:28

of source improvements,
10:30 - 10:33

of how it can empower machines
10:33 - 10:39

by looping back to the training data
10:39 - 10:43

and also how it avoids companies like us
10:43 - 10:49

to pick sides allowing way to rely
on the community's assessment.
10:49 - 10:54

And the third one is CitePool,
which is creating...
10:54 - 10:58

we're trying to help creating
citation candidate pool
10:58 - 11:03

to improve the productivity of people
who want to add citation
11:03 - 11:05

but also see if we can make that
11:05 - 11:10

into a training data
accessible to researchers.
11:10 - 11:13

So let me use WikiLoop Battlefield
as an example.
11:13 - 11:18

If you have... try it on your phone--
battlefield.wikiloop.org.
11:18 - 11:22

By the way, I want to highlight,
the name is subject to change
11:22 - 11:26

because some friendly community members
have come to me and suggest
11:26 - 11:32

that Battlefield might not be
the best name for a project
11:32 - 11:35

serving the Wikimedia movement.
11:35 - 11:40

So if you don't like this name,
come join us in the discussion,
11:40 - 11:41

provide your suggestion,
11:41 - 11:44

we will be very happy
to converge to a name
11:44 - 11:48

that has community consensus
and popularity.
11:48 - 11:51

But let's use that as a placeholder here.
11:53 - 11:56

I don't need to introduce
to this group of people
11:56 - 11:59

about the typical vandalism workflow
12:00 - 12:03

but if you have already...
12:05 - 12:09

trying to conduct
some counter-vandalism activity,
12:09 - 12:12

you might know that it's not very trivial.
12:12 - 12:16

How many of you have seen vandalism
on Wikipedia and Wikidata?
12:17 - 12:22

Okay, how many of you
have reverted, by hand, some of them?
12:23 - 12:28

How many of you have used certain tools
or go ahead and find certain tools
12:28 - 12:31

to patrol or revert vandalism?
12:31 - 12:32

Okay.
12:33 - 12:36

Cool, this is
the highest density of people
12:36 - 12:41

who have tried to revert vandalism
12:41 - 12:44

that I have spoken to before.
12:44 - 12:49

So maybe some of you have been
very comfortably doing that
12:49 - 12:53

but for me as someone
who started editing actively
12:54 - 12:57

only since like three years ago
12:58 - 13:03

and who only started to be very serous
doing vandalism detection and patrolling
13:04 - 13:06

only since about last year
13:06 - 13:11

I found that doing so is not super easy
13:11 - 13:14

on the world of Wikimedia movement.
13:15 - 13:22

If we look at the existing alternatives
13:22 - 13:26

there are tools that is built
featuring desktops,
13:26 - 13:31

there are tools that is relying
on users who have rollback permissions,
13:31 - 13:34

which itself is a big barrier to get.
13:35 - 13:39

We want to make this
a super easy to use platform
13:39 - 13:42

for all the three roles.
13:42 - 13:46

The first one is user, reviewer or editor,
whatever you call it.
13:47 - 13:48

The second one is researcher
13:48 - 13:53

who is trying to create
vandalism detecting algorithms or systems.
13:53 - 13:55

And the third one is developers
13:55 - 14:00

who is trying to improve
this WikiLoop Battlefield tooling.
14:00 - 14:02

We want it to be
super easy for user to use.
14:02 - 14:05

You can you pull up your phone,
you don't have to install it,
14:05 - 14:07

you can do in on your laptop.
14:07 - 14:10

And we also want
to lower a barrier to review.
14:10 - 14:17

The reason why other tools
are trying to limit the access to the tool
14:17 - 14:22

is because there needs to be
a base trust level for people to use them.
14:22 - 14:27

You don't want someone
to come to a counter-vandalism tool
14:27 - 14:28

to vandalize itself.
14:29 - 14:32

So what we are trying to do is that,
14:32 - 14:34

to begin with, we want
to make it super easy
14:34 - 14:40

but also we want to allow multiple people
to label the same thing.
14:40 - 14:42

Also we want to make it super convenient
14:42 - 14:48

to see the [inaudible],
to see other label, and all in real time.
14:48 - 14:52

We also want to make it
for researchers super easy to use.
14:52 - 14:55

By one click you can download the labeling
14:55 - 15:01

and maybe start play with the data
and see how it fits in your model.
15:02 - 15:06

And we provide APIs
that have access to real time data.
15:07 - 15:10

And for the developer
we make it very easy to pick up--
15:10 - 15:15

we have one click--
you can deploy your trial instances,
15:15 - 15:17

things like that.
15:17 - 15:21

This is an example
about building projects
15:21 - 15:23

for umbrella like WikiLoop.
15:23 - 15:28

We want to make sure
the community trust comes the first.
15:28 - 15:31

We usually need to make it
open source the best.
15:32 - 15:37

And we want to avoid proprietary tech,
we want to avoid tech lock-down,
15:38 - 15:43

and we rely on community approval
for certain features.
15:44 - 15:49

And if you have seen this--
this is the components that we rely on--
15:49 - 15:56

still very early stage but you get
the principles behind the design.
15:56 - 16:00

So what's next, we are trying
to grow our usage.
16:00 - 16:02

Hopefully you can try it out by yourself
16:02 - 16:07

and promise to me
that you don't click on the login.
16:08 - 16:09

There is a login button--
16:09 - 16:10

there will be some good features
16:10 - 16:13

that make it super easy
to even revert something.
16:13 - 16:15

Currently it's still a jump to revert.
16:17 - 16:18

But we are building features,
16:18 - 16:24

and we are also trying
to let you choose some categories
16:24 - 16:27

or the watchlist
that you will be watching
16:27 - 16:31

and the one that you care about to patrol.
16:32 - 16:38

And also if you are researchers
while doing related vandalism detection,
16:38 - 16:42

try our data and give us feedback.
16:44 - 16:47

And I will go through quickly
about a few other projects
16:47 - 16:49

that we are featuring here
16:49 - 16:52

and we will look for questions
and feedback from you
16:52 - 16:58

about what we think
and what you think should be there
16:58 - 17:02

or how we should fix things
if it doesn't work right.
17:02 - 17:06

Wikidata Game is a platform
built by a community member Magnus,
17:06 - 17:09

a celebrity in this community, I think.
17:10 - 17:13

And by showing this
we are providing datasets
17:13 - 17:20

but we also want to let people know
that we are not reinventing the wheels,
17:20 - 17:21

that we are not trying to...
17:21 - 17:24

When we come up with some idea,
we look into with community
17:24 - 17:27

and see if there is
existing tools that's there
17:27 - 17:30

and how we can be
a part of the ecosystem
17:30 - 17:36

rather than building everything
independently and everything separately.
17:37 - 17:39

And this is the current status.
17:40 - 17:43

By early results, we show that Wikidata...
17:45 - 17:47

a few games that we released
17:47 - 17:52

have triggered and proved activity
on the entities related
17:53 - 17:55

and a few follow up.
17:55 - 17:57

One thing that we have come up with,
17:57 - 18:00

as I have talked
to a few community members
18:00 - 18:02

is the PreCheck idea
18:02 - 18:09

that is basically providing
preliminary check about bulk uploads,
18:09 - 18:12

sampled preliminary check
by community member
18:12 - 18:14

and use that to generate a report,
18:14 - 18:16

make it easier for discussions
18:16 - 18:20

about whether this big block
of Wikidata datasets
18:20 - 18:25

should be included
or uploaded to wikidata.org
18:25 - 18:27

or it should be rechecked or fixed.
18:31 - 18:36

And there is another project
that is mostly a dataset project
18:36 - 18:37

called CatFacts.
18:38 - 18:43

CatFacts is datasets that we generate
18:43 - 18:46

about facts from categories,
18:46 - 18:50

the one that you see,
the Christian Scientist, just now
18:51 - 18:56

is actually an interesting outlier
of data points
18:56 - 18:58

from this effort.
18:58 - 19:02

This goal is to generate
the facts from category
19:02 - 19:07

which we think have been
very rich facts online that people...
19:08 - 19:10

that has been under leverage.
19:10 - 19:14

But before it can be fully leveraged
19:14 - 19:17

we need to make sure
that quality is good enough as well
19:17 - 19:22

and there is efforts
of putting it onto Wikidata Game
19:22 - 19:24

and there is effort that we're thinking
19:24 - 19:27

maybe building PreCheck
would help as well.
19:28 - 19:30

And it's still in early stage.
19:30 - 19:34

Feel free to come to talk us
about other efforts,
19:34 - 19:38

other ideas you think
about datasets we could provide.
19:38 - 19:42

The Bot, which is communication tools.
19:42 - 19:45

We know that Bot can do many things
like writhing Wikipedia article
19:45 - 19:50

but we promised
that we don't write actual article
19:50 - 19:53

but we mostly use it
19:53 - 19:58

as a way to communicate
from, let's say, user talk
19:59 - 20:04

to give us access
to large scale conversations
20:04 - 20:06

with the community members.
20:06 - 20:10

Explorer is going to show
all our datasets,
20:10 - 20:12

our toolings, their stats
20:12 - 20:15

and queries you can run on our things.
20:15 - 20:18

Stay tuned, this one is releasing soon.
20:19 - 20:21

And we have several other ideas
20:21 - 20:24

but I would jump
to this overall portfolio.
20:24 - 20:28

It would be several projects
to begin with datasets and tooling,
20:28 - 20:30

and what we are doing currently
20:30 - 20:33

is Explorer, Battlefield,
CatFacts and PageRank,
20:33 - 20:40

and there are some other upcoming ideas
like PreCheck, CitePool and Bubbles.
20:41 - 20:46

And this is one of the diagrams
20:46 - 20:49

that I want to show you.
20:49 - 20:53

We want to not only use
one individual project
20:53 - 20:55

to contribute the community
20:55 - 20:58

and also generate the training data
for the research, academia,
20:58 - 21:01

we also have an idea
21:01 - 21:05

that these projects may work together.
21:06 - 21:09

For example, the CitePool,
the system that we want to build
21:09 - 21:15

to allow people to easier find citations
for Wikipedia articles or Wikidata
21:16 - 21:19

but also use the Explorer
to display the result--
21:19 - 21:23

it depends on the page rank
scorances of datasets
21:24 - 21:30

to determine how to rank the citation page
that we will recommend
21:30 - 21:36

and use the PreCheck
to do quality, sanity check
21:36 - 21:40

and maybe create
bulk batch reports by Bot
21:40 - 21:44

and the PreCheck will depend
on the Game as well.
21:51 - 21:53

If some of our community friends
21:53 - 21:55

have been following
the progress of WikiLoop,
21:55 - 21:59

we have been through ice-breaking phase,
22:00 - 22:02

we were trying to earn the community trust
22:02 - 22:06

because we know how cautious
we need to be
22:06 - 22:10

coming to contribute to a movement
22:10 - 22:15

that relies so much
on the neutrality and non-bias policies.
22:15 - 22:20

And we have gradually start to have ideas
22:20 - 22:23

about tools and datas
and find the direction
22:23 - 22:26

of how we can possibly
make this sustainable.
22:26 - 22:32

And we are looking into creating
long-term sustainability,
22:32 - 22:35

both internally and also externally,
22:35 - 22:39

both in terms of getting resource
and getting support,
22:39 - 22:45

also externally of getting engagement,
getting usage, and getting contributors,
22:46 - 22:48

starting from next quarter.
22:49 - 22:53

I want to quote Evan You,
who is a creator
22:53 - 22:59

of popular frontend framework Vue.js,
22:59 - 23:01

"Software development
gets tremendously harder
23:01 - 23:06

when you start to have to convince people
instead of just writing the code."
23:06 - 23:09

This applies to editing
Wikipedia or Wikidata.
23:09 - 23:13

It's very easy to click a button
and add individual articles
23:13 - 23:19

but also it's very hard
when you need to convince people.
23:23 - 23:27

I hope to leave some time for questions,
23:27 - 23:32

although we only have few,
probably one or two minutes.
23:33 - 23:36

Yes, so we have about two minutes.
23:36 - 23:39

So if people want to shout questions out,
I'll bring the mic over.
23:41 - 23:42

Hands up maybe.
23:45 - 23:50

(person 1) So where would I go to
at this moment if I would like to use this
23:50 - 23:54

to solve some of the problem
with chemicals,
23:54 - 23:57

where some Wikipedia pages
about chemicals,
23:57 - 24:00

they have a chem box
about a specific chemical
24:00 - 24:04

but are otherwise about
a class of chemicals.
24:04 - 24:06

Is that something
where WikiLoop could help?
24:08 - 24:13

I think that's the individual
domain expertize part, right?
24:13 - 24:16

If you are talking
about topics of articles
24:16 - 24:19

that are associated with specific topics.
24:19 - 24:21

We are trying to...
we might be able to help
24:21 - 24:26

but we are trying to tackle the problem
that is like more general currently.
24:26 - 24:33

And overall the goal is
to find the possibility of
24:35 - 24:39

empowering human beings productivity
24:39 - 24:42

and also trying to generate the knowledge
24:42 - 24:44

that potentially helps...
24:44 - 24:47

the training data that potentially
helps the algorithms.
24:50 - 24:52

(person 2) I think we have time
for a very quick one.
24:55 - 24:59

(person 3) Are you also going to do this
for search of data on Commons?
25:00 - 25:01

Yeah, we hope to...
25:01 - 25:05

If you are referring to Battlefield
or counter-vandalism tools,
25:06 - 25:12

yeah, we are planning
to expand it to other Wiki projects,
25:12 - 25:14

including Commons in Wikidata.
25:15 - 25:17

(person 2) I think that's all the questions
we have time for
25:17 - 25:20

but if you'd like to show
your appreciation for [Victor.]
25:20 - 25:21

Thank you.
25:21 - 25:25

(applause)

Title:: cdn.media.ccc.de/.../wikidatacon2019-1147-eng-Project_WikiLoop_hd.mp4
Video Language:: English
Duration:: 25:30

	Bar Sch edited English subtitles for cdn.media.ccc.de/.../wikidatacon2019-1147-eng-Project_WikiLoop_hd.mp4
	C3Subtitles edited English subtitles for cdn.media.ccc.de/.../wikidatacon2019-1147-eng-Project_WikiLoop_hd.mp4

English subtitles

Revisions

Revision 2 Uploaded

Bar Sch

cdn.media.ccc.de/.../wikidatacon2019-1147-eng-Project_WikiLoop_hd.mp4

Revisions

Our website uses cookies

Operating cookies (Required)