-
[inaudible] and I
have an effort called WikiLoop,
-
and this is what I'm going
to introduce to you about.
-
We have presented WikiLoop, the idea,
to several Wikimedia related conferences.
-
How many of you have heard
about WikiLoop before?
-
Thanks.
-
And how many of you have interacted
with the datasets and toolings
-
that we provided before?
-
Okay, fairly new.
So this will be mostly an introduction.
-
So we would like to tell you
why we start this initiative
-
and what it intends to do,
-
and how you can get involved
or what it will go for.
-
So, to begin with,
we would like to give you an example.
-
This is a vandalism
that happened in Italian...
-
that happened in Italy Wikipedia.
-
I know that most people here
are interested in Wikidata.
-
I will tell you why this is relevant too.
-
So basically what we found is
-
that someone vandalized
Wikipedia on Italian
-
and says, "Bezos who cannot afford a car."
-
And this is an interesting question,
-
if you think about it,
this is blatant obvious vandalism
-
but when it comes to machines
and algorithms
-
which find to detect vandalism
and avoid serving users the information,
-
how can computer understand
this kind of information,
-
like it would be...
-
we realize that sometimes
there are limitations
-
of how far algorithms can go
and machine can go.
-
Another example here is let's say,
-
there is a word or label,
or a category on Wikipedia says,
-
someone, a person,
is a Christian scientist.
-
Now, given this label,
what facts do you come up with
-
like what would you infer
from this category?
-
Do you think it would be a "Christian"
or do you think it would be a "scientist"?
-
In this specific case--
it does not apply everywhere--
-
but it this specific case,
-
there is a religion
called "Christian Science,"
-
and people who hold that belief
is called "Christian Scientist."
-
And, again, for machines,
how can we know, like
-
even if many people here are big [fan]
-
that's the better we make our data
a knowledge machine-friendly
-
the easier we can work and improve
the overall knowledge accessibility
-
and contribute together
-
but there is always things
-
that we believe
that machine has restrictions.
-
So all in all, we start to realize
-
that coming from Internet companies
-
who have a strong belief
of our technology
-
and what machine can do,
-
there is always a gap
or there is always something
-
that we would need to rely on human being
-
and more, we would need
to rely on communities
-
who are actively contributing,
who are doing the peer reviews to our...
-
collaborating with each other.
-
So this is a picture
about the background effort of WikiLoop.
-
For the human being,
they have the knowledge,
-
we have our domain expertize
and we can crosscheck each other
-
but we just have that enough time.
-
And there are many things
that machine can empower this
-
but there is restrictions there.
-
So the goal is to empower
-
or improve the productivity
of human editors.
-
But also the other side of the formula
is we want to loop that back
-
to the research and the academic efforts
-
that improve how machine
can help in these cases.
-
So by raise of hand,
how many of you have used Google?
-
Thank you.
-
And how many of you
-
think that companies like Google
and other big knowledge companies
-
should contribute more
to the knowledge world?
-
So what happens is that...
-
we all know that our mission at Google
or other similar companies--
-
we have a strong background
of leveraging the open knowledge world,
-
like for Google specific case
-
it's like organize
the world's information.
-
So we help disseminate the information,
-
which in one sense that helps
the mission of this movement.
-
But only every once a while
we have sporadic help
-
trying to donate knowledge
and datasets, and tools,
-
and we want to see
if we can make this sustainable,
-
both in the technical sense
-
and also in the business sense.
-
So this is like
a one-sentence introduction.
-
We want WikiLoop
to become an umbrella program
-
for a series of technical projects
-
intended to contribute
datasets and toolings
-
and hopefully make this a community effort
with participation of
-
other likeminded people,
partners and institutions
-
to join with this effort.
-
There are several projects
that we think would be a good fit,
-
and these are the criteria.
-
First of all, the idea is
that it needs to be source improvements
-
or source improvements
by and large is a good fit,
-
and also the second thing
that companies like us
-
really cannot do very well by ourself
-
is to maximize the neutrality,
to avoid picking sides
-
on the controversies,
decisions or discussions
-
and another thing is that to make this
in the long-term sustainability
-
and to keep it
being supported by this industry.
-
We want to see
the productivity, the scalability
-
of our contribution and efforts.
-
To explain a little bit more...
-
We always look trying to extract...
-
for example, we are trying
to extract facts from Wikipedia.
-
And while we can do
several separations,
-
we're labeling, fairly well,
-
up to certain point
the bottleneck is no longer
-
how good the machine,
the algorithm can reach
-
but sometimes
there is a noise in the source,
-
and if we do not remove the source
-
or minimize the source noise there,
-
that's how far the machine can go.
-
So that's the first criteria.
-
And the second criteria is,
-
we don't want to get to be seen as buyers
or introduce potential buyers.
-
We want to rely on
governance that is peer reviewed
-
and that is done by the community
-
so that we can avoid picking sides
in the controversy questions.
-
And the third thing
which probably not so intuitive
-
but this is the kind of...
I would like...
-
Let me give you an example
of the projects we have in mind.
-
Let's say there are smaller,
minority language there.
-
I have heard a very good talk
earlier this morning.
-
But one idea we have here is,
-
let's say you are a minority
language contributor, very active,
-
and you want to advocate for your culture
and supporting your knowledge creation.
-
But because companies like Google
or other consumer company,
-
they have a bar
for releasing a translation,
-
to make it available.
-
They want the precision to be high enough
-
so that they can use it to serve users.
-
But maybe internally they have AI modules
that are experimenting,
-
not good enough to the bar
-
because lack of training data,
-
so the translation is not available.
-
But the community is doing
the translation by hand anyway.
-
Now, one of the things we are thinking of,
-
if we can provide
some of this experimental thing
-
that is not good enough
to serve general user purpose
-
but still good for the community
-
and somewhat improving the productivity,
-
it would be able to
-
one, improve the speed of how well
a community can contribute,
-
and second, what a community is creating
anyway can come back as a training data
-
that keeps bootstrapping the machines.
-
So over time by this effort
we hope to generate a model
-
that both helps
the human being, the editors,
-
but also helps the research
-
that improves the AI and other approaches.
-
And this is a big overview
of a few projects
-
we are going to introduce.
-
Due to the time limitation
I will feature a few.
-
The WikiLoop Game, which you can look up,
-
is one that we leveraged a platform
-
created by Magnus called Wikidata Game.
-
We provide several datasets there
to be played, to be introduced
-
and commit to the Wikidata
-
but by the human review.
-
And Google doesn't get
to contribute data directly
-
to Wikipedia or Wikidata
-
but having someone who is reviewing it
as non-biased individuals to do so.
-
And the second one I'm going to feature
is WikiLoop Battlefield,
-
the one that you have seen just now
as a counter-vandalism platform,
-
and this one also features
the same criteria
-
of source improvements,
-
of how it can empower machines
-
by looping back to the training data
-
and also how it avoids companies like us
-
to pick sides allowing way to rely
on the community's assessment.
-
And the third one is CitePool,
which is creating...
-
we're trying to help creating
citation candidate pool
-
to improve the productivity of people
who want to add citation
-
but also see if we can make that
-
into a training data
accessible to researchers.
-
So let me use WikiLoop Battlefield
as an example.
-
If you have... try it on your phone--
battlefield.wikiloop.org.
-
By the way, I want to highlight,
the name is subject to change
-
because some friendly community members
have come to me and suggest
-
that Battlefield might not be
the best name for a project
-
serving the Wikimedia movement.
-
So if you don't like this name,
come join us in the discussion,
-
provide your suggestion,
-
we will be very happy
to converge to a name
-
that has community consensus
and popularity.
-
But let's use that as a placeholder here.
-
I don't need to introduce
to this group of people
-
about the typical vandalism workflow
-
but if you have already...
-
trying to conduct
some counter-vandalism activity,
-
you might know that it's not very trivial.
-
How many of you have seen vandalism
on Wikipedia and Wikidata?
-
Okay, how many of you
have reverted, by hand, some of them?
-
How many of you have used certain tools
or go ahead and find certain tools
-
to patrol or revert vandalism?
-
Okay.
-
Cool, this is
the highest density of people
-
who have tried to revert vandalism
-
that I have spoken to before.
-
So maybe some of you have been
very comfortably doing that
-
but for me as someone
who started editing actively
-
only since like three years ago
-
and who only started to be very serous
doing vandalism detection and patrolling
-
only since about last year
-
I found that doing so is not super easy
-
on the world of Wikimedia movement.
-
If we look at the existing alternatives
-
there are tools that is built
featuring desktops,
-
there are tools that is relying
on users who have rollback permissions,
-
which itself is a big barrier to get.
-
We want to make this
a super easy to use platform
-
for all the three roles.
-
The first one is user, reviewer or editor,
whatever you call it.
-
The second one is researcher
-
who is trying to create
vandalism detecting algorithms or systems.
-
And the third one is developers
-
who is trying to improve
this WikiLoop Battlefield tooling.
-
We want it to be
super easy for user to use.
-
You can you pull up your phone,
you don't have to install it,
-
you can do in on your laptop.
-
And we also want
to lower a barrier to review.
-
The reason why other tools
are trying to limit the access to the tool
-
is because there needs to be
a base trust level for people to use them.
-
You don't want someone
to come to a counter-vandalism tool
-
to vandalize itself.
-
So what we are trying to do is that,
-
to begin with, we want
to make it super easy
-
but also we want to allow multiple people
to label the same thing.
-
Also we want to make it super convenient
-
to see the [inaudible],
to see other label, and all in real time.
-
We also want to make it
for researchers super easy to use.
-
By one click you can download the labeling
-
and maybe start play with the data
and see how it fits in your model.
-
And we provide APIs
that have access to real time data.
-
And for the developer
we make it very easy to pick up--
-
we have one click--
you can deploy your trial instances,
-
things like that.
-
This is an example
about building projects
-
for umbrella like WikiLoop.
-
We want to make sure
the community trust comes the first.
-
We usually need to make it
open source the best.
-
And we want to avoid proprietary tech,
we want to avoid tech lock-down,
-
and we rely on community approval
for certain features.
-
And if you have seen this--
this is the components that we rely on--
-
still very early stage but you get
the principles behind the design.
-
So what's next, we are trying
to grow our usage.
-
Hopefully you can try it out by yourself
-
and promise to me
that you don't click on the login.
-
There is a login button--
-
there will be some good features
-
that make it super easy
to even revert something.
-
Currently it's still a jump to revert.
-
But we are building features,
-
and we are also trying
to let you choose some categories
-
or the watchlist
that you will be watching
-
and the one that you care about to patrol.
-
And also if you are researchers
while doing related vandalism detection,
-
try our data and give us feedback.
-
And I will go through quickly
about a few other projects
-
that we are featuring here
-
and we will look for questions
and feedback from you
-
about what we think
and what you think should be there
-
or how we should fix things
if it doesn't work right.
-
Wikidata Game is a platform
built by a community member Magnus,
-
a celebrity in this community, I think.
-
And by showing this
we are providing datasets
-
but we also want to let people know
that we are not reinventing the wheels,
-
that we are not trying to...
-
When we come up with some idea,
we look into with community
-
and see if there is
existing tools that's there
-
and how we can be
a part of the ecosystem
-
rather than building everything
independently and everything separately.
-
And this is the current status.
-
By early results, we show that Wikidata...
-
a few games that we released
-
have triggered and proved activity
on the entities related
-
and a few follow up.
-
One thing that we have come up with,
-
as I have talked
to a few community members
-
is the PreCheck idea
-
that is basically providing
preliminary check about bulk uploads,
-
sampled preliminary check
by community member
-
and use that to generate a report,
-
make it easier for discussions
-
about whether this big block
of Wikidata datasets
-
should be included
or uploaded to wikidata.org
-
or it should be rechecked or fixed.
-
And there is another project
that is mostly a dataset project
-
called CatFacts.
-
CatFacts is datasets that we generate
-
about facts from categories,
-
the one that you see,
the Christian Scientist, just now
-
is actually an interesting outlier
of data points
-
from this effort.
-
This goal is to generate
the facts from category
-
which we think have been
very rich facts online that people...
-
that has been under leverage.
-
But before it can be fully leveraged
-
we need to make sure
that quality is good enough as well
-
and there is efforts
of putting it onto Wikidata Game
-
and there is effort that we're thinking
-
maybe building PreCheck
would help as well.
-
And it's still in early stage.
-
Feel free to come to talk us
about other efforts,
-
other ideas you think
about datasets we could provide.
-
The Bot, which is communication tools.
-
We know that Bot can do many things
like writhing Wikipedia article
-
but we promised
that we don't write actual article
-
but we mostly use it
-
as a way to communicate
from, let's say, user talk
-
to give us access
to large scale conversations
-
with the community members.
-
Explorer is going to show
all our datasets,
-
our toolings, their stats
-
and queries you can run on our things.
-
Stay tuned, this one is releasing soon.
-
And we have several other ideas
-
but I would jump
to this overall portfolio.
-
It would be several projects
to begin with datasets and tooling,
-
and what we are doing currently
-
is Explorer, Battlefield,
CatFacts and PageRank,
-
and there are some other upcoming ideas
like PreCheck, CitePool and Bubbles.
-
And this is one of the diagrams
-
that I want to show you.
-
We want to not only use
one individual project
-
to contribute the community
-
and also generate the training data
for the research, academia,
-
we also have an idea
-
that these projects may work together.
-
For example, the CitePool,
the system that we want to build
-
to allow people to easier find citations
for Wikipedia articles or Wikidata
-
but also use the Explorer
to display the result--
-
it depends on the page rank
scorances of datasets
-
to determine how to rank the citation page
that we will recommend
-
and use the PreCheck
to do quality, sanity check
-
and maybe create
bulk batch reports by Bot
-
and the PreCheck will depend
on the Game as well.
-
If some of our community friends
-
have been following
the progress of WikiLoop,
-
we have been through ice-breaking phase,
-
we were trying to earn the community trust
-
because we know how cautious
we need to be
-
coming to contribute to a movement
-
that relies so much
on the neutrality and non-bias policies.
-
And we have gradually start to have ideas
-
about tools and datas
and find the direction
-
of how we can possibly
make this sustainable.
-
And we are looking into creating
long-term sustainability,
-
both internally and also externally,
-
both in terms of getting resource
and getting support,
-
also externally of getting engagement,
getting usage, and getting contributors,
-
starting from next quarter.
-
I want to quote Evan You,
who is a creator
-
of popular frontend framework Vue.js,
-
"Software development
gets tremendously harder
-
when you start to have to convince people
instead of just writing the code."
-
This applies to editing
Wikipedia or Wikidata.
-
It's very easy to click a button
and add individual articles
-
but also it's very hard
when you need to convince people.
-
I hope to leave some time for questions,
-
although we only have few,
probably one or two minutes.
-
Yes, so we have about two minutes.
-
So if people want to shout questions out,
I'll bring the mic over.
-
Hands up maybe.
-
(person 1) So where would I go to
at this moment if I would like to use this
-
to solve some of the problem
with chemicals,
-
where some Wikipedia pages
about chemicals,
-
they have a chem box
about a specific chemical
-
but are otherwise about
a class of chemicals.
-
Is that something
where WikiLoop could help?
-
I think that's the individual
domain expertize part, right?
-
If you are talking
about topics of articles
-
that are associated with specific topics.
-
We are trying to...
we might be able to help
-
but we are trying to tackle the problem
that is like more general currently.
-
And overall the goal is
to find the possibility of
-
empowering human beings productivity
-
and also trying to generate the knowledge
-
that potentially helps...
-
the training data that potentially
helps the algorithms.
-
(person 2) I think we have time
for a very quick one.
-
(person 3) Are you also going to do this
for search of data on Commons?
-
Yeah, we hope to...
-
If you are referring to Battlefield
or counter-vandalism tools,
-
yeah, we are planning
to expand it to other Wiki projects,
-
including Commons in Wikidata.
-
(person 2) I think that's all the questions
we have time for
-
but if you'd like to show
your appreciation for [Victor.]
-
Thank you.
-
(applause)