-
Hi, I am Satdeep. I work
with the Foundation in Ben's team.
-
Here's my friend from India, Bodhi.
-
He's working with the Centre
for Internet and Society,
-
but he's here in his volunteer capacity.
-
So, we're going to talk about
knowledge gaps and Wikidata today.
-
So what are knowledge gaps?
-
As the name suggests, it's a gap
in our existent knowledge.
-
But in terms of Wikidata,
we're looking at knowledge gaps
-
in two different aspects.
-
One is, how can Wikidata help us
in filling the knowledge gaps
-
in other Wikimedia projects?
-
And the second is, how do we fill
the knowledge gaps within Wikidata?
-
For the first one,
"Filling knowledge gaps with Wikidata."
-
Wikidata is helping in a number of ways
-
in filling knowledge gaps
on different Wikimedia projects,
-
for example, ArticlePlaceholder,
-
or another tool called Scribe
is being built,
-
Wikidata Infoboxes, all of them are--
-
(audience reacts)
-
Yes, there was a session about it
early this morning or in the afternoon.
-
And there are also a lot
of different templates
-
which use Wikidata.
-
And then there are new templates
called [inaudible],
-
which along with this here
are used to make lists like these.
-
And if you click on one
of the topics on this list,
-
you get this draft article.
-
There was a presentation about this
in this same room by [inaudible].
-
So you get a draft article
with some sentences
-
and the infoboxes from Wikidata.
-
But this is not what we're going
to talk about here today.
-
We're going to talk about how, in India,
-
we first have to fill
the knowledge gaps within Wikidata,
-
so then we can do
all these amazing things.
-
So there are knowledge gaps
in localization.
-
We need to add a lot more labels
in different languages.
-
There needs to build local data
about local places, people,
-
so that we can do
all those awesome things.
-
But the main aspect of there
is to build community capacity
-
to do all that stuff.
-
So, that's where we come
to The Indic Case Study,
-
which this is all about.
-
And how did it all start?
-
There is a person
sitting right there, Asaf.
-
He is responsible for all this--
-
for bringing Wikidata to India.
-
So there was the first community
capacity development training
-
with the Tamil community in 2016,
where he introduced Wikidata.
-
And then there was like a bunch
of Wikidatans, super users
-
who started contributing to Wikidata.
-
And then, in 2017, on both our requests,
-
he came to India again and did
-
(laughs) Wiki-a-Tra--
-
it's like Wiki travel in India.
-
He did that, he went
to seven different cities,
-
seven different communities
at least, in India,
-
where he did Wikidata workshops,
-
mostly two-days workshops
in all those places.
-
And then, in 2018, again,
an Advanced Wikidata workshop.
-
And that has actually helped in building
some sort of Wikidata community
-
around India.
-
That also got the community engaged,
-
and then we started
building WikiProject India,
-
and then some other projects
related to that,
-
such as WikiProject West Bengal,
Indian Railways, and Kerala,
-
which are like some
specifics regions in India
-
where the community has been trying
to engage themselves
-
and doing some work around it.
-
And then there have been
some more initiatives to engage newbies
-
such as edit-a-thons,
or labelathons, datathons,
-
with which we've been trying
to get more and more people involved.
-
And some initiatives around education,
-
workshops in education institutions--
Asaf also did one of those.
-
Yeah. Next, Bodhi.
-
So, there have been
so many workshops in India,
-
throughout all of India
from 2017 to 2019.
-
And we're also trying to engage,
as Satdeep said,
-
we are trying to engage
the newbies in different ways.
-
But still, the number of power users
are not very much in India.
-
Only very few, maybe five or six people
are doing the heavy-duty work.
-
So one of the reasons for that:
-
mostly the Wikimedia community
is focused in India on other projects,
-
mostly in Wikipedia and somehow,
right now, in Wikisource.
-
So, there are very few editors who are--
-
very few active editors
-
who are contributing
to Wikidata regularly.
-
India is a multilingual country,
-
so there are around
22 Wikimedia projects
-
running in India.
-
So the workforces are totally divided.
-
So, we don't have
a focused group of people
-
who are working
on specific areas of Wikidata
-
because they are so much divided
into different projects,
-
that we have to engage--
we're trying to actively engage them
-
in different ways.
-
And they are spread over a vast region,
-
India is the seventh
largest country in the world,
-
and so it's quite difficult to coordinate
the intercommunity,
-
the 22 languages communities
to work on only one project.
-
So, we have adopted a different approach.
-
Firstly, we're targeting the data gaps,
-
which is easy because
there are huge data gaps in India
-
on every topic, almost every topic.
-
And...
-
(chuckles)
-
...start locally.
-
Sorry. (laughs)
-
- So, it's 1, 1, 1--
- Everything is a priority!
-
(laughter)
-
Anyway. So we start locally.
-
So we have thought that intercountry--
-
the data ingestion
of intercountries is quite difficult.
-
And there are huge databases for India,
-
for example, the science databases,
the election databases.
-
And if we work on the intercountry,
-
then it'd be really impossible
for five or six heavy-duty users.
-
So we target one place at a time.
-
So that is the map of India,
-
and you can see the bright pink color
that is West Bengal.
-
So in October 2018 to May 2019,
many things happened there.
-
So lots of data
were ingested in that part.
-
And after this map was generated,
-
there is a tool for that
called Wikidata Analysis--
-
built by [inaudible], user: [inaudible].
-
And after we got this map,
-
we shared this with other communities.
-
That "We have done this for West Bengal,
you can do it for your country.
-
And this is really cool."
-
And people have started working--
-
that was a direct effect.
-
WikiProject Kerala was built
just at that time,
-
and they started working
on the schools of India--
-
schools of Kerala--
and Kerala is situated right here--
-
and I couldn't [locate]
that in the map right now
-
because the tool is right now down.
-
So we just started locally.
-
We're trying to inspire people
from other parts of the country
-
to contribute.
-
And that's what happened in West Bengal,
-
around 40,000 villages
with 2001 and 2011 census.
-
Our data was ingested--
that's complete data.
-
Almost complete data
-
which could have been
ingested in Wikidata.
-
And there were 11,000
government hospitals with coordinates
-
which were ingested,
-
and there was [inaudible] approach
to close to 1 million Bengali labels.
-
And so on.
-
There were many things happening,
but these were the things
-
which we've done
in West Bengal at that time.
-
So we also tried
to create cool visualizations
-
from those works we've done
-
because census and elections,
these are boring data.
-
These are not paintings,
and also so we cannot--
-
like these are also not GLAM data
and other things.
-
So these are boring data.
-
So we need to find some way
to make it interesting for people.
-
So, we have tried some cool queries.
-
This is one of them.
There are many others.
-
So this is the population growth
in West Bengal
-
between our villages--
around 36,000 villages
-
between 2001 and 2011.
-
And not only villages,
we have uploaded census data
-
about every administrative hierarchy,
-
like community developing blocks,
districts, municipalities, wards, etc.,
-
cities, towns.
-
This is a new tool, InteGraality,
-
and you can see
-
that this is a count of hospitals
-
in the world,
-
and India is right now
leading in Wikidata--
-
13,466 hospitals.
-
The blue colors are the data completeness.
-
But the funny thing is--
it's only one area of India.
-
It's West Bengal,
-
there are 11,642 hospitals right now.
-
So if we complete all these steps
and there are more--
-
if we complete all those steps,
-
there will be a huge amount
of data about hospitals
-
with coordinates
which will be there in Wikidata,
-
and we have a plan to build an app
based on that data,
-
so that when a person gets ill,
-
using that app, he may find
-
the nearest location of the hospitals.
-
So these hospitals are ranging
from Primary Health Centers
-
to [inaudible] Health Cares,
-
with all sorts of facilities
available for them.
-
So we've tried to ingest
all those data in Wikidata,
-
if possible.
-
And after completing this task,
if we build some app,
-
then maybe someone,
a sick person in a dying urgency
-
can find the nearest government hospital.
-
- This is another--
- (Satdeep) Go back.
-
(Bodhi) Oh, sorry.
-
Okay. So this is the work
which was done for Indian Railways.
-
It was started there,
also from West Bengal.
-
And you can check the color--
-
the blue color is more complete data
-
and the green color
is slightly not complete,
-
but it's going to get completed soon.
-
And there are right now,
9,000 Indian railway stations
-
with coordinates, obviously,
because they are on the map.
-
Right now, they're being connected
with Pakistan and Bangladesh railways.
-
So we have a plan to connect
all Asian railways one day--
-
someday, maybe.
-
(laughs)
-
But, yeah, we'll do it.
-
Anyway. So, right now on the table,
-
we are in the second position
after Japan, obviously.
-
And-- yeah. So this is another cool query.
-
Visualization showing
the flight connections--
-
international and domestic
flight connections from India,
-
to and from India.
-
So it's like kind of messy,
but we can filter it
-
for domestic connections
or international connections.
-
So, anyway.
-
We have also completed
-
everything about 2014
Indian General Election data.
-
India general election is a kind
of complex state of data
-
because there are
so many political parties,
-
so many election--
not like a two party elections.
-
So there were 6,000 political parties
which participate in Indian--
-
I think 600 or something.
-
So, anyway.
-
So, yeah.
-
And there were so many candidates,
you can imagine.
-
And some of them have the same name.
-
Like in one constituency,
-
there was like three people
with the same name.
-
(laughs)
-
So that was like a funny thing.
-
But we completed those data--
uploading those data in Wikidata.
-
Right now, only 24 Indian
general elections have been done.
-
We don't have much users in Wikidata--
heavy-duty users in Wikidata in India.
-
So currently we're uploading
geoshape files of the constituencies.
-
In West Bengal, we have
already uploaded 43 constituencies,
-
geoshape files of the constituencies,
and also the [inaudible].
-
There is another part of India
that has not been done,
-
so when it will be completed then--
-
when it'll be--
-
when we upload other election that are--
-
like 2009 or before that,
-
we'll create cool animations.
-
That's showing how the voters
have changed their minds
-
from like centrist to rightist
or leftist to rightist, anyway.
-
So in the pipeline, there are schools,
-
bank branches, post offices, geoshapes,
elections, and many more.
-
- (man 1) Cinema.
- Cinema, yeah.
-
(laughs)
-
- Of course, cinema.
- And monuments.
-
And monuments.
-
And most of them will be completed
within a few months.
-
And in a not so distant future,
we'll try to upload weather data.
-
There are not much good property
for weather, right now, in Wikidata,
-
that's why we're not
touching it right now,
-
but we'll do it.
-
Also bibliographical data
-
for Indian literature data
are also very less in Wikidata.
-
And there will be
some institutional partnerships.
-
There were some primary talks already,
-
and maybe we'll have
some good news in the future.
-
So other ways to engage.
-
We have created
some subpages of WikiProject India.
-
We have created a skillshare initiative--
started a skillshare initiative
-
where people who have
slightly more knowledge in Wikidata
-
can share something with other people,
-
on a one-to-one basis
approaching online or offline way.
-
We have also started a newsletter,
a quarterly newsletter,
-
the first issue has been published
in October [2018],
-
and we are showcasing
cool visualizations in social media
-
in Facebook and Twiter channels
of Wikidata India, every day.
-
So these are the links.
-
You can find them there.
-
Thank you so much for the...
-
As most of you can already guess,
-
Bodhi is from that part of India,
the West Bengal,
-
where they've done all that work.
-
(laughs)
-
So the West Bengali community in India
has been really doing this amazing work,
-
and this needs to go
to other parts of India
-
which need more capacity development,
-
which need more trainings,
also more coordination in India.
-
And, okay, I would like to end this
-
with how you can help in identifying
some of the knowledge gaps
-
and taking that conversation forward,
-
which is not directly
related with this topic.
-
But there is a Wiki project,
"Identifying knowledge gaps,"
-
you can join that and share your thoughts.
-
We are also trying to use--
how can we use property P5008,
-
which is on the focus list
for a specific project--
-
how we can use that to surface
certain topics for contest
-
or other events.
-
And in the end, we'd like to thank you.
-
Also, we'd like to thank Asaf
and Mahir and Tito
-
who are another
two power users of Wikidata.
-
We'd like to sincerely thank everyone.
-
Thank you so much.
-
(applause)
-
Questions.
-
(woman 1) Mark here says, "Hi."
-
(laughs)
-
(moderator) So we have only five minutes
for questions and answers.
-
There. There's a question there.
-
(woman 1) Do I need the microphone?
-
(woman 1) Thank you so much
for your presentation.
-
Is this census data--
what exactly kind of data is that,
-
that you've been ingesting?
-
It's not for individuals, is it?
-
It's more like populations
and stuff like that?
-
It's population data, mainly.
Demographic data.
-
(woman 1) Are there any other things
that have been asked in the census?
-
(man 1) For village, gender--
-
(man 2) I was a little involved with that,
so I remember what the data looks like.
-
Per settlement in India, per village town.
-
You have the total population,
the masculine versus feminine population,
-
the literate versus illiterate population.
-
Within that, you have also
a separation by gender,
-
so you know how many
illiterate males there are
-
versus so many
illiterate females there are.
-
It's actually quite detailed.
-
There are hundreds and hundreds
of pieces of data per village.
-
Only some of them
have been modeled on Wikidata.
-
Just, of course,
no individual census data.
-
(woman 1) Sometimes countries get weird.
-
(woman 2) So I wanted to ask you
about the label ingestion
-
or the translations of labels you do.
-
How did you do that? Do you use tools?
-
How do you get people to add it
in their native language
-
and translate the labels.
-
So, mostly TABernacle,
-
and QuickStatements.
-
Those we can use, QuickStatements.
-
(woman 2) Alright. Cool.
-
But also at the same time,
like using labelathons as an activity
-
to engage more and more people
to do that activity.
-
Asaf.
-
The hero.
-
(Asaf) A note on TABernacle.
-
I just want to mention for anyone
who may be not aware,
-
all of us here use Wikidata-related tools
-
which means all of us
have used tools by Magnus,
-
the amazing tool builder.
-
I just wanted to point out
that he's here at the conference.
-
So if you haven't had a chance yet
-
to thank him for his amazing work
that enables so much impact--
-
do so today.
-
I'm not sure he is into hugs,
but you can just thank him.
-
(laughs)
-
(man 3) Was the skillshare working?
-
What do you do? What are the results?
-
So, the response is [still no].
-
But, yeah. We have five or six people
have already requested,
-
and we have completed those.
-
(Satdeep) That's going on--
-
Like, we just need to surface
the value of Wikidata.
-
I think we haven't
really been able to do that.
-
Also, we haven't been able
to connect with other projects
-
that they are already doing,
-
like, for example,
Wikisource or Wikipedia.
-
Like how we need to communicate
that in a better way
-
to the larger community
who is contributing.
-
It was just like getting up
and creating a Wiki periodical.
-
Like how do we involve them
and bring them here.
-
That's still a problem.
-
And Bodhi is showing the census data.
-
Bodhi, can you please explain?
-
(Bodhi) So this is population data
from the 2011 census,
-
5007 in 2001 in the census data.
-
This is one village.
-
So there are like 36,000 villages
or 40,000 villages.
-
This is the male population,
female population,
-
number of households,
-
illiterate population with male,
female, population qualifiers,
-
literate population
and illiterate populations,
-
and so on.
-
And this is the census code
for 2001 and 2011.
-
(woman 3) Okay. I just want to say
that I loved your presentation,
-
and I wanted to talk nearly
about the same thing tomorrow,
-
so it'll be great because tomorrow--
I will just [stay] watch from this one,
-
so making my life easier.
-
What I wanted to do or to talk about--
-
but I think the WikiProject
you're starting on Wikidata
-
will do that--
-
is all to engage people
not working about India directly,
-
but like I have tools, names,
but I don't deal with Indian names
-
because I am not sure I understand
all there are on them,
-
and I don't want to do
something massively wrong,
-
so better to be careful.
-
But I just need to ask with someone
who understand all the problems,
-
and I can add an automated tool
-
and deal with thousands
upon thousands of items.
-
And I think they are many, many tools
-
already doing some automated description
and things like that
-
for which we don't actually
need people every day,
-
we just need like 10 minutes time
for someone to tell me
-
or to say family names in those languages,
-
and then it just added to the tool.
-
And you probably know
[automated] description tool,
-
but if you just ask the people
who are using it massively
-
to just add Indian languages,
-
then you have all Wikidatans
doing the same work for you,
-
and actually, it is a problem.
-
I am helping an African community
build up their Wikidata
-
in Wikipedia, so it's not
the same problem,
-
but nearly the same problem.
-
And that's the problem we have
-
which is actually bridging the gap
-
between the biggest Wikidatans--
-
I am doing works in languages
I don't know a word of,
-
but it's this kind of adoption system,
-
like I need a native speaker to tell me
-
what I can do with all the problems
on all the complicated cases.
-
And everything
that I can automate, I will automate.
-
And it's just an idea,
but do you think it will be like
-
a good idea to create not so specific
Wiki knowledge gap on Wikidata,
-
but a matching system
-
like, "Hey I am working on this subject,
do you want to ask me for that?"
-
- Like, yeah, a matching tool, like to--
- Connect people.
-
- (woman 3) To connect people
across languages.
-
Yeah. So that was my idea
because I think
-
some of the African communities
I am helping,
-
would really, really love
what you're doing,
-
but none of them speak Indian,
and we just need to have pivot people
-
to create the link
-
and make all this even more powerful.
-
And I really, really love
what you're doing. So thank you.
-
Thank you so much.
-
Thanks to Bodhi for all the awesome work.
-
(laughs)
-
And the larger Indian community.
-
But that's a really good idea,
I think we should take that up.
-
As a movement, we have not been doing
the sharing thing pretty good.
-
We need to figure out how to do that.
-
Because there are awesome tools,
-
one is built, but the others
don't know about.
-
That's a larger problem,
-
and that's a piece that fits
into the larger problem.
-
We should be solving someplace.
-
Let's figure out where we can do that.
-
Thank you.
-
(applause)