Hi, I am Satdeep. I work
with the Foundation in Ben's team.
Here's my friend from India, Bodhi.
He's working with the Centre
for Internet and Society,
but he's here in his volunteer capacity.
So, we're going to talk about
knowledge gaps and Wikidata today.
So what are knowledge gaps?
As the name suggests, it's a gap
in our existent knowledge.
But in terms of Wikidata,
we're looking at knowledge gaps
in two different aspects.
One is, how can Wikidata help us
in filling the knowledge gaps
in other Wikimedia projects?
And the second is, how do we fill
the knowledge gaps within Wikidata?
For the first one,
"Filling knowledge gaps with Wikidata."
Wikidata is helping in a number of ways
in filling knowledge gaps
on different Wikimedia projects,
for example, ArticlePlaceholder,
or another tool called Scribe
is being built,
Wikidata Infoboxes, all of them are--
(audience reacts)
Yes, there was a session about it
early this morning or in the afternoon.
And there are also a lot
of different templates
which use Wikidata.
And then there are new templates
called [inaudible],
which along with this here
are used to make lists like these.
And if you click on one
of the topics on this list,
you get this draft article.
There was a presentation about this
in this same room by [inaudible].
So you get a draft article
with some sentences
and the infoboxes from Wikidata.
But this is not what we're going
to talk about here today.
We're going to talk about how, in India,
we first have to fill
the knowledge gaps within Wikidata,
so then we can do
all these amazing things.
So there are knowledge gaps
in localization.
We need to add a lot more labels
in different languages.
There needs to build local data
about local places, people,
so that we can do
all those awesome things.
But the main aspect of there
is to build community capacity
to do all that stuff.
So, that's where we come
to The Indic Case Study,
which this is all about.
And how did it all start?
There is a person
sitting right there, Asaf.
He is responsible for all this--
for bringing Wikidata to India.
So there was the first community
capacity development training
with the Tamil community in 2016,
where he introduced Wikidata.
And then there was like a bunch
of Wikidatans, super users
who started contributing to Wikidata.
And then, in 2017, on both our requests,
he came to India again and did
(laughs) Wiki-a-Tra--
it's like Wiki travel in India.
He did that, he went
to seven different cities,
seven different communities
at least, in India,
where he did Wikidata workshops,
mostly two-days workshops
in all those places.
And then, in 2018, again,
an Advanced Wikidata workshop.
And that has actually helped in building
some sort of Wikidata community
around India.
That also got the community engaged,
and then we started
building WikiProject India,
and then some other projects
related to that,
such as WikiProject West Bengal,
Indian Railways, and Kerala,
which are like some
specifics regions in India
where the community has been trying
to engage themselves
and doing some work around it.
And then there have been
some more initiatives to engage newbies
such as edit-a-thons,
or labelathons, datathons,
with which we've been trying
to get more and more people involved.
And some initiatives around education,
workshops in education institutions--
Asaf also did one of those.
Yeah. Next, Bodhi.
So, there have been
so many workshops in India,
throughout all of India
from 2017 to 2019.
And we're also trying to engage,
as Satdeep said,
we are trying to engage
the newbies in different ways.
But still, the number of power users
are not very much in India.
Only very few, maybe five or six people
are doing the heavy-duty work.
So one of the reasons for that:
mostly the Wikimedia community
is focused in India on other projects,
mostly in Wikipedia and somehow,
right now, in Wikisource.
So, there are very few editors who are--
very few active editors
who are contributing
to Wikidata regularly.
India is a multilingual country,
so there are around
22 Wikimedia projects
running in India.
So the workforces are totally divided.
So, we don't have
a focused group of people
who are working
on specific areas of Wikidata
because they are so much divided
into different projects,
that we have to engage--
we're trying to actively engage them
in different ways.
And they are spread over a vast region,
India is the seventh
largest country in the world,
and so it's quite difficult to coordinate
the intercommunity,
the 22 languages communities
to work on only one project.
So, we have adopted a different approach.
Firstly, we're targeting the data gaps,
which is easy because
there are huge data gaps in India
on every topic, almost every topic.
And...
(chuckles)
...start locally.
Sorry. (laughs)
- So, it's 1, 1, 1--
- Everything is a priority!
(laughter)
Anyway. So we start locally.
So we have thought that intercountry--
the data ingestion
of intercountries is quite difficult.
And there are huge databases for India,
for example, the science databases,
the election databases.
And if we work on the intercountry,
then it'd be really impossible
for five or six heavy-duty users.
So we target one place at a time.
So that is the map of India,
and you can see the bright pink color
that is West Bengal.
So in October 2018 to May 2019,
many things happened there.
So lots of data
were ingested in that part.
And after this map was generated,
there is a tool for that
called Wikidata Analysis--
built by [inaudible], user: [inaudible].
And after we got this map,
we shared this with other communities.
That "We have done this for West Bengal,
you can do it for your country.
And this is really cool."
And people have started working--
that was a direct effect.
WikiProject Kerala was built
just at that time,
and they started working
on the schools of India--
schools of Kerala--
and Kerala is situated right here--
and I couldn't [locate]
that in the map right now
because the tool is right now down.
So we just started locally.
We're trying to inspire people
from other parts of the country
to contribute.
And that's what happened in West Bengal,
around 40,000 villages
with 2001 and 2011 census.
Our data was ingested--
that's complete data.
Almost complete data
which could have been
ingested in Wikidata.
And there were 11,000
government hospitals with coordinates
which were ingested,
and there was [inaudible] approach
to close to 1 million Bengali labels.
And so on.
There were many things happening,
but these were the things
which we've done
in West Bengal at that time.
So we also tried
to create cool visualizations
from those works we've done
because census and elections,
these are boring data.
These are not paintings,
and also so we cannot--
like these are also not GLAM data
and other things.
So these are boring data.
So we need to find some way
to make it interesting for people.
So, we have tried some cool queries.
This is one of them.
There are many others.
So this is the population growth
in West Bengal
between our villages--
around 36,000 villages
between 2001 and 2011.
And not only villages,
we have uploaded census data
about every administrative hierarchy,
like community developing blocks,
districts, municipalities, wards, etc.,
cities, towns.
This is a new tool, InteGraality,
and you can see
that this is a count of hospitals
in the world,
and India is right now
leading in Wikidata--
13,466 hospitals.
The blue colors are the data completeness.
But the funny thing is--
it's only one area of India.
It's West Bengal,
there are 11,642 hospitals right now.
So if we complete all these steps
and there are more--
if we complete all those steps,
there will be a huge amount
of data about hospitals
with coordinates
which will be there in Wikidata,
and we have a plan to build an app
based on that data,
so that when a person gets ill,
using that app, he may find
the nearest location of the hospitals.
So these hospitals are ranging
from Primary Health Centers
to [inaudible] Health Cares,
with all sorts of facilities
available for them.
So we've tried to ingest
all those data in Wikidata,
if possible.
And after completing this task,
if we build some app,
then maybe someone,
a sick person in a dying urgency
can find the nearest government hospital.
- This is another--
- (Satdeep) Go back.
(Bodhi) Oh, sorry.
Okay. So this is the work
which was done for Indian Railways.
It was started there,
also from West Bengal.
And you can check the color--
the blue color is more complete data
and the green color
is slightly not complete,
but it's going to get completed soon.
And there are right now,
9,000 Indian railway stations
with coordinates, obviously,
because they are on the map.
Right now, they're being connected
with Pakistan and Bangladesh railways.
So we have a plan to connect
all Asian railways one day--
someday, maybe.
(laughs)
But, yeah, we'll do it.
Anyway. So, right now on the table,
we are in the second position
after Japan, obviously.
And-- yeah. So this is another cool query.
Visualization showing
the flight connections--
international and domestic
flight connections from India,
to and from India.
So it's like kind of messy,
but we can filter it
for domestic connections
or international connections.
So, anyway.
We have also completed
everything about 2014
Indian General Election data.
India general election is a kind
of complex state of data
because there are
so many political parties,
so many election--
not like a two party elections.
So there were 6,000 political parties
which participate in Indian--
I think 600 or something.
So, anyway.
So, yeah.
And there were so many candidates,
you can imagine.
And some of them have the same name.
Like in one constituency,
there was like three people
with the same name.
(laughs)
So that was like a funny thing.
But we completed those data--
uploading those data in Wikidata.
Right now, only 24 Indian
general elections have been done.
We don't have much users in Wikidata--
heavy-duty users in Wikidata in India.
So currently we're uploading
geoshape files of the constituencies.
In West Bengal, we have
already uploaded 43 constituencies,
geoshape files of the constituencies,
and also the [inaudible].
There is another part of India
that has not been done,
so when it will be completed then--
when it'll be--
when we upload other election that are--
like 2009 or before that,
we'll create cool animations.
That's showing how the voters
have changed their minds
from like centrist to rightist
or leftist to rightist, anyway.
So in the pipeline, there are schools,
bank branches, post offices, geoshapes,
elections, and many more.
- (man 1) Cinema.
- Cinema, yeah.
(laughs)
- Of course, cinema.
- And monuments.
And monuments.
And most of them will be completed
within a few months.
And in a not so distant future,
we'll try to upload weather data.
There are not much good property
for weather, right now, in Wikidata,
that's why we're not
touching it right now,
but we'll do it.
Also bibliographical data
for Indian literature data
are also very less in Wikidata.
And there will be
some institutional partnerships.
There were some primary talks already,
and maybe we'll have
some good news in the future.
So other ways to engage.
We have created
some subpages of WikiProject India.
We have created a skillshare initiative--
started a skillshare initiative
where people who have
slightly more knowledge in Wikidata
can share something with other people,
on a one-to-one basis
approaching online or offline way.
We have also started a newsletter,
a quarterly newsletter,
the first issue has been published
in October [2018],
and we are showcasing
cool visualizations in social media
in Facebook and Twiter channels
of Wikidata India, every day.
So these are the links.
You can find them there.
Thank you so much for the...
As most of you can already guess,
Bodhi is from that part of India,
the West Bengal,
where they've done all that work.
(laughs)
So the West Bengali community in India
has been really doing this amazing work,
and this needs to go
to other parts of India
which need more capacity development,
which need more trainings,
also more coordination in India.
And, okay, I would like to end this
with how you can help in identifying
some of the knowledge gaps
and taking that conversation forward,
which is not directly
related with this topic.
But there is a Wiki project,
"Identifying knowledge gaps,"
you can join that and share your thoughts.
We are also trying to use--
how can we use property P5008,
which is on the focus list
for a specific project--
how we can use that to surface
certain topics for contest
or other events.
And in the end, we'd like to thank you.
Also, we'd like to thank Asaf
and Mahir and Tito
who are another
two power users of Wikidata.
We'd like to sincerely thank everyone.
Thank you so much.
(applause)
Questions.
(woman 1) Mark here says, "Hi."
(laughs)
(moderator) So we have only five minutes
for questions and answers.
There. There's a question there.
(woman 1) Do I need the microphone?
(woman 1) Thank you so much
for your presentation.
Is this census data--
what exactly kind of data is that,
that you've been ingesting?
It's not for individuals, is it?
It's more like populations
and stuff like that?
It's population data, mainly.
Demographic data.
(woman 1) Are there any other things
that have been asked in the census?
(man 1) For village, gender--
(man 2) I was a little involved with that,
so I remember what the data looks like.
Per settlement in India, per village town.
You have the total population,
the masculine versus feminine population,
the literate versus illiterate population.
Within that, you have also
a separation by gender,
so you know how many
illiterate males there are
versus so many
illiterate females there are.
It's actually quite detailed.
There are hundreds and hundreds
of pieces of data per village.
Only some of them
have been modeled on Wikidata.
Just, of course,
no individual census data.
(woman 1) Sometimes countries get weird.
(woman 2) So I wanted to ask you
about the label ingestion
or the translations of labels you do.
How did you do that? Do you use tools?
How do you get people to add it
in their native language
and translate the labels.
So, mostly TABernacle,
and QuickStatements.
Those we can use, QuickStatements.
(woman 2) Alright. Cool.
But also at the same time,
like using labelathons as an activity
to engage more and more people
to do that activity.
Asaf.
The hero.
(Asaf) A note on TABernacle.
I just want to mention for anyone
who may be not aware,
all of us here use Wikidata-related tools
which means all of us
have used tools by Magnus,
the amazing tool builder.
I just wanted to point out
that he's here at the conference.
So if you haven't had a chance yet
to thank him for his amazing work
that enables so much impact--
do so today.
I'm not sure he is into hugs,
but you can just thank him.
(laughs)
(man 3) Was the skillshare working?
What do you do? What are the results?
So, the response is [still no].
But, yeah. We have five or six people
have already requested,
and we have completed those.
(Satdeep) That's going on--
Like, we just need to surface
the value of Wikidata.
I think we haven't
really been able to do that.
Also, we haven't been able
to connect with other projects
that they are already doing,
like, for example,
Wikisource or Wikipedia.
Like how we need to communicate
that in a better way
to the larger community
who is contributing.
It was just like getting up
and creating a Wiki periodical.
Like how do we involve them
and bring them here.
That's still a problem.
And Bodhi is showing the census data.
Bodhi, can you please explain?
(Bodhi) So this is population data
from the 2011 census,
5007 in 2001 in the census data.
This is one village.
So there are like 36,000 villages
or 40,000 villages.
This is the male population,
female population,
number of households,
illiterate population with male,
female, population qualifiers,
literate population
and illiterate populations,
and so on.
And this is the census code
for 2001 and 2011.
(woman 3) Okay. I just want to say
that I loved your presentation,
and I wanted to talk nearly
about the same thing tomorrow,
so it'll be great because tomorrow--
I will just [stay] watch from this one,
so making my life easier.
What I wanted to do or to talk about--
but I think the WikiProject
you're starting on Wikidata
will do that--
is all to engage people
not working about India directly,
but like I have tools, names,
but I don't deal with Indian names
because I am not sure I understand
all there are on them,
and I don't want to do
something massively wrong,
so better to be careful.
But I just need to ask with someone
who understand all the problems,
and I can add an automated tool
and deal with thousands
upon thousands of items.
And I think they are many, many tools
already doing some automated description
and things like that
for which we don't actually
need people every day,
we just need like 10 minutes time
for someone to tell me
or to say family names in those languages,
and then it just added to the tool.
And you probably know
[automated] description tool,
but if you just ask the people
who are using it massively
to just add Indian languages,
then you have all Wikidatans
doing the same work for you,
and actually, it is a problem.
I am helping an African community
build up their Wikidata
in Wikipedia, so it's not
the same problem,
but nearly the same problem.
And that's the problem we have
which is actually bridging the gap
between the biggest Wikidatans--
I am doing works in languages
I don't know a word of,
but it's this kind of adoption system,
like I need a native speaker to tell me
what I can do with all the problems
on all the complicated cases.
And everything
that I can automate, I will automate.
And it's just an idea,
but do you think it will be like
a good idea to create not so specific
Wiki knowledge gap on Wikidata,
but a matching system
like, "Hey I am working on this subject,
do you want to ask me for that?"
- Like, yeah, a matching tool, like to--
- Connect people.
- (woman 3) To connect people
across languages.
Yeah. So that was my idea
because I think
some of the African communities
I am helping,
would really, really love
what you're doing,
but none of them speak Indian,
and we just need to have pivot people
to create the link
and make all this even more powerful.
And I really, really love
what you're doing. So thank you.
Thank you so much.
Thanks to Bodhi for all the awesome work.
(laughs)
And the larger Indian community.
But that's a really good idea,
I think we should take that up.
As a movement, we have not been doing
the sharing thing pretty good.
We need to figure out how to do that.
Because there are awesome tools,
one is built, but the others
don't know about.
That's a larger problem,
and that's a piece that fits
into the larger problem.
We should be solving someplace.
Let's figure out where we can do that.
Thank you.
(applause)