Hi, I am Satdeep. I work with the Foundation in Ben's team. Here's my friend from India, Bodhi. He's working with the Centre for Internet and Society, but he's here in his volunteer capacity. So, we're going to talk about knowledge gaps and Wikidata today. So what are knowledge gaps? As the name suggests, it's a gap in our existent knowledge. But in terms of Wikidata, we're looking at knowledge gaps in two different aspects. One is, how can Wikidata help us in filling the knowledge gaps in other Wikimedia projects? And the second is, how do we fill the knowledge gaps within Wikidata? For the first one, "Filling knowledge gaps with Wikidata." Wikidata is helping in a number of ways in filling knowledge gaps on different Wikimedia projects, for example, ArticlePlaceholder, or another tool called Scribe is being built, Wikidata Infoboxes, all of them are-- (audience reacts) Yes, there was a session about it early this morning or in the afternoon. And there are also a lot of different templates which use Wikidata. And then there are new templates called [inaudible], which along with this here are used to make lists like these. And if you click on one of the topics on this list, you get this draft article. There was a presentation about this in this same room by [inaudible]. So you get a draft article with some sentences and the infoboxes from Wikidata. But this is not what we're going to talk about here today. We're going to talk about how, in India, we first have to fill the knowledge gaps within Wikidata, so then we can do all these amazing things. So there are knowledge gaps in localization. We need to add a lot more labels in different languages. There needs to build local data about local places, people, so that we can do all those awesome things. But the main aspect of there is to build community capacity to do all that stuff. So, that's where we come to The Indic Case Study, which this is all about. And how did it all start? There is a person sitting right there, Asaf. He is responsible for all this-- for bringing Wikidata to India. So there was the first community capacity development training with the Tamil community in 2016, where he introduced Wikidata. And then there was like a bunch of Wikidatans, super users who started contributing to Wikidata. And then, in 2017, on both our requests, he came to India again and did (laughs) Wiki-a-Tra-- it's like Wiki travel in India. He did that, he went to seven different cities, seven different communities at least, in India, where he did Wikidata workshops, mostly two-days workshops in all those places. And then, in 2018, again, an Advanced Wikidata workshop. And that has actually helped in building some sort of Wikidata community around India. That also got the community engaged, and then we started building WikiProject India, and then some other projects related to that, such as WikiProject West Bengal, Indian Railways, and Kerala, which are like some specifics regions in India where the community has been trying to engage themselves and doing some work around it. And then there have been some more initiatives to engage newbies such as edit-a-thons, or labelathons, datathons, with which we've been trying to get more and more people involved. And some initiatives around education, workshops in education institutions-- Asaf also did one of those. Yeah. Next, Bodhi. So, there have been so many workshops in India, throughout all of India from 2017 to 2019. And we're also trying to engage, as Satdeep said, we are trying to engage the newbies in different ways. But still, the number of power users are not very much in India. Only very few, maybe five or six people are doing the heavy-duty work. So one of the reasons for that: mostly the Wikimedia community is focused in India on other projects, mostly in Wikipedia and somehow, right now, in Wikisource. So, there are very few editors who are-- very few active editors who are contributing to Wikidata regularly. India is a multilingual country, so there are around 22 Wikimedia projects running in India. So the workforces are totally divided. So, we don't have a focused group of people who are working on specific areas of Wikidata because they are so much divided into different projects, that we have to engage-- we're trying to actively engage them in different ways. And they are spread over a vast region, India is the seventh largest country in the world, and so it's quite difficult to coordinate the intercommunity, the 22 languages communities to work on only one project. So, we have adopted a different approach. Firstly, we're targeting the data gaps, which is easy because there are huge data gaps in India on every topic, almost every topic. And... (chuckles) ...start locally. Sorry. (laughs) - So, it's 1, 1, 1-- - Everything is a priority! (laughter) Anyway. So we start locally. So we have thought that intercountry-- the data ingestion of intercountries is quite difficult. And there are huge databases for India, for example, the science databases, the election databases. And if we work on the intercountry, then it'd be really impossible for five or six heavy-duty users. So we target one place at a time. So that is the map of India, and you can see the bright pink color that is West Bengal. So in October 2018 to May 2019, many things happened there. So lots of data were ingested in that part. And after this map was generated, there is a tool for that called Wikidata Analysis-- built by [inaudible], user: [inaudible]. And after we got this map, we shared this with other communities. That "We have done this for West Bengal, you can do it for your country. And this is really cool." And people have started working-- that was a direct effect. WikiProject Kerala was built just at that time, and they started working on the schools of India-- schools of Kerala-- and Kerala is situated right here-- and I couldn't [locate] that in the map right now because the tool is right now down. So we just started locally. We're trying to inspire people from other parts of the country to contribute. And that's what happened in West Bengal, around 40,000 villages with 2001 and 2011 census. Our data was ingested-- that's complete data. Almost complete data which could have been ingested in Wikidata. And there were 11,000 government hospitals with coordinates which were ingested, and there was [inaudible] approach to close to 1 million Bengali labels. And so on. There were many things happening, but these were the things which we've done in West Bengal at that time. So we also tried to create cool visualizations from those works we've done because census and elections, these are boring data. These are not paintings, and also so we cannot-- like these are also not GLAM data and other things. So these are boring data. So we need to find some way to make it interesting for people. So, we have tried some cool queries. This is one of them. There are many others. So this is the population growth in West Bengal between our villages-- around 36,000 villages between 2001 and 2011. And not only villages, we have uploaded census data about every administrative hierarchy, like community developing blocks, districts, municipalities, wards, etc., cities, towns. This is a new tool, InteGraality, and you can see that this is a count of hospitals in the world, and India is right now leading in Wikidata-- 13,466 hospitals. The blue colors are the data completeness. But the funny thing is-- it's only one area of India. It's West Bengal, there are 11,642 hospitals right now. So if we complete all these steps and there are more-- if we complete all those steps, there will be a huge amount of data about hospitals with coordinates which will be there in Wikidata, and we have a plan to build an app based on that data, so that when a person gets ill, using that app, he may find the nearest location of the hospitals. So these hospitals are ranging from Primary Health Centers to [inaudible] Health Cares, with all sorts of facilities available for them. So we've tried to ingest all those data in Wikidata, if possible. And after completing this task, if we build some app, then maybe someone, a sick person in a dying urgency can find the nearest government hospital. - This is another-- - (Satdeep) Go back. (Bodhi) Oh, sorry. Okay. So this is the work which was done for Indian Railways. It was started there, also from West Bengal. And you can check the color-- the blue color is more complete data and the green color is slightly not complete, but it's going to get completed soon. And there are right now, 9,000 Indian railway stations with coordinates, obviously, because they are on the map. Right now, they're being connected with Pakistan and Bangladesh railways. So we have a plan to connect all Asian railways one day-- someday, maybe. (laughs) But, yeah, we'll do it. Anyway. So, right now on the table, we are in the second position after Japan, obviously. And-- yeah. So this is another cool query. Visualization showing the flight connections-- international and domestic flight connections from India, to and from India. So it's like kind of messy, but we can filter it for domestic connections or international connections. So, anyway. We have also completed everything about 2014 Indian General Election data. India general election is a kind of complex state of data because there are so many political parties, so many election-- not like a two party elections. So there were 6,000 political parties which participate in Indian-- I think 600 or something. So, anyway. So, yeah. And there were so many candidates, you can imagine. And some of them have the same name. Like in one constituency, there was like three people with the same name. (laughs) So that was like a funny thing. But we completed those data-- uploading those data in Wikidata. Right now, only 24 Indian general elections have been done. We don't have much users in Wikidata-- heavy-duty users in Wikidata in India. So currently we're uploading geoshape files of the constituencies. In West Bengal, we have already uploaded 43 constituencies, geoshape files of the constituencies, and also the [inaudible]. There is another part of India that has not been done, so when it will be completed then-- when it'll be-- when we upload other election that are-- like 2009 or before that, we'll create cool animations. That's showing how the voters have changed their minds from like centrist to rightist or leftist to rightist, anyway. So in the pipeline, there are schools, bank branches, post offices, geoshapes, elections, and many more. - (man 1) Cinema. - Cinema, yeah. (laughs) - Of course, cinema. - And monuments. And monuments. And most of them will be completed within a few months. And in a not so distant future, we'll try to upload weather data. There are not much good property for weather, right now, in Wikidata, that's why we're not touching it right now, but we'll do it. Also bibliographical data for Indian literature data are also very less in Wikidata. And there will be some institutional partnerships. There were some primary talks already, and maybe we'll have some good news in the future. So other ways to engage. We have created some subpages of WikiProject India. We have created a skillshare initiative-- started a skillshare initiative where people who have slightly more knowledge in Wikidata can share something with other people, on a one-to-one basis approaching online or offline way. We have also started a newsletter, a quarterly newsletter, the first issue has been published in October [2018], and we are showcasing cool visualizations in social media in Facebook and Twiter channels of Wikidata India, every day. So these are the links. You can find them there. Thank you so much for the... As most of you can already guess, Bodhi is from that part of India, the West Bengal, where they've done all that work. (laughs) So the West Bengali community in India has been really doing this amazing work, and this needs to go to other parts of India which need more capacity development, which need more trainings, also more coordination in India. And, okay, I would like to end this with how you can help in identifying some of the knowledge gaps and taking that conversation forward, which is not directly related with this topic. But there is a Wiki project, "Identifying knowledge gaps," you can join that and share your thoughts. We are also trying to use-- how can we use property P5008, which is on the focus list for a specific project-- how we can use that to surface certain topics for contest or other events. And in the end, we'd like to thank you. Also, we'd like to thank Asaf and Mahir and Tito who are another two power users of Wikidata. We'd like to sincerely thank everyone. Thank you so much. (applause) Questions. (woman 1) Mark here says, "Hi." (laughs) (moderator) So we have only five minutes for questions and answers. There. There's a question there. (woman 1) Do I need the microphone? (woman 1) Thank you so much for your presentation. Is this census data-- what exactly kind of data is that, that you've been ingesting? It's not for individuals, is it? It's more like populations and stuff like that? It's population data, mainly. Demographic data. (woman 1) Are there any other things that have been asked in the census? (man 1) For village, gender-- (man 2) I was a little involved with that, so I remember what the data looks like. Per settlement in India, per village town. You have the total population, the masculine versus feminine population, the literate versus illiterate population. Within that, you have also a separation by gender, so you know how many illiterate males there are versus so many illiterate females there are. It's actually quite detailed. There are hundreds and hundreds of pieces of data per village. Only some of them have been modeled on Wikidata. Just, of course, no individual census data. (woman 1) Sometimes countries get weird. (woman 2) So I wanted to ask you about the label ingestion or the translations of labels you do. How did you do that? Do you use tools? How do you get people to add it in their native language and translate the labels. So, mostly TABernacle, and QuickStatements. Those we can use, QuickStatements. (woman 2) Alright. Cool. But also at the same time, like using labelathons as an activity to engage more and more people to do that activity. Asaf. The hero. (Asaf) A note on TABernacle. I just want to mention for anyone who may be not aware, all of us here use Wikidata-related tools which means all of us have used tools by Magnus, the amazing tool builder. I just wanted to point out that he's here at the conference. So if you haven't had a chance yet to thank him for his amazing work that enables so much impact-- do so today. I'm not sure he is into hugs, but you can just thank him. (laughs) (man 3) Was the skillshare working? What do you do? What are the results? So, the response is [still no]. But, yeah. We have five or six people have already requested, and we have completed those. (Satdeep) That's going on-- Like, we just need to surface the value of Wikidata. I think we haven't really been able to do that. Also, we haven't been able to connect with other projects that they are already doing, like, for example, Wikisource or Wikipedia. Like how we need to communicate that in a better way to the larger community who is contributing. It was just like getting up and creating a Wiki periodical. Like how do we involve them and bring them here. That's still a problem. And Bodhi is showing the census data. Bodhi, can you please explain? (Bodhi) So this is population data from the 2011 census, 5007 in 2001 in the census data. This is one village. So there are like 36,000 villages or 40,000 villages. This is the male population, female population, number of households, illiterate population with male, female, population qualifiers, literate population and illiterate populations, and so on. And this is the census code for 2001 and 2011. (woman 3) Okay. I just want to say that I loved your presentation, and I wanted to talk nearly about the same thing tomorrow, so it'll be great because tomorrow-- I will just [stay] watch from this one, so making my life easier. What I wanted to do or to talk about-- but I think the WikiProject you're starting on Wikidata will do that-- is all to engage people not working about India directly, but like I have tools, names, but I don't deal with Indian names because I am not sure I understand all there are on them, and I don't want to do something massively wrong, so better to be careful. But I just need to ask with someone who understand all the problems, and I can add an automated tool and deal with thousands upon thousands of items. And I think they are many, many tools already doing some automated description and things like that for which we don't actually need people every day, we just need like 10 minutes time for someone to tell me or to say family names in those languages, and then it just added to the tool. And you probably know [automated] description tool, but if you just ask the people who are using it massively to just add Indian languages, then you have all Wikidatans doing the same work for you, and actually, it is a problem. I am helping an African community build up their Wikidata in Wikipedia, so it's not the same problem, but nearly the same problem. And that's the problem we have which is actually bridging the gap between the biggest Wikidatans-- I am doing works in languages I don't know a word of, but it's this kind of adoption system, like I need a native speaker to tell me what I can do with all the problems on all the complicated cases. And everything that I can automate, I will automate. And it's just an idea, but do you think it will be like a good idea to create not so specific Wiki knowledge gap on Wikidata, but a matching system like, "Hey I am working on this subject, do you want to ask me for that?" - Like, yeah, a matching tool, like to-- - Connect people. - (woman 3) To connect people across languages. Yeah. So that was my idea because I think some of the African communities I am helping, would really, really love what you're doing, but none of them speak Indian, and we just need to have pivot people to create the link and make all this even more powerful. And I really, really love what you're doing. So thank you. Thank you so much. Thanks to Bodhi for all the awesome work. (laughs) And the larger Indian community. But that's a really good idea, I think we should take that up. As a movement, we have not been doing the sharing thing pretty good. We need to figure out how to do that. Because there are awesome tools, one is built, but the others don't know about. That's a larger problem, and that's a piece that fits into the larger problem. We should be solving someplace. Let's figure out where we can do that. Thank you. (applause)