WEBVTT 00:00:05.929 --> 00:00:09.349 Hi, I am Satdeep. I work with the Foundation in Ben's team. 00:00:10.315 --> 00:00:12.851 Here's my friend from India, Bodhi. 00:00:12.851 --> 00:00:15.461 He's working with the Centre for Internet and Society, 00:00:15.461 --> 00:00:18.541 but he's here in his volunteer capacity. 00:00:19.523 --> 00:00:24.769 So, we're going to talk about knowledge gaps and Wikidata today. 00:00:25.340 --> 00:00:27.080 So what are knowledge gaps? 00:00:27.651 --> 00:00:31.651 As the name suggests, it's a gap in our existent knowledge. 00:00:31.651 --> 00:00:36.421 But in terms of Wikidata, we're looking at knowledge gaps 00:00:36.421 --> 00:00:38.261 in two different aspects. 00:00:38.261 --> 00:00:43.101 One is, how can Wikidata help us in filling the knowledge gaps 00:00:43.101 --> 00:00:45.141 in other Wikimedia projects? 00:00:45.141 --> 00:00:50.253 And the second is, how do we fill the knowledge gaps within Wikidata? 00:00:52.287 --> 00:00:57.187 For the first one, "Filling knowledge gaps with Wikidata." 00:00:57.187 --> 00:00:58.887 Wikidata is helping in a number of ways 00:00:58.887 --> 00:01:01.177 in filling knowledge gaps on different Wikimedia projects, 00:01:01.177 --> 00:01:02.767 for example, ArticlePlaceholder, 00:01:02.767 --> 00:01:06.147 or another tool called Scribe is being built, 00:01:06.147 --> 00:01:08.908 Wikidata Infoboxes, all of them are-- 00:01:08.908 --> 00:01:09.908 (audience reacts) 00:01:09.908 --> 00:01:14.218 Yes, there was a session about it early this morning or in the afternoon. 00:01:14.713 --> 00:01:18.713 And there are also a lot of different templates 00:01:19.700 --> 00:01:22.080 which use Wikidata. 00:01:22.429 --> 00:01:28.079 And then there are new templates called [inaudible], 00:01:28.773 --> 00:01:34.242 which along with this here are used to make lists like these. 00:01:34.242 --> 00:01:37.342 And if you click on one of the topics on this list, 00:01:37.702 --> 00:01:39.522 you get this draft article. 00:01:39.522 --> 00:01:42.988 There was a presentation about this in this same room by [inaudible]. 00:01:42.988 --> 00:01:46.792 So you get a draft article with some sentences 00:01:46.792 --> 00:01:50.152 and the infoboxes from Wikidata. 00:01:51.194 --> 00:01:54.988 But this is not what we're going to talk about here today. 00:01:55.705 --> 00:01:59.275 We're going to talk about how, in India, 00:01:59.275 --> 00:02:03.203 we first have to fill the knowledge gaps within Wikidata, 00:02:03.203 --> 00:02:05.563 so then we can do all these amazing things. 00:02:05.563 --> 00:02:09.223 So there are knowledge gaps in localization. 00:02:09.223 --> 00:02:12.323 We need to add a lot more labels in different languages. 00:02:13.256 --> 00:02:16.306 There needs to build local data about local places, people, 00:02:16.306 --> 00:02:18.576 so that we can do all those awesome things. 00:02:18.576 --> 00:02:23.188 But the main aspect of there is to build community capacity 00:02:23.188 --> 00:02:24.858 to do all that stuff. 00:02:24.858 --> 00:02:28.888 So, that's where we come to The Indic Case Study, 00:02:28.888 --> 00:02:30.648 which this is all about. 00:02:30.648 --> 00:02:33.238 And how did it all start? 00:02:33.618 --> 00:02:37.133 There is a person sitting right there, Asaf. 00:02:37.133 --> 00:02:39.610 He is responsible for all this-- 00:02:40.463 --> 00:02:42.826 for bringing Wikidata to India. 00:02:43.528 --> 00:02:48.127 So there was the first community capacity development training 00:02:49.376 --> 00:02:53.086 with the Tamil community in 2016, where he introduced Wikidata. 00:02:53.086 --> 00:02:58.117 And then there was like a bunch of Wikidatans, super users 00:02:58.117 --> 00:03:00.447 who started contributing to Wikidata. 00:03:00.447 --> 00:03:04.970 And then, in 2017, on both our requests, 00:03:04.970 --> 00:03:07.940 he came to India again and did 00:03:07.940 --> 00:03:09.697 (laughs) Wiki-a-Tra-- 00:03:09.697 --> 00:03:12.709 it's like Wiki travel in India. 00:03:12.709 --> 00:03:16.366 He did that, he went to seven different cities, 00:03:16.366 --> 00:03:19.419 seven different communities at least, in India, 00:03:19.419 --> 00:03:22.772 where he did Wikidata workshops, 00:03:23.192 --> 00:03:26.052 mostly two-days workshops in all those places. 00:03:26.052 --> 00:03:30.277 And then, in 2018, again, an Advanced Wikidata workshop. 00:03:30.277 --> 00:03:34.737 And that has actually helped in building some sort of Wikidata community 00:03:34.737 --> 00:03:36.709 around India. 00:03:39.848 --> 00:03:42.498 That also got the community engaged, 00:03:42.498 --> 00:03:45.258 and then we started building WikiProject India, 00:03:45.258 --> 00:03:47.328 and then some other projects related to that, 00:03:47.328 --> 00:03:51.468 such as WikiProject West Bengal, Indian Railways, and Kerala, 00:03:51.468 --> 00:03:53.748 which are like some specifics regions in India 00:03:53.748 --> 00:03:56.038 where the community has been trying to engage themselves 00:03:56.038 --> 00:03:58.338 and doing some work around it. 00:03:58.338 --> 00:04:02.868 And then there have been some more initiatives to engage newbies 00:04:02.868 --> 00:04:07.598 such as edit-a-thons, or labelathons, datathons, 00:04:08.655 --> 00:04:11.925 with which we've been trying to get more and more people involved. 00:04:11.925 --> 00:04:14.895 And some initiatives around education, 00:04:14.895 --> 00:04:18.455 workshops in education institutions-- Asaf also did one of those. 00:04:20.697 --> 00:04:22.252 Yeah. Next, Bodhi. 00:04:23.042 --> 00:04:26.432 So, there have been so many workshops in India, 00:04:26.432 --> 00:04:30.361 throughout all of India from 2017 to 2019. 00:04:30.361 --> 00:04:33.121 And we're also trying to engage, as Satdeep said, 00:04:33.121 --> 00:04:35.431 we are trying to engage the newbies in different ways. 00:04:35.431 --> 00:04:40.955 But still, the number of power users are not very much in India. 00:04:41.915 --> 00:04:48.166 Only very few, maybe five or six people are doing the heavy-duty work. 00:04:48.975 --> 00:04:51.795 So one of the reasons for that: 00:04:52.891 --> 00:04:58.961 mostly the Wikimedia community is focused in India on other projects, 00:04:58.961 --> 00:05:04.183 mostly in Wikipedia and somehow, right now, in Wikisource. 00:05:04.183 --> 00:05:08.344 So, there are very few editors who are-- 00:05:08.344 --> 00:05:09.644 very few active editors 00:05:09.644 --> 00:05:12.344 who are contributing to Wikidata regularly. 00:05:13.264 --> 00:05:15.506 India is a multilingual country, 00:05:15.506 --> 00:05:18.914 so there are around 22 Wikimedia projects 00:05:18.914 --> 00:05:20.203 running in India. 00:05:20.203 --> 00:05:22.935 So the workforces are totally divided. 00:05:23.630 --> 00:05:28.530 So, we don't have a focused group of people 00:05:28.530 --> 00:05:31.439 who are working on specific areas of Wikidata 00:05:31.439 --> 00:05:34.540 because they are so much divided into different projects, 00:05:34.540 --> 00:05:39.090 that we have to engage-- we're trying to actively engage them 00:05:39.090 --> 00:05:40.442 in different ways. 00:05:40.442 --> 00:05:42.013 And they are spread over a vast region, 00:05:42.013 --> 00:05:44.292 India is the seventh largest country in the world, 00:05:44.292 --> 00:05:48.462 and so it's quite difficult to coordinate the intercommunity, 00:05:48.462 --> 00:05:53.882 the 22 languages communities to work on only one project. 00:05:56.246 --> 00:05:59.898 So, we have adopted a different approach. 00:05:59.898 --> 00:06:02.868 Firstly, we're targeting the data gaps, 00:06:02.868 --> 00:06:06.736 which is easy because there are huge data gaps in India 00:06:07.478 --> 00:06:10.209 on every topic, almost every topic. 00:06:10.209 --> 00:06:11.486 And... 00:06:12.681 --> 00:06:14.356 (chuckles) 00:06:14.356 --> 00:06:15.598 ...start locally. 00:06:16.358 --> 00:06:17.770 Sorry. (laughs) 00:06:17.770 --> 00:06:21.568 - So, it's 1, 1, 1-- - Everything is a priority! 00:06:21.568 --> 00:06:23.678 (laughter) 00:06:25.368 --> 00:06:28.028 Anyway. So we start locally. 00:06:28.766 --> 00:06:34.666 So we have thought that intercountry-- 00:06:34.666 --> 00:06:37.727 the data ingestion of intercountries is quite difficult. 00:06:37.727 --> 00:06:41.087 And there are huge databases for India, 00:06:41.087 --> 00:06:44.596 for example, the science databases, the election databases. 00:06:44.596 --> 00:06:48.916 And if we work on the intercountry, 00:06:48.916 --> 00:06:54.369 then it'd be really impossible for five or six heavy-duty users. 00:06:54.369 --> 00:06:58.009 So we target one place at a time. 00:06:58.009 --> 00:07:01.779 So that is the map of India, 00:07:01.779 --> 00:07:06.315 and you can see the bright pink color that is West Bengal. 00:07:06.315 --> 00:07:12.243 So in October 2018 to May 2019, many things happened there. 00:07:12.243 --> 00:07:17.288 So lots of data were ingested in that part. 00:07:17.288 --> 00:07:20.928 And after this map was generated, 00:07:20.928 --> 00:07:24.184 there is a tool for that called Wikidata Analysis-- 00:07:24.184 --> 00:07:26.314 built by [inaudible], user: [inaudible]. 00:07:26.314 --> 00:07:32.023 And after we got this map, 00:07:32.973 --> 00:07:36.323 we shared this with other communities. 00:07:36.323 --> 00:07:39.603 That "We have done this for West Bengal, you can do it for your country. 00:07:39.603 --> 00:07:41.043 And this is really cool." 00:07:41.043 --> 00:07:43.983 And people have started working-- 00:07:44.855 --> 00:07:46.476 that was a direct effect. 00:07:46.476 --> 00:07:50.656 WikiProject Kerala was built just at that time, 00:07:50.656 --> 00:07:54.656 and they started working on the schools of India-- 00:07:54.656 --> 00:07:58.048 schools of Kerala-- and Kerala is situated right here-- 00:07:58.048 --> 00:08:01.086 and I couldn't [locate] that in the map right now 00:08:01.086 --> 00:08:05.796 because the tool is right now down. 00:08:07.316 --> 00:08:10.280 So we just started locally. 00:08:11.245 --> 00:08:14.525 We're trying to inspire people from other parts of the country 00:08:14.525 --> 00:08:15.729 to contribute. 00:08:16.833 --> 00:08:19.962 And that's what happened in West Bengal, 00:08:19.962 --> 00:08:25.431 around 40,000 villages with 2001 and 2011 census. 00:08:25.431 --> 00:08:28.860 Our data was ingested-- that's complete data. 00:08:28.860 --> 00:08:30.639 Almost complete data 00:08:30.639 --> 00:08:33.869 which could have been ingested in Wikidata. 00:08:34.435 --> 00:08:38.677 And there were 11,000 government hospitals with coordinates 00:08:38.677 --> 00:08:40.317 which were ingested, 00:08:40.317 --> 00:08:45.260 and there was [inaudible] approach to close to 1 million Bengali labels. 00:08:46.191 --> 00:08:47.261 And so on. 00:08:47.261 --> 00:08:50.668 There were many things happening, but these were the things 00:08:50.668 --> 00:08:53.843 which we've done in West Bengal at that time. 00:08:53.843 --> 00:08:58.243 So we also tried to create cool visualizations 00:08:58.243 --> 00:09:00.261 from those works we've done 00:09:00.261 --> 00:09:02.929 because census and elections, these are boring data. 00:09:02.929 --> 00:09:07.429 These are not paintings, and also so we cannot-- 00:09:07.429 --> 00:09:11.239 like these are also not GLAM data and other things. 00:09:11.239 --> 00:09:13.029 So these are boring data. 00:09:13.029 --> 00:09:19.460 So we need to find some way to make it interesting for people. 00:09:19.460 --> 00:09:22.790 So, we have tried some cool queries. 00:09:22.790 --> 00:09:24.897 This is one of them. There are many others. 00:09:25.321 --> 00:09:27.423 So this is the population growth in West Bengal 00:09:27.423 --> 00:09:31.651 between our villages-- around 36,000 villages 00:09:31.651 --> 00:09:34.037 between 2001 and 2011. 00:09:34.658 --> 00:09:37.778 And not only villages, we have uploaded census data 00:09:37.778 --> 00:09:42.547 about every administrative hierarchy, 00:09:42.547 --> 00:09:48.353 like community developing blocks, districts, municipalities, wards, etc., 00:09:48.353 --> 00:09:49.575 cities, towns. 00:09:51.669 --> 00:09:56.893 This is a new tool, InteGraality, 00:09:57.722 --> 00:10:00.292 and you can see 00:10:00.292 --> 00:10:05.487 that this is a count of hospitals 00:10:06.970 --> 00:10:08.160 in the world, 00:10:08.160 --> 00:10:11.310 and India is right now leading in Wikidata-- 00:10:11.310 --> 00:10:15.500 13,466 hospitals. 00:10:18.168 --> 00:10:21.218 The blue colors are the data completeness. 00:10:22.491 --> 00:10:29.539 But the funny thing is-- it's only one area of India. 00:10:29.539 --> 00:10:30.995 It's West Bengal, 00:10:30.995 --> 00:10:33.759 there are 11,642 hospitals right now. 00:10:33.759 --> 00:10:38.168 So if we complete all these steps and there are more-- 00:10:40.130 --> 00:10:41.740 if we complete all those steps, 00:10:41.740 --> 00:10:45.280 there will be a huge amount of data about hospitals 00:10:45.280 --> 00:10:48.030 with coordinates which will be there in Wikidata, 00:10:48.030 --> 00:10:54.916 and we have a plan to build an app based on that data, 00:10:55.934 --> 00:10:59.134 so that when a person gets ill, 00:11:02.801 --> 00:11:04.591 using that app, he may find 00:11:04.591 --> 00:11:08.718 the nearest location of the hospitals. 00:11:15.268 --> 00:11:19.268 So these hospitals are ranging from Primary Health Centers 00:11:19.268 --> 00:11:20.966 to [inaudible] Health Cares, 00:11:21.941 --> 00:11:26.881 with all sorts of facilities available for them. 00:11:26.881 --> 00:11:31.191 So we've tried to ingest all those data in Wikidata, 00:11:31.191 --> 00:11:32.541 if possible. 00:11:33.251 --> 00:11:37.390 And after completing this task, if we build some app, 00:11:37.390 --> 00:11:41.689 then maybe someone, a sick person in a dying urgency 00:11:41.689 --> 00:11:45.366 can find the nearest government hospital. 00:11:48.795 --> 00:11:52.255 - This is another-- - (Satdeep) Go back. 00:11:52.255 --> 00:11:53.743 (Bodhi) Oh, sorry. 00:11:54.914 --> 00:12:01.573 Okay. So this is the work which was done for Indian Railways. 00:12:01.573 --> 00:12:04.188 It was started there, also from West Bengal. 00:12:04.737 --> 00:12:08.994 And you can check the color-- 00:12:08.994 --> 00:12:11.094 the blue color is more complete data 00:12:11.094 --> 00:12:14.818 and the green color is slightly not complete, 00:12:14.818 --> 00:12:18.368 but it's going to get completed soon. 00:12:18.626 --> 00:12:22.863 And there are right now, 9,000 Indian railway stations 00:12:22.863 --> 00:12:25.764 with coordinates, obviously, because they are on the map. 00:12:25.764 --> 00:12:29.504 Right now, they're being connected with Pakistan and Bangladesh railways. 00:12:29.504 --> 00:12:34.682 So we have a plan to connect all Asian railways one day-- 00:12:34.682 --> 00:12:35.992 someday, maybe. 00:12:35.992 --> 00:12:37.042 (laughs) 00:12:37.042 --> 00:12:38.760 But, yeah, we'll do it. 00:12:39.334 --> 00:12:44.314 Anyway. So, right now on the table, 00:12:44.314 --> 00:12:48.854 we are in the second position after Japan, obviously. 00:12:49.947 --> 00:12:53.477 And-- yeah. So this is another cool query. 00:12:54.012 --> 00:12:59.487 Visualization showing the flight connections-- 00:12:59.726 --> 00:13:01.968 international and domestic flight connections from India, 00:13:01.968 --> 00:13:03.146 to and from India. 00:13:03.146 --> 00:13:05.803 So it's like kind of messy, but we can filter it 00:13:05.803 --> 00:13:09.573 for domestic connections or international connections. 00:13:09.573 --> 00:13:11.053 So, anyway. 00:13:13.061 --> 00:13:14.641 We have also completed 00:13:14.641 --> 00:13:18.141 everything about 2014 Indian General Election data. 00:13:18.141 --> 00:13:22.914 India general election is a kind of complex state of data 00:13:22.914 --> 00:13:25.314 because there are so many political parties, 00:13:25.314 --> 00:13:27.646 so many election-- not like a two party elections. 00:13:28.617 --> 00:13:32.407 So there were 6,000 political parties which participate in Indian-- 00:13:33.064 --> 00:13:36.449 I think 600 or something. 00:13:36.449 --> 00:13:38.826 So, anyway. 00:13:38.826 --> 00:13:40.336 So, yeah. 00:13:41.461 --> 00:13:44.531 And there were so many candidates, you can imagine. 00:13:45.239 --> 00:13:48.949 And some of them have the same name. 00:13:49.522 --> 00:13:51.195 Like in one constituency, 00:13:51.195 --> 00:13:53.365 there was like three people with the same name. 00:13:53.365 --> 00:13:54.445 (laughs) 00:13:54.445 --> 00:13:56.633 So that was like a funny thing. 00:13:58.456 --> 00:14:04.213 But we completed those data-- uploading those data in Wikidata. 00:14:04.213 --> 00:14:07.501 Right now, only 24 Indian general elections have been done. 00:14:07.501 --> 00:14:13.324 We don't have much users in Wikidata-- heavy-duty users in Wikidata in India. 00:14:13.805 --> 00:14:18.624 So currently we're uploading geoshape files of the constituencies. 00:14:19.036 --> 00:14:23.884 In West Bengal, we have already uploaded 43 constituencies, 00:14:23.884 --> 00:14:28.519 geoshape files of the constituencies, and also the [inaudible]. 00:14:28.519 --> 00:14:31.539 There is another part of India that has not been done, 00:14:31.539 --> 00:14:33.569 so when it will be completed then-- 00:14:33.569 --> 00:14:34.860 when it'll be-- 00:14:35.185 --> 00:14:38.861 when we upload other election that are-- 00:14:38.861 --> 00:14:42.529 like 2009 or before that, 00:14:42.529 --> 00:14:45.588 we'll create cool animations. 00:14:45.588 --> 00:14:51.190 That's showing how the voters have changed their minds 00:14:51.190 --> 00:14:54.953 from like centrist to rightist or leftist to rightist, anyway. 00:14:54.953 --> 00:14:58.668 So in the pipeline, there are schools, 00:14:58.668 --> 00:15:02.818 bank branches, post offices, geoshapes, elections, and many more. 00:15:04.124 --> 00:15:06.164 - (man 1) Cinema. - Cinema, yeah. 00:15:06.164 --> 00:15:07.224 (laughs) 00:15:07.224 --> 00:15:09.039 - Of course, cinema. - And monuments. 00:15:09.039 --> 00:15:10.529 And monuments. 00:15:11.211 --> 00:15:14.761 And most of them will be completed within a few months. 00:15:16.272 --> 00:15:21.872 And in a not so distant future, we'll try to upload weather data. 00:15:22.974 --> 00:15:27.764 There are not much good property for weather, right now, in Wikidata, 00:15:28.274 --> 00:15:31.110 that's why we're not touching it right now, 00:15:31.110 --> 00:15:32.448 but we'll do it. 00:15:32.862 --> 00:15:35.988 Also bibliographical data 00:15:36.415 --> 00:15:40.215 for Indian literature data are also very less in Wikidata. 00:15:41.485 --> 00:15:46.244 And there will be some institutional partnerships. 00:15:46.746 --> 00:15:48.606 There were some primary talks already, 00:15:48.606 --> 00:15:51.596 and maybe we'll have some good news in the future. 00:15:54.140 --> 00:15:55.550 So other ways to engage. 00:15:55.550 --> 00:16:00.013 We have created some subpages of WikiProject India. 00:16:00.406 --> 00:16:05.256 We have created a skillshare initiative-- started a skillshare initiative 00:16:05.256 --> 00:16:08.646 where people who have slightly more knowledge in Wikidata 00:16:08.646 --> 00:16:11.876 can share something with other people, 00:16:11.876 --> 00:16:16.756 on a one-to-one basis approaching online or offline way. 00:16:16.756 --> 00:16:20.073 We have also started a newsletter, a quarterly newsletter, 00:16:20.073 --> 00:16:25.739 the first issue has been published in October [2018], 00:16:26.431 --> 00:16:30.512 and we are showcasing cool visualizations in social media 00:16:30.512 --> 00:16:34.478 in Facebook and Twiter channels of Wikidata India, every day. 00:16:34.990 --> 00:16:38.420 So these are the links. 00:16:38.420 --> 00:16:40.286 You can find them there. 00:16:41.602 --> 00:16:43.502 Thank you so much for the... 00:16:45.483 --> 00:16:48.853 As most of you can already guess, 00:16:49.784 --> 00:16:54.514 Bodhi is from that part of India, the West Bengal, 00:16:54.514 --> 00:16:56.554 where they've done all that work. 00:16:57.040 --> 00:16:59.100 (laughs) 00:16:59.100 --> 00:17:04.752 So the West Bengali community in India has been really doing this amazing work, 00:17:04.752 --> 00:17:07.922 and this needs to go to other parts of India 00:17:07.922 --> 00:17:10.598 which need more capacity development, 00:17:10.598 --> 00:17:15.704 which need more trainings, also more coordination in India. 00:17:16.181 --> 00:17:19.591 And, okay, I would like to end this 00:17:19.591 --> 00:17:22.550 with how you can help in identifying some of the knowledge gaps 00:17:22.550 --> 00:17:24.765 and taking that conversation forward, 00:17:24.765 --> 00:17:26.685 which is not directly related with this topic. 00:17:26.685 --> 00:17:30.535 But there is a Wiki project, "Identifying knowledge gaps," 00:17:30.535 --> 00:17:34.225 you can join that and share your thoughts. 00:17:34.225 --> 00:17:38.859 We are also trying to use-- how can we use property P5008, 00:17:38.859 --> 00:17:42.749 which is on the focus list for a specific project-- 00:17:42.749 --> 00:17:48.149 how we can use that to surface certain topics for contest 00:17:48.149 --> 00:17:50.129 or other events. 00:17:50.898 --> 00:17:54.543 And in the end, we'd like to thank you. 00:17:55.248 --> 00:18:01.034 Also, we'd like to thank Asaf and Mahir and Tito 00:18:01.034 --> 00:18:05.941 who are another two power users of Wikidata. 00:18:05.941 --> 00:18:08.131 We'd like to sincerely thank everyone. 00:18:08.131 --> 00:18:09.731 Thank you so much. 00:18:09.731 --> 00:18:11.621 (applause) 00:18:15.014 --> 00:18:16.490 Questions. 00:18:17.003 --> 00:18:18.860 (woman 1) Mark here says, "Hi." 00:18:18.860 --> 00:18:20.090 (laughs) 00:18:20.090 --> 00:18:23.089 (moderator) So we have only five minutes for questions and answers. 00:18:28.457 --> 00:18:30.137 There. There's a question there. 00:18:30.137 --> 00:18:31.777 (woman 1) Do I need the microphone? 00:18:33.626 --> 00:18:36.062 (woman 1) Thank you so much for your presentation. 00:18:36.062 --> 00:18:39.248 Is this census data-- what exactly kind of data is that, 00:18:39.248 --> 00:18:40.498 that you've been ingesting? 00:18:40.498 --> 00:18:42.930 It's not for individuals, is it? 00:18:42.930 --> 00:18:45.733 It's more like populations and stuff like that? 00:18:46.148 --> 00:18:49.538 It's population data, mainly. Demographic data. 00:18:49.842 --> 00:18:52.952 (woman 1) Are there any other things that have been asked in the census? 00:18:56.347 --> 00:18:59.257 (man 1) For village, gender-- 00:19:00.921 --> 00:19:04.251 (man 2) I was a little involved with that, so I remember what the data looks like. 00:19:04.251 --> 00:19:07.603 Per settlement in India, per village town. 00:19:07.603 --> 00:19:11.926 You have the total population, the masculine versus feminine population, 00:19:12.481 --> 00:19:15.564 the literate versus illiterate population. 00:19:15.564 --> 00:19:18.944 Within that, you have also a separation by gender, 00:19:18.944 --> 00:19:20.964 so you know how many illiterate males there are 00:19:20.964 --> 00:19:23.093 versus so many illiterate females there are. 00:19:23.093 --> 00:19:24.603 It's actually quite detailed. 00:19:24.603 --> 00:19:28.923 There are hundreds and hundreds of pieces of data per village. 00:19:29.338 --> 00:19:33.228 Only some of them have been modeled on Wikidata. 00:19:36.937 --> 00:19:39.517 Just, of course, no individual census data. 00:19:41.990 --> 00:19:43.793 (woman 1) Sometimes countries get weird. 00:19:45.108 --> 00:19:48.628 (woman 2) So I wanted to ask you about the label ingestion 00:19:48.628 --> 00:19:50.868 or the translations of labels you do. 00:19:51.622 --> 00:19:53.932 How did you do that? Do you use tools? 00:19:53.932 --> 00:19:56.582 How do you get people to add it in their native language 00:19:56.582 --> 00:19:58.367 and translate the labels. 00:19:59.425 --> 00:20:02.797 So, mostly TABernacle, 00:20:03.638 --> 00:20:06.927 and QuickStatements. 00:20:06.927 --> 00:20:08.297 Those we can use, QuickStatements. 00:20:08.297 --> 00:20:10.497 (woman 2) Alright. Cool. 00:20:10.497 --> 00:20:13.823 But also at the same time, like using labelathons as an activity 00:20:13.823 --> 00:20:17.530 to engage more and more people to do that activity. 00:20:19.757 --> 00:20:21.234 Asaf. 00:20:21.234 --> 00:20:22.286 The hero. 00:20:24.698 --> 00:20:26.238 (Asaf) A note on TABernacle. 00:20:26.238 --> 00:20:29.266 I just want to mention for anyone who may be not aware, 00:20:30.349 --> 00:20:33.399 all of us here use Wikidata-related tools 00:20:33.399 --> 00:20:36.607 which means all of us have used tools by Magnus, 00:20:36.607 --> 00:20:38.297 the amazing tool builder. 00:20:38.297 --> 00:20:41.317 I just wanted to point out that he's here at the conference. 00:20:41.317 --> 00:20:43.067 So if you haven't had a chance yet 00:20:43.067 --> 00:20:47.780 to thank him for his amazing work that enables so much impact-- 00:20:47.780 --> 00:20:48.817 do so today. 00:20:48.817 --> 00:20:51.757 I'm not sure he is into hugs, but you can just thank him. 00:20:51.757 --> 00:20:53.407 (laughs) 00:21:05.300 --> 00:21:08.455 (man 3) Was the skillshare working? 00:21:08.455 --> 00:21:11.297 What do you do? What are the results? 00:21:11.635 --> 00:21:13.935 So, the response is [still no]. 00:21:15.471 --> 00:21:21.766 But, yeah. We have five or six people have already requested, 00:21:21.766 --> 00:21:23.386 and we have completed those. 00:21:24.404 --> 00:21:25.614 (Satdeep) That's going on-- 00:21:25.614 --> 00:21:29.726 Like, we just need to surface the value of Wikidata. 00:21:29.726 --> 00:21:31.911 I think we haven't really been able to do that. 00:21:31.911 --> 00:21:34.841 Also, we haven't been able to connect with other projects 00:21:34.841 --> 00:21:36.532 that they are already doing, 00:21:36.532 --> 00:21:38.602 like, for example, Wikisource or Wikipedia. 00:21:38.602 --> 00:21:42.402 Like how we need to communicate that in a better way 00:21:42.402 --> 00:21:45.312 to the larger community who is contributing. 00:21:45.312 --> 00:21:49.312 It was just like getting up and creating a Wiki periodical. 00:21:49.312 --> 00:21:52.102 Like how do we involve them and bring them here. 00:21:52.102 --> 00:21:53.886 That's still a problem. 00:21:54.378 --> 00:21:56.810 And Bodhi is showing the census data. 00:21:56.810 --> 00:21:58.574 Bodhi, can you please explain? 00:22:02.150 --> 00:22:07.213 (Bodhi) So this is population data from the 2011 census, 00:22:07.213 --> 00:22:12.486 5007 in 2001 in the census data. 00:22:13.126 --> 00:22:14.709 This is one village. 00:22:14.709 --> 00:22:17.824 So there are like 36,000 villages or 40,000 villages. 00:22:18.520 --> 00:22:21.022 This is the male population, female population, 00:22:21.829 --> 00:22:23.129 number of households, 00:22:23.129 --> 00:22:28.050 illiterate population with male, female, population qualifiers, 00:22:29.358 --> 00:22:31.648 literate population and illiterate populations, 00:22:31.648 --> 00:22:32.798 and so on. 00:22:32.798 --> 00:22:35.702 And this is the census code for 2001 and 2011. 00:22:40.319 --> 00:22:43.557 (woman 3) Okay. I just want to say that I loved your presentation, 00:22:43.557 --> 00:22:47.447 and I wanted to talk nearly about the same thing tomorrow, 00:22:47.447 --> 00:22:51.650 so it'll be great because tomorrow-- I will just [stay] watch from this one, 00:22:51.650 --> 00:22:54.230 so making my life easier. 00:22:55.544 --> 00:22:58.446 What I wanted to do or to talk about-- 00:22:58.446 --> 00:23:02.276 but I think the WikiProject you're starting on Wikidata 00:23:02.276 --> 00:23:03.671 will do that-- 00:23:03.671 --> 00:23:07.671 is all to engage people not working about India directly, 00:23:07.671 --> 00:23:12.607 but like I have tools, names, but I don't deal with Indian names 00:23:13.139 --> 00:23:17.139 because I am not sure I understand all there are on them, 00:23:17.139 --> 00:23:19.725 and I don't want to do something massively wrong, 00:23:19.725 --> 00:23:21.375 so better to be careful. 00:23:21.375 --> 00:23:26.624 But I just need to ask with someone who understand all the problems, 00:23:26.624 --> 00:23:28.984 and I can add an automated tool 00:23:28.984 --> 00:23:32.811 and deal with thousands upon thousands of items. 00:23:33.047 --> 00:23:35.727 And I think they are many, many tools 00:23:35.727 --> 00:23:39.687 already doing some automated description and things like that 00:23:39.687 --> 00:23:46.104 for which we don't actually need people every day, 00:23:46.104 --> 00:23:50.918 we just need like 10 minutes time for someone to tell me 00:23:50.918 --> 00:23:54.778 or to say family names in those languages, 00:23:54.778 --> 00:23:57.278 and then it just added to the tool. 00:23:57.278 --> 00:24:01.917 And you probably know [automated] description tool, 00:24:02.520 --> 00:24:06.750 but if you just ask the people who are using it massively 00:24:06.750 --> 00:24:08.810 to just add Indian languages, 00:24:08.810 --> 00:24:12.810 then you have all Wikidatans doing the same work for you, 00:24:12.810 --> 00:24:15.409 and actually, it is a problem. 00:24:16.969 --> 00:24:20.970 I am helping an African community build up their Wikidata 00:24:20.970 --> 00:24:23.819 in Wikipedia, so it's not the same problem, 00:24:23.819 --> 00:24:25.737 but nearly the same problem. 00:24:26.317 --> 00:24:29.347 And that's the problem we have 00:24:29.347 --> 00:24:32.257 which is actually bridging the gap 00:24:32.257 --> 00:24:35.119 between the biggest Wikidatans-- 00:24:35.813 --> 00:24:39.313 I am doing works in languages I don't know a word of, 00:24:40.843 --> 00:24:44.526 but it's this kind of adoption system, 00:24:44.526 --> 00:24:48.566 like I need a native speaker to tell me 00:24:48.566 --> 00:24:51.854 what I can do with all the problems on all the complicated cases. 00:24:51.854 --> 00:24:55.752 And everything that I can automate, I will automate. 00:24:55.752 --> 00:24:59.752 And it's just an idea, but do you think it will be like 00:24:59.752 --> 00:25:04.752 a good idea to create not so specific Wiki knowledge gap on Wikidata, 00:25:04.752 --> 00:25:09.462 but a matching system 00:25:09.462 --> 00:25:15.821 like, "Hey I am working on this subject, do you want to ask me for that?" 00:25:15.821 --> 00:25:20.521 - Like, yeah, a matching tool, like to-- - Connect people. 00:25:20.521 --> 00:25:22.980 - (woman 3) To connect people across languages. 00:25:24.119 --> 00:25:26.910 Yeah. So that was my idea because I think 00:25:27.253 --> 00:25:30.273 some of the African communities I am helping, 00:25:30.273 --> 00:25:33.177 would really, really love what you're doing, 00:25:33.177 --> 00:25:39.253 but none of them speak Indian, and we just need to have pivot people 00:25:39.746 --> 00:25:41.266 to create the link 00:25:41.523 --> 00:25:44.504 and make all this even more powerful. 00:25:44.504 --> 00:25:47.341 And I really, really love what you're doing. So thank you. 00:25:47.341 --> 00:25:48.551 Thank you so much. 00:25:48.551 --> 00:25:50.619 Thanks to Bodhi for all the awesome work. 00:25:51.341 --> 00:25:52.728 (laughs) 00:25:52.728 --> 00:25:54.498 And the larger Indian community. 00:25:54.498 --> 00:25:57.113 But that's a really good idea, I think we should take that up. 00:25:57.683 --> 00:26:03.090 As a movement, we have not been doing the sharing thing pretty good. 00:26:03.090 --> 00:26:04.840 We need to figure out how to do that. 00:26:04.840 --> 00:26:06.727 Because there are awesome tools, 00:26:06.727 --> 00:26:09.028 one is built, but the others don't know about. 00:26:09.028 --> 00:26:10.864 That's a larger problem, 00:26:10.864 --> 00:26:14.197 and that's a piece that fits into the larger problem. 00:26:14.197 --> 00:26:16.061 We should be solving someplace. 00:26:16.061 --> 00:26:18.050 Let's figure out where we can do that. 00:26:19.421 --> 00:26:20.867 Thank you. 00:26:20.867 --> 00:26:23.027 (applause)