0:00:06.303,0:00:07.362 (Lydia) Thank you so much. 0:00:07.362,0:00:11.244 So, this conference,[br]one of the big themes is languages. 0:00:14.220,0:00:18.508 I want to give you an overview[br]of where we actually are currently 0:00:18.508,0:00:19.812 when it comes to languages 0:00:20.264,0:00:22.167 and where we can go from here. 0:00:29.036,0:00:32.580 Wikidata is all about giving more people[br]more access to more knowledge, 0:00:32.580,0:00:37.168 and language is such an important part[br]of making that a reality, 0:00:38.205,0:00:43.291 especially since more and more[br]of our lives depends on technology. 0:00:44.114,0:00:48.873 And as our keynote speaker[br]earlier today was talking, 0:00:49.723,0:00:51.588 some of the technology[br]leaves people behind 0:00:51.588,0:00:55.020 simply because they can't speak[br]a certain language, 0:00:55.320,0:00:57.573 and that's not okay. 0:00:58.633,0:01:02.097 So we want to do something about that. 0:01:02.927,0:01:05.841 And in order to change that,[br]you need at least two things. 0:01:06.411,0:01:11.270 One is you need to provide content[br]to the people in their language, 0:01:11.270,0:01:12.955 and the second thing you need 0:01:12.955,0:01:15.910 is to provide them[br]with interaction in their language 0:01:15.910,0:01:19.189 in those applications[br]or whatever it is you have. 0:01:20.367,0:01:25.277 And Wikidata helps with both of those. 0:01:25.277,0:01:28.408 And the first thing,[br]content in your language, 0:01:28.408,0:01:30.879 that is basically what we have[br]in items and properties, 0:01:31.319,0:01:33.082 how we describe the world. 0:01:33.082,0:01:35.085 Now, this is certainly[br]not everything you need, 0:01:35.085,0:01:39.294 but it gets you quite far ahead. 0:01:39.764,0:01:41.847 The other thing[br]is interaction in your language, 0:01:41.847,0:01:46.389 and that's where lexemes come into play 0:01:46.389,0:01:49.382 If you want to talk[br]to your digital personal assistant 0:01:49.382,0:01:54.918 or if you want to have your device[br]translate a text and things like that. 0:01:56.404,0:01:59.254 Alright, let's look into[br]content in your language. 0:01:59.254,0:02:03.396 So what we have in items and properties. 0:02:05.406,0:02:09.696 For this, the labels in those items[br]and properties are crucial. 0:02:10.236,0:02:14.866 We need to know what this entity[br]is called that we're talking about. 0:02:15.656,0:02:19.987 And instead of talking about Q5, 0:02:19.987,0:02:22.180 someone who speaks English[br]knows that's a "human," 0:02:22.180,0:02:24.706 someone who speaks German[br]knows that's a "mensch," 0:02:24.706,0:02:26.374 and similar things. 0:02:26.374,0:02:29.742 So those labels on items and properties 0:02:29.742,0:02:33.619 are bridging the gap[br]between humans and machines. 0:02:33.619,0:02:35.439 And humans and humans 0:02:35.439,0:02:40.115 making more existing knowledge[br]accessible to them. 0:02:43.270,0:02:46.290 Now, that's a nice aspiration. 0:02:46.290,0:02:48.342 What does it actually look like? 0:02:48.342,0:02:49.607 It looks like this. 0:02:50.947,0:02:52.416 What you're seeing here 0:02:52.416,0:02:58.496 is that most of the items[br]on Wikidata have two labels, 0:02:58.496,0:03:00.767 so labels in two languages. 0:03:01.697,0:03:03.851 And after that, it's one, and then three, 0:03:03.851,0:03:06.115 and then it becomes very sad. 0:03:06.781,0:03:08.581 (quiet laughter) 0:03:10.047,0:03:12.713 I think we need to do better than this. 0:03:14.185,0:03:15.319 But, on the other hand, 0:03:15.319,0:03:17.478 I was actually expecting this[br]to be even worse. 0:03:17.478,0:03:19.560 I was expecting the average to be one. 0:03:19.560,0:03:22.503 So I was quite happy [br]to see two. (chuckles) 0:03:24.921,0:03:26.186 Alright. 0:03:27.156,0:03:29.527 But it's not just interesting to know 0:03:29.527,0:03:33.742 how many labels our items[br]and properties have. 0:03:33.742,0:03:36.565 It's also interesting to see[br]in which languages. 0:03:38.045,0:03:43.764 Here you see a graph of the languages 0:03:43.764,0:03:46.838 that we have labels for on Items. 0:03:46.838,0:03:50.669 So the biggest part there is Other. 0:03:51.229,0:03:53.863 So I just took the top 100 languages 0:03:54.533,0:03:58.902 and everything else is Other[br]to make this graph readable. 0:03:59.542,0:04:02.142 And then there's English and Dutch, 0:04:03.002,0:04:04.254 French, 0:04:05.924,0:04:09.129 and not to forget, Asturian. 0:04:09.659,0:04:11.889 - (person 1) Whoo![br]- Whoo-hoo, yes! 0:04:13.899,0:04:16.954 So what you see here is quite an imbalance 0:04:16.954,0:04:20.114 and still quite a lot of focus on English. 0:04:21.236,0:04:24.367 Another thing is if you look[br]at the same thing for Properties, 0:04:24.367,0:04:25.999 it's actually looking better. 0:04:27.399,0:04:32.750 And I think part of that constituted[br]just being way less properties. 0:04:32.750,0:04:36.770 So even smaller communities[br]have a chance to keep up with that. 0:04:36.770,0:04:39.173 But it's also a pretty[br]important part of Wikidata 0:04:39.173,0:04:41.159 to localize into your language. 0:04:41.159,0:04:42.384 So that's good. 0:04:45.752,0:04:47.842 What I want to highlight[br]here with Asturian 0:04:47.842,0:04:53.698 is that a small community[br]can really make a huge difference 0:04:54.448,0:04:57.085 with some dedication and work, 0:04:57.085,0:04:58.420 and that's really cool. 0:05:01.846,0:05:03.530 A small quiz for you. 0:05:03.530,0:05:05.493 If you take all the properties on Wikidata 0:05:05.493,0:05:07.687 that are not external identifiers, 0:05:07.687,0:05:10.358 which one has the most labels,[br]like the most languages? 0:05:10.977,0:05:13.847 (audience) [inaudible] 0:05:13.847,0:05:16.786 I hear some agreement on instance of? 0:05:17.506,0:05:19.443 You would be wrong. 0:05:19.983,0:05:22.210 It's image. (chuckles) 0:05:23.230,0:05:26.366 So, yeah, that tells you,[br]if you speak one of the languages 0:05:26.366,0:05:28.621 where instance of[br]doesn't yet have a label, 0:05:28.621,0:05:30.190 you might want to add it. 0:05:32.102,0:05:35.676 So it has 148 labels currently. 0:05:37.688,0:05:41.249 But that's just another slide. 0:05:42.631,0:05:44.162 This graph tells us something 0:05:44.162,0:05:49.321 about how much content we are making[br]available in a certain language 0:05:49.321,0:05:52.042 and how much of that content[br]is actually used. 0:05:52.042,0:05:55.448 So what you're seeing is basically a curve 0:05:55.448,0:06:00.987 with most content having English labels,[br]being available in English, 0:06:01.507,0:06:04.295 and being used a lot. 0:06:04.295,0:06:06.449 And then it kind of goes down. 0:06:06.449,0:06:09.436 But, again, what you can see are outliers 0:06:09.436,0:06:15.333 who have a lot more content[br]than you would necessarily expect, 0:06:16.903,0:06:19.539 and that is really, really good. 0:06:20.839,0:06:24.945 The problem still is it's not used a lot. 0:06:25.565,0:06:28.742 Asturian and Dutch should be higher, 0:06:28.742,0:06:31.994 and I think helping those communities 0:06:33.266,0:06:35.563 increase the use[br]of the data they collected 0:06:35.563,0:06:37.682 is a really useful thing to do. 0:06:42.910,0:06:48.110 What this analysis and others[br]showed us is also a good thing though 0:06:48.300,0:06:51.378 is that we are seeing[br]that highly used items 0:06:51.378,0:06:55.295 also tend to have more labels 0:06:55.295,0:06:58.188 or the other way around--[br]it's not entirely clear. 0:07:02.513,0:07:04.376 And then the question is, 0:07:04.806,0:07:07.009 are we serving[br]just the powerful languages? 0:07:07.899,0:07:11.147 Or are we serving everyone? 0:07:12.757,0:07:17.743 And what you see here[br]is a grouping of languages. 0:07:17.743,0:07:21.832 The languages that are grouped together[br]tend to have labels together. 0:07:26.042,0:07:28.599 And you see it clustering. 0:07:28.599,0:07:34.065 Now here's a similar clustering, colored, 0:07:34.065,0:07:39.475 based on how alive, how used, 0:07:40.455,0:07:43.156 how endangered the language is. 0:07:43.156,0:07:44.642 And a good thing you're seeing here 0:07:44.642,0:07:49.566 is that safe languages[br]and endangered languages 0:07:49.566,0:07:53.773 do not form two different clusters. 0:07:53.773,0:07:58.872 But they're all mixed together, 0:08:00.262,0:08:04.625 which is much better than it would be[br]the other way around 0:08:04.625,0:08:09.377 where the safe languages,[br]the powerful languages 0:08:10.197,0:08:12.164 are just helping each other out. 0:08:12.744,0:08:14.356 No, that's not the case. 0:08:14.356,0:08:17.417 And it's a really good thing. 0:08:17.417,0:08:20.042 When I saw this, [br]I thought this was very good. 0:08:23.474,0:08:25.169 Here's a similar thing 0:08:26.239,0:08:28.800 where we looked at 0:08:30.230,0:08:34.222 the languages' status 0:08:34.222,0:08:36.225 and how many labels it has. 0:08:39.367,0:08:42.937 What you're seeing[br]is a clear win for safe languages, 0:08:42.937,0:08:44.248 as is expected. 0:08:45.508,0:08:46.693 But what you're also seeing 0:08:46.693,0:08:54.407 is that the languages in category 2[br]and 3 and maybe even 4 0:08:54.407,0:08:59.280 are not that bad, actually, 0:08:59.280,0:09:02.367 in terms of their representation[br]in Wikidata and others. 0:09:03.287,0:09:06.408 It's a really good thing to find. 0:09:07.646,0:09:09.129 Now, if you look at the same thing 0:09:09.129,0:09:12.418 for how much of that content[br]of those labels 0:09:12.418,0:09:15.495 is actually used[br]on Wikipedia, for example, 0:09:17.455,0:09:22.563 then we see a similar[br]picture emerging again. 0:09:23.603,0:09:29.813 And it tells us that those communities[br]are actually making good use of their time 0:09:29.813,0:09:34.504 by filling in labels[br]for higher used items, for example. 0:09:36.410,0:09:40.493 There are outliers[br]where I think we can help, 0:09:41.683,0:09:48.202 to help those communities find the places[br]where their work would be most valuable. 0:09:49.312,0:09:52.663 But, overall, I'm happy with this picture. 0:09:54.823,0:09:59.844 Now, that was the items[br]and properties part of Wikidata. 0:10:00.714,0:10:03.033 Now, let's look at interaction[br]in your languages. 0:10:03.033,0:10:05.203 So the lexeme parts of Wikidata 0:10:05.203,0:10:09.394 where we describe words[br]and their forms and their meanings. 0:10:10.167,0:10:13.301 We've been doing this now[br]since May last year, 0:10:16.461,0:10:19.127 and content has been growing. 0:10:20.114,0:10:22.149 You can see here in blue the lexemes, 0:10:22.149,0:10:25.938 and then in red, [br]the forms on those lexemes 0:10:25.938,0:10:29.910 and yellow, the senses[br]on those lexemes. 0:10:30.991,0:10:34.451 So some communities--[br]we'll get to that later-- 0:10:34.451,0:10:39.793 have spent a lot of time creating forms[br]and senses for their lexemes, 0:10:39.793,0:10:42.753 which is really useful 0:10:42.753,0:10:48.243 because that builds[br]the core of the data set that you need. 0:10:50.562,0:10:55.133 Now, we looked at all the languages 0:10:55.133,0:10:57.906 that have lexemes on Wikidata. 0:10:57.906,0:11:01.003 So words we have, 0:11:01.713,0:11:04.404 those are right now 310 languages. 0:11:04.884,0:11:08.290 Now, what do you think is the top language 0:11:08.290,0:11:11.949 when it comes to the number[br]of lexemes currently in Wikidata? 0:11:12.933,0:11:14.700 (audience) [inaudible] 0:11:19.183,0:11:20.216 Huh? 0:11:20.216,0:11:21.741 (person 2) German. 0:11:21.741,0:11:24.252 Sorry, I've heard it before. 0:11:24.252,0:11:25.651 It's Russian. 0:11:28.011,0:11:29.754 Russian is quite ahead. 0:11:31.897,0:11:33.832 And just to give you some perspective, 0:11:35.652,0:11:36.816 there's different opinions 0:11:36.816,0:11:42.231 but I've read, for example,[br]that 1,000 to 3,000 words 0:11:42.231,0:11:45.450 gets you to conversation level,[br]roughly, in another language, 0:11:45.450,0:11:49.461 and 4,000 to 10,000 words[br]to an advanced level. 0:11:51.591,0:11:55.282 So, we still have a bit to catch up there. 0:11:58.483,0:12:03.279 One thing I want you[br]to pay attention to is Basque here 0:12:03.279,0:12:07.744 with 10,000, roughly, lexemes. 0:12:09.244,0:12:13.003 Now, if you look at the number[br]of forms for those lexemes, 0:12:14.163,0:12:16.497 Basque is way up there, 0:12:18.257,0:12:20.006 which is really cool, 0:12:20.006,0:12:24.930 and you should go to a talk that explains[br]to you why that is the case. 0:12:27.341,0:12:31.175 Now, if you look at the number[br]of senses, so what do words mean, 0:12:32.015,0:12:35.081 Basque even gets to the top of the list. 0:12:35.081,0:12:37.102 I think that deserves an applause. 0:12:37.102,0:12:38.921 (applause) 0:12:45.678,0:12:47.118 Another short quiz. 0:12:47.118,0:12:50.181 What's the lexeme[br]with the most translations currently? 0:12:50.651,0:12:55.414 (audience) Cats, cats, [inaudible], [br]Douglas Adams, [inaudible] 0:12:56.766,0:13:00.014 All good guesses, but no. 0:13:01.012,0:13:04.137 It's this, the Russian word for "water." 0:13:09.571,0:13:12.253 Alright, so now we talked a lot 0:13:12.253,0:13:16.412 about how many lexemes,[br]forms, and senses we have, 0:13:16.412,0:13:20.493 but that's just one thing you need. 0:13:20.493,0:13:21.515 The other thing you need 0:13:21.515,0:13:25.161 is actually describing those lexemes,[br]forms, and senses 0:13:25.161,0:13:27.647 in a machine-readable way. 0:13:27.647,0:13:30.039 And for that you have statements,[br]like on items. 0:13:31.479,0:13:36.362 And one of the properties[br]you use is usage example. 0:13:36.362,0:13:38.582 So whoever is using that data 0:13:38.582,0:13:42.089 can understand how to use[br]that word in context, 0:13:42.089,0:13:44.158 so that could be a quote, for example. 0:13:45.396,0:13:47.113 And here, Polish rocks. 0:13:47.900,0:13:49.764 Good job, Polish speakers. 0:13:54.219,0:13:57.680 Another property[br]that's really useful is IPA, 0:13:57.680,0:14:00.186 so how do you pronounce this word. 0:14:00.876,0:14:07.497 Russian apparently needs[br]lots of IPA statements. 0:14:10.419,0:14:13.314 But, again, Polish, second. 0:14:17.148,0:14:20.753 And last but not least[br]we have pronunciation audio. 0:14:20.753,0:14:23.372 So that is links to files on Commons 0:14:23.372,0:14:25.959 where someone speaks the word, 0:14:25.959,0:14:29.913 so you can hear a native speaker[br]pronounce the word 0:14:29.913,0:14:32.871 in case you can't read IPA, for example. 0:14:34.959,0:14:39.205 And there's a really nice actually[br]Wiki-based powered project 0:14:39.205,0:14:40.474 called Lingua Libre 0:14:40.884,0:14:45.173 where you can go and help record[br]words in your language 0:14:45.173,0:14:47.836 that then can be added[br]to lexemes on Wikidata, 0:14:48.446,0:14:52.103 so other people can understand[br]how to pronounce your words. 0:14:53.663,0:14:55.694 (person 2) [inaudible] 0:14:55.694,0:14:57.665 If you search for "Lingua Libre," 0:14:57.665,0:15:00.981 and I'm sure someone can post it[br]in the Telegram channel. 0:15:03.138,0:15:04.621 Those guys rock. 0:15:04.621,0:15:06.726 They did really cool stuff with Wikibase. 0:15:09.416,0:15:10.617 Alright. 0:15:12.706,0:15:17.285 Then the question is,[br]where do we go from here? 0:15:19.165,0:15:22.010 Based on the numbers I've just shown you, 0:15:23.030,0:15:25.172 we've come a long way 0:15:25.172,0:15:28.430 towards giving more people[br]more access to more knowledge 0:15:28.430,0:15:31.240 when looking at languages on Wikidata. 0:15:32.530,0:15:36.392 But there is also still[br]a lot of work ahead of us. 0:15:38.992,0:15:42.341 Some of the things[br]you can do to help, for example, 0:15:42.341,0:15:44.921 is run label-a-thons 0:15:44.921,0:15:50.124 like get people together[br]to label items in Wikidata 0:15:50.914,0:15:55.121 or do an edit-a-thon[br]around lexemes in your language 0:15:55.121,0:15:59.212 to get the most used words[br]in your language into Wikidata. 0:16:00.773,0:16:03.285 Or you can use a tool like Terminator 0:16:03.285,0:16:08.493 that helps you find the most[br]important items in your language 0:16:08.493,0:16:11.549 that are still missing a label. 0:16:13.274,0:16:18.359 Most important being measured[br]by how often it is used 0:16:18.359,0:16:22.553 in other Wikidata items[br]as links in statements. 0:16:25.768,0:16:30.022 And, of course, for the lexeme part, 0:16:31.342,0:16:35.169 now that we've got[br]a basic coverage of those lexemes, 0:16:35.169,0:16:41.163 it's also about building them out,[br]adding more statements to them 0:16:41.163,0:16:44.401 so that they actually can build the base 0:16:44.401,0:16:47.421 for meaningful applications[br]to build on top of that. 0:16:48.141,0:16:50.795 Because we're getting closer[br]to that critical mass, 0:16:50.795,0:16:53.616 but we're still away from that, 0:16:53.616,0:16:56.624 that you can build[br]serious applications on top of it. 0:16:58.277,0:17:01.680 And I hope all of you[br]will join us in doing that. 0:17:02.583,0:17:07.103 And that already brings me 0:17:07.103,0:17:09.843 to a little help from our friends, 0:17:09.843,0:17:12.812 and Bruno, do you want to come over 0:17:13.882,0:17:16.854 and talk to us about lexical masks. 0:17:17.541,0:17:18.567 (Bruno) Thank you, Lydia, 0:17:18.567,0:17:21.519 thank you for giving me[br]this short period of time 0:17:21.519,0:17:24.150 to present this work[br]that we are doing at Google 0:17:24.150,0:17:29.635 Denny that most of you[br]probably have heard of or know. 0:17:30.126,0:17:32.030 Because at Google so I'm a linguist. 0:17:32.030,0:17:36.150 so I'm very happy to be here[br]amongst other language enthusiasts. 0:17:36.620,0:17:39.278 We are also building some lexicons, 0:17:39.278,0:17:41.766 and we have built this technology 0:17:41.766,0:17:45.589 or this approach that we think[br]can be useful for you. 0:17:46.369,0:17:48.455 Just to give you[br]a little bit of background, 0:17:48.455,0:17:52.068 this is my lexicographic[br]background talking here. 0:17:52.788,0:17:54.347 When we build a lexicon database, 0:17:54.347,0:17:58.623 there is a lot of hard time to maintain,[br]to keep them consistent 0:17:58.623,0:18:00.125 and to exchange data, 0:18:00.125,0:18:02.027 as you probably know. 0:18:02.517,0:18:05.927 There are several attempts[br]to unify the feature and the properties 0:18:05.927,0:18:09.184 that are describing[br]those lexemes and those forms, 0:18:09.184,0:18:10.936 and it's not a solved problem, 0:18:10.936,0:18:13.958 but there are some[br]unification attempts on that side. 0:18:13.958,0:18:15.209 But what is really missing-- 0:18:15.209,0:18:18.732 and this is a problem we had[br]at the beginning of our project at Google 0:18:18.732,0:18:21.607 is to try to have an internal structure 0:18:22.197,0:18:25.910 that describes how[br]a lexical entry should look like, 0:18:25.910,0:18:28.581 what kind of data[br]or what kind of information we have 0:18:28.581,0:18:32.237 and the specification that are expected. 0:18:32.237,0:18:38.187 So, this is what we came up[br]with this thing called lexicon mask. 0:18:38.897,0:18:44.841 A lexicon mask is describing[br]what is expected for an entry, 0:18:44.841,0:18:47.329 a lexicographic entry, to be complete, 0:18:47.329,0:18:51.436 both in terms of the number of forms[br]you expect for a lexeme, 0:18:51.436,0:18:55.607 and the number of features[br]you expect for each of those forms. 0:18:56.397,0:18:58.329 Here is an example for Italian adjectives. 0:18:58.329,0:19:02.002 You expect, in Italian, to have[br]four forms for your adjectives, 0:19:02.002,0:19:05.383 and each of these forms[br]have a specific combination 0:19:05.383,0:19:07.946 of gender and number features. 0:19:08.606,0:19:12.672 This is what we expect[br]for the Italian adjectives. 0:19:12.672,0:19:16.176 Of course, you can have[br]extremely complex masks, 0:19:16.176,0:19:20.783 like the French verbs conjugation,[br]which is quite extensive, 0:19:20.783,0:19:23.487 and I don't show you[br]any other Russian mask 0:19:23.487,0:19:25.378 because it doesn't fit the screen. 0:19:26.308,0:19:29.531 And we also have[br]some detailed specifications 0:19:29.531,0:19:33.421 because we distinguish[br]what is at the form level. 0:19:33.421,0:19:37.544 So here you have Russian nouns[br]that have three numbers 0:19:37.544,0:19:40.048 and a number of cases[br]with different forms, 0:19:40.048,0:19:43.086 but they also have[br]an entry level specification 0:19:43.086,0:19:45.590 that says a noun particularly has 0:19:45.590,0:19:50.133 an inherent gender[br]and an inherent animacy feature 0:19:50.133,0:19:52.488 that is also specified in the mask. 0:19:54.518,0:19:58.779 We also want to distinguish[br]that a mask gives a specification 0:19:58.779,0:20:01.874 for, in general,[br]what an entry should look like. 0:20:01.874,0:20:07.158 But you can have smaller masks[br]for defective aspects of the form 0:20:07.158,0:20:11.282 or defective aspects of the lexeme[br]that happen in language. 0:20:11.282,0:20:14.537 So here is the simplest version[br]of French verbs 0:20:14.537,0:20:19.729 that have only the 3rd person singular[br]for all the weather verbs, 0:20:19.729,0:20:23.969 like "it rains" or "it snows,"[br]like in English. 0:20:24.537,0:20:26.493 So we distinguish these two levels. 0:20:26.923,0:20:29.962 And how we use this at Google 0:20:29.962,0:20:32.643 is that when we have a lexicon[br]that we want to use, 0:20:33.063,0:20:38.309 we use the mask to really[br]literally throw the lexicons, 0:20:38.309,0:20:40.163 all the entries, through the mask 0:20:40.163,0:20:44.303 and see which entry has a problem[br]in terms of structure. 0:20:44.303,0:20:46.523 Are we missing a form?[br]Are we missing a feature? 0:20:46.523,0:20:51.497 And when there is a problem,[br]we do some human validation 0:20:51.497,0:20:53.751 or just to see if it passes the mask. 0:20:53.751,0:20:57.924 So it's an extremely powerful tool[br]to check the quality of the structure. 0:20:59.427,0:21:01.964 So what we are happy to announce today 0:21:01.964,0:21:05.408 is that we get the green light[br]to open source our mask. 0:21:05.948,0:21:07.573 So this is a schema. 0:21:07.573,0:21:09.477 If you want that, we can release 0:21:09.477,0:21:13.483 and that we will provide[br]to Wikidata as to ShEx files. 0:21:13.483,0:21:16.688 This is a ShEx file for German nouns, 0:21:16.688,0:21:20.428 and Denny is working on the conversion[br]from our internal specification 0:21:20.428,0:21:23.666 to a more open-source specification. 0:21:23.666,0:21:27.522 We currently cover more than 25 languages. 0:21:27.522,0:21:29.225 So we expect to grow on our side, 0:21:29.225,0:21:34.350 but we also look for this opportunity[br]to collaborate for other languages. 0:21:34.350,0:21:40.728 And one of the ongoing collaborations[br]also that Denny has with Lukas. 0:21:40.728,0:21:45.052 Lukas has these great tools to have a UI 0:21:45.052,0:21:51.061 to help the user or the contributor[br]to add more forms. 0:21:51.061,0:21:54.151 So if you want to add[br]an adjective in French, 0:21:54.151,0:21:59.057 the UI is telling you[br]how many forms are expected 0:21:59.057,0:22:01.562 and what kind of features[br]this form should have. 0:22:01.562,0:22:06.268 So our mask will help the tool[br]to be defined and expanded. 0:22:07.238,0:22:08.385 That's it. 0:22:08.791,0:22:10.358 (Lydia) Thank you so much. 0:22:10.358,0:22:11.993 (applause) 0:22:14.249,0:22:16.891 Alright. Are there questions? 0:22:16.891,0:22:19.381 Do you want to talk more about lexemes? 0:22:19.817,0:22:21.475 - (person 3) Yes.[br]- Yes. (chuckles) 0:22:33.485,0:22:35.380 (person 3) My question,[br]because you were talking 0:22:35.380,0:22:39.106 about giving more access[br]to more people in more languages. 0:22:39.106,0:22:42.444 But there are a lot of languages[br]that can't be used in Wikidata. 0:22:42.444,0:22:44.588 So what solution do you have for that? 0:22:45.889,0:22:47.686 When you say that can't use Wikidata, 0:22:47.686,0:22:50.308 are you talking about entering labels? 0:22:50.308,0:22:52.578 - (person 3) Labels, descriptions.[br]- Right. 0:22:52.578,0:22:55.498 So, for lexemes, it's a bit different 0:22:55.498,0:22:57.793 because there we don't have[br]that restriction. 0:22:58.923,0:23:05.003 For labels on items and properties,[br]there is some restriction 0:23:05.433,0:23:12.411 because we wanted to make sure[br]that it's not completely 0:23:12.411,0:23:14.229 anyone does anything, 0:23:14.229,0:23:17.769 and it becomes unmanageable. 0:23:19.349,0:23:23.328 Even a small community who wants[br]one language and wants to work on that, 0:23:23.898,0:23:26.787 come talk to us, we will make it happen. 0:23:26.787,0:23:29.202 (person 3) I mean, we did this[br]at the Prague Hackathon in May, 0:23:29.202,0:23:32.459 and it took us until almost August[br]in order to be able to use our language. 0:23:32.459,0:23:35.135 - Yeah.[br]- (person 3) So, it's very slow. 0:23:35.135,0:23:37.854 Yeah, it is, unfortunately, very slow. 0:23:37.854,0:23:39.883 We're currently working[br]with the language Committee 0:23:39.883,0:23:46.048 on solving some fundamental... 0:23:49.537,0:23:55.447 Like, getting agreement on what kind[br]of languages are actually "allowed," 0:23:56.047,0:23:59.398 and that has taken too long, 0:23:59.988,0:24:04.178 which is the reason why your request[br]probably took longer than it should have. 0:24:04.778,0:24:05.963 (person 3) Thanks. 0:24:06.815,0:24:07.950 (person 4) Thank you. 0:24:07.950,0:24:10.938 Lydia, if you remember[br]the statistics that you showed, 0:24:10.938,0:24:12.886 the number of lexemes per language. 0:24:12.886,0:24:17.599 So, did you count[br]all the forms as a data point 0:24:17.599,0:24:20.034 or only lexemes? 0:24:21.289,0:24:22.941 (Lydia) Do you mean this? 0:24:22.941,0:24:24.053 Which one do you mean? 0:24:24.053,0:24:25.529 (person 4) Yes, exactly. 0:24:25.797,0:24:28.341 If you remember,[br]does this number [inaudible] 0:24:28.341,0:24:31.954 all the forms for all the lexemes[br]or just how many lexemes there are? 0:24:31.954,0:24:33.585 No, this is just a number of lexemes. 0:24:33.585,0:24:35.395 (person 4) Just a number of lexemes, okay. 0:24:35.395,0:24:36.797 So then it is a just statistic 0:24:36.797,0:24:39.390 because if it would then[br]compose the forms-- 0:24:39.390,0:24:40.614 that's why I'm asking-- 0:24:40.614,0:24:42.817 then all the languages[br]with the inflectional morphology, 0:24:42.817,0:24:45.027 like Russian, Serbian,[br]Slovenian and et cetera, 0:24:45.027,0:24:47.616 they have a natural advantage[br]because they have so many. 0:24:47.616,0:24:51.990 So, this kind of kicks in here[br]on this number of forms. 0:24:51.990,0:24:53.851 (person 4) Yeah, that was this one. [br]Thank you. 0:24:56.546,0:25:00.224 (person 5) So, I had[br]a quick question about the... 0:25:00.644,0:25:06.824 When we're talking about[br]the actual items and properties. 0:25:07.124,0:25:08.901 Like as far as I understand, 0:25:08.901,0:25:11.955 there is currently no way[br]to give an actual source 0:25:11.955,0:25:14.726 to any of the labels[br]and descriptions that are given. 0:25:14.726,0:25:18.047 So, for example,[br]because when you're talking 0:25:18.047,0:25:20.920 about an item property, 0:25:20.920,0:25:24.509 like, for example,[br]you can get conflicting labels. 0:25:24.509,0:25:25.739 Yes. 0:25:25.739,0:25:27.662 (person 5) So this person is like... 0:25:28.402,0:25:30.781 We were talking about[br]indigenous things before, for example. 0:25:30.781,0:25:35.965 So this person is a Norwegian artist[br]according to this source, 0:25:35.965,0:25:38.750 and a Sami artist,[br]according to this source. 0:25:39.550,0:25:42.883 Or, for example, in Estonian,[br]we had an issue 0:25:42.883,0:25:47.729 where we had to change terminology[br]to the official use terminology 0:25:47.729,0:25:49.482 in official lexicons, 0:25:49.482,0:25:52.262 but we have no way to indicate really why, 0:25:52.262,0:25:53.596 like what was the source of this 0:25:53.596,0:25:55.561 and why this was better[br]and what was there before. 0:25:55.561,0:25:57.150 It was just me as a random person 0:25:57.150,0:25:59.615 just switching the thing[br]to anyone who sees it. 0:25:59.615,0:26:02.520 So is there a plan[br]to make this possible in any way 0:26:02.520,0:26:06.355 so that we can actually have[br]proper sources for the language data? 0:26:07.045,0:26:11.568 So, it is partially possible. 0:26:11.568,0:26:15.958 So, for example, when you have[br]an item for a person, 0:26:16.968,0:26:22.720 you have a statement, first name,[br]last name, and so on, of that person, 0:26:22.720,0:26:26.226 and then you can provide[br]the reference for that there. 0:26:28.211,0:26:32.544 I'm quite hesitant to add more complexity 0:26:32.544,0:26:35.557 for references on labels and descriptions, 0:26:35.557,0:26:38.624 but if people really, really think 0:26:38.624,0:26:44.939 this is something that isn't covered[br]by any reference on the statement, 0:26:44.939,0:26:46.803 then let's talk about it. 0:26:49.079,0:26:53.303 But I fear it will add a lot of complexity 0:26:53.303,0:26:56.523 for what I hope are few cases, 0:26:57.393,0:27:00.188 but I'm willing to be convinced otherwise 0:27:00.188,0:27:04.087 if people really feel[br]very strongly about this. 0:27:04.087,0:27:08.177 (person 5) I mean, if it's added[br]it probably shouldn't be the default, 0:27:08.177,0:27:12.452 show to all the users as a beginner,[br]interface, in any case. 0:27:12.452,0:27:16.190 More like, "Click here if you need to say[br]a specific thing about this." 0:27:17.632,0:27:23.368 Do we have a sense of how many times[br]that would actually matter? 0:27:24.520,0:27:26.423 (person 5) In Estonian, for example-- 0:27:26.423,0:27:28.844 I expect this is true[br]of other languages as well-- 0:27:29.274,0:27:34.203 for example, there is an official name[br]that is the actual legitimate translation, 0:27:34.203,0:27:36.206 for example, into English, 0:27:36.206,0:27:40.314 of, say, a specific kind of municipality. 0:27:40.614,0:27:42.182 That was my use case, for example, 0:27:42.182,0:27:44.409 where we were using the word "parish" 0:27:45.159,0:27:50.885 which the original Estonian word[br]was meant kind of like church parish, 0:27:50.885,0:27:51.899 and that was the origin, 0:27:51.899,0:27:54.809 but that's not the official translation[br]Estonia gets right now. 0:27:55.189,0:27:58.993 In this case, I would just add it[br]as official name statements 0:27:58.993,0:28:00.817 and add the reference there. 0:28:02.032,0:28:03.158 (person 5) Okay. 0:28:05.186,0:28:06.572 More questions, yes? 0:28:07.682,0:28:10.044 (person 6) I have two quick comments. 0:28:10.044,0:28:13.934 You specifically called out Asturian[br]as a language that does well, 0:28:13.934,0:28:16.455 and I think that's a false artifact. 0:28:16.455,0:28:17.724 Tell me about it. 0:28:17.724,0:28:19.748 (person 6) I think it's just a bot 0:28:19.748,0:28:24.068 that pasted person names,[br]like proper names, 0:28:24.068,0:28:27.172 and said, "Well, this is exactly[br]like in French or Spanish," 0:28:27.172,0:28:28.558 and just massively copied it. 0:28:28.558,0:28:33.316 One point of evidence is that[br]you don't see that energy in Asturian 0:28:33.316,0:28:37.205 in things that actually[br]require translation, like property names, 0:28:37.205,0:28:39.648 or names of items[br]that are not proper names. 0:28:39.648,0:28:41.219 Asaf, you break my heart. 0:28:41.219,0:28:43.198 (person 6) I know,[br]I like raining on parades, 0:28:43.198,0:28:48.458 but I have good news as well,[br]which is about the pronunciation numbers. 0:28:49.408,0:28:53.515 As you probably know,[br]Commons is full of pronunciation files, 0:28:53.515,0:28:54.668 and, for example, 0:28:54.668,0:29:01.102 Dutch has no less than 300,000[br]pronunciation files already on Commons 0:29:01.912,0:29:05.051 that just need to somehow be ingested. 0:29:05.051,0:29:07.697 So if anyone's looking for a side project, 0:29:07.697,0:29:08.997 there's tons and tons 0:29:08.997,0:29:13.280 of classified, categorized[br]pronunciation files on Commons 0:29:13.280,0:29:16.893 under the category[br]"Pronunciation" by language. 0:29:16.893,0:29:22.840 So that's just waiting to be matched[br]to lexemes and put on Lexeme. 0:29:23.180,0:29:25.484 And I was wondering[br]if you could say something 0:29:25.484,0:29:26.585 about the road map, 0:29:26.585,0:29:28.757 something about how much investment 0:29:28.757,0:29:31.995 or what can we expect[br]from Lexeme in the coming year, 0:29:31.995,0:29:34.020 because I, for one, can't wait. 0:29:34.949,0:29:37.044 You can't wait? (chuckles) 0:29:37.044,0:29:39.118 - (person 6) For more.[br]- Yes. (chuckles) 0:29:44.541,0:29:49.523 Right now, we're concentrating[br]more on Wikibase and data quality 0:29:51.493,0:29:55.087 to see how much traction this gets 0:29:55.087,0:30:01.676 and then getting more for feeding off[br]where the pain points are next, 0:30:01.676,0:30:06.003 and then going back to improving[br]lexicographical data further. 0:30:06.903,0:30:09.790 And one of the things[br]I'd love to hear from you 0:30:09.790,0:30:14.136 is where exactly do you see[br]the next steps, 0:30:14.136,0:30:15.966 where do you want to see improvements 0:30:15.966,0:30:20.340 so that we can then figure out[br]how to make that happen. 0:30:21.125,0:30:22.810 But, of course, you're right, 0:30:22.810,0:30:25.712 there's still so much to do[br]also on the technical side. 0:30:30.573,0:30:35.848 (person 7) Okay, as we were uploading[br]the Basque words with forms, 0:30:35.848,0:30:37.768 and you'll see some[br]of these kinds of things, 0:30:37.768,0:30:41.329 we were both like, last week we said,[br]"Oh, we are the first one in something." 0:30:42.919,0:30:44.928 It's It appears in press, and it's like, 0:30:44.928,0:30:49.488 "Oh, Basque are the first time in some--[br]they are the first in something, okay." 0:30:49.488,0:30:50.606 (laughs) 0:30:50.606,0:30:53.318 And then people ask,[br]"Okay, but what is this for?" 0:30:54.678,0:30:56.849 We don't have a real good answer. 0:30:56.849,0:30:57.888 I mean it's like, okay, 0:30:57.888,0:31:01.841 this will help computers[br]to understand more our language, yes, 0:31:01.841,0:31:05.279 but what kind of tools[br]can we make in the future? 0:31:05.279,0:31:07.467 And we don't have a good answer for this. 0:31:07.467,0:31:10.625 So I don't know[br]if you have a good answer for this. 0:31:10.625,0:31:12.742 (chuckles) I don't know[br]if I have a good answer, 0:31:12.742,0:31:14.746 but I have an answer. 0:31:15.480,0:31:20.425 So I think right now [br]as I was telling [inaudible], 0:31:20.425,0:31:21.924 we haven't reached that critical mass 0:31:21.924,0:31:25.529 where you can build a lot[br]of the really interesting tools. 0:31:25.529,0:31:27.707 But there are already some tools. 0:31:28.267,0:31:31.912 Just the other day,[br]Esther [Pandelia], for example, 0:31:31.912,0:31:33.817 released a tool where you can see, 0:31:35.837,0:31:38.889 I think it was the words on a globe 0:31:38.889,0:31:41.901 where they're spoken,[br]where they're coming from. 0:31:42.631,0:31:44.090 I'm probably wrong about this, 0:31:44.090,0:31:46.346 but she had answered[br]on the Project chat on Wikidata-- 0:31:46.346,0:31:48.984 you can look it up there. 0:31:49.574,0:31:51.805 So we have seen these first tools, 0:31:51.805,0:31:55.696 just like we've seen[br]back when Wikidata started. 0:31:56.846,0:31:59.602 First some--like just a network, 0:31:59.602,0:32:03.424 and like, "Hey, look, there's this thing[br]that connects to this other thing." 0:32:04.824,0:32:07.059 And as we have more data, 0:32:07.059,0:32:10.352 and as we've reached some critical mass, 0:32:11.852,0:32:14.747 more powerful applications[br]become possible, 0:32:15.677,0:32:17.516 things like Histropedia, 0:32:19.126,0:32:21.988 things like question and answering 0:32:21.988,0:32:26.663 in your digital personal assistant,[br]Platypus, and so on. 0:32:26.663,0:32:29.668 And we're seeing[br]a similar thing with lexemes. 0:32:31.198,0:32:34.650 We're at the stage[br]where you can build like these little, 0:32:34.650,0:32:37.464 hey, look, there's a connection[br]between the two things, 0:32:37.864,0:32:42.738 and there's a translation[br]of this word into that language stage, 0:32:42.738,0:32:47.747 and as we build it out[br]and as we describe more words, 0:32:47.747,0:32:49.533 more becomes possible. 0:32:49.533,0:32:51.795 Now, what becomes possible? 0:32:53.482,0:32:59.483 As Ben, our keynote speaker earlier[br]was talking about translations, 0:33:00.103,0:33:03.455 being able to translate[br]from one language to another. 0:33:03.455,0:33:07.929 And Jens, my colleague,[br]he's always talking about 0:33:07.929,0:33:11.452 the European Union[br]looking for a translator 0:33:11.452,0:33:17.439 who can translate from[br]I think it was Maltese to Swedish-- 0:33:17.439,0:33:19.436 - (person 8) Estonian.[br]- Estonian. 0:33:22.016,0:33:26.211 And that is not a usual combination. 0:33:27.211,0:33:31.735 But once you have all these languages[br]in one machine-readable place, 0:33:31.735,0:33:33.143 you can do that, 0:33:33.143,0:33:36.857 you can get a dictionary 0:33:36.857,0:33:41.735 from Estonian to Maltese and back. 0:33:42.935,0:33:45.607 So covering language[br]combinations in dictionaries 0:33:45.607,0:33:47.911 that just haven't been covered before 0:33:47.911,0:33:51.050 because there wasn't[br]enough demand for it, for example, 0:33:51.050,0:33:55.540 to make it financially viable[br]and to justify the work. 0:33:55.540,0:33:57.147 Now we can do that. 0:33:59.797,0:34:02.318 Then text generation. 0:34:02.318,0:34:03.653 Lucie was earlier talking 0:34:03.653,0:34:10.136 about how she's working[br]with Hattie on generating text 0:34:10.136,0:34:14.673 to get Wikipedia articles[br]in minority languages started, 0:34:15.423,0:34:19.512 and that needs data about words, 0:34:19.512,0:34:22.589 and you need to understand[br]the language to do that. 0:34:23.769,0:34:28.133 Yeah, and those are just some[br]that come to my mind right now. 0:34:28.693,0:34:30.494 Maybe our audience has more ideas 0:34:30.494,0:34:34.353 what they want to do[br]when we have all the glorious data. 0:34:37.693,0:34:40.892 (person 9) Okay, I will deviate[br]from the lexemes topic. 0:34:40.892,0:34:42.666 I will ask the question, 0:34:42.666,0:34:45.634 how can I as a member of community 0:34:45.634,0:34:50.135 influence that priority is put on task, 0:34:50.135,0:34:56.644 that a new user comes, and he can indicate[br]what languages he wants to see and edit 0:34:56.644,0:35:01.135 without some secret verbal[br]template knowledge. 0:35:02.145,0:35:05.053 Maybe there will be this year[br]this technical wish list 0:35:05.053,0:35:07.040 without Wikipedia topics. 0:35:07.040,0:35:10.119 Maybe there's a hope[br]we can all vote about 0:35:10.119,0:35:14.218 this thing we didn't fix for seven years. 0:35:14.218,0:35:17.607 So do you have any ideas[br]and comments about this? 0:35:18.217,0:35:20.328 So you're talking about the fact 0:35:20.328,0:35:23.518 that someone who is[br]not logged into Wikidata 0:35:23.518,0:35:25.971 can't change their language easily? 0:35:25.971,0:35:27.839 (person 9) No, for [inaudible] users. 0:35:28.309,0:35:30.689 So, if they are logged in, 0:35:30.689,0:35:34.871 they can just change their language[br]at the top of the page, 0:35:35.891,0:35:38.099 and then it will appear 0:35:39.769,0:35:42.013 where the labels' description[br][inaudible] are, 0:35:42.013,0:35:43.483 and they can edit it. 0:35:45.657,0:35:49.009 (person 9) Well, actually, usually[br]many times the workflow 0:35:49.009,0:35:52.447 is that if you want to have[br]multiple languages, they are available, 0:35:52.447,0:35:55.419 and it's not always the case. 0:35:55.419,0:35:58.584 Okay, maybe we should sit down[br]after this talk and you show me. 0:36:01.562,0:36:04.089 Cool. More questions? 0:36:05.534,0:36:06.536 Yes. 0:36:11.595,0:36:13.196 (person 10) Thanks for the presentation. 0:36:14.106,0:36:15.127 Can you comment 0:36:15.127,0:36:19.307 on the state of the correlation[br]with the Wiktionary community. 0:36:19.307,0:36:22.296 As far as I've seen,[br]there were some discussions 0:36:22.296,0:36:26.051 about importing some elements of the work, 0:36:26.051,0:36:30.843 but there seems to be licensing issues[br]and some disagreements, et cetera. 0:36:30.843,0:36:31.848 Right. 0:36:31.848,0:36:36.330 So, Wiktionary communities[br]have spent a lot of time 0:36:37.320,0:36:39.473 building Wiktionary. 0:36:39.473,0:36:42.643 They have built 0:36:43.193,0:36:47.554 amazingly complicated[br]and complex templates 0:36:47.554,0:36:53.614 to build pretty tables[br]that automatically generate forms for you 0:36:53.614,0:36:56.392 and all kinds of really impressive, 0:36:56.392,0:37:00.683 and kind of crazy stuff,[br]if you think about it. 0:37:02.311,0:37:07.994 And, of course, they have invested[br]a lot of time and effort into that. 0:37:09.364,0:37:11.801 And understandably, 0:37:11.801,0:37:17.116 they don't just want that to be grabbed, 0:37:18.046,0:37:19.102 just like that. 0:37:19.102,0:37:21.791 So there's some of that coming from there. 0:37:22.761,0:37:25.137 And that's fine, that's okay. 0:37:25.737,0:37:32.092 Now, the first Wiktionary communities[br]are talking about turning out 0:37:32.092,0:37:34.329 and importing some[br]of their data into Wikidata. 0:37:34.329,0:37:39.095 Russian, you have seen,[br]for example, is one of those cases 0:37:40.375,0:37:42.355 And I expect more of that to happen. 0:37:43.635,0:37:46.800 But it will be a slow process, 0:37:46.800,0:37:49.383 just like adoption[br]of Wikidata's data on Wikipedia 0:37:49.383,0:37:51.909 has been a rather slow process. 0:37:52.849,0:37:56.183 On the other side[br]of making it actually easier 0:37:56.183,0:37:59.132 to use the data that is in lexemes, 0:37:59.132,0:38:02.209 on Wiktionary, so that[br]they can make use of that 0:38:02.209,0:38:05.531 and share data between[br]the language Wiktionaries 0:38:05.531,0:38:08.853 which is super hard[br]to impossible right now, 0:38:08.853,0:38:11.560 which is crazy,[br]just like it was on Wikipedia. 0:38:13.860,0:38:16.325 Wait for the birthday present. (chuckles) 0:38:20.038,0:38:21.182 Yes. 0:38:22.599,0:38:24.827 (person 11) When I was thinking[br]the other way around it, 0:38:24.827,0:38:28.168 I actually didn't want to say it[br]because I think this will be super silly, 0:38:28.168,0:38:32.003 but I think that Wiktionary[br]already has some content, 0:38:32.003,0:38:34.978 and I know that[br]we can't transfer it to Wikidata 0:38:34.978,0:38:37.048 because there's a difference in licenses. 0:38:37.048,0:38:39.631 But I was thinking maybe[br]we can do something about that. 0:38:40.321,0:38:45.913 Maybe, I don't know, we can obtain[br]the communities' permission 0:38:45.913,0:38:51.205 after like, I don't know,[br]having like a public voting 0:38:52.075,0:38:55.642 and for the community,[br]the active members of the community 0:38:55.642,0:39:02.523 to vote and say if they would like [br]or accept or to transfer the content 0:39:02.523,0:39:05.528 for which they may do[br]the Wikidata lexemes. 0:39:06.238,0:39:08.537 Because I just think it is such a waste. 0:39:09.568,0:39:14.443 So, that's definitely[br]a conversation those people 0:39:14.443,0:39:18.249 who are in Wiktionary communities[br]are very welcome to bring up there. 0:39:18.249,0:39:24.647 I think it would be a bit presumptuous[br]for us to go and force that. 0:39:25.917,0:39:31.142 But, yeah, I think it's definitely worth[br]having a conversation. 0:39:31.142,0:39:33.898 But I think it's also important[br]to understand 0:39:33.898,0:39:39.082 that there's a distinction between[br]what is actually legally allowed 0:39:39.082,0:39:43.147 and what we should be doing 0:39:43.147,0:39:45.426 and what those people want or do not want. 0:39:45.736,0:39:47.329 So even if it's legally allowed, 0:39:47.329,0:39:50.640 if some other Wiktionary communities[br]do not want that, 0:39:50.640,0:39:53.537 I would be careful, at least. 0:39:58.886,0:40:02.489 I think you need the mic[br]for the stream. 0:40:04.540,0:40:07.299 (person 12) So, obviously,[br]it's all very exciting, 0:40:07.979,0:40:12.319 and I immediately think[br]how can I take that to my students 0:40:12.319,0:40:15.558 and how can I incorporate it[br]with the courses, 0:40:15.558,0:40:18.531 the work that we're doing,[br]educational settings. 0:40:18.531,0:40:22.271 And I don't have, at the moment, 0:40:22.871,0:40:24.116 first of all, enough knowledge, 0:40:24.116,0:40:27.278 but I think the documentation[br]that we do have 0:40:27.808,0:40:30.082 could be maybe improved. 0:40:30.082,0:40:33.437 So that's a kind of request[br]to make cool videos 0:40:33.437,0:40:35.898 that explain how it works 0:40:35.898,0:40:39.948 because if we have it, we can then use it, 0:40:39.948,0:40:41.985 and we can have students on board, 0:40:41.985,0:40:47.072 and we can make people understand[br]how awesome it all is. 0:40:47.072,0:40:52.001 And yeah, just think about documentation[br]and think about education, please. 0:40:52.001,0:40:54.480 Because I think a lot could be done. 0:40:54.480,0:40:58.585 These are like many tasks[br]that could be done even with... 0:41:00.125,0:41:02.033 well, I wouldn't say primary schools, 0:41:02.033,0:41:05.495 but certainly, even younger students. 0:41:05.915,0:41:10.866 And so I would really like to see[br]that potential being tapped into, 0:41:10.866,0:41:15.272 and, as of now, I personally[br]don't understand enough 0:41:15.272,0:41:19.500 to be able to create tasks[br]or to create like... 0:41:20.430,0:41:22.155 to do something practical with it. 0:41:22.155,0:41:25.772 So any help, any thoughts[br]anyone here has about that, 0:41:25.772,0:41:29.648 I would be very happy to hear[br]your thoughts, and yours as well. 0:41:30.508,0:41:32.129 Yeah, let's talk about that. 0:41:35.473,0:41:37.139 More questions? 0:41:37.809,0:41:39.195 Someone else raised a hand. 0:41:39.195,0:41:40.495 I forgot where it was. 0:41:45.739,0:41:49.996 (person 13) So, if we can't import[br]from Wiktionary, 0:41:49.996,0:41:55.772 is there some concerted effort[br]to find other public domain sources, 0:41:55.772,0:41:57.459 maybe all the data, 0:41:58.769,0:42:03.167 and kind of prefilter it, organize it 0:42:03.167,0:42:08.470 so that it's easy to be checked[br]by people for import? 0:42:09.093,0:42:11.181 So there are first efforts. 0:42:11.181,0:42:14.769 My understanding is that Basque[br]is one of those efforts. 0:42:14.769,0:42:17.474 Maybe you want to say[br]a bit more about it? 0:42:18.426,0:42:20.130 (person 14) [inaudible] 0:42:23.166,0:42:27.148 Okay, the actual answer [br]is paying for that... 0:42:28.374,0:42:33.381 I mean, we have an agreement[br]with a contractor we usually work with. 0:42:34.801,0:42:38.725 They do dictionaries-- 0:42:40.315,0:42:42.458 lots of stuff, but they do dictionaries. 0:42:42.458,0:42:47.473 So we agreed with them[br]to make free the students' dictionary, 0:42:47.473,0:42:52.782 we would [cast] the most common words[br]and start uploading it 0:42:52.782,0:42:55.590 with an external identifier[br]and the scheme of things. 0:42:56.420,0:43:02.902 But there was some discussion[br]about leaving it on CC0 0:43:03.212,0:43:05.322 because they have[br]the dictionary with CC by it, 0:43:06.537,0:43:10.326 and they understood[br]what the difference was. 0:43:10.326,0:43:13.866 So there was some discussion. 0:43:13.866,0:43:19.709 But I think that we can provide some tools[br]or some examples in the future, 0:43:19.709,0:43:21.761 and I think that there will be[br]other dictionaries 0:43:21.761,0:43:24.016 that we can handle, 0:43:24.016,0:43:29.274 and also I think Wiktionary[br]should start moving in that direction, 0:43:29.274,0:43:32.260 but that's another great discussion. 0:43:33.285,0:43:34.487 And on top of that, 0:43:34.487,0:43:38.839 Lea is also in contact[br]with people from Occitan 0:43:38.839,0:43:41.827 who work on Occitan dictionaries, 0:43:41.827,0:43:45.138 and they're currently working[br]on a Sumerian collaboration. 0:43:51.644,0:43:53.363 More questions? 0:44:01.487,0:44:05.349 (person 15) Hi! We are the people[br]who want to import Occitan data. 0:44:05.349,0:44:06.585 Aha! Perfect! 0:44:06.585,0:44:08.368 (person 15) And we have a small problem. 0:44:09.188,0:44:14.215 We don't know how to represent[br]the variety of all lexemes. 0:44:14.215,0:44:17.893 We have six dialects, 0:44:17.893,0:44:24.014 and we want to indicate for Lexeme[br]in which dialect it's used, 0:44:24.014,0:44:27.285 and we don't have a proper[br]C0 statement to do that. 0:44:27.285,0:44:31.105 So as long as the segment doesn't exist, 0:44:31.635,0:44:34.465 it prevents us from [inaudible] 0:44:34.465,0:44:37.603 because we will need to do it again 0:44:37.603,0:44:42.076 when we will be able[br]to [export] the statement. 0:44:42.076,0:44:44.551 And it's complicated[br]because it's a statement 0:44:44.551,0:44:47.802 which won't be asked by many people 0:44:47.802,0:44:53.444 because it's a statement[br]which concerns mostly minority languages. 0:44:53.444,0:44:56.933 So you will have one person to ask this. 0:44:56.933,0:45:00.022 But as our colleagues Basque, 0:45:00.022,0:45:06.082 it can be one person[br]who will power thousands of others, 0:45:06.082,0:45:10.884 so it might not be asking a lot, 0:45:10.884,0:45:14.136 but it will be very important for us. 0:45:14.874,0:45:17.600 Do you already have[br]a new property proposal up, 0:45:17.600,0:45:19.470 or do you need help creating it? 0:45:21.524,0:45:24.300 (person 15) We asked four months ago. 0:45:24.720,0:45:28.755 Alright, then let's get some people[br]to help out with this property proposal. 0:45:30.159,0:45:33.092 I'm sure there are enough people[br]in this room to make this happen. 0:45:33.360,0:45:35.452 (person 15) Property proposal[br][speaking in French]. 0:45:35.452,0:45:36.965 (person 16) We didn't have an answer. 0:45:36.965,0:45:39.769 (person 15) We didn't have any answer,[br]and we don't know how to do this 0:45:39.769,0:45:42.953 because we aren't [br]in the Wikidata community. 0:45:44.694,0:45:48.817 Yup, so there are people here[br]who can help you. 0:45:48.817,0:45:52.134 Maybe someone raises their hand to take-- 0:45:52.574,0:45:53.644 (person 14) I'm for that. 0:45:53.644,0:45:55.512 But I think this is quite interesting 0:45:55.512,0:45:59.059 that only the variant of form 0:45:59.059,0:46:02.607 also can handle it geographically, 0:46:02.607,0:46:04.995 with coordinates or some kind of mapping. 0:46:05.595,0:46:07.815 Also having different pronunciations, 0:46:07.815,0:46:11.837 and I think this is something[br]that happens in lots of languages. 0:46:12.607,0:46:16.262 We should start making[br]it happen [inaudible], 0:46:16.262,0:46:18.865 and I'm going to search for the property. 0:46:19.782,0:46:20.933 Cool. 0:46:20.933,0:46:24.446 So you will get backing[br]for your property proposal. 0:46:26.136,0:46:27.297 Thank you. 0:46:28.153,0:46:30.261 Alright, more questions? 0:46:32.410,0:46:33.474 Finn. 0:46:33.974,0:46:35.055 Finn is one of those people 0:46:35.055,0:46:38.031 who builds stuff[br]on top of lexicographical data. 0:46:38.031,0:46:40.085 (Finn) It's just a small question, 0:46:40.405,0:46:44.226 and that's about spelling variations. 0:46:44.896,0:46:48.002 It seems to be difficult to put them in... 0:46:48.532,0:46:53.368 You could, of course,[br]have multiple forms for the same word. 0:46:56.327,0:46:58.448 I don't know, it seems to be... 0:46:59.558,0:47:03.535 If you don't do it that way,[br]it seems to be difficult to specify... 0:47:04.771,0:47:05.888 or I don't know whether 0:47:05.888,0:47:09.731 this is just a minor technical issue[br]or whether... 0:47:09.731,0:47:11.252 Let's look at it together. 0:47:11.642,0:47:15.230 I would love to see an example. 0:47:17.478,0:47:18.478 Asaf. 0:47:26.886,0:47:28.396 (Asaf) Thank you. 0:47:29.386,0:47:33.685 I can give a very concrete example[br]from my mother tongue, Hebrew. 0:47:34.205,0:47:38.845 Hebrew has two main variants 0:47:38.845,0:47:42.786 for expressing almost every word 0:47:42.786,0:47:47.640 because the traditional spelling 0:47:47.640,0:47:50.044 leaves out many of the vowels. 0:47:50.934,0:47:55.207 And, therefore, in modern editions[br]of the Bible and of poetry, 0:47:55.207,0:47:57.461 diacritics are used. 0:47:57.461,0:48:02.670 However, those diacritics[br]are never used for modern prose 0:48:02.670,0:48:05.974 or newspaper writing or street signs. 0:48:05.974,0:48:11.209 So the average daily casual use[br]puts in extra vowels 0:48:12.169,0:48:13.519 and doesn't use the diacritics 0:48:13.519,0:48:15.607 because they are,[br]of course, more cumbersome 0:48:15.607,0:48:17.893 and have all kinds of rules[br]and nobody knows the rules. 0:48:18.633,0:48:20.531 So there are basically two variants. 0:48:20.531,0:48:25.322 There's the everyday casual prose variant, 0:48:25.322,0:48:27.827 and there's the Bible or poetry, 0:48:27.827,0:48:32.200 which always come[br]in this traditional diacriticized text. 0:48:32.200,0:48:33.302 To be useful, 0:48:33.302,0:48:37.428 Lexeme would have to recognize[br]both varieties of every single word 0:48:37.428,0:48:39.747 and every single form[br]of every single word. 0:48:40.677,0:48:43.391 So that's a very comprehensive use case 0:48:43.391,0:48:46.340 for official stable variants. 0:48:46.340,0:48:48.942 It's not dialect, it's not regions, 0:48:49.332,0:48:53.627 it's basically two coexisting[br]morphological systems. 0:48:54.537,0:48:58.926 And I too don't know exactly[br]how to express that in Lexeme today, 0:48:58.926,0:49:02.800 which is one thing that is keeping me[br]in partial answer to Magnus' question 0:49:02.800,0:49:05.238 from uploading the parts that are ready 0:49:05.238,0:49:09.394 from the biggest Hebrew dictionary,[br]which is public domain 0:49:09.394,0:49:13.141 and which I have been digitizing[br]for several years now. 0:49:13.141,0:49:14.803 A good portion of it is ready, 0:49:14.803,0:49:16.549 but I'm not putting it on Lexeme right now 0:49:16.549,0:49:20.245 because I don't know exactly[br]how to solve this problem. 0:49:20.245,0:49:23.387 Alright, let's solve[br]this problem here. (chuckles) 0:49:24.503,0:49:26.021 That has to be possible. 0:49:30.045,0:49:32.047 Alright, more questions? 0:49:37.173,0:49:39.735 If not, then thank you so much. 0:49:40.605,0:49:42.675 (applause)