WEBVTT 00:00:06.009 --> 00:00:09.069 (host) Hello, everyone. Thank you for coming to these lightning talks. 00:00:09.069 --> 00:00:11.529 Our first speaker, I'm going to run straight into it, 00:00:11.529 --> 00:00:13.781 is going to be Rosie Stephenson-Goodknight. 00:00:13.781 --> 00:00:15.319 Did I get that right? 00:00:15.319 --> 00:00:19.609 Yes. And so she's going to be talking about the Women Writers Project. 00:00:19.609 --> 00:00:22.569 And we're going to-- yeah, is that right? Great. 00:00:22.569 --> 00:00:24.299 And so, we're going to just launch right in, 00:00:24.299 --> 00:00:26.699 and I want to remind you, if there's time for questions, 00:00:26.699 --> 00:00:28.802 to please not speak until you have the microphone. 00:00:28.802 --> 00:00:30.329 Thank you. 00:00:31.589 --> 00:00:34.125 (Rosie) Hi, everyone, and thanks for coming to this session, 00:00:34.125 --> 00:00:36.829 where we're going to talk about Women Writers in Review, 00:00:36.829 --> 00:00:40.329 cultures of reception associated with trans-Atlantic, 00:00:40.329 --> 00:00:43.977 English language women writers, broadly construed. 00:00:44.523 --> 00:00:48.387 Women Writers in Review is an initiative of the Women Writers Project 00:00:48.387 --> 00:00:50.535 of Northeastern University. 00:00:50.535 --> 00:00:55.253 It moved there from Brown University, approximately 15 years ago. 00:00:55.993 --> 00:01:00.287 Women Writers in Review is a collection of 18th- and 19th-century reviews, 00:01:00.287 --> 00:01:04.281 publication notices, literary histories, and other texts 00:01:04.281 --> 00:01:09.511 corresponding to trans-Atlantic-- so, UK and US mostly, 00:01:09.511 --> 00:01:12.953 though a few Canadian-- written works by women. 00:01:13.255 --> 00:01:15.683 It's a project where the two universities, 00:01:15.683 --> 00:01:18.133 Brown University and Northeastern University, 00:01:18.133 --> 00:01:22.645 started collecting the manuscripts of women from this period. 00:01:23.337 --> 00:01:27.520 And then they started collecting the reviews of these works, 00:01:27.520 --> 00:01:31.593 and then they started scoring these reviews by giving them a rating. 00:01:32.321 --> 00:01:36.144 It's designed to investigate the discourse of reception and connection 00:01:36.144 --> 00:01:39.333 with the changing trans-Atlantic literary landscape 00:01:39.333 --> 00:01:42.664 for the period 1770 to 1830. 00:01:46.143 --> 00:01:49.103 You're going to pardon me if I speak fast, because I've got five minutes 00:01:49.103 --> 00:01:50.646 to go over this. 00:01:50.646 --> 00:01:55.443 It includes 690 English language texts responding to works 00:01:55.443 --> 00:01:59.565 written or translated by 18th- and 19th-century women writers. 00:01:59.593 --> 00:02:04.813 There are 74 authors in the corpus, using 112 different sources, 00:02:04.813 --> 00:02:07.782 or periodicals, or magazines. 00:02:07.782 --> 00:02:10.773 And there are 628 critical reviews. 00:02:11.867 --> 00:02:14.671 Here's a picture that shows you what we're talking about 00:02:14.671 --> 00:02:16.573 in terms of a review. 00:02:16.573 --> 00:02:18.819 And you can also see what kind of scores 00:02:18.819 --> 00:02:25.403 were given by the academics at Northeastern University. 00:02:25.833 --> 00:02:28.922 Most of these are women who were giving scores 00:02:28.922 --> 00:02:34.031 based on the reviews that were done mostly, probably all men, 00:02:34.031 --> 00:02:39.799 back in this time period 1770 to 1830 of works written by women. 00:02:39.799 --> 00:02:43.469 By works, we're talking about plays, and novels, and poems, 00:02:43.469 --> 00:02:46.955 essays, and other kinds of articles. 00:02:48.615 --> 00:02:50.275 So, what are we talking about? 00:02:50.275 --> 00:02:54.676 This required creating items for authors for their works, 00:02:54.676 --> 00:02:57.946 like I said, novels and plays and poems. 00:02:57.946 --> 00:03:04.938 It required creating new items for this period of time 00:03:05.038 --> 00:03:08.391 where there are defunct periodicals. 00:03:08.391 --> 00:03:12.499 It required creating items for the scholarly articles. 00:03:12.578 --> 00:03:16.900 And then the review scores of each, and the review score by, 00:03:16.943 --> 00:03:19.998 which in this case would be Women Writers in Review, 00:03:19.998 --> 00:03:23.336 and what we still need to add is the described by source. 00:03:25.226 --> 00:03:28.970 This gives you a picture of the kind of spreadsheets, 00:03:28.970 --> 00:03:31.397 Google Spreadsheets, that I have been working on. 00:03:31.397 --> 00:03:34.296 I shouldn't just say I, because I've had a lot of help. 00:03:34.296 --> 00:03:37.546 I've had a lot of people who were working on this project with me. 00:03:37.546 --> 00:03:40.413 And you can see at the top, something about the authors, 00:03:40.413 --> 00:03:41.736 about the works. 00:03:41.736 --> 00:03:45.496 The third group is going to be the periodical, 00:03:45.496 --> 00:03:48.006 and then, how the scores started showing. 00:03:49.203 --> 00:03:52.122 And of course, this is how they look-- 00:03:52.122 --> 00:03:57.396 the beauty of being able to present the preliminary findings. 00:03:57.856 --> 00:04:01.767 Once we have uploaded all of the data, 00:04:02.989 --> 00:04:05.906 and I hope that that's going to be done by the end of this year, 00:04:06.956 --> 00:04:08.496 this will obviously look different. 00:04:09.916 --> 00:04:10.931 Appendix. 00:04:10.931 --> 00:04:15.267 So, here's what the depiction looks like 00:04:15.267 --> 00:04:18.505 at the Northeastern University website. 00:04:19.024 --> 00:04:22.474 I don't think it's quite as clear as what we can do with Wikidata. 00:04:22.531 --> 00:04:27.351 And so, this was probably the reason why, when I started as a visiting scholar 00:04:27.351 --> 00:04:31.751 in 2017, they asked if this is one of the projects that I could work on. 00:04:31.751 --> 00:04:36.093 They stopped their work the year before, in 2016. 00:04:36.093 --> 00:04:39.073 And I think they just don't have the resources to continue. 00:04:40.251 --> 00:04:43.415 Some parts of this presentation came from another 00:04:43.415 --> 00:04:45.812 that was published in 2016. 00:04:45.812 --> 00:04:49.401 And last but not least, here are links 00:04:49.401 --> 00:04:53.361 to the different parts of the work that I'm doing. 00:04:54.257 --> 00:04:55.561 Thank you very much. 00:04:55.561 --> 00:04:56.845 Questions. 00:04:56.845 --> 00:04:58.754 (applause) 00:05:10.397 --> 00:05:14.665 (woman) So, when you have a work, and you have the review of the work, 00:05:14.665 --> 00:05:17.703 are you looking at a particular edition of the work, 00:05:17.703 --> 00:05:20.665 or are these all reviews of first editions? 00:05:21.271 --> 00:05:22.861 It's a good question. No. 00:05:22.861 --> 00:05:25.601 They are not just reviews of the first edition. 00:05:25.601 --> 00:05:28.601 Some are reviews of the second or third edition. 00:05:30.062 --> 00:05:32.262 I'm going to add something that maybe I should have said 00:05:32.262 --> 00:05:34.951 before I closed and went to question and answers-- 00:05:34.966 --> 00:05:36.800 what's so special about this? 00:05:37.220 --> 00:05:40.461 What's special is nobody else has done this on Wikidata. 00:05:41.454 --> 00:05:45.580 Surely, there are other universities that have their own collections, 00:05:45.580 --> 00:05:51.447 where their scholars have reviewed the reviews of someone's work 00:05:51.800 --> 00:05:53.394 in some language. 00:05:54.491 --> 00:05:57.389 So, hopefully, once this methodology gets-- 00:05:58.000 --> 00:06:02.390 once I write this up and the project is over and presented again, 00:06:02.390 --> 00:06:05.310 that there will be other universities, other libraries 00:06:05.310 --> 00:06:07.923 that will speak up and say, "We've got data sets, too, 00:06:08.248 --> 00:06:13.020 and we're going to go ahead and upload them into Wikidata ourselves," 00:06:13.020 --> 00:06:15.910 and then it'd be lovely to start doing some comparisons. 00:06:19.572 --> 00:06:22.060 Anyone? Jane. 00:06:22.093 --> 00:06:23.767 (Jane) Do you actually have books? 00:06:24.293 --> 00:06:26.889 Do you actually have the books-- are the books in existence, 00:06:26.889 --> 00:06:28.860 or are you actually doing metadata about books 00:06:28.860 --> 00:06:31.400 where we don't even know where the books are? 00:06:31.780 --> 00:06:34.829 Northeastern University actually has the book, 00:06:34.829 --> 00:06:37.209 or the essay, or the poem. 00:06:39.759 --> 00:06:45.392 And they have the critical review of the book, or the essay, or the poem. 00:06:45.755 --> 00:06:48.820 And they're working on the transcription of these, 00:06:48.820 --> 00:06:51.452 and they're not at 100% yet. 00:06:52.432 --> 00:06:56.256 They're not at 100%, but it's like, all things working on it. 00:07:00.218 --> 00:07:02.043 Any other questions? 00:07:05.697 --> 00:07:07.399 (host) We're going to wrap it up there. 00:07:07.399 --> 00:07:09.063 Thanks for being such a nice audience. 00:07:09.063 --> 00:07:11.677 (applause) 00:07:14.012 --> 00:07:18.581 Lady bug for [inaudible]. 00:08:58.271 --> 00:08:59.372 (man) Finally got that. 00:08:59.372 --> 00:09:02.565 What I'm going to do is I'm just going to click on these to load. 00:09:02.565 --> 00:09:06.091 Just while-- is that new tab there? 00:09:06.946 --> 00:09:08.053 [inaudible] 00:09:08.053 --> 00:09:10.524 The first one? Yeah, perfect. 00:09:11.024 --> 00:09:13.503 Sorry, my German is not even rusty, 00:09:13.503 --> 00:09:15.251 it's simply non-existent. 00:09:15.663 --> 00:09:19.561 So, I'll just let them load, because then these queries can run 00:09:19.561 --> 00:09:22.728 while I'm sort of introducing what I was talking about and doing. 00:09:22.728 --> 00:09:24.795 So, hi, I'm Nav from Histropedia. 00:09:24.795 --> 00:09:28.169 And basically, for the last quite a few years, 00:09:28.169 --> 00:09:29.710 we've been relatively quiet, 00:09:29.710 --> 00:09:32.423 while we've been sort of working on technology and tools 00:09:32.423 --> 00:09:36.837 that we need to sort of develop, ultimately, Histropedia version 2, 00:09:36.837 --> 00:09:39.433 which is going to be, you know, this huge enhancement 00:09:39.433 --> 00:09:40.771 on the first version. 00:09:40.771 --> 00:09:43.270 Well, it's kind of in progress, but as we do it, 00:09:43.270 --> 00:09:45.236 we've been experimenting with these other tools, 00:09:45.236 --> 00:09:47.387 and building the technology that we're going to need. 00:09:48.132 --> 00:09:51.781 One really crucial part for this is the ability to sort of see 00:09:51.781 --> 00:09:55.085 the whole of history from the billions of years time scale, 00:09:55.085 --> 00:09:58.602 to up to the current day, 00:09:58.602 --> 00:10:00.638 and zooming all the way into single days. 00:10:00.638 --> 00:10:03.433 And ultimately, in the end, down to hours and minutes. 00:10:03.433 --> 00:10:06.517 We've managed to create a [inaudible] of update to our engine. 00:10:06.517 --> 00:10:08.327 Other engines can already do this, 00:10:08.327 --> 00:10:11.122 but unfortunately, they also can't handle the large data sets. 00:10:11.122 --> 00:10:13.269 So, we finally got this update to our engine. 00:10:13.269 --> 00:10:15.392 It allows us to zoom to billions of years. 00:10:15.392 --> 00:10:19.533 So, recently-- the recently finished update, 00:10:19.533 --> 00:10:22.333 and it's basically, it's an update to our query viewer tool, 00:10:22.333 --> 00:10:24.482 which is like a live version of Histropedia 00:10:24.482 --> 00:10:26.832 just linked straight to Wikidata. 00:10:26.832 --> 00:10:29.092 So, it's literally based on a query, 00:10:29.092 --> 00:10:31.372 a live query, and we see the results of it. 00:10:31.372 --> 00:10:33.883 So, it's sort of separate to our main tool. 00:10:33.883 --> 00:10:37.502 So, I'm going to flick to the first one, which is my first experiment. 00:10:37.502 --> 00:10:39.716 And you'll forgive me, the queries-- 00:10:39.716 --> 00:10:42.181 the code was kind of finished not so long ago, 00:10:42.181 --> 00:10:44.736 and the queries, I've been trying to find out what can I find 00:10:44.736 --> 00:10:47.692 and what's interesting to look at, what's missing. 00:10:47.692 --> 00:10:52.154 So, I started off with a kind of, sort of, well-- 00:10:52.154 --> 00:10:54.241 So, that's not the right-- that's not Life on Earth. 00:10:54.241 --> 00:10:55.699 Is this Life on Earth? 00:10:56.123 --> 00:10:57.467 That will do, anyway. 00:10:57.467 --> 00:11:01.985 So, I started off just trying to look at what sort of things 00:11:01.985 --> 00:11:04.657 are actually in Wikidata. 00:11:04.657 --> 00:11:07.407 And this particular one-- sorry, it's in reverse. 00:11:07.407 --> 00:11:09.829 So, this is the first one I wanted to show you. 00:11:09.829 --> 00:11:12.485 So, this is a kind of a life on Earth query 00:11:12.485 --> 00:11:14.457 that I wanted to develop. 00:11:14.457 --> 00:11:18.410 And basically, what it is is all the taxons in Wikidata 00:11:18.410 --> 00:11:20.157 that have a date. 00:11:20.157 --> 00:11:23.726 And as you can probably see from the panel, there is not many of them. 00:11:23.726 --> 00:11:25.784 But we do have the different taxon ranks. 00:11:25.784 --> 00:11:27.596 So, you know, is it a species, a class-- 00:11:27.596 --> 00:11:29.725 for a biologist, this makes a lot of sense. 00:11:29.725 --> 00:11:32.446 But if I was just to close that a bit, 00:11:32.596 --> 00:11:35.453 we can see, we are going back to the earliest forms of life here. 00:11:35.453 --> 00:11:37.236 3.5 billion years ago. 00:11:37.236 --> 00:11:42.707 And as we zoom in here, we start to see the more modern forms of life, 00:11:42.746 --> 00:11:47.232 and we see some really interesting things developing, 00:11:47.232 --> 00:11:50.829 but we're still lacking a lot of data in terms of this kind of time range. 00:11:52.250 --> 00:11:55.286 So, my next thought was, "Okay, well, why aren't--" 00:11:55.592 --> 00:11:57.088 "I want to see a Tyrannosaurus Rex." 00:11:57.088 --> 00:11:59.838 That's what I really wanted to see on my query, and it wasn't there. 00:11:59.838 --> 00:12:02.138 So, had a little dig in, and I found out why. 00:12:02.234 --> 00:12:05.284 It's because they're much more being stored 00:12:05.284 --> 00:12:08.696 in terms of the temporal range or time period that they relate to. 00:12:09.065 --> 00:12:11.412 So, on comes the next query, 00:12:11.412 --> 00:12:13.144 where I actually sort of-- 00:12:13.664 --> 00:12:17.641 basically, this query is looking for any item 00:12:17.641 --> 00:12:22.284 that has a temporal range start, and/or a temporal range end. 00:12:22.665 --> 00:12:25.965 Which is basically in the form-- in life forms, it kind of relates 00:12:25.965 --> 00:12:28.644 to when they emerged and when they became extinct. 00:12:28.644 --> 00:12:31.044 So, these are the periods on the side here. 00:12:31.585 --> 00:12:33.190 If I just close that a bit-- 00:12:33.190 --> 00:12:37.364 you can see that we have quite a lot of interesting stuff. 00:12:37.364 --> 00:12:39.834 And there's the Tyrannosaurus that I was looking for. 00:12:39.834 --> 00:12:43.394 So, I finally got that, and I was like, "Yes! I've done it!" 00:12:43.394 --> 00:12:46.084 I've got that Triceratops in there for bonus. 00:12:46.084 --> 00:12:48.984 But of course, still loads missing. 00:12:48.984 --> 00:12:50.665 And I'd love to see lots more here. 00:12:50.665 --> 00:12:52.590 But at least, it gives you the idea. 00:12:52.590 --> 00:12:55.794 The nice thing is, here as well, if I star some of these, 00:12:55.794 --> 00:12:58.374 you can see that the time range is shown. 00:12:58.374 --> 00:13:01.027 So, you can start to do what I really wanted to do, is say, 00:13:01.027 --> 00:13:04.004 "Okay, when did this one end, and when did the next one begin? 00:13:04.004 --> 00:13:06.085 When did things start going extinct?" 00:13:06.085 --> 00:13:09.832 So, I was pretty excited, but, still, really hoping for a lot more. 00:13:09.832 --> 00:13:11.619 So, there's a lot of editing to be done 00:13:11.619 --> 00:13:15.098 in terms of these large geological and cosmic time scales. 00:13:15.909 --> 00:13:19.273 You can see on the color code, I can also do extinction period. 00:13:19.273 --> 00:13:23.489 So, I say, I want to find out stuff that went extinct in the late Cretaceous. 00:13:23.489 --> 00:13:25.768 And I now know that two things did that. 00:13:25.768 --> 00:13:27.717 There's obviously quite a few more. 00:13:27.717 --> 00:13:30.483 And I put the taxon rank in there, as well, 00:13:30.483 --> 00:13:31.986 just so that we can also see, 00:13:31.986 --> 00:13:34.588 "Okay, which, what is its species, genus, et cetera." 00:13:35.479 --> 00:13:37.143 So, pretty exciting. 00:13:37.143 --> 00:13:41.192 I was quite happy, but it's unfolding, what needs to be done a lot. 00:13:42.126 --> 00:13:45.447 So I went to the next one, which was-- 00:13:45.447 --> 00:13:48.045 I was thinking, "Well, I can't find all the data I'm looking for. 00:13:48.045 --> 00:13:49.347 Let's go a bit more general, 00:13:49.347 --> 00:13:53.833 and just look for all of a certain kind of dates in Wikidata that I can find 00:13:53.833 --> 00:13:57.240 that are over 10,000 years old, basically. 00:13:58.219 --> 00:14:00.703 And what type of thing are they?" 00:14:00.762 --> 00:14:04.298 So, this color code is relatively okay, but it might be a bit misleading, 00:14:04.298 --> 00:14:06.264 because some things are multiple types. 00:14:06.264 --> 00:14:08.318 So, therefore, it's a bit random, at times. 00:14:08.318 --> 00:14:11.468 But, you get some really fascinating stuff in here. 00:14:11.468 --> 00:14:14.255 I've got for a start-- I've got all of the millennia 00:14:14.255 --> 00:14:18.238 that we have in Wikidata, which is, you know, there you go. 00:14:18.238 --> 00:14:21.558 Read about everything that happened in all these different millennia. 00:14:21.558 --> 00:14:23.629 No pictures for any of these, unfortunately. 00:14:23.629 --> 00:14:26.670 So, there's nothing to really say what happened in them. 00:14:26.670 --> 00:14:29.203 Taxon, which we were just looking at, which kind of led me on 00:14:29.203 --> 00:14:31.124 to the other queries. 00:14:31.124 --> 00:14:34.079 And of course, that sort of like all of them in one group. 00:14:34.079 --> 00:14:36.875 Interesting stuff. Archaeological cultures. 00:14:36.875 --> 00:14:40.121 And this is like, okay, this is more like up my street. 00:14:40.121 --> 00:14:42.670 This is the sort of things I want to learn about. 00:14:42.670 --> 00:14:45.234 Again, pictures would be nice. 00:14:45.493 --> 00:14:48.781 But it's really showing you something interesting. 00:14:48.781 --> 00:14:50.361 And it's just worth exploring here. 00:14:50.361 --> 00:14:52.534 And of course, there's some that really make me excited 00:14:52.534 --> 00:14:54.048 for what we could be doing. 00:14:54.048 --> 00:14:57.288 For example, there was something here which was-- 00:14:58.028 --> 00:15:00.888 I mean, system, actually, was quite an interesting one. 00:15:01.794 --> 00:15:04.237 And sorry, that's not actually the one I was thinking about. 00:15:04.237 --> 00:15:05.958 In fact, that means nothing to me at all. 00:15:05.958 --> 00:15:07.613 Someone might know what that means. 00:15:08.057 --> 00:15:10.813 Art movements, archaeological sites, activities. 00:15:10.813 --> 00:15:12.478 There was only two of these, 00:15:12.478 --> 00:15:15.788 but I really like the idea, because-- and they're both the same. 00:15:15.788 --> 00:15:17.658 They're both hunting. 00:15:17.730 --> 00:15:19.390 And of course, there's two of them. 00:15:19.390 --> 00:15:22.360 And the reason is, is because there's a little qualifier on there. 00:15:22.360 --> 00:15:25.143 If we were to just look through, we can see-- 00:15:25.143 --> 00:15:27.735 we can see somewhere down here, will be the start time. 00:15:27.735 --> 00:15:30.690 And the qualifier is talking about when Homo erectus did it, 00:15:30.690 --> 00:15:32.735 and when Homo sapiens did it. 00:15:32.735 --> 00:15:35.513 So that should be in brackets on the query, 00:15:35.513 --> 00:15:39.002 a little extension to do to show you what the two different versions mean. 00:15:39.002 --> 00:15:42.390 But I would love to see all of human skills in here. 00:15:42.390 --> 00:15:44.708 When did we first do farming, when did we first this-- 00:15:44.708 --> 00:15:46.010 when did fire come about? 00:15:46.010 --> 00:15:48.270 All of these things, when did we first extract iron? 00:15:48.270 --> 00:15:50.355 When did we first-- all of these wonderful things 00:15:50.355 --> 00:15:53.607 that developed to modern world that we live in. 00:15:53.607 --> 00:15:56.873 So, really exciting signs of what could be there, 00:15:56.873 --> 00:15:58.112 if it all got populated. 00:15:58.112 --> 00:16:00.210 So, you know, this is what we really need to work on, 00:16:00.210 --> 00:16:02.333 is some of this historical info. 00:16:03.243 --> 00:16:05.060 Last one, I just wanted to just show you, 00:16:05.060 --> 00:16:07.283 which was just an extra bonus one I threw in, 00:16:07.283 --> 00:16:10.875 just to look at the time periods that we actually have, 00:16:10.875 --> 00:16:13.921 the historical ages that we have in Wikidata. 00:16:13.921 --> 00:16:17.524 And so, this is actually just all sub-classes of unit of time. 00:16:17.524 --> 00:16:22.396 And then, this is the actual instance that it was. 00:16:22.396 --> 00:16:23.775 And it's just really interesting. 00:16:23.775 --> 00:16:25.849 This is more the kind of thing-- 00:16:26.979 --> 00:16:29.541 In Histropedia Mark II, these are the kind of things 00:16:29.541 --> 00:16:31.944 that will actually will be displayed more under the timeline 00:16:31.944 --> 00:16:33.984 as a sort of a range or period. 00:16:33.993 --> 00:16:36.436 And so, we are particularly interested in these periods 00:16:36.436 --> 00:16:37.976 being really tight and nice, 00:16:37.976 --> 00:16:40.718 because it helps you to, then, say what happened when, 00:16:40.718 --> 00:16:43.983 and you can sound really clever when you talk about when things happened, 00:16:43.983 --> 00:16:47.263 in the Neolithic or the upper Paleolithic, or whatever. 00:16:47.263 --> 00:16:49.121 I'm still pretty clueless on most of it, 00:16:49.121 --> 00:16:51.918 because I'm just kind of just waiting for the data to be up to scratch. 00:16:51.918 --> 00:16:55.163 Great. I think I can actually round it up there. 00:16:55.163 --> 00:16:57.145 Loads more exciting queries to come. 00:16:57.145 --> 00:17:00.420 A lot more features and cool stuff, actually, just around the corner for us, 00:17:00.420 --> 00:17:02.758 because we've just finished a lot of cool things. 00:17:02.758 --> 00:17:05.471 But there's a little bit of time to pull it all together. 00:17:05.471 --> 00:17:07.373 So, look out for more. 00:17:07.373 --> 00:17:09.760 If there's any questions, I think I've got one minute. 00:17:09.760 --> 00:17:11.458 So, it would have to be one. 00:17:11.510 --> 00:17:13.253 (host) Yes, Nav. I forgot to introduce you. 00:17:13.253 --> 00:17:16.933 I'm sorry. That's Nav, as he said, Histropedia, Evans. Thank you very much. 00:17:16.933 --> 00:17:17.986 Thank you. Cheers. Yeah. 00:17:17.986 --> 00:17:19.450 (host) Very fast questions. 00:17:19.450 --> 00:17:21.815 Anyone with a very fast question [inaudible]. 00:17:24.654 --> 00:17:29.230 (woman 2) Very quickly, how can I do my own, if I want languages, 00:17:29.230 --> 00:17:30.818 when do we start, for instance. 00:17:30.818 --> 00:17:32.031 Absolutely. Good question. 00:17:32.031 --> 00:17:34.320 So just click on the-- oh, I've shared this. 00:17:34.320 --> 00:17:36.853 It's called cosmic timelines on the URL. 00:17:36.853 --> 00:17:40.911 Should be cosmic and geological, but then it's not a short URL anymore. 00:17:40.911 --> 00:17:43.711 So, you click on this icon in the top corner there, 00:17:43.711 --> 00:17:47.431 and then, you get to the query page, which is like the home page of this tool. 00:17:47.431 --> 00:17:49.311 This is where the query is pasted in. 00:17:49.311 --> 00:17:51.491 So, at the moment, I've got the language there. 00:17:51.491 --> 00:17:53.483 If I want to change it to something else, 00:17:53.483 --> 00:17:56.062 Arabic, or French, or whatever-- 00:17:56.062 --> 00:17:58.271 and here are the-- this is the area 00:17:58.271 --> 00:18:03.092 where you sort of enter in exactly which variables in your query 00:18:03.092 --> 00:18:04.600 you would like to do each thing. 00:18:04.600 --> 00:18:06.781 If you put nothing in, it will try and figure it out. 00:18:06.781 --> 00:18:09.971 But if you want advanced stuff-- and really important, is the precision, 00:18:09.971 --> 00:18:13.033 because that's not available on the query service timeline. 00:18:13.033 --> 00:18:14.123 So, you get everything-- 00:18:14.123 --> 00:18:16.303 is the first of January 10 billion years ago, 00:18:16.303 --> 00:18:18.363 you know, which is not what we want to see. 00:18:18.363 --> 00:18:20.603 And the rank, which is quite interesting. 00:18:20.603 --> 00:18:24.173 My timelines are all based on a very simple rank of site link count, 00:18:24.173 --> 00:18:27.058 how many different articles there are, or something else. 00:18:27.058 --> 00:18:29.432 But that's how you go and mess around with it with yourself, 00:18:29.432 --> 00:18:32.034 and you put your color codes and your filters in down here. 00:18:32.034 --> 00:18:34.098 Comma separate them, if you would like more, 00:18:34.098 --> 00:18:36.007 and they come up as options in the final tool. 00:18:36.007 --> 00:18:37.836 And I think that pretty much is it, isn't it. 00:18:37.836 --> 00:18:39.863 So, any other questions, do find me afterwards. 00:18:39.863 --> 00:18:41.655 Always happy to get cornered for this stuff. 00:18:41.655 --> 00:18:42.954 I love talking about it. 00:18:42.954 --> 00:18:44.989 Okay. So, thank you very much. Cheers. 00:18:44.989 --> 00:18:46.948 (applause) 00:19:28.344 --> 00:19:30.220 (mumbles) 00:19:30.265 --> 00:19:32.115 So, where is the first one? 00:19:33.854 --> 00:19:35.397 This one, no. 00:19:45.636 --> 00:19:47.132 This? Sorry. 00:19:48.270 --> 00:19:50.090 Is it full screen? 00:19:50.217 --> 00:19:52.129 Yep. Full screen. 00:19:54.747 --> 00:19:56.289 Well, good work. 00:19:58.388 --> 00:19:59.434 [Strike.] 00:19:59.497 --> 00:20:02.312 Yeah, so, okay. Thank you. 00:20:04.752 --> 00:20:07.062 So, hi, I'm Thibaud Senalada. 00:20:07.062 --> 00:20:08.952 As [inaudible] introduced me. 00:20:09.552 --> 00:20:14.212 I'm a software engineer at the French National Library. 00:20:14.992 --> 00:20:18.349 And I'm here today to talk to you about NOEMI, 00:20:18.979 --> 00:20:23.682 which is a software, a proof of concept, 00:20:23.682 --> 00:20:26.501 and a [inaudible] software 00:20:26.635 --> 00:20:29.961 to the French Library to cataloging. 00:20:30.787 --> 00:20:32.870 Sorry. [inaudible]. 00:20:32.870 --> 00:20:35.359 Sorry for my English. It's a bit of fuzzy. 00:20:36.971 --> 00:20:39.321 And so, what's NOEMI? 00:20:39.321 --> 00:20:41.589 So, NOEMI stands for: 00:20:41.589 --> 00:20:44.591 Nouer les oeuvres, expressions, Manifestations et Items. 00:20:44.591 --> 00:20:46.533 Which, in English, is: 00:20:46.533 --> 00:20:49.891 to link work, expression, manifestation, and items. 00:20:51.086 --> 00:20:58.057 It's based on the FRBR, 00:20:58.057 --> 00:21:00.633 and [inaudible]. 00:21:00.881 --> 00:21:03.105 Yeah. Anyway. 00:21:03.631 --> 00:21:04.839 So, yeah. 00:21:05.244 --> 00:21:09.540 So, this software, we use to produce metadata. 00:21:10.841 --> 00:21:12.201 It will be used 00:21:12.201 --> 00:21:17.831 by 600 people on a daily basis. 00:21:18.911 --> 00:21:24.271 And as I say in the title, it will be based on Wikibase. 00:21:25.415 --> 00:21:31.871 So, there is also a format manager. 00:21:32.388 --> 00:21:39.138 So, people using this software will use like a code editor, 00:21:39.254 --> 00:21:41.817 but for MARC format. 00:21:41.968 --> 00:21:45.178 So, it's [inaudible], things like that. 00:21:46.814 --> 00:21:49.868 A data processing tool, like I said. 00:21:49.959 --> 00:21:53.040 And also, authorization management, 00:21:54.327 --> 00:21:56.378 because they will need a-- 00:21:57.337 --> 00:22:01.417 if there is some data, where it can be modified. 00:22:05.877 --> 00:22:07.840 So, the PoC context. 00:22:08.728 --> 00:22:12.738 So, this software will be replacing an old software, 00:22:12.855 --> 00:22:15.688 called ADCAT02. 00:22:17.111 --> 00:22:20.964 It is part of the bibliographic transition. 00:22:20.984 --> 00:22:24.554 So, I say the [inaudible]. 00:22:25.359 --> 00:22:29.390 [inaudible]. [inaudible] in English? 00:22:30.254 --> 00:22:31.662 Format. 00:22:32.717 --> 00:22:35.734 And it will be the [inaudible] of the-- 00:22:39.979 --> 00:22:41.090 Sorry. 00:22:42.349 --> 00:22:46.560 It will be [inaudible] all the [inaudible] 00:22:46.560 --> 00:22:49.689 of the BnF with data. 00:22:51.731 --> 00:22:54.124 And so, doing this work, 00:22:54.124 --> 00:22:59.693 we accessed Wikibase to see if it fits our needs. 00:23:01.244 --> 00:23:03.383 And [inaudible] pretty good. 00:23:04.485 --> 00:23:06.930 So, why Wikibase? 00:23:06.930 --> 00:23:08.821 Because of the flexibility of the format. 00:23:08.835 --> 00:23:11.646 We arrive-- 00:23:11.850 --> 00:23:16.388 to inject MARC, INTERMARC for BnF-- 00:23:16.960 --> 00:23:18.350 in the database. 00:23:18.399 --> 00:23:22.803 And use it to-- use this link management 00:23:22.803 --> 00:23:25.529 between entities using Blazegraph, 00:23:25.529 --> 00:23:27.776 so, as Wikibase does. 00:23:29.155 --> 00:23:32.700 We also choose Wikibase, because it was already-- 00:23:35.183 --> 00:23:38.900 it handles history and user account. 00:23:39.941 --> 00:23:42.414 So, it's easiest for us. 00:23:43.106 --> 00:23:48.270 And it also has a good-- it's pretty easy to create bots 00:23:48.270 --> 00:23:51.090 to watch and curate data 00:23:51.840 --> 00:23:53.430 and also to make statistics. 00:23:54.820 --> 00:23:57.170 It's free and open, and sustainable. 00:23:57.908 --> 00:23:59.084 Yeah, so. 00:23:59.610 --> 00:24:02.519 I'm sorry if you don't understand what I say, 00:24:02.519 --> 00:24:04.839 because I know my English is not that good. 00:24:07.720 --> 00:24:12.139 But during this PoC, we encountered some trouble. 00:24:12.802 --> 00:24:13.938 Okay. 00:24:14.790 --> 00:24:21.117 First of all, as a search engine, I think we have to create 00:24:21.117 --> 00:24:24.150 another-- 00:24:24.185 --> 00:24:28.988 not another, a supplementary search engine to use it with, 00:24:29.433 --> 00:24:31.120 to fit our needs. 00:24:31.688 --> 00:24:37.155 Because we need some search 00:24:37.155 --> 00:24:42.366 like faceted search and filters. 00:24:43.755 --> 00:24:47.525 Also we have the [inaudible], 00:24:47.525 --> 00:24:50.407 of using postgreSQL database. 00:24:50.407 --> 00:24:54.885 And for the moment, I think Wikibase [inaudible]. 00:24:56.436 --> 00:25:01.266 And when we try to use postgreSQL, it was a bit difficult, 00:25:01.266 --> 00:25:04.394 and will cause some issues. 00:25:05.662 --> 00:25:08.825 And we have also some fear about performance, 00:25:08.825 --> 00:25:15.238 because the catalog is about 20 million entities, 00:25:16.366 --> 00:25:19.146 20 million bibliographic entities. 00:25:19.146 --> 00:25:22.851 That can be more than 20 million entities, actually. 00:25:23.276 --> 00:25:27.771 And we don't know the time that we'll have to inject them 00:25:27.809 --> 00:25:30.765 in the Wikibase, and how to do it. 00:25:32.198 --> 00:25:34.267 So, [inaudible], 00:25:34.324 --> 00:25:39.616 but the real software development has already started. 00:25:43.242 --> 00:25:46.175 We start by creating an interface with Wikibase. 00:25:46.261 --> 00:25:47.711 We're using Java. 00:25:48.091 --> 00:25:50.093 Like PyWikibase. 00:25:51.691 --> 00:25:54.888 - (man) Pywikibot. - Pywikibot. Yeah, thank you. 00:25:56.027 --> 00:25:57.723 The same way, but in Java. 00:25:59.309 --> 00:26:02.831 We also inject already the format into the Wikibase. 00:26:03.540 --> 00:26:09.093 And we do something like the INTERMARC editor, 00:26:09.458 --> 00:26:12.134 [inaudible], et cetera. 00:26:13.672 --> 00:26:14.926 Thank you. 00:26:15.333 --> 00:26:17.135 (applause) 00:26:23.527 --> 00:26:24.749 Yeah. 00:26:27.748 --> 00:26:29.813 (man 2) Faceted search will be a nice feature 00:26:29.813 --> 00:26:31.885 in the Wikidata UI itself. 00:26:31.924 --> 00:26:34.062 So, have you talked to any of the developers, 00:26:34.062 --> 00:26:35.675 or is that something that could be done? 00:26:35.711 --> 00:26:37.108 Sorry, I don't understand. 00:26:37.108 --> 00:26:39.041 (man 2) The faceted search idea. 00:26:39.911 --> 00:26:41.982 It would be nice to be able to search only humans, 00:26:41.982 --> 00:26:44.221 or search only works, or something, right? 00:26:44.321 --> 00:26:47.991 Yeah. I'm sorry, I don't-- I don't-- 00:26:48.131 --> 00:26:50.436 (man 2) Yeah, I mean, so, it would be nice if we had that 00:26:50.436 --> 00:26:52.265 in Wikidata itself in the UI. 00:26:52.822 --> 00:26:53.954 Yeah, yeah, yeah. 00:26:54.088 --> 00:26:56.077 [inaudible] 00:26:56.077 --> 00:26:57.911 Yeah, okay, thank you. 00:26:57.911 --> 00:27:00.026 I'm sorry. (laughs) 00:27:01.186 --> 00:27:03.902 Yeah, yeah. But I think we will-- 00:27:04.506 --> 00:27:07.266 I don't know if we want to do it inside Wikibase, 00:27:07.266 --> 00:27:10.746 or in our next systems. 00:27:10.785 --> 00:27:15.186 For the moment, we don't really solve that. 00:27:15.965 --> 00:27:17.885 For the moment, I think. 00:27:17.885 --> 00:27:19.285 Sorry. 00:27:27.645 --> 00:27:30.644 (man 3) I suppose on the topic of the faceted search, 00:27:32.535 --> 00:27:35.068 Wikidata, SPARQL Query, Wikibase-- 00:27:35.068 --> 00:27:38.965 SPARQL Query is I think, functionally equivalent 00:27:38.965 --> 00:27:41.405 to a facetable search. 00:27:42.105 --> 00:27:44.234 So, it's mostly an interface issue, right? 00:27:44.284 --> 00:27:47.791 I mean, you could build an interface that starts with a query, 00:27:47.791 --> 00:27:51.111 and then, gives you possible facets to filter by. 00:27:51.370 --> 00:27:52.660 And when you click one of them, 00:27:52.660 --> 00:27:55.217 it adds a condition to the SPARQL Query, right? 00:27:55.664 --> 00:27:58.183 Yeah, but I think the SPARQL-- 00:27:59.157 --> 00:28:04.310 they don't go as detailed as we want, as we have-- 00:28:05.632 --> 00:28:09.631 When we inject the format, we use a statement for-- 00:28:10.525 --> 00:28:13.124 the format is like XML. 00:28:13.223 --> 00:28:15.842 So, it's a zone, subzone, and value. 00:28:16.413 --> 00:28:20.292 And in the [inaudible] statement, we add the subzone, 00:28:20.892 --> 00:28:22.902 because the zone was already there. 00:28:23.002 --> 00:28:28.565 And we want to query some qualifier on this. 00:28:28.659 --> 00:28:35.206 And I don't know if the SPARQL goes through that-- I'm sorry-- 00:28:36.145 --> 00:28:38.277 in a fast way. 00:28:40.025 --> 00:28:46.285 I think we need some index for us to [inaudible]. 00:28:46.925 --> 00:28:48.145 Yeah. 00:28:48.145 --> 00:28:50.250 (man 3) SPARQL doesn't do a query-- 00:28:52.321 --> 00:28:55.703 To do proper string searches in SPARQL is very hard. 00:28:55.703 --> 00:28:57.610 You have to have filters, which are slow, 00:28:57.610 --> 00:28:59.815 and it really doesn't work that well. 00:28:59.815 --> 00:29:02.845 So, it's a different search problem, really. 00:29:06.871 --> 00:29:09.270 More question? If anyone has one? 00:29:12.215 --> 00:29:13.999 - Great. Thank you. - Thank you. 00:29:14.044 --> 00:29:15.895 (applause) 00:29:37.766 --> 00:29:41.960 (host) Nielsen speaking about the tool Ordia. Thank you. 00:30:05.084 --> 00:30:06.460 So, I'm Finn Årup Nielsen, 00:30:06.460 --> 00:30:09.006 and a couple of years ago, I started Scholia 00:30:09.006 --> 00:30:14.611 that displays data from Wikidata via a SPARQL Query 00:30:14.611 --> 00:30:16.359 to the Wikidata Query Service 00:30:16.359 --> 00:30:18.959 so we can generate, for example, a list of publications 00:30:18.959 --> 00:30:20.380 for a specific author. 00:30:20.866 --> 00:30:26.941 Now, last year, Wikidata introduced lexicographic data. 00:30:29.332 --> 00:30:32.655 And I [inaudible] the idea of Scholia 00:30:32.655 --> 00:30:39.279 that is using Wikidata and the Wikidata Query Service 00:30:39.445 --> 00:30:42.036 to generate overviews of lexicographic data. 00:30:42.585 --> 00:30:46.125 So, Ordia is the example of this one here. 00:30:46.197 --> 00:30:51.998 So, it generates-- it's a web application run from the Toolforge service, 00:30:51.998 --> 00:30:57.198 and for example, it will dynamically generate a page such as-- 00:30:57.234 --> 00:31:01.768 This one here is statistics over what there is of lexicographic data 00:31:01.768 --> 00:31:03.841 in Wikidata. 00:31:03.992 --> 00:31:07.404 For example, the number of lexemes, is currently over 200,000. 00:31:08.664 --> 00:31:10.483 So, there's a range of things you can do here. 00:31:10.483 --> 00:31:12.916 You can, for example, look in the aspects of that. 00:31:12.916 --> 00:31:15.560 The menu, there's quite a lot of things here. 00:31:15.560 --> 00:31:18.485 And so, I will search on a specific Danish lexemes. 00:31:19.503 --> 00:31:22.835 "Rød"-- which is "red" in Danish. 00:31:23.376 --> 00:31:27.466 So, you basically get, for the specific lexeme, 00:31:28.286 --> 00:31:30.618 the same type of information that you could see 00:31:30.618 --> 00:31:33.751 in the ordinary part of Wikidata, here. 00:31:34.451 --> 00:31:38.256 Annotations about the lexeme, annotation about the forms, 00:31:39.359 --> 00:31:40.872 single or plural forms. 00:31:41.548 --> 00:31:43.501 Annotation about the sentence. 00:31:44.683 --> 00:31:47.678 But what you can't see in ordinary Wikidata 00:31:47.678 --> 00:31:52.150 is sort of aggregating across lexemes. 00:31:52.246 --> 00:31:54.207 And this is, for example, down here-- 00:31:54.207 --> 00:31:55.902 down here with the compound. 00:31:55.902 --> 00:31:57.764 So, in Danish, like in German, 00:31:57.764 --> 00:31:59.950 words can be compounded. 00:31:59.950 --> 00:32:03.478 For example, for "red", we have rødkælk 00:32:03.478 --> 00:32:05.830 which is compounded by two words. 00:32:06.721 --> 00:32:10.085 And we've got, on the second one here, rødvin-- red wine. 00:32:11.060 --> 00:32:15.691 This list here is constructed by a SPARQL Query to the Wikidata Service. 00:32:16.751 --> 00:32:20.406 And also, further down here, we've got a lot of Danish words here. 00:32:20.970 --> 00:32:26.122 Further down here, we should have a graph of the words 00:32:27.426 --> 00:32:29.164 which are compounded from rød. 00:32:29.658 --> 00:32:31.980 We have [rød]-- red here in the middle. 00:32:31.980 --> 00:32:34.372 And for example, around-- somewhere around here, 00:32:34.372 --> 00:32:36.895 which should have, for example, "red cabbage," 00:32:36.936 --> 00:32:40.343 "red cabbage salad," "red cabbage soup," and so on. 00:32:40.434 --> 00:32:43.055 So you can browse around, in this one here, and see it. 00:32:44.204 --> 00:32:51.188 We can go a bit back here, and then look on the main sense 00:32:51.388 --> 00:32:55.030 of the word rød-- red in Danish. 00:32:55.550 --> 00:33:01.610 So, Ordia automatically generates information about hyponyms. 00:33:02.570 --> 00:33:04.400 Subconcepts, for example, 00:33:04.400 --> 00:33:07.400 light red, dark red, pink, purple, and so on, 00:33:07.525 --> 00:33:14.272 are in the-- when we make a Wikidata Query service, SPARQL Query. 00:33:14.576 --> 00:33:20.570 Then we go around in the Wikidata graph, 00:33:20.626 --> 00:33:22.266 and get this information here. 00:33:22.266 --> 00:33:24.786 And we can also get translation automatically, 00:33:24.786 --> 00:33:28.316 even though it's not necessarily stated within the Wikidata lexemes items. 00:33:28.316 --> 00:33:32.679 For example, here, we have translated rød to "red" in English, 00:33:32.679 --> 00:33:36.089 and röd in Swedish, and so on. 00:33:36.107 --> 00:33:38.191 There's not that very many there. 00:33:38.747 --> 00:33:40.262 There's a range of other things here. 00:33:40.262 --> 00:33:43.487 Let me show you, for example, this one here-- 00:33:44.387 --> 00:33:51.308 this is veninde- now I go over to this one here. 00:33:54.308 --> 00:33:57.328 -inde, which is a feminine suffix. 00:33:58.058 --> 00:34:00.498 So, this is auto-generated there, 00:34:00.498 --> 00:34:02.641 it's a combination of "instance of"-- 00:34:03.268 --> 00:34:07.171 lexemes that are "instance of" feminine suffixes. 00:34:08.142 --> 00:34:11.519 And for example, for German, we have [inaudible]. 00:34:11.519 --> 00:34:15.373 So, -in would be a feminine suffix in German. 00:34:15.704 --> 00:34:21.291 And I put in sort of the five Danish feminine suffixes 00:34:22.571 --> 00:34:24.206 of Danish. 00:34:25.480 --> 00:34:29.106 Another facility is, for example, if you have a text, 00:34:29.106 --> 00:34:34.021 you can copy and paste it into this Text to lexemes here. 00:34:34.571 --> 00:34:35.911 Let me-- 00:34:37.482 --> 00:34:41.218 "a car crashed into... 00:34:41.864 --> 00:34:44.141 a green house." 00:34:46.485 --> 00:34:48.701 Let me change that to "English". 00:34:49.006 --> 00:34:50.029 Press Submit. 00:34:50.029 --> 00:34:53.355 Now, Ordia will then extract each of the word here, 00:34:53.355 --> 00:34:54.733 in this sentence here, 00:34:54.733 --> 00:34:58.217 and try to see whether they are entered in the specific form, 00:34:58.217 --> 00:35:00.778 a lexeme, are entered in Wikidata. 00:35:00.778 --> 00:35:04.228 And these simple words here are entered in Wikidata. 00:35:04.228 --> 00:35:09.190 But if we, for example, change it to-- there's nothing called "vancar" 00:35:09.190 --> 00:35:13.998 but just let us do that here. 00:35:14.535 --> 00:35:19.532 And you got down here-- it's as a blue link 00:35:20.335 --> 00:35:23.295 that you can create a new Wikidata lexeme item. 00:35:24.556 --> 00:35:29.097 But the range of other things to explore 00:35:29.716 --> 00:35:31.496 in this web application. 00:35:31.496 --> 00:35:35.596 And if there's any suggestions, or comments, or notes, or something, 00:35:35.596 --> 00:35:39.337 you can contact me, or put in an issue on GitHub. 00:35:39.337 --> 00:35:44.856 So, this particular application is developed on GitHub, 00:35:44.856 --> 00:35:50.526 and I'm open for new ideas and ways to represent information there. 00:35:51.306 --> 00:35:52.701 Okay, thank you. 00:35:52.701 --> 00:35:54.661 (applause) 00:35:59.328 --> 00:36:00.906 Questions? 00:36:03.262 --> 00:36:04.524 (woman 3) I love your tool. 00:36:04.524 --> 00:36:09.752 Can you show the languages, that which is awesome for me, I think, 00:36:09.752 --> 00:36:11.731 to show other languages. 00:36:12.183 --> 00:36:14.537 So, this is a bit of statistics over the languages, 00:36:14.537 --> 00:36:17.046 and the Russians have been scraping Wictionary, 00:36:17.046 --> 00:36:20.327 and that's why they have now 100,000 lexemes. 00:36:24.387 --> 00:36:28.088 There's also a lot of work on Basque here. 00:36:29.566 --> 00:36:32.241 I think there's an organization putting that information in here. 00:36:32.241 --> 00:36:34.932 And you can also see a graph of these-- 00:36:34.932 --> 00:36:37.662 this is Number of forms as functions of number of lexemes. 00:36:38.798 --> 00:36:41.279 And all the way up here-- 00:36:41.279 --> 00:36:45.255 here, this is Russian, down here, Basque, I think. 00:36:45.476 --> 00:36:47.997 And English, perhaps, down here. 00:36:48.953 --> 00:36:50.692 And also in the Number of senses, 00:36:52.473 --> 00:36:58.360 I think Basque, English, and Russian, 00:37:00.184 --> 00:37:02.048 Hebrew, and so on. 00:37:02.048 --> 00:37:03.343 Yeah. 00:37:11.045 --> 00:37:12.950 (man 4) That looks like an incredible tool. 00:37:12.950 --> 00:37:15.097 But I was just wondering, is it all fully live? 00:37:15.097 --> 00:37:18.344 Is it all based on SPARQL Queries and live or are there some things-- 00:37:18.344 --> 00:37:20.458 - Yes. I believe, yes. - Fantastic. 00:37:20.511 --> 00:37:24.961 But as they get more data into Wikidata, 00:37:24.961 --> 00:37:26.100 there's a bit of an issue. 00:37:26.100 --> 00:37:27.328 For example, for Russian here. 00:37:27.328 --> 00:37:31.966 I started out this a year ago when there's not that very many lexemes, 00:37:32.061 --> 00:37:35.503 and so there was no problems with the time-outs. 00:37:35.503 --> 00:37:38.367 But representing it here-- 00:37:38.367 --> 00:37:42.268 but if I press Russian, I think there might be some issues. 00:37:42.268 --> 00:37:44.284 There's a count that works here, 00:37:44.284 --> 00:37:46.101 for example, longest words or phrases. 00:37:46.101 --> 00:37:49.252 But I think the lexemes are sort of loading in. 00:37:49.252 --> 00:37:52.727 I think I'll need to fix that as Wikidata grows here. 00:37:53.258 --> 00:37:55.927 As you see, there's a lot of Russian nouns, apparently. 00:37:56.699 --> 00:37:58.451 And I don't know whether the-- 00:37:59.351 --> 00:38:01.519 apparently, that's what they're working on. 00:38:01.573 --> 00:38:03.960 There seems also to be a bit of time-out there. 00:38:06.705 --> 00:38:08.033 [inaudible], oh, yes. 00:38:08.115 --> 00:38:09.984 The first one there. 00:38:10.832 --> 00:38:16.110 But apparently, the longest words and phrases is a bit too expansive. 00:38:17.931 --> 00:38:20.334 But apparently, it can be loaded there, and it's probably-- 00:38:21.318 --> 00:38:23.167 it's loaded all the 100,000 there, 00:38:23.167 --> 00:38:27.938 so you can click all 10,000 pages. 00:38:36.748 --> 00:38:38.678 (host) If there aren't any other questions-- 00:38:39.564 --> 00:38:40.950 The longest word came now. 00:38:40.950 --> 00:38:43.146 So, it's, yeah. 00:38:44.972 --> 00:38:46.390 Probably-- 00:38:47.855 --> 00:38:49.975 [inaudible] 00:38:50.321 --> 00:38:51.540 What is that? 00:38:51.540 --> 00:38:53.518 - (audience) It's a chemical. - A chemical, yes. 00:38:56.317 --> 00:38:58.303 (host) More questions? Or shall we? 00:38:59.792 --> 00:39:02.332 Alright, alright. Thank you very much. 00:39:02.332 --> 00:39:04.392 (applause) 00:39:23.642 --> 00:39:25.121 (Nicolas) Is it good? 00:39:31.008 --> 00:39:32.346 (host) Awesome. 00:39:34.920 --> 00:39:38.137 Alright, now, to wrap it up, we have Nicolas Vigneron, 00:39:38.137 --> 00:39:40.778 talking about Wikisource and Wikidata. 00:39:41.469 --> 00:39:42.804 (Nicolas) This is good? 00:39:44.542 --> 00:39:46.126 Who knows Wikisource? 00:39:47.582 --> 00:39:48.959 Yay! 00:39:50.740 --> 00:39:53.582 More and more people raising hands every year. 00:39:53.582 --> 00:39:54.957 That's good. 00:39:55.282 --> 00:40:01.462 So, this morning, [Lydia] said that Wikivoyage was the first real user of-- 00:40:03.306 --> 00:40:05.987 [inaudible] 00:40:06.572 --> 00:40:08.347 Wikisource is not that far behind. 00:40:09.230 --> 00:40:13.280 There's a lot to do, and I want to do some basic numbers, 00:40:13.280 --> 00:40:16.964 statistics, about where we are, and where I want to go. 00:40:17.613 --> 00:40:23.409 So first, there will be a lot of questions of what is a book, 00:40:23.409 --> 00:40:25.389 what is bibliographical data. 00:40:25.389 --> 00:40:27.229 People from the BnF can agree with me. 00:40:27.229 --> 00:40:29.969 That can be a nightmare if you go into details. 00:40:30.164 --> 00:40:35.803 But some big numbers that-- Google Books tried to do an estimation 00:40:35.803 --> 00:40:39.676 on how many "books," air quote books, there is in the world, 00:40:39.676 --> 00:40:43.005 and there's 130 million books in the world. 00:40:43.705 --> 00:40:47.279 And, yeah, let's put them all on Wikidata. 00:40:47.650 --> 00:40:49.300 Or not. I don't know. 00:40:49.392 --> 00:40:51.049 But where are we now? 00:40:51.413 --> 00:40:52.468 And why is it books? 00:40:52.468 --> 00:40:55.668 Because for Google Books, everything is scanned, basically. 00:40:55.795 --> 00:40:58.670 They don't have exactly a very clear distinction. 00:40:59.400 --> 00:41:04.350 There's sometimes, two-page books, which [inaudible], Google Books is a book. 00:41:04.714 --> 00:41:10.131 But for many people, you have to have at least 50 pages to be a book. 00:41:10.536 --> 00:41:12.321 So, that's always hard to count. 00:41:12.885 --> 00:41:15.603 But here's what we know on Wikidata. 00:41:15.603 --> 00:41:18.704 This the graph of what is a book for Wikidata. 00:41:18.704 --> 00:41:21.524 You have-- that's totally [inaudible]-- 00:41:21.524 --> 00:41:23.979 but that's Wikidata, literary work as well. 00:41:23.979 --> 00:41:27.194 And this is all the subclasses, or subclasses of subclasses-- 00:41:27.194 --> 00:41:30.334 or subclasses of subclasses of what is a book. 00:41:30.804 --> 00:41:32.705 So, that's very hard to do. 00:41:32.737 --> 00:41:34.253 I can do a graph like that, 00:41:34.253 --> 00:41:36.833 but SPARQL Query engine doesn't work 00:41:36.833 --> 00:41:41.523 if I want to count everything that is instance of these subclasses, 00:41:41.523 --> 00:41:45.143 and basically, SPARQL says no, time-out. 00:41:45.633 --> 00:41:47.020 So, what's the problem? 00:41:47.020 --> 00:41:50.713 But I know already that there's a lot of subclasses, 00:41:50.713 --> 00:41:52.153 but we need to look into it. 00:41:52.153 --> 00:41:57.943 And probably, if you know Wikidata, on the page, Wikidata point statistics, 00:41:58.643 --> 00:42:02.647 you have all the numbers by big classes, 00:42:02.647 --> 00:42:07.047 and you all probably know that the big chunk here 00:42:07.047 --> 00:42:08.642 is scholarly articles, 00:42:08.707 --> 00:42:12.749 which is, thanks to the WikiCite project, in particular, 00:42:14.113 --> 00:42:17.125 which can be books or not, depending on definition. 00:42:19.062 --> 00:42:22.508 You see that there's no subclass books, 00:42:23.032 --> 00:42:26.034 because there's not enough to show. 00:42:26.049 --> 00:42:28.472 It's probably somewhere in the others, 00:42:28.472 --> 00:42:30.127 the purple area is others. 00:42:30.163 --> 00:42:34.115 And there's a lot of things that's under one percent. 00:42:34.162 --> 00:42:38.821 So, basically, we can say that we have less one percent 00:42:38.821 --> 00:42:42.131 of things identified as books in Wikidata. 00:42:42.551 --> 00:42:46.091 Maybe there is more books, but not identified as such. 00:42:47.842 --> 00:42:49.284 I'm talking about books, 00:42:49.383 --> 00:42:51.768 but when we are talking about bibliographical data, 00:42:51.768 --> 00:42:53.920 there's also the author, person, 00:42:53.920 --> 00:42:58.472 so maybe some of the human here are also authors, surely. 00:43:00.068 --> 00:43:03.221 And we need to do another count, which is another big query to do. 00:43:03.602 --> 00:43:05.301 That times out, so-- 00:43:05.396 --> 00:43:08.015 I have a lot of not number to this, sorry. 00:43:10.619 --> 00:43:14.332 So, yeah, basically, this first slide is about how it's complicated 00:43:14.332 --> 00:43:19.122 to know how much we have of what, and how to count them. 00:43:19.445 --> 00:43:21.091 So, yeah, hard to count. 00:43:21.618 --> 00:43:23.280 What we know-- 00:43:24.133 --> 00:43:26.618 that is we have a lot of properties-- 00:43:27.185 --> 00:43:29.684 700,000, I guess, 00:43:30.208 --> 00:43:31.680 now on Wikidata. 00:43:32.593 --> 00:43:35.952 We know that we have a lot of identifiers among these properties. 00:43:36.721 --> 00:43:42.538 And we know that almost 4,000 are properties for identifiers 00:43:43.146 --> 00:43:45.623 relative to bibliographical, 00:43:45.737 --> 00:43:49.862 like ID at the National Library of France, 00:43:49.862 --> 00:43:52.251 National Library of Yaddi, Yaddi, Yada, 00:43:52.251 --> 00:43:56.681 because we love identifier of National Library on Wikidata. 00:43:56.681 --> 00:44:00.271 So, we have almost all libraries, national libraries and more. 00:44:01.101 --> 00:44:03.796 So, we have a lot of properties. I know that. 00:44:05.071 --> 00:44:06.727 And we are widely used. 00:44:06.834 --> 00:44:10.053 I know that, for instance, BnF properties use-- 00:44:10.579 --> 00:44:12.772 BnF is National Library of France-- 00:44:12.772 --> 00:44:18.989 is used 1 million times-- OCOC, VIAF, or the big like that. 00:44:21.001 --> 00:44:24.202 A lot of uses in Wikidata. 00:44:25.426 --> 00:44:28.980 But it's not because we have a lot of uses of various properties 00:44:28.980 --> 00:44:30.666 in Wikidata that it's complete. 00:44:31.266 --> 00:44:33.758 As Thibaud said, there's more than 20 million books, 00:44:33.758 --> 00:44:37.099 [inaudible], which is more as entities. 00:44:37.837 --> 00:44:39.569 And we have only 1 million, 00:44:39.569 --> 00:44:43.538 so we have 19 million still to do. 00:44:45.177 --> 00:44:47.276 Also, what we know from the Wikidata side, 00:44:47.276 --> 00:44:51.918 is that we have a good-- very quite active Wikidata project, 00:44:51.918 --> 00:44:53.840 called WikiProject Books, 00:44:54.332 --> 00:44:58.127 where we have a model we kind of agree on, 00:44:58.181 --> 00:45:00.916 which is not always followed, which is, again, a problem. 00:45:00.956 --> 00:45:02.710 What is a book? You know it. 00:45:03.414 --> 00:45:05.385 I only have five minutes, so, I'll keep going. 00:45:06.090 --> 00:45:08.880 And then, I'm a Wikisourcean, so, Wikisourcer. 00:45:09.426 --> 00:45:11.930 So, I wanted to know the other way around 00:45:11.930 --> 00:45:13.496 what is from Wikisource already, 00:45:13.496 --> 00:45:16.406 because Wikisource is already inside the Wikimedia project. 00:45:16.406 --> 00:45:19.883 A lot of bibliographical records and information. 00:45:19.883 --> 00:45:23.161 So, in the 66 million items on Wikidata, 00:45:23.161 --> 00:45:28.850 already 1 million are linked to Wikisource. 00:45:29.330 --> 00:45:31.890 [inaudible]. 00:45:32.350 --> 00:45:36.080 So, that's very few, but that's quite a lot. 00:45:37.496 --> 00:45:40.174 There's a lot of author. 00:45:40.174 --> 00:45:44.670 There's some books, texts, work, edition, whatever. 00:45:45.271 --> 00:45:48.425 Not always well-arranged. 00:45:48.869 --> 00:45:50.600 And there's a lot of internal pages, 00:45:50.600 --> 00:45:53.150 like categories and templates, and things like that. 00:45:53.194 --> 00:45:54.984 But still, 1 million in total. 00:45:58.329 --> 00:46:01.767 The Wikisource community are often small communities, 00:46:01.767 --> 00:46:05.010 like on the French community Wikisource, 00:46:05.010 --> 00:46:07.537 which is one of the biggest, there's 50 people. 00:46:07.537 --> 00:46:08.787 That's the biggest we have. 00:46:09.047 --> 00:46:12.937 So, we love Wikidata, because, hey, they did a lot of work for us. 00:46:12.942 --> 00:46:15.131 So, just take it from Wikisource. 00:46:15.131 --> 00:46:19.885 So, in this small community, we love to reuse Wikidata data. 00:46:20.935 --> 00:46:24.076 Right now, we use a lot of a tool which is called WEF-- 00:46:24.358 --> 00:46:27.978 Wikidata Edit Framework-- thank you. 00:46:29.318 --> 00:46:33.098 And we are eager to see how Wikidata Bridge will work. 00:46:33.438 --> 00:46:36.798 And we are trying to do things with a team in Wikidata 00:46:37.638 --> 00:46:40.678 in Wikipedia Deutschland team, [inaudible]. 00:46:41.007 --> 00:46:43.934 And there's a lot of collaboration in the future 00:46:43.934 --> 00:46:46.586 that we want to do: better integrate, 00:46:47.636 --> 00:46:51.068 do everything in one click when you import a first book in Wikisource, 00:46:51.068 --> 00:46:52.465 things like that. 00:46:53.364 --> 00:46:57.664 Better-- do links between edition in Wikidata. 00:46:57.852 --> 00:46:59.492 That needs to be done. 00:47:00.041 --> 00:47:02.282 The Foundation is doing the wish list now, 00:47:02.282 --> 00:47:04.853 and we have a lot of requests about that. 00:47:05.938 --> 00:47:07.342 And yeah, that's it. 00:47:07.342 --> 00:47:09.116 That was just a short overview. 00:47:09.116 --> 00:47:15.272 So, if you have some questions, I'll take them and be available later, 00:47:15.712 --> 00:47:17.112 if you want to. 00:47:17.723 --> 00:47:19.722 (applause) 00:47:25.639 --> 00:47:28.281 Come on, you love Wikisource, you have questions! 00:47:33.989 --> 00:47:35.775 (woman 4) I asked you already this in August, 00:47:35.775 --> 00:47:38.411 and I wonder if this has already changed. 00:47:38.411 --> 00:47:42.337 What is the biggest problem you have in Wikisource right now, 00:47:42.337 --> 00:47:43.761 from your perspective? 00:47:44.167 --> 00:47:45.670 The first one, only? (chuckles) 00:47:48.314 --> 00:47:54.152 I think because it's a small community, we need efficient tools that work easily, 00:47:54.152 --> 00:47:57.148 because we have very few people, 00:47:57.148 --> 00:47:59.464 so we need tool that are easy to use 00:47:59.464 --> 00:48:04.247 and a one-click solution to [inaudible] a bit, 00:48:04.371 --> 00:48:05.607 that's a big dream. 00:48:05.607 --> 00:48:07.179 I think that's what's most important, 00:48:07.179 --> 00:48:10.485 because that's the threshold in Wikisource, it's a small community. 00:48:11.204 --> 00:48:13.241 I think this is the most important. 00:48:14.615 --> 00:48:15.975 [inaudible] 00:48:16.867 --> 00:48:19.600 (man 5) I'm curious if you can speak to your opinion, 00:48:19.600 --> 00:48:23.154 or the French Wikisource opinion, or maybe you spoke to other communities 00:48:23.154 --> 00:48:29.834 about the notion of not including metadata about all the world's books. 00:48:30.234 --> 00:48:31.635 That was mentioned in the morning. 00:48:31.635 --> 00:48:34.965 Maybe other Wikibases, and other federated databases 00:48:34.965 --> 00:48:38.026 will have that information, and Wikidata won't. 00:48:39.159 --> 00:48:41.494 How does that feel for Wikisource? 00:48:43.981 --> 00:48:45.502 This is my very personal opinion. 00:48:45.502 --> 00:48:47.386 I know that people in the Wikisource community 00:48:47.386 --> 00:48:48.723 disagree with that. 00:48:48.723 --> 00:48:50.537 But I think we need to stay-- 00:48:50.537 --> 00:48:53.194 an external Wikibase is not a good solution, 00:48:53.194 --> 00:48:55.353 because we have Shakespeare on Wikisource, 00:48:55.353 --> 00:48:58.323 and we have Shakespeare on Wikipedia. 00:48:58.564 --> 00:49:01.295 So, we need to interlink, and interlink is there. 00:49:01.295 --> 00:49:04.007 Or like, Romeo and Juliet, we have them both. 00:49:04.007 --> 00:49:07.229 So, we are still pretty close to Wikipedia. 00:49:07.433 --> 00:49:09.431 And the difference with WikiCites-- 00:49:09.431 --> 00:49:12.515 with WikiCite, we have a lot of items which are small. 00:49:14.372 --> 00:49:16.051 Wikisource is the other way around. 00:49:16.150 --> 00:49:18.281 We have few items, who are big. 00:49:18.281 --> 00:49:20.515 Which can be a scaling problem and everything, 00:49:20.515 --> 00:49:23.615 but it's quite a small subset of data. 00:49:23.683 --> 00:49:27.539 So, my personal opinion is we should stay in the Wikidata. 00:49:28.391 --> 00:49:32.117 Again, because we are not very much a lot of people, 00:49:32.117 --> 00:49:34.287 so we need to stay, with the tool we know, 00:49:34.287 --> 00:49:35.846 don't change too much the tools 00:49:35.846 --> 00:49:37.736 for the small community, please. 00:49:37.769 --> 00:49:39.282 So, that's it. 00:49:39.282 --> 00:49:40.910 But I know that other people disagree. 00:49:40.910 --> 00:49:44.579 You can talk to [Sadeep] if you want. He will have another point of view. 00:49:46.119 --> 00:49:49.319 Thank you. I think, last question, maybe. 00:49:51.234 --> 00:49:54.446 (man 6) Sometimes, I find it difficult to link the Wikidata item 00:49:54.446 --> 00:50:00.976 with a Wikisource article, because there's a Wikisource novel-- 00:50:01.079 --> 00:50:06.128 might be split over several pages, and there's an index page, 00:50:06.128 --> 00:50:08.853 and there's perhaps a front page, or something like that. 00:50:08.853 --> 00:50:12.053 Do you have that problem, or is that a general problem, or-- 00:50:12.092 --> 00:50:16.892 Yeah, that's one of the first ideas on the wish list 00:50:16.892 --> 00:50:19.092 for the Foundation, actually. 00:50:19.092 --> 00:50:20.790 Yeah, because Wikipedia is on the-- 00:50:20.790 --> 00:50:22.772 if you know the [inaudible] organization, 00:50:22.772 --> 00:50:26.598 Wikipedia is on the work level, and Wikisource on the edition level. 00:50:26.598 --> 00:50:28.572 So, already, you have a problem there. 00:50:28.572 --> 00:50:30.931 And then, we have several editions of the same work, 00:50:30.931 --> 00:50:34.014 and we have sub-chapters and things inside the edition. 00:50:34.014 --> 00:50:41.001 So, yeah, that's one too many problems which is hard to solve by nature. 00:50:41.555 --> 00:50:44.839 But there's maybe a tool that can help to solve that. 00:50:45.893 --> 00:50:47.469 Hopefully. 00:50:49.172 --> 00:50:51.395 And that's time, ladies and gentlemen. 00:50:51.398 --> 00:50:53.283 So, thank you very much, Nicolas. 00:50:53.335 --> 00:50:55.137 (applause) 00:50:59.010 --> 00:51:01.127 And please join me giving one more round of applause 00:51:01.127 --> 00:51:03.147 to all of our wonderful speakers. 00:51:03.147 --> 00:51:04.901 (applause)