1 00:00:06,009 --> 00:00:09,069 (host) Hello, everyone. Thank you for coming to these lightning talks. 2 00:00:09,069 --> 00:00:11,529 Our first speaker, I'm going to run straight into it, 3 00:00:11,529 --> 00:00:13,781 is going to be Rosie Stephenson-Goodknight. 4 00:00:13,781 --> 00:00:15,319 Did I get that right? 5 00:00:15,319 --> 00:00:19,609 Yes. And so she's going to be talking about the Women Writers Project. 6 00:00:19,609 --> 00:00:22,569 And we're going to-- yeah, is that right? Great. 7 00:00:22,569 --> 00:00:24,299 And so, we're going to just launch right in, 8 00:00:24,299 --> 00:00:26,699 and I want to remind you, if there's time for questions, 9 00:00:26,699 --> 00:00:28,802 to please not speak until you have the microphone. 10 00:00:28,802 --> 00:00:30,329 Thank you. 11 00:00:31,589 --> 00:00:34,125 (Rosie) Hi, everyone, and thanks for coming to this session, 12 00:00:34,125 --> 00:00:36,829 where we're going to talk about Women Writers in Review, 13 00:00:36,829 --> 00:00:40,329 cultures of reception associated with trans-Atlantic, 14 00:00:40,329 --> 00:00:43,977 English language women writers, broadly construed. 15 00:00:44,523 --> 00:00:48,387 Women Writers in Review is an initiative of the Women Writers Project 16 00:00:48,387 --> 00:00:50,535 of Northeastern University. 17 00:00:50,535 --> 00:00:55,253 It moved there from Brown University, approximately 15 years ago. 18 00:00:55,993 --> 00:01:00,287 Women Writers in Review is a collection of 18th- and 19th-century reviews, 19 00:01:00,287 --> 00:01:04,281 publication notices, literary histories, and other texts 20 00:01:04,281 --> 00:01:09,511 corresponding to trans-Atlantic-- so, UK and US mostly, 21 00:01:09,511 --> 00:01:12,953 though a few Canadian-- written works by women. 22 00:01:13,255 --> 00:01:15,683 It's a project where the two universities, 23 00:01:15,683 --> 00:01:18,133 Brown University and Northeastern University, 24 00:01:18,133 --> 00:01:22,645 started collecting the manuscripts of women from this period. 25 00:01:23,337 --> 00:01:27,520 And then they started collecting the reviews of these works, 26 00:01:27,520 --> 00:01:31,593 and then they started scoring these reviews by giving them a rating. 27 00:01:32,321 --> 00:01:36,144 It's designed to investigate the discourse of reception and connection 28 00:01:36,144 --> 00:01:39,333 with the changing trans-Atlantic literary landscape 29 00:01:39,333 --> 00:01:42,664 for the period 1770 to 1830. 30 00:01:46,143 --> 00:01:49,103 You're going to pardon me if I speak fast, because I've got five minutes 31 00:01:49,103 --> 00:01:50,646 to go over this. 32 00:01:50,646 --> 00:01:55,443 It includes 690 English language texts responding to works 33 00:01:55,443 --> 00:01:59,565 written or translated by 18th- and 19th-century women writers. 34 00:01:59,593 --> 00:02:04,813 There are 74 authors in the corpus, using 112 different sources, 35 00:02:04,813 --> 00:02:07,782 or periodicals, or magazines. 36 00:02:07,782 --> 00:02:10,773 And there are 628 critical reviews. 37 00:02:11,867 --> 00:02:14,671 Here's a picture that shows you what we're talking about 38 00:02:14,671 --> 00:02:16,573 in terms of a review. 39 00:02:16,573 --> 00:02:18,819 And you can also see what kind of scores 40 00:02:18,819 --> 00:02:25,403 were given by the academics at Northeastern University. 41 00:02:25,833 --> 00:02:28,922 Most of these are women who were giving scores 42 00:02:28,922 --> 00:02:34,031 based on the reviews that were done mostly, probably all men, 43 00:02:34,031 --> 00:02:39,799 back in this time period 1770 to 1830 of works written by women. 44 00:02:39,799 --> 00:02:43,469 By works, we're talking about plays, and novels, and poems, 45 00:02:43,469 --> 00:02:46,955 essays, and other kinds of articles. 46 00:02:48,615 --> 00:02:50,275 So, what are we talking about? 47 00:02:50,275 --> 00:02:54,676 This required creating items for authors for their works, 48 00:02:54,676 --> 00:02:57,946 like I said, novels and plays and poems. 49 00:02:57,946 --> 00:03:04,938 It required creating new items for this period of time 50 00:03:05,038 --> 00:03:08,391 where there are defunct periodicals. 51 00:03:08,391 --> 00:03:12,499 It required creating items for the scholarly articles. 52 00:03:12,578 --> 00:03:16,900 And then the review scores of each, and the review score by, 53 00:03:16,943 --> 00:03:19,998 which in this case would be Women Writers in Review, 54 00:03:19,998 --> 00:03:23,336 and what we still need to add is the described by source. 55 00:03:25,226 --> 00:03:28,970 This gives you a picture of the kind of spreadsheets, 56 00:03:28,970 --> 00:03:31,397 Google Spreadsheets, that I have been working on. 57 00:03:31,397 --> 00:03:34,296 I shouldn't just say I, because I've had a lot of help. 58 00:03:34,296 --> 00:03:37,546 I've had a lot of people who were working on this project with me. 59 00:03:37,546 --> 00:03:40,413 And you can see at the top, something about the authors, 60 00:03:40,413 --> 00:03:41,736 about the works. 61 00:03:41,736 --> 00:03:45,496 The third group is going to be the periodical, 62 00:03:45,496 --> 00:03:48,006 and then, how the scores started showing. 63 00:03:49,203 --> 00:03:52,122 And of course, this is how they look-- 64 00:03:52,122 --> 00:03:57,396 the beauty of being able to present the preliminary findings. 65 00:03:57,856 --> 00:04:01,767 Once we have uploaded all of the data, 66 00:04:02,989 --> 00:04:05,906 and I hope that that's going to be done by the end of this year, 67 00:04:06,956 --> 00:04:08,496 this will obviously look different. 68 00:04:09,916 --> 00:04:10,931 Appendix. 69 00:04:10,931 --> 00:04:15,267 So, here's what the depiction looks like 70 00:04:15,267 --> 00:04:18,505 at the Northeastern University website. 71 00:04:19,024 --> 00:04:22,474 I don't think it's quite as clear as what we can do with Wikidata. 72 00:04:22,531 --> 00:04:27,351 And so, this was probably the reason why, when I started as a visiting scholar 73 00:04:27,351 --> 00:04:31,751 in 2017, they asked if this is one of the projects that I could work on. 74 00:04:31,751 --> 00:04:36,093 They stopped their work the year before, in 2016. 75 00:04:36,093 --> 00:04:39,073 And I think they just don't have the resources to continue. 76 00:04:40,251 --> 00:04:43,415 Some parts of this presentation came from another 77 00:04:43,415 --> 00:04:45,812 that was published in 2016. 78 00:04:45,812 --> 00:04:49,401 And last but not least, here are links 79 00:04:49,401 --> 00:04:53,361 to the different parts of the work that I'm doing. 80 00:04:54,257 --> 00:04:55,561 Thank you very much. 81 00:04:55,561 --> 00:04:56,845 Questions. 82 00:04:56,845 --> 00:04:58,754 (applause) 83 00:05:10,397 --> 00:05:14,665 (woman) So, when you have a work, and you have the review of the work, 84 00:05:14,665 --> 00:05:17,703 are you looking at a particular edition of the work, 85 00:05:17,703 --> 00:05:20,665 or are these all reviews of first editions? 86 00:05:21,271 --> 00:05:22,861 It's a good question. No. 87 00:05:22,861 --> 00:05:25,601 They are not just reviews of the first edition. 88 00:05:25,601 --> 00:05:28,601 Some are reviews of the second or third edition. 89 00:05:30,062 --> 00:05:32,262 I'm going to add something that maybe I should have said 90 00:05:32,262 --> 00:05:34,951 before I closed and went to question and answers-- 91 00:05:34,966 --> 00:05:36,800 what's so special about this? 92 00:05:37,220 --> 00:05:40,461 What's special is nobody else has done this on Wikidata. 93 00:05:41,454 --> 00:05:45,580 Surely, there are other universities that have their own collections, 94 00:05:45,580 --> 00:05:51,447 where their scholars have reviewed the reviews of someone's work 95 00:05:51,800 --> 00:05:53,394 in some language. 96 00:05:54,491 --> 00:05:57,389 So, hopefully, once this methodology gets-- 97 00:05:58,000 --> 00:06:02,390 once I write this up and the project is over and presented again, 98 00:06:02,390 --> 00:06:05,310 that there will be other universities, other libraries 99 00:06:05,310 --> 00:06:07,923 that will speak up and say, "We've got data sets, too, 100 00:06:08,248 --> 00:06:13,020 and we're going to go ahead and upload them into Wikidata ourselves," 101 00:06:13,020 --> 00:06:15,910 and then it'd be lovely to start doing some comparisons. 102 00:06:19,572 --> 00:06:22,060 Anyone? Jane. 103 00:06:22,093 --> 00:06:23,767 (Jane) Do you actually have books? 104 00:06:24,293 --> 00:06:26,889 Do you actually have the books-- are the books in existence, 105 00:06:26,889 --> 00:06:28,860 or are you actually doing metadata about books 106 00:06:28,860 --> 00:06:31,400 where we don't even know where the books are? 107 00:06:31,780 --> 00:06:34,829 Northeastern University actually has the book, 108 00:06:34,829 --> 00:06:37,209 or the essay, or the poem. 109 00:06:39,759 --> 00:06:45,392 And they have the critical review of the book, or the essay, or the poem. 110 00:06:45,755 --> 00:06:48,820 And they're working on the transcription of these, 111 00:06:48,820 --> 00:06:51,452 and they're not at 100% yet. 112 00:06:52,432 --> 00:06:56,256 They're not at 100%, but it's like, all things working on it. 113 00:07:00,218 --> 00:07:02,043 Any other questions? 114 00:07:05,697 --> 00:07:07,399 (host) We're going to wrap it up there. 115 00:07:07,399 --> 00:07:09,063 Thanks for being such a nice audience. 116 00:07:09,063 --> 00:07:11,677 (applause) 117 00:07:14,012 --> 00:07:18,581 Lady bug for [inaudible]. 118 00:08:58,271 --> 00:08:59,372 (man) Finally got that. 119 00:08:59,372 --> 00:09:02,565 What I'm going to do is I'm just going to click on these to load. 120 00:09:02,565 --> 00:09:06,091 Just while-- is that new tab there? 121 00:09:06,946 --> 00:09:08,053 [inaudible] 122 00:09:08,053 --> 00:09:10,524 The first one? Yeah, perfect. 123 00:09:11,024 --> 00:09:13,503 Sorry, my German is not even rusty, 124 00:09:13,503 --> 00:09:15,251 it's simply non-existent. 125 00:09:15,663 --> 00:09:19,561 So, I'll just let them load, because then these queries can run 126 00:09:19,561 --> 00:09:22,728 while I'm sort of introducing what I was talking about and doing. 127 00:09:22,728 --> 00:09:24,795 So, hi, I'm Nav from Histropedia. 128 00:09:24,795 --> 00:09:28,169 And basically, for the last quite a few years, 129 00:09:28,169 --> 00:09:29,710 we've been relatively quiet, 130 00:09:29,710 --> 00:09:32,423 while we've been sort of working on technology and tools 131 00:09:32,423 --> 00:09:36,837 that we need to sort of develop, ultimately, Histropedia version 2, 132 00:09:36,837 --> 00:09:39,433 which is going to be, you know, this huge enhancement 133 00:09:39,433 --> 00:09:40,771 on the first version. 134 00:09:40,771 --> 00:09:43,270 Well, it's kind of in progress, but as we do it, 135 00:09:43,270 --> 00:09:45,236 we've been experimenting with these other tools, 136 00:09:45,236 --> 00:09:47,387 and building the technology that we're going to need. 137 00:09:48,132 --> 00:09:51,781 One really crucial part for this is the ability to sort of see 138 00:09:51,781 --> 00:09:55,085 the whole of history from the billions of years time scale, 139 00:09:55,085 --> 00:09:58,602 to up to the current day, 140 00:09:58,602 --> 00:10:00,638 and zooming all the way into single days. 141 00:10:00,638 --> 00:10:03,433 And ultimately, in the end, down to hours and minutes. 142 00:10:03,433 --> 00:10:06,517 We've managed to create a [inaudible] of update to our engine. 143 00:10:06,517 --> 00:10:08,327 Other engines can already do this, 144 00:10:08,327 --> 00:10:11,122 but unfortunately, they also can't handle the large data sets. 145 00:10:11,122 --> 00:10:13,269 So, we finally got this update to our engine. 146 00:10:13,269 --> 00:10:15,392 It allows us to zoom to billions of years. 147 00:10:15,392 --> 00:10:19,533 So, recently-- the recently finished update, 148 00:10:19,533 --> 00:10:22,333 and it's basically, it's an update to our query viewer tool, 149 00:10:22,333 --> 00:10:24,482 which is like a live version of Histropedia 150 00:10:24,482 --> 00:10:26,832 just linked straight to Wikidata. 151 00:10:26,832 --> 00:10:29,092 So, it's literally based on a query, 152 00:10:29,092 --> 00:10:31,372 a live query, and we see the results of it. 153 00:10:31,372 --> 00:10:33,883 So, it's sort of separate to our main tool. 154 00:10:33,883 --> 00:10:37,502 So, I'm going to flick to the first one, which is my first experiment. 155 00:10:37,502 --> 00:10:39,716 And you'll forgive me, the queries-- 156 00:10:39,716 --> 00:10:42,181 the code was kind of finished not so long ago, 157 00:10:42,181 --> 00:10:44,736 and the queries, I've been trying to find out what can I find 158 00:10:44,736 --> 00:10:47,692 and what's interesting to look at, what's missing. 159 00:10:47,692 --> 00:10:52,154 So, I started off with a kind of, sort of, well-- 160 00:10:52,154 --> 00:10:54,241 So, that's not the right-- that's not Life on Earth. 161 00:10:54,241 --> 00:10:55,699 Is this Life on Earth? 162 00:10:56,123 --> 00:10:57,467 That will do, anyway. 163 00:10:57,467 --> 00:11:01,985 So, I started off just trying to look at what sort of things 164 00:11:01,985 --> 00:11:04,657 are actually in Wikidata. 165 00:11:04,657 --> 00:11:07,407 And this particular one-- sorry, it's in reverse. 166 00:11:07,407 --> 00:11:09,829 So, this is the first one I wanted to show you. 167 00:11:09,829 --> 00:11:12,485 So, this is a kind of a life on Earth query 168 00:11:12,485 --> 00:11:14,457 that I wanted to develop. 169 00:11:14,457 --> 00:11:18,410 And basically, what it is is all the taxons in Wikidata 170 00:11:18,410 --> 00:11:20,157 that have a date. 171 00:11:20,157 --> 00:11:23,726 And as you can probably see from the panel, there is not many of them. 172 00:11:23,726 --> 00:11:25,784 But we do have the different taxon ranks. 173 00:11:25,784 --> 00:11:27,596 So, you know, is it a species, a class-- 174 00:11:27,596 --> 00:11:29,725 for a biologist, this makes a lot of sense. 175 00:11:29,725 --> 00:11:32,446 But if I was just to close that a bit, 176 00:11:32,596 --> 00:11:35,453 we can see, we are going back to the earliest forms of life here. 177 00:11:35,453 --> 00:11:37,236 3.5 billion years ago. 178 00:11:37,236 --> 00:11:42,707 And as we zoom in here, we start to see the more modern forms of life, 179 00:11:42,746 --> 00:11:47,232 and we see some really interesting things developing, 180 00:11:47,232 --> 00:11:50,829 but we're still lacking a lot of data in terms of this kind of time range. 181 00:11:52,250 --> 00:11:55,286 So, my next thought was, "Okay, well, why aren't--" 182 00:11:55,592 --> 00:11:57,088 "I want to see a Tyrannosaurus Rex." 183 00:11:57,088 --> 00:11:59,838 That's what I really wanted to see on my query, and it wasn't there. 184 00:11:59,838 --> 00:12:02,138 So, had a little dig in, and I found out why. 185 00:12:02,234 --> 00:12:05,284 It's because they're much more being stored 186 00:12:05,284 --> 00:12:08,696 in terms of the temporal range or time period that they relate to. 187 00:12:09,065 --> 00:12:11,412 So, on comes the next query, 188 00:12:11,412 --> 00:12:13,144 where I actually sort of-- 189 00:12:13,664 --> 00:12:17,641 basically, this query is looking for any item 190 00:12:17,641 --> 00:12:22,284 that has a temporal range start, and/or a temporal range end. 191 00:12:22,665 --> 00:12:25,965 Which is basically in the form-- in life forms, it kind of relates 192 00:12:25,965 --> 00:12:28,644 to when they emerged and when they became extinct. 193 00:12:28,644 --> 00:12:31,044 So, these are the periods on the side here. 194 00:12:31,585 --> 00:12:33,190 If I just close that a bit-- 195 00:12:33,190 --> 00:12:37,364 you can see that we have quite a lot of interesting stuff. 196 00:12:37,364 --> 00:12:39,834 And there's the Tyrannosaurus that I was looking for. 197 00:12:39,834 --> 00:12:43,394 So, I finally got that, and I was like, "Yes! I've done it!" 198 00:12:43,394 --> 00:12:46,084 I've got that Triceratops in there for bonus. 199 00:12:46,084 --> 00:12:48,984 But of course, still loads missing. 200 00:12:48,984 --> 00:12:50,665 And I'd love to see lots more here. 201 00:12:50,665 --> 00:12:52,590 But at least, it gives you the idea. 202 00:12:52,590 --> 00:12:55,794 The nice thing is, here as well, if I star some of these, 203 00:12:55,794 --> 00:12:58,374 you can see that the time range is shown. 204 00:12:58,374 --> 00:13:01,027 So, you can start to do what I really wanted to do, is say, 205 00:13:01,027 --> 00:13:04,004 "Okay, when did this one end, and when did the next one begin? 206 00:13:04,004 --> 00:13:06,085 When did things start going extinct?" 207 00:13:06,085 --> 00:13:09,832 So, I was pretty excited, but, still, really hoping for a lot more. 208 00:13:09,832 --> 00:13:11,619 So, there's a lot of editing to be done 209 00:13:11,619 --> 00:13:15,098 in terms of these large geological and cosmic time scales. 210 00:13:15,909 --> 00:13:19,273 You can see on the color code, I can also do extinction period. 211 00:13:19,273 --> 00:13:23,489 So, I say, I want to find out stuff that went extinct in the late Cretaceous. 212 00:13:23,489 --> 00:13:25,768 And I now know that two things did that. 213 00:13:25,768 --> 00:13:27,717 There's obviously quite a few more. 214 00:13:27,717 --> 00:13:30,483 And I put the taxon rank in there, as well, 215 00:13:30,483 --> 00:13:31,986 just so that we can also see, 216 00:13:31,986 --> 00:13:34,588 "Okay, which, what is its species, genus, et cetera." 217 00:13:35,479 --> 00:13:37,143 So, pretty exciting. 218 00:13:37,143 --> 00:13:41,192 I was quite happy, but it's unfolding, what needs to be done a lot. 219 00:13:42,126 --> 00:13:45,447 So I went to the next one, which was-- 220 00:13:45,447 --> 00:13:48,045 I was thinking, "Well, I can't find all the data I'm looking for. 221 00:13:48,045 --> 00:13:49,347 Let's go a bit more general, 222 00:13:49,347 --> 00:13:53,833 and just look for all of a certain kind of dates in Wikidata that I can find 223 00:13:53,833 --> 00:13:57,240 that are over 10,000 years old, basically. 224 00:13:58,219 --> 00:14:00,703 And what type of thing are they?" 225 00:14:00,762 --> 00:14:04,298 So, this color code is relatively okay, but it might be a bit misleading, 226 00:14:04,298 --> 00:14:06,264 because some things are multiple types. 227 00:14:06,264 --> 00:14:08,318 So, therefore, it's a bit random, at times. 228 00:14:08,318 --> 00:14:11,468 But, you get some really fascinating stuff in here. 229 00:14:11,468 --> 00:14:14,255 I've got for a start-- I've got all of the millennia 230 00:14:14,255 --> 00:14:18,238 that we have in Wikidata, which is, you know, there you go. 231 00:14:18,238 --> 00:14:21,558 Read about everything that happened in all these different millennia. 232 00:14:21,558 --> 00:14:23,629 No pictures for any of these, unfortunately. 233 00:14:23,629 --> 00:14:26,670 So, there's nothing to really say what happened in them. 234 00:14:26,670 --> 00:14:29,203 Taxon, which we were just looking at, which kind of led me on 235 00:14:29,203 --> 00:14:31,124 to the other queries. 236 00:14:31,124 --> 00:14:34,079 And of course, that sort of like all of them in one group. 237 00:14:34,079 --> 00:14:36,875 Interesting stuff. Archaeological cultures. 238 00:14:36,875 --> 00:14:40,121 And this is like, okay, this is more like up my street. 239 00:14:40,121 --> 00:14:42,670 This is the sort of things I want to learn about. 240 00:14:42,670 --> 00:14:45,234 Again, pictures would be nice. 241 00:14:45,493 --> 00:14:48,781 But it's really showing you something interesting. 242 00:14:48,781 --> 00:14:50,361 And it's just worth exploring here. 243 00:14:50,361 --> 00:14:52,534 And of course, there's some that really make me excited 244 00:14:52,534 --> 00:14:54,048 for what we could be doing. 245 00:14:54,048 --> 00:14:57,288 For example, there was something here which was-- 246 00:14:58,028 --> 00:15:00,888 I mean, system, actually, was quite an interesting one. 247 00:15:01,794 --> 00:15:04,237 And sorry, that's not actually the one I was thinking about. 248 00:15:04,237 --> 00:15:05,958 In fact, that means nothing to me at all. 249 00:15:05,958 --> 00:15:07,613 Someone might know what that means. 250 00:15:08,057 --> 00:15:10,813 Art movements, archaeological sites, activities. 251 00:15:10,813 --> 00:15:12,478 There was only two of these, 252 00:15:12,478 --> 00:15:15,788 but I really like the idea, because-- and they're both the same. 253 00:15:15,788 --> 00:15:17,658 They're both hunting. 254 00:15:17,730 --> 00:15:19,390 And of course, there's two of them. 255 00:15:19,390 --> 00:15:22,360 And the reason is, is because there's a little qualifier on there. 256 00:15:22,360 --> 00:15:25,143 If we were to just look through, we can see-- 257 00:15:25,143 --> 00:15:27,735 we can see somewhere down here, will be the start time. 258 00:15:27,735 --> 00:15:30,690 And the qualifier is talking about when Homo erectus did it, 259 00:15:30,690 --> 00:15:32,735 and when Homo sapiens did it. 260 00:15:32,735 --> 00:15:35,513 So that should be in brackets on the query, 261 00:15:35,513 --> 00:15:39,002 a little extension to do to show you what the two different versions mean. 262 00:15:39,002 --> 00:15:42,390 But I would love to see all of human skills in here. 263 00:15:42,390 --> 00:15:44,708 When did we first do farming, when did we first this-- 264 00:15:44,708 --> 00:15:46,010 when did fire come about? 265 00:15:46,010 --> 00:15:48,270 All of these things, when did we first extract iron? 266 00:15:48,270 --> 00:15:50,355 When did we first-- all of these wonderful things 267 00:15:50,355 --> 00:15:53,607 that developed to modern world that we live in. 268 00:15:53,607 --> 00:15:56,873 So, really exciting signs of what could be there, 269 00:15:56,873 --> 00:15:58,112 if it all got populated. 270 00:15:58,112 --> 00:16:00,210 So, you know, this is what we really need to work on, 271 00:16:00,210 --> 00:16:02,333 is some of this historical info. 272 00:16:03,243 --> 00:16:05,060 Last one, I just wanted to just show you, 273 00:16:05,060 --> 00:16:07,283 which was just an extra bonus one I threw in, 274 00:16:07,283 --> 00:16:10,875 just to look at the time periods that we actually have, 275 00:16:10,875 --> 00:16:13,921 the historical ages that we have in Wikidata. 276 00:16:13,921 --> 00:16:17,524 And so, this is actually just all sub-classes of unit of time. 277 00:16:17,524 --> 00:16:22,396 And then, this is the actual instance that it was. 278 00:16:22,396 --> 00:16:23,775 And it's just really interesting. 279 00:16:23,775 --> 00:16:25,849 This is more the kind of thing-- 280 00:16:26,979 --> 00:16:29,541 In Histropedia Mark II, these are the kind of things 281 00:16:29,541 --> 00:16:31,944 that will actually will be displayed more under the timeline 282 00:16:31,944 --> 00:16:33,984 as a sort of a range or period. 283 00:16:33,993 --> 00:16:36,436 And so, we are particularly interested in these periods 284 00:16:36,436 --> 00:16:37,976 being really tight and nice, 285 00:16:37,976 --> 00:16:40,718 because it helps you to, then, say what happened when, 286 00:16:40,718 --> 00:16:43,983 and you can sound really clever when you talk about when things happened, 287 00:16:43,983 --> 00:16:47,263 in the Neolithic or the upper Paleolithic, or whatever. 288 00:16:47,263 --> 00:16:49,121 I'm still pretty clueless on most of it, 289 00:16:49,121 --> 00:16:51,918 because I'm just kind of just waiting for the data to be up to scratch. 290 00:16:51,918 --> 00:16:55,163 Great. I think I can actually round it up there. 291 00:16:55,163 --> 00:16:57,145 Loads more exciting queries to come. 292 00:16:57,145 --> 00:17:00,420 A lot more features and cool stuff, actually, just around the corner for us, 293 00:17:00,420 --> 00:17:02,758 because we've just finished a lot of cool things. 294 00:17:02,758 --> 00:17:05,471 But there's a little bit of time to pull it all together. 295 00:17:05,471 --> 00:17:07,373 So, look out for more. 296 00:17:07,373 --> 00:17:09,760 If there's any questions, I think I've got one minute. 297 00:17:09,760 --> 00:17:11,458 So, it would have to be one. 298 00:17:11,510 --> 00:17:13,253 (host) Yes, Nav. I forgot to introduce you. 299 00:17:13,253 --> 00:17:16,933 I'm sorry. That's Nav, as he said, Histropedia, Evans. Thank you very much. 300 00:17:16,933 --> 00:17:17,986 Thank you. Cheers. Yeah. 301 00:17:17,986 --> 00:17:19,450 (host) Very fast questions. 302 00:17:19,450 --> 00:17:21,815 Anyone with a very fast question [inaudible]. 303 00:17:24,654 --> 00:17:29,230 (woman 2) Very quickly, how can I do my own, if I want languages, 304 00:17:29,230 --> 00:17:30,818 when do we start, for instance. 305 00:17:30,818 --> 00:17:32,031 Absolutely. Good question. 306 00:17:32,031 --> 00:17:34,320 So just click on the-- oh, I've shared this. 307 00:17:34,320 --> 00:17:36,853 It's called cosmic timelines on the URL. 308 00:17:36,853 --> 00:17:40,911 Should be cosmic and geological, but then it's not a short URL anymore. 309 00:17:40,911 --> 00:17:43,711 So, you click on this icon in the top corner there, 310 00:17:43,711 --> 00:17:47,431 and then, you get to the query page, which is like the home page of this tool. 311 00:17:47,431 --> 00:17:49,311 This is where the query is pasted in. 312 00:17:49,311 --> 00:17:51,491 So, at the moment, I've got the language there. 313 00:17:51,491 --> 00:17:53,483 If I want to change it to something else, 314 00:17:53,483 --> 00:17:56,062 Arabic, or French, or whatever-- 315 00:17:56,062 --> 00:17:58,271 and here are the-- this is the area 316 00:17:58,271 --> 00:18:03,092 where you sort of enter in exactly which variables in your query 317 00:18:03,092 --> 00:18:04,600 you would like to do each thing. 318 00:18:04,600 --> 00:18:06,781 If you put nothing in, it will try and figure it out. 319 00:18:06,781 --> 00:18:09,971 But if you want advanced stuff-- and really important, is the precision, 320 00:18:09,971 --> 00:18:13,033 because that's not available on the query service timeline. 321 00:18:13,033 --> 00:18:14,123 So, you get everything-- 322 00:18:14,123 --> 00:18:16,303 is the first of January 10 billion years ago, 323 00:18:16,303 --> 00:18:18,363 you know, which is not what we want to see. 324 00:18:18,363 --> 00:18:20,603 And the rank, which is quite interesting. 325 00:18:20,603 --> 00:18:24,173 My timelines are all based on a very simple rank of site link count, 326 00:18:24,173 --> 00:18:27,058 how many different articles there are, or something else. 327 00:18:27,058 --> 00:18:29,432 But that's how you go and mess around with it with yourself, 328 00:18:29,432 --> 00:18:32,034 and you put your color codes and your filters in down here. 329 00:18:32,034 --> 00:18:34,098 Comma separate them, if you would like more, 330 00:18:34,098 --> 00:18:36,007 and they come up as options in the final tool. 331 00:18:36,007 --> 00:18:37,836 And I think that pretty much is it, isn't it. 332 00:18:37,836 --> 00:18:39,863 So, any other questions, do find me afterwards. 333 00:18:39,863 --> 00:18:41,655 Always happy to get cornered for this stuff. 334 00:18:41,655 --> 00:18:42,954 I love talking about it. 335 00:18:42,954 --> 00:18:44,989 Okay. So, thank you very much. Cheers. 336 00:18:44,989 --> 00:18:46,948 (applause) 337 00:19:28,344 --> 00:19:30,220 (mumbles) 338 00:19:30,265 --> 00:19:32,115 So, where is the first one? 339 00:19:33,854 --> 00:19:35,397 This one, no. 340 00:19:45,636 --> 00:19:47,132 This? Sorry. 341 00:19:48,270 --> 00:19:50,090 Is it full screen? 342 00:19:50,217 --> 00:19:52,129 Yep. Full screen. 343 00:19:54,747 --> 00:19:56,289 Well, good work. 344 00:19:58,388 --> 00:19:59,434 [Strike.] 345 00:19:59,497 --> 00:20:02,312 Yeah, so, okay. Thank you. 346 00:20:04,752 --> 00:20:07,062 So, hi, I'm Thibaud Senalada. 347 00:20:07,062 --> 00:20:08,952 As [inaudible] introduced me. 348 00:20:09,552 --> 00:20:14,212 I'm a software engineer at the French National Library. 349 00:20:14,992 --> 00:20:18,349 And I'm here today to talk to you about NOEMI, 350 00:20:18,979 --> 00:20:23,682 which is a software, a proof of concept, 351 00:20:23,682 --> 00:20:26,501 and a [inaudible] software 352 00:20:26,635 --> 00:20:29,961 to the French Library to cataloging. 353 00:20:30,787 --> 00:20:32,870 Sorry. [inaudible]. 354 00:20:32,870 --> 00:20:35,359 Sorry for my English. It's a bit of fuzzy. 355 00:20:36,971 --> 00:20:39,321 And so, what's NOEMI? 356 00:20:39,321 --> 00:20:41,589 So, NOEMI stands for: 357 00:20:41,589 --> 00:20:44,591 Nouer les oeuvres, expressions, Manifestations et Items. 358 00:20:44,591 --> 00:20:46,533 Which, in English, is: 359 00:20:46,533 --> 00:20:49,891 to link work, expression, manifestation, and items. 360 00:20:51,086 --> 00:20:58,057 It's based on the FRBR, 361 00:20:58,057 --> 00:21:00,633 and [inaudible]. 362 00:21:00,881 --> 00:21:03,105 Yeah. Anyway. 363 00:21:03,631 --> 00:21:04,839 So, yeah. 364 00:21:05,244 --> 00:21:09,540 So, this software, we use to produce metadata. 365 00:21:10,841 --> 00:21:12,201 It will be used 366 00:21:12,201 --> 00:21:17,831 by 600 people on a daily basis. 367 00:21:18,911 --> 00:21:24,271 And as I say in the title, it will be based on Wikibase. 368 00:21:25,415 --> 00:21:31,871 So, there is also a format manager. 369 00:21:32,388 --> 00:21:39,138 So, people using this software will use like a code editor, 370 00:21:39,254 --> 00:21:41,817 but for MARC format. 371 00:21:41,968 --> 00:21:45,178 So, it's [inaudible], things like that. 372 00:21:46,814 --> 00:21:49,868 A data processing tool, like I said. 373 00:21:49,959 --> 00:21:53,040 And also, authorization management, 374 00:21:54,327 --> 00:21:56,378 because they will need a-- 375 00:21:57,337 --> 00:22:01,417 if there is some data, where it can be modified. 376 00:22:05,877 --> 00:22:07,840 So, the PoC context. 377 00:22:08,728 --> 00:22:12,738 So, this software will be replacing an old software, 378 00:22:12,855 --> 00:22:15,688 called ADCAT02. 379 00:22:17,111 --> 00:22:20,964 It is part of the bibliographic transition. 380 00:22:20,984 --> 00:22:24,554 So, I say the [inaudible]. 381 00:22:25,359 --> 00:22:29,390 [inaudible]. [inaudible] in English? 382 00:22:30,254 --> 00:22:31,662 Format. 383 00:22:32,717 --> 00:22:35,734 And it will be the [inaudible] of the-- 384 00:22:39,979 --> 00:22:41,090 Sorry. 385 00:22:42,349 --> 00:22:46,560 It will be [inaudible] all the [inaudible] 386 00:22:46,560 --> 00:22:49,689 of the BnF with data. 387 00:22:51,731 --> 00:22:54,124 And so, doing this work, 388 00:22:54,124 --> 00:22:59,693 we accessed Wikibase to see if it fits our needs. 389 00:23:01,244 --> 00:23:03,383 And [inaudible] pretty good. 390 00:23:04,485 --> 00:23:06,930 So, why Wikibase? 391 00:23:06,930 --> 00:23:08,821 Because of the flexibility of the format. 392 00:23:08,835 --> 00:23:11,646 We arrive-- 393 00:23:11,850 --> 00:23:16,388 to inject MARC, INTERMARC for BnF-- 394 00:23:16,960 --> 00:23:18,350 in the database. 395 00:23:18,399 --> 00:23:22,803 And use it to-- use this link management 396 00:23:22,803 --> 00:23:25,529 between entities using Blazegraph, 397 00:23:25,529 --> 00:23:27,776 so, as Wikibase does. 398 00:23:29,155 --> 00:23:32,700 We also choose Wikibase, because it was already-- 399 00:23:35,183 --> 00:23:38,900 it handles history and user account. 400 00:23:39,941 --> 00:23:42,414 So, it's easiest for us. 401 00:23:43,106 --> 00:23:48,270 And it also has a good-- it's pretty easy to create bots 402 00:23:48,270 --> 00:23:51,090 to watch and curate data 403 00:23:51,840 --> 00:23:53,430 and also to make statistics. 404 00:23:54,820 --> 00:23:57,170 It's free and open, and sustainable. 405 00:23:57,908 --> 00:23:59,084 Yeah, so. 406 00:23:59,610 --> 00:24:02,519 I'm sorry if you don't understand what I say, 407 00:24:02,519 --> 00:24:04,839 because I know my English is not that good. 408 00:24:07,720 --> 00:24:12,139 But during this PoC, we encountered some trouble. 409 00:24:12,802 --> 00:24:13,938 Okay. 410 00:24:14,790 --> 00:24:21,117 First of all, as a search engine, I think we have to create 411 00:24:21,117 --> 00:24:24,150 another-- 412 00:24:24,185 --> 00:24:28,988 not another, a supplementary search engine to use it with, 413 00:24:29,433 --> 00:24:31,120 to fit our needs. 414 00:24:31,688 --> 00:24:37,155 Because we need some search 415 00:24:37,155 --> 00:24:42,366 like faceted search and filters. 416 00:24:43,755 --> 00:24:47,525 Also we have the [inaudible], 417 00:24:47,525 --> 00:24:50,407 of using postgreSQL database. 418 00:24:50,407 --> 00:24:54,885 And for the moment, I think Wikibase [inaudible]. 419 00:24:56,436 --> 00:25:01,266 And when we try to use postgreSQL, it was a bit difficult, 420 00:25:01,266 --> 00:25:04,394 and will cause some issues. 421 00:25:05,662 --> 00:25:08,825 And we have also some fear about performance, 422 00:25:08,825 --> 00:25:15,238 because the catalog is about 20 million entities, 423 00:25:16,366 --> 00:25:19,146 20 million bibliographic entities. 424 00:25:19,146 --> 00:25:22,851 That can be more than 20 million entities, actually. 425 00:25:23,276 --> 00:25:27,771 And we don't know the time that we'll have to inject them 426 00:25:27,809 --> 00:25:30,765 in the Wikibase, and how to do it. 427 00:25:32,198 --> 00:25:34,267 So, [inaudible], 428 00:25:34,324 --> 00:25:39,616 but the real software development has already started. 429 00:25:43,242 --> 00:25:46,175 We start by creating an interface with Wikibase. 430 00:25:46,261 --> 00:25:47,711 We're using Java. 431 00:25:48,091 --> 00:25:50,093 Like PyWikibase. 432 00:25:51,691 --> 00:25:54,888 - (man) Pywikibot. - Pywikibot. Yeah, thank you. 433 00:25:56,027 --> 00:25:57,723 The same way, but in Java. 434 00:25:59,309 --> 00:26:02,831 We also inject already the format into the Wikibase. 435 00:26:03,540 --> 00:26:09,093 And we do something like the INTERMARC editor, 436 00:26:09,458 --> 00:26:12,134 [inaudible], et cetera. 437 00:26:13,672 --> 00:26:14,926 Thank you. 438 00:26:15,333 --> 00:26:17,135 (applause) 439 00:26:23,527 --> 00:26:24,749 Yeah. 440 00:26:27,748 --> 00:26:29,813 (man 2) Faceted search will be a nice feature 441 00:26:29,813 --> 00:26:31,885 in the Wikidata UI itself. 442 00:26:31,924 --> 00:26:34,062 So, have you talked to any of the developers, 443 00:26:34,062 --> 00:26:35,675 or is that something that could be done? 444 00:26:35,711 --> 00:26:37,108 Sorry, I don't understand. 445 00:26:37,108 --> 00:26:39,041 (man 2) The faceted search idea. 446 00:26:39,911 --> 00:26:41,982 It would be nice to be able to search only humans, 447 00:26:41,982 --> 00:26:44,221 or search only works, or something, right? 448 00:26:44,321 --> 00:26:47,991 Yeah. I'm sorry, I don't-- I don't-- 449 00:26:48,131 --> 00:26:50,436 (man 2) Yeah, I mean, so, it would be nice if we had that 450 00:26:50,436 --> 00:26:52,265 in Wikidata itself in the UI. 451 00:26:52,822 --> 00:26:53,954 Yeah, yeah, yeah. 452 00:26:54,088 --> 00:26:56,077 [inaudible] 453 00:26:56,077 --> 00:26:57,911 Yeah, okay, thank you. 454 00:26:57,911 --> 00:27:00,026 I'm sorry. (laughs) 455 00:27:01,186 --> 00:27:03,902 Yeah, yeah. But I think we will-- 456 00:27:04,506 --> 00:27:07,266 I don't know if we want to do it inside Wikibase, 457 00:27:07,266 --> 00:27:10,746 or in our next systems. 458 00:27:10,785 --> 00:27:15,186 For the moment, we don't really solve that. 459 00:27:15,965 --> 00:27:17,885 For the moment, I think. 460 00:27:17,885 --> 00:27:19,285 Sorry. 461 00:27:27,645 --> 00:27:30,644 (man 3) I suppose on the topic of the faceted search, 462 00:27:32,535 --> 00:27:35,068 Wikidata, SPARQL Query, Wikibase-- 463 00:27:35,068 --> 00:27:38,965 SPARQL Query is I think, functionally equivalent 464 00:27:38,965 --> 00:27:41,405 to a facetable search. 465 00:27:42,105 --> 00:27:44,234 So, it's mostly an interface issue, right? 466 00:27:44,284 --> 00:27:47,791 I mean, you could build an interface that starts with a query, 467 00:27:47,791 --> 00:27:51,111 and then, gives you possible facets to filter by. 468 00:27:51,370 --> 00:27:52,660 And when you click one of them, 469 00:27:52,660 --> 00:27:55,217 it adds a condition to the SPARQL Query, right? 470 00:27:55,664 --> 00:27:58,183 Yeah, but I think the SPARQL-- 471 00:27:59,157 --> 00:28:04,310 they don't go as detailed as we want, as we have-- 472 00:28:05,632 --> 00:28:09,631 When we inject the format, we use a statement for-- 473 00:28:10,525 --> 00:28:13,124 the format is like XML. 474 00:28:13,223 --> 00:28:15,842 So, it's a zone, subzone, and value. 475 00:28:16,413 --> 00:28:20,292 And in the [inaudible] statement, we add the subzone, 476 00:28:20,892 --> 00:28:22,902 because the zone was already there. 477 00:28:23,002 --> 00:28:28,565 And we want to query some qualifier on this. 478 00:28:28,659 --> 00:28:35,206 And I don't know if the SPARQL goes through that-- I'm sorry-- 479 00:28:36,145 --> 00:28:38,277 in a fast way. 480 00:28:40,025 --> 00:28:46,285 I think we need some index for us to [inaudible]. 481 00:28:46,925 --> 00:28:48,145 Yeah. 482 00:28:48,145 --> 00:28:50,250 (man 3) SPARQL doesn't do a query-- 483 00:28:52,321 --> 00:28:55,703 To do proper string searches in SPARQL is very hard. 484 00:28:55,703 --> 00:28:57,610 You have to have filters, which are slow, 485 00:28:57,610 --> 00:28:59,815 and it really doesn't work that well. 486 00:28:59,815 --> 00:29:02,845 So, it's a different search problem, really. 487 00:29:06,871 --> 00:29:09,270 More question? If anyone has one? 488 00:29:12,215 --> 00:29:13,999 - Great. Thank you. - Thank you. 489 00:29:14,044 --> 00:29:15,895 (applause) 490 00:29:37,766 --> 00:29:41,960 (host) Nielsen speaking about the tool Ordia. Thank you. 491 00:30:05,084 --> 00:30:06,460 So, I'm Finn Årup Nielsen, 492 00:30:06,460 --> 00:30:09,006 and a couple of years ago, I started Scholia 493 00:30:09,006 --> 00:30:14,611 that displays data from Wikidata via a SPARQL Query 494 00:30:14,611 --> 00:30:16,359 to the Wikidata Query Service 495 00:30:16,359 --> 00:30:18,959 so we can generate, for example, a list of publications 496 00:30:18,959 --> 00:30:20,380 for a specific author. 497 00:30:20,866 --> 00:30:26,941 Now, last year, Wikidata introduced lexicographic data. 498 00:30:29,332 --> 00:30:32,655 And I [inaudible] the idea of Scholia 499 00:30:32,655 --> 00:30:39,279 that is using Wikidata and the Wikidata Query Service 500 00:30:39,445 --> 00:30:42,036 to generate overviews of lexicographic data. 501 00:30:42,585 --> 00:30:46,125 So, Ordia is the example of this one here. 502 00:30:46,197 --> 00:30:51,998 So, it generates-- it's a web application run from the Toolforge service, 503 00:30:51,998 --> 00:30:57,198 and for example, it will dynamically generate a page such as-- 504 00:30:57,234 --> 00:31:01,768 This one here is statistics over what there is of lexicographic data 505 00:31:01,768 --> 00:31:03,841 in Wikidata. 506 00:31:03,992 --> 00:31:07,404 For example, the number of lexemes, is currently over 200,000. 507 00:31:08,664 --> 00:31:10,483 So, there's a range of things you can do here. 508 00:31:10,483 --> 00:31:12,916 You can, for example, look in the aspects of that. 509 00:31:12,916 --> 00:31:15,560 The menu, there's quite a lot of things here. 510 00:31:15,560 --> 00:31:18,485 And so, I will search on a specific Danish lexemes. 511 00:31:19,503 --> 00:31:22,835 "Rød"-- which is "red" in Danish. 512 00:31:23,376 --> 00:31:27,466 So, you basically get, for the specific lexeme, 513 00:31:28,286 --> 00:31:30,618 the same type of information that you could see 514 00:31:30,618 --> 00:31:33,751 in the ordinary part of Wikidata, here. 515 00:31:34,451 --> 00:31:38,256 Annotations about the lexeme, annotation about the forms, 516 00:31:39,359 --> 00:31:40,872 single or plural forms. 517 00:31:41,548 --> 00:31:43,501 Annotation about the sentence. 518 00:31:44,683 --> 00:31:47,678 But what you can't see in ordinary Wikidata 519 00:31:47,678 --> 00:31:52,150 is sort of aggregating across lexemes. 520 00:31:52,246 --> 00:31:54,207 And this is, for example, down here-- 521 00:31:54,207 --> 00:31:55,902 down here with the compound. 522 00:31:55,902 --> 00:31:57,764 So, in Danish, like in German, 523 00:31:57,764 --> 00:31:59,950 words can be compounded. 524 00:31:59,950 --> 00:32:03,478 For example, for "red", we have rødkælk 525 00:32:03,478 --> 00:32:05,830 which is compounded by two words. 526 00:32:06,721 --> 00:32:10,085 And we've got, on the second one here, rødvin-- red wine. 527 00:32:11,060 --> 00:32:15,691 This list here is constructed by a SPARQL Query to the Wikidata Service. 528 00:32:16,751 --> 00:32:20,406 And also, further down here, we've got a lot of Danish words here. 529 00:32:20,970 --> 00:32:26,122 Further down here, we should have a graph of the words 530 00:32:27,426 --> 00:32:29,164 which are compounded from rød. 531 00:32:29,658 --> 00:32:31,980 We have [rød]-- red here in the middle. 532 00:32:31,980 --> 00:32:34,372 And for example, around-- somewhere around here, 533 00:32:34,372 --> 00:32:36,895 which should have, for example, "red cabbage," 534 00:32:36,936 --> 00:32:40,343 "red cabbage salad," "red cabbage soup," and so on. 535 00:32:40,434 --> 00:32:43,055 So you can browse around, in this one here, and see it. 536 00:32:44,204 --> 00:32:51,188 We can go a bit back here, and then look on the main sense 537 00:32:51,388 --> 00:32:55,030 of the word rød-- red in Danish. 538 00:32:55,550 --> 00:33:01,610 So, Ordia automatically generates information about hyponyms. 539 00:33:02,570 --> 00:33:04,400 Subconcepts, for example, 540 00:33:04,400 --> 00:33:07,400 light red, dark red, pink, purple, and so on, 541 00:33:07,525 --> 00:33:14,272 are in the-- when we make a Wikidata Query service, SPARQL Query. 542 00:33:14,576 --> 00:33:20,570 Then we go around in the Wikidata graph, 543 00:33:20,626 --> 00:33:22,266 and get this information here. 544 00:33:22,266 --> 00:33:24,786 And we can also get translation automatically, 545 00:33:24,786 --> 00:33:28,316 even though it's not necessarily stated within the Wikidata lexemes items. 546 00:33:28,316 --> 00:33:32,679 For example, here, we have translated rød to "red" in English, 547 00:33:32,679 --> 00:33:36,089 and röd in Swedish, and so on. 548 00:33:36,107 --> 00:33:38,191 There's not that very many there. 549 00:33:38,747 --> 00:33:40,262 There's a range of other things here. 550 00:33:40,262 --> 00:33:43,487 Let me show you, for example, this one here-- 551 00:33:44,387 --> 00:33:51,308 this is veninde- now I go over to this one here. 552 00:33:54,308 --> 00:33:57,328 -inde, which is a feminine suffix. 553 00:33:58,058 --> 00:34:00,498 So, this is auto-generated there, 554 00:34:00,498 --> 00:34:02,641 it's a combination of "instance of"-- 555 00:34:03,268 --> 00:34:07,171 lexemes that are "instance of" feminine suffixes. 556 00:34:08,142 --> 00:34:11,519 And for example, for German, we have [inaudible]. 557 00:34:11,519 --> 00:34:15,373 So, -in would be a feminine suffix in German. 558 00:34:15,704 --> 00:34:21,291 And I put in sort of the five Danish feminine suffixes 559 00:34:22,571 --> 00:34:24,206 of Danish. 560 00:34:25,480 --> 00:34:29,106 Another facility is, for example, if you have a text, 561 00:34:29,106 --> 00:34:34,021 you can copy and paste it into this Text to lexemes here. 562 00:34:34,571 --> 00:34:35,911 Let me-- 563 00:34:37,482 --> 00:34:41,218 "a car crashed into... 564 00:34:41,864 --> 00:34:44,141 a green house." 565 00:34:46,485 --> 00:34:48,701 Let me change that to "English". 566 00:34:49,006 --> 00:34:50,029 Press Submit. 567 00:34:50,029 --> 00:34:53,355 Now, Ordia will then extract each of the word here, 568 00:34:53,355 --> 00:34:54,733 in this sentence here, 569 00:34:54,733 --> 00:34:58,217 and try to see whether they are entered in the specific form, 570 00:34:58,217 --> 00:35:00,778 a lexeme, are entered in Wikidata. 571 00:35:00,778 --> 00:35:04,228 And these simple words here are entered in Wikidata. 572 00:35:04,228 --> 00:35:09,190 But if we, for example, change it to-- there's nothing called "vancar" 573 00:35:09,190 --> 00:35:13,998 but just let us do that here. 574 00:35:14,535 --> 00:35:19,532 And you got down here-- it's as a blue link 575 00:35:20,335 --> 00:35:23,295 that you can create a new Wikidata lexeme item. 576 00:35:24,556 --> 00:35:29,097 But the range of other things to explore 577 00:35:29,716 --> 00:35:31,496 in this web application. 578 00:35:31,496 --> 00:35:35,596 And if there's any suggestions, or comments, or notes, or something, 579 00:35:35,596 --> 00:35:39,337 you can contact me, or put in an issue on GitHub. 580 00:35:39,337 --> 00:35:44,856 So, this particular application is developed on GitHub, 581 00:35:44,856 --> 00:35:50,526 and I'm open for new ideas and ways to represent information there. 582 00:35:51,306 --> 00:35:52,701 Okay, thank you. 583 00:35:52,701 --> 00:35:54,661 (applause) 584 00:35:59,328 --> 00:36:00,906 Questions? 585 00:36:03,262 --> 00:36:04,524 (woman 3) I love your tool. 586 00:36:04,524 --> 00:36:09,752 Can you show the languages, that which is awesome for me, I think, 587 00:36:09,752 --> 00:36:11,731 to show other languages. 588 00:36:12,183 --> 00:36:14,537 So, this is a bit of statistics over the languages, 589 00:36:14,537 --> 00:36:17,046 and the Russians have been scraping Wictionary, 590 00:36:17,046 --> 00:36:20,327 and that's why they have now 100,000 lexemes. 591 00:36:24,387 --> 00:36:28,088 There's also a lot of work on Basque here. 592 00:36:29,566 --> 00:36:32,241 I think there's an organization putting that information in here. 593 00:36:32,241 --> 00:36:34,932 And you can also see a graph of these-- 594 00:36:34,932 --> 00:36:37,662 this is Number of forms as functions of number of lexemes. 595 00:36:38,798 --> 00:36:41,279 And all the way up here-- 596 00:36:41,279 --> 00:36:45,255 here, this is Russian, down here, Basque, I think. 597 00:36:45,476 --> 00:36:47,997 And English, perhaps, down here. 598 00:36:48,953 --> 00:36:50,692 And also in the Number of senses, 599 00:36:52,473 --> 00:36:58,360 I think Basque, English, and Russian, 600 00:37:00,184 --> 00:37:02,048 Hebrew, and so on. 601 00:37:02,048 --> 00:37:03,343 Yeah. 602 00:37:11,045 --> 00:37:12,950 (man 4) That looks like an incredible tool. 603 00:37:12,950 --> 00:37:15,097 But I was just wondering, is it all fully live? 604 00:37:15,097 --> 00:37:18,344 Is it all based on SPARQL Queries and live or are there some things-- 605 00:37:18,344 --> 00:37:20,458 - Yes. I believe, yes. - Fantastic. 606 00:37:20,511 --> 00:37:24,961 But as they get more data into Wikidata, 607 00:37:24,961 --> 00:37:26,100 there's a bit of an issue. 608 00:37:26,100 --> 00:37:27,328 For example, for Russian here. 609 00:37:27,328 --> 00:37:31,966 I started out this a year ago when there's not that very many lexemes, 610 00:37:32,061 --> 00:37:35,503 and so there was no problems with the time-outs. 611 00:37:35,503 --> 00:37:38,367 But representing it here-- 612 00:37:38,367 --> 00:37:42,268 but if I press Russian, I think there might be some issues. 613 00:37:42,268 --> 00:37:44,284 There's a count that works here, 614 00:37:44,284 --> 00:37:46,101 for example, longest words or phrases. 615 00:37:46,101 --> 00:37:49,252 But I think the lexemes are sort of loading in. 616 00:37:49,252 --> 00:37:52,727 I think I'll need to fix that as Wikidata grows here. 617 00:37:53,258 --> 00:37:55,927 As you see, there's a lot of Russian nouns, apparently. 618 00:37:56,699 --> 00:37:58,451 And I don't know whether the-- 619 00:37:59,351 --> 00:38:01,519 apparently, that's what they're working on. 620 00:38:01,573 --> 00:38:03,960 There seems also to be a bit of time-out there. 621 00:38:06,705 --> 00:38:08,033 [inaudible], oh, yes. 622 00:38:08,115 --> 00:38:09,984 The first one there. 623 00:38:10,832 --> 00:38:16,110 But apparently, the longest words and phrases is a bit too expansive. 624 00:38:17,931 --> 00:38:20,334 But apparently, it can be loaded there, and it's probably-- 625 00:38:21,318 --> 00:38:23,167 it's loaded all the 100,000 there, 626 00:38:23,167 --> 00:38:27,938 so you can click all 10,000 pages. 627 00:38:36,748 --> 00:38:38,678 (host) If there aren't any other questions-- 628 00:38:39,564 --> 00:38:40,950 The longest word came now. 629 00:38:40,950 --> 00:38:43,146 So, it's, yeah. 630 00:38:44,972 --> 00:38:46,390 Probably-- 631 00:38:47,855 --> 00:38:49,975 [inaudible] 632 00:38:50,321 --> 00:38:51,540 What is that? 633 00:38:51,540 --> 00:38:53,518 - (audience) It's a chemical. - A chemical, yes. 634 00:38:56,317 --> 00:38:58,303 (host) More questions? Or shall we? 635 00:38:59,792 --> 00:39:02,332 Alright, alright. Thank you very much. 636 00:39:02,332 --> 00:39:04,392 (applause) 637 00:39:23,642 --> 00:39:25,121 (Nicolas) Is it good? 638 00:39:31,008 --> 00:39:32,346 (host) Awesome. 639 00:39:34,920 --> 00:39:38,137 Alright, now, to wrap it up, we have Nicolas Vigneron, 640 00:39:38,137 --> 00:39:40,778 talking about Wikisource and Wikidata. 641 00:39:41,469 --> 00:39:42,804 (Nicolas) This is good? 642 00:39:44,542 --> 00:39:46,126 Who knows Wikisource? 643 00:39:47,582 --> 00:39:48,959 Yay! 644 00:39:50,740 --> 00:39:53,582 More and more people raising hands every year. 645 00:39:53,582 --> 00:39:54,957 That's good. 646 00:39:55,282 --> 00:40:01,462 So, this morning, [Lydia] said that Wikivoyage was the first real user of-- 647 00:40:03,306 --> 00:40:05,987 [inaudible] 648 00:40:06,572 --> 00:40:08,347 Wikisource is not that far behind. 649 00:40:09,230 --> 00:40:13,280 There's a lot to do, and I want to do some basic numbers, 650 00:40:13,280 --> 00:40:16,964 statistics, about where we are, and where I want to go. 651 00:40:17,613 --> 00:40:23,409 So first, there will be a lot of questions of what is a book, 652 00:40:23,409 --> 00:40:25,389 what is bibliographical data. 653 00:40:25,389 --> 00:40:27,229 People from the BnF can agree with me. 654 00:40:27,229 --> 00:40:29,969 That can be a nightmare if you go into details. 655 00:40:30,164 --> 00:40:35,803 But some big numbers that-- Google Books tried to do an estimation 656 00:40:35,803 --> 00:40:39,676 on how many "books," air quote books, there is in the world, 657 00:40:39,676 --> 00:40:43,005 and there's 130 million books in the world. 658 00:40:43,705 --> 00:40:47,279 And, yeah, let's put them all on Wikidata. 659 00:40:47,650 --> 00:40:49,300 Or not. I don't know. 660 00:40:49,392 --> 00:40:51,049 But where are we now? 661 00:40:51,413 --> 00:40:52,468 And why is it books? 662 00:40:52,468 --> 00:40:55,668 Because for Google Books, everything is scanned, basically. 663 00:40:55,795 --> 00:40:58,670 They don't have exactly a very clear distinction. 664 00:40:59,400 --> 00:41:04,350 There's sometimes, two-page books, which [inaudible], Google Books is a book. 665 00:41:04,714 --> 00:41:10,131 But for many people, you have to have at least 50 pages to be a book. 666 00:41:10,536 --> 00:41:12,321 So, that's always hard to count. 667 00:41:12,885 --> 00:41:15,603 But here's what we know on Wikidata. 668 00:41:15,603 --> 00:41:18,704 This the graph of what is a book for Wikidata. 669 00:41:18,704 --> 00:41:21,524 You have-- that's totally [inaudible]-- 670 00:41:21,524 --> 00:41:23,979 but that's Wikidata, literary work as well. 671 00:41:23,979 --> 00:41:27,194 And this is all the subclasses, or subclasses of subclasses-- 672 00:41:27,194 --> 00:41:30,334 or subclasses of subclasses of what is a book. 673 00:41:30,804 --> 00:41:32,705 So, that's very hard to do. 674 00:41:32,737 --> 00:41:34,253 I can do a graph like that, 675 00:41:34,253 --> 00:41:36,833 but SPARQL Query engine doesn't work 676 00:41:36,833 --> 00:41:41,523 if I want to count everything that is instance of these subclasses, 677 00:41:41,523 --> 00:41:45,143 and basically, SPARQL says no, time-out. 678 00:41:45,633 --> 00:41:47,020 So, what's the problem? 679 00:41:47,020 --> 00:41:50,713 But I know already that there's a lot of subclasses, 680 00:41:50,713 --> 00:41:52,153 but we need to look into it. 681 00:41:52,153 --> 00:41:57,943 And probably, if you know Wikidata, on the page, Wikidata point statistics, 682 00:41:58,643 --> 00:42:02,647 you have all the numbers by big classes, 683 00:42:02,647 --> 00:42:07,047 and you all probably know that the big chunk here 684 00:42:07,047 --> 00:42:08,642 is scholarly articles, 685 00:42:08,707 --> 00:42:12,749 which is, thanks to the WikiCite project, in particular, 686 00:42:14,113 --> 00:42:17,125 which can be books or not, depending on definition. 687 00:42:19,062 --> 00:42:22,508 You see that there's no subclass books, 688 00:42:23,032 --> 00:42:26,034 because there's not enough to show. 689 00:42:26,049 --> 00:42:28,472 It's probably somewhere in the others, 690 00:42:28,472 --> 00:42:30,127 the purple area is others. 691 00:42:30,163 --> 00:42:34,115 And there's a lot of things that's under one percent. 692 00:42:34,162 --> 00:42:38,821 So, basically, we can say that we have less one percent 693 00:42:38,821 --> 00:42:42,131 of things identified as books in Wikidata. 694 00:42:42,551 --> 00:42:46,091 Maybe there is more books, but not identified as such. 695 00:42:47,842 --> 00:42:49,284 I'm talking about books, 696 00:42:49,383 --> 00:42:51,768 but when we are talking about bibliographical data, 697 00:42:51,768 --> 00:42:53,920 there's also the author, person, 698 00:42:53,920 --> 00:42:58,472 so maybe some of the human here are also authors, surely. 699 00:43:00,068 --> 00:43:03,221 And we need to do another count, which is another big query to do. 700 00:43:03,602 --> 00:43:05,301 That times out, so-- 701 00:43:05,396 --> 00:43:08,015 I have a lot of not number to this, sorry. 702 00:43:10,619 --> 00:43:14,332 So, yeah, basically, this first slide is about how it's complicated 703 00:43:14,332 --> 00:43:19,122 to know how much we have of what, and how to count them. 704 00:43:19,445 --> 00:43:21,091 So, yeah, hard to count. 705 00:43:21,618 --> 00:43:23,280 What we know-- 706 00:43:24,133 --> 00:43:26,618 that is we have a lot of properties-- 707 00:43:27,185 --> 00:43:29,684 700,000, I guess, 708 00:43:30,208 --> 00:43:31,680 now on Wikidata. 709 00:43:32,593 --> 00:43:35,952 We know that we have a lot of identifiers among these properties. 710 00:43:36,721 --> 00:43:42,538 And we know that almost 4,000 are properties for identifiers 711 00:43:43,146 --> 00:43:45,623 relative to bibliographical, 712 00:43:45,737 --> 00:43:49,862 like ID at the National Library of France, 713 00:43:49,862 --> 00:43:52,251 National Library of Yaddi, Yaddi, Yada, 714 00:43:52,251 --> 00:43:56,681 because we love identifier of National Library on Wikidata. 715 00:43:56,681 --> 00:44:00,271 So, we have almost all libraries, national libraries and more. 716 00:44:01,101 --> 00:44:03,796 So, we have a lot of properties. I know that. 717 00:44:05,071 --> 00:44:06,727 And we are widely used. 718 00:44:06,834 --> 00:44:10,053 I know that, for instance, BnF properties use-- 719 00:44:10,579 --> 00:44:12,772 BnF is National Library of France-- 720 00:44:12,772 --> 00:44:18,989 is used 1 million times-- OCOC, VIAF, or the big like that. 721 00:44:21,001 --> 00:44:24,202 A lot of uses in Wikidata. 722 00:44:25,426 --> 00:44:28,980 But it's not because we have a lot of uses of various properties 723 00:44:28,980 --> 00:44:30,666 in Wikidata that it's complete. 724 00:44:31,266 --> 00:44:33,758 As Thibaud said, there's more than 20 million books, 725 00:44:33,758 --> 00:44:37,099 [inaudible], which is more as entities. 726 00:44:37,837 --> 00:44:39,569 And we have only 1 million, 727 00:44:39,569 --> 00:44:43,538 so we have 19 million still to do. 728 00:44:45,177 --> 00:44:47,276 Also, what we know from the Wikidata side, 729 00:44:47,276 --> 00:44:51,918 is that we have a good-- very quite active Wikidata project, 730 00:44:51,918 --> 00:44:53,840 called WikiProject Books, 731 00:44:54,332 --> 00:44:58,127 where we have a model we kind of agree on, 732 00:44:58,181 --> 00:45:00,916 which is not always followed, which is, again, a problem. 733 00:45:00,956 --> 00:45:02,710 What is a book? You know it. 734 00:45:03,414 --> 00:45:05,385 I only have five minutes, so, I'll keep going. 735 00:45:06,090 --> 00:45:08,880 And then, I'm a Wikisourcean, so, Wikisourcer. 736 00:45:09,426 --> 00:45:11,930 So, I wanted to know the other way around 737 00:45:11,930 --> 00:45:13,496 what is from Wikisource already, 738 00:45:13,496 --> 00:45:16,406 because Wikisource is already inside the Wikimedia project. 739 00:45:16,406 --> 00:45:19,883 A lot of bibliographical records and information. 740 00:45:19,883 --> 00:45:23,161 So, in the 66 million items on Wikidata, 741 00:45:23,161 --> 00:45:28,850 already 1 million are linked to Wikisource. 742 00:45:29,330 --> 00:45:31,890 [inaudible]. 743 00:45:32,350 --> 00:45:36,080 So, that's very few, but that's quite a lot. 744 00:45:37,496 --> 00:45:40,174 There's a lot of author. 745 00:45:40,174 --> 00:45:44,670 There's some books, texts, work, edition, whatever. 746 00:45:45,271 --> 00:45:48,425 Not always well-arranged. 747 00:45:48,869 --> 00:45:50,600 And there's a lot of internal pages, 748 00:45:50,600 --> 00:45:53,150 like categories and templates, and things like that. 749 00:45:53,194 --> 00:45:54,984 But still, 1 million in total. 750 00:45:58,329 --> 00:46:01,767 The Wikisource community are often small communities, 751 00:46:01,767 --> 00:46:05,010 like on the French community Wikisource, 752 00:46:05,010 --> 00:46:07,537 which is one of the biggest, there's 50 people. 753 00:46:07,537 --> 00:46:08,787 That's the biggest we have. 754 00:46:09,047 --> 00:46:12,937 So, we love Wikidata, because, hey, they did a lot of work for us. 755 00:46:12,942 --> 00:46:15,131 So, just take it from Wikisource. 756 00:46:15,131 --> 00:46:19,885 So, in this small community, we love to reuse Wikidata data. 757 00:46:20,935 --> 00:46:24,076 Right now, we use a lot of a tool which is called WEF-- 758 00:46:24,358 --> 00:46:27,978 Wikidata Edit Framework-- thank you. 759 00:46:29,318 --> 00:46:33,098 And we are eager to see how Wikidata Bridge will work. 760 00:46:33,438 --> 00:46:36,798 And we are trying to do things with a team in Wikidata 761 00:46:37,638 --> 00:46:40,678 in Wikipedia Deutschland team, [inaudible]. 762 00:46:41,007 --> 00:46:43,934 And there's a lot of collaboration in the future 763 00:46:43,934 --> 00:46:46,586 that we want to do: better integrate, 764 00:46:47,636 --> 00:46:51,068 do everything in one click when you import a first book in Wikisource, 765 00:46:51,068 --> 00:46:52,465 things like that. 766 00:46:53,364 --> 00:46:57,664 Better-- do links between edition in Wikidata. 767 00:46:57,852 --> 00:46:59,492 That needs to be done. 768 00:47:00,041 --> 00:47:02,282 The Foundation is doing the wish list now, 769 00:47:02,282 --> 00:47:04,853 and we have a lot of requests about that. 770 00:47:05,938 --> 00:47:07,342 And yeah, that's it. 771 00:47:07,342 --> 00:47:09,116 That was just a short overview. 772 00:47:09,116 --> 00:47:15,272 So, if you have some questions, I'll take them and be available later, 773 00:47:15,712 --> 00:47:17,112 if you want to. 774 00:47:17,723 --> 00:47:19,722 (applause) 775 00:47:25,639 --> 00:47:28,281 Come on, you love Wikisource, you have questions! 776 00:47:33,989 --> 00:47:35,775 (woman 4) I asked you already this in August, 777 00:47:35,775 --> 00:47:38,411 and I wonder if this has already changed. 778 00:47:38,411 --> 00:47:42,337 What is the biggest problem you have in Wikisource right now, 779 00:47:42,337 --> 00:47:43,761 from your perspective? 780 00:47:44,167 --> 00:47:45,670 The first one, only? (chuckles) 781 00:47:48,314 --> 00:47:54,152 I think because it's a small community, we need efficient tools that work easily, 782 00:47:54,152 --> 00:47:57,148 because we have very few people, 783 00:47:57,148 --> 00:47:59,464 so we need tool that are easy to use 784 00:47:59,464 --> 00:48:04,247 and a one-click solution to [inaudible] a bit, 785 00:48:04,371 --> 00:48:05,607 that's a big dream. 786 00:48:05,607 --> 00:48:07,179 I think that's what's most important, 787 00:48:07,179 --> 00:48:10,485 because that's the threshold in Wikisource, it's a small community. 788 00:48:11,204 --> 00:48:13,241 I think this is the most important. 789 00:48:14,615 --> 00:48:15,975 [inaudible] 790 00:48:16,867 --> 00:48:19,600 (man 5) I'm curious if you can speak to your opinion, 791 00:48:19,600 --> 00:48:23,154 or the French Wikisource opinion, or maybe you spoke to other communities 792 00:48:23,154 --> 00:48:29,834 about the notion of not including metadata about all the world's books. 793 00:48:30,234 --> 00:48:31,635 That was mentioned in the morning. 794 00:48:31,635 --> 00:48:34,965 Maybe other Wikibases, and other federated databases 795 00:48:34,965 --> 00:48:38,026 will have that information, and Wikidata won't. 796 00:48:39,159 --> 00:48:41,494 How does that feel for Wikisource? 797 00:48:43,981 --> 00:48:45,502 This is my very personal opinion. 798 00:48:45,502 --> 00:48:47,386 I know that people in the Wikisource community 799 00:48:47,386 --> 00:48:48,723 disagree with that. 800 00:48:48,723 --> 00:48:50,537 But I think we need to stay-- 801 00:48:50,537 --> 00:48:53,194 an external Wikibase is not a good solution, 802 00:48:53,194 --> 00:48:55,353 because we have Shakespeare on Wikisource, 803 00:48:55,353 --> 00:48:58,323 and we have Shakespeare on Wikipedia. 804 00:48:58,564 --> 00:49:01,295 So, we need to interlink, and interlink is there. 805 00:49:01,295 --> 00:49:04,007 Or like, Romeo and Juliet, we have them both. 806 00:49:04,007 --> 00:49:07,229 So, we are still pretty close to Wikipedia. 807 00:49:07,433 --> 00:49:09,431 And the difference with WikiCites-- 808 00:49:09,431 --> 00:49:12,515 with WikiCite, we have a lot of items which are small. 809 00:49:14,372 --> 00:49:16,051 Wikisource is the other way around. 810 00:49:16,150 --> 00:49:18,281 We have few items, who are big. 811 00:49:18,281 --> 00:49:20,515 Which can be a scaling problem and everything, 812 00:49:20,515 --> 00:49:23,615 but it's quite a small subset of data. 813 00:49:23,683 --> 00:49:27,539 So, my personal opinion is we should stay in the Wikidata. 814 00:49:28,391 --> 00:49:32,117 Again, because we are not very much a lot of people, 815 00:49:32,117 --> 00:49:34,287 so we need to stay, with the tool we know, 816 00:49:34,287 --> 00:49:35,846 don't change too much the tools 817 00:49:35,846 --> 00:49:37,736 for the small community, please. 818 00:49:37,769 --> 00:49:39,282 So, that's it. 819 00:49:39,282 --> 00:49:40,910 But I know that other people disagree. 820 00:49:40,910 --> 00:49:44,579 You can talk to [Sadeep] if you want. He will have another point of view. 821 00:49:46,119 --> 00:49:49,319 Thank you. I think, last question, maybe. 822 00:49:51,234 --> 00:49:54,446 (man 6) Sometimes, I find it difficult to link the Wikidata item 823 00:49:54,446 --> 00:50:00,976 with a Wikisource article, because there's a Wikisource novel-- 824 00:50:01,079 --> 00:50:06,128 might be split over several pages, and there's an index page, 825 00:50:06,128 --> 00:50:08,853 and there's perhaps a front page, or something like that. 826 00:50:08,853 --> 00:50:12,053 Do you have that problem, or is that a general problem, or-- 827 00:50:12,092 --> 00:50:16,892 Yeah, that's one of the first ideas on the wish list 828 00:50:16,892 --> 00:50:19,092 for the Foundation, actually. 829 00:50:19,092 --> 00:50:20,790 Yeah, because Wikipedia is on the-- 830 00:50:20,790 --> 00:50:22,772 if you know the [inaudible] organization, 831 00:50:22,772 --> 00:50:26,598 Wikipedia is on the work level, and Wikisource on the edition level. 832 00:50:26,598 --> 00:50:28,572 So, already, you have a problem there. 833 00:50:28,572 --> 00:50:30,931 And then, we have several editions of the same work, 834 00:50:30,931 --> 00:50:34,014 and we have sub-chapters and things inside the edition. 835 00:50:34,014 --> 00:50:41,001 So, yeah, that's one too many problems which is hard to solve by nature. 836 00:50:41,555 --> 00:50:44,839 But there's maybe a tool that can help to solve that. 837 00:50:45,893 --> 00:50:47,469 Hopefully. 838 00:50:49,172 --> 00:50:51,395 And that's time, ladies and gentlemen. 839 00:50:51,398 --> 00:50:53,283 So, thank you very much, Nicolas. 840 00:50:53,335 --> 00:50:55,137 (applause) 841 00:50:59,010 --> 00:51:01,127 And please join me giving one more round of applause 842 00:51:01,127 --> 00:51:03,147 to all of our wonderful speakers. 843 00:51:03,147 --> 00:51:04,901 (applause)