1
00:00:06,009 --> 00:00:09,069
(host) Hello, everyone. Thank you
for coming to these lightning talks.
2
00:00:09,069 --> 00:00:11,529
Our first speaker, I'm going
to run straight into it,
3
00:00:11,529 --> 00:00:13,781
is going to be Rosie
Stephenson-Goodknight.
4
00:00:13,781 --> 00:00:15,319
Did I get that right?
5
00:00:15,319 --> 00:00:19,609
Yes. And so she's going to be talking
about the Women Writers Project.
6
00:00:19,609 --> 00:00:22,569
And we're going to--
yeah, is that right? Great.
7
00:00:22,569 --> 00:00:24,299
And so, we're going
to just launch right in,
8
00:00:24,299 --> 00:00:26,699
and I want to remind you,
if there's time for questions,
9
00:00:26,699 --> 00:00:28,802
to please not speak
until you have the microphone.
10
00:00:28,802 --> 00:00:30,329
Thank you.
11
00:00:31,589 --> 00:00:34,125
(Rosie) Hi, everyone, and thanks
for coming to this session,
12
00:00:34,125 --> 00:00:36,829
where we're going to talk
about Women Writers in Review,
13
00:00:36,829 --> 00:00:40,329
cultures of reception associated
with trans-Atlantic,
14
00:00:40,329 --> 00:00:43,977
English language women writers,
broadly construed.
15
00:00:44,523 --> 00:00:48,387
Women Writers in Review is an initiative
of the Women Writers Project
16
00:00:48,387 --> 00:00:50,535
of Northeastern University.
17
00:00:50,535 --> 00:00:55,253
It moved there from Brown University,
approximately 15 years ago.
18
00:00:55,993 --> 00:01:00,287
Women Writers in Review is a collection
of 18th- and 19th-century reviews,
19
00:01:00,287 --> 00:01:04,281
publication notices,
literary histories, and other texts
20
00:01:04,281 --> 00:01:09,511
corresponding to trans-Atlantic--
so, UK and US mostly,
21
00:01:09,511 --> 00:01:12,953
though a few Canadian--
written works by women.
22
00:01:13,255 --> 00:01:15,683
It's a project where the two universities,
23
00:01:15,683 --> 00:01:18,133
Brown University
and Northeastern University,
24
00:01:18,133 --> 00:01:22,645
started collecting the manuscripts
of women from this period.
25
00:01:23,337 --> 00:01:27,520
And then they started collecting
the reviews of these works,
26
00:01:27,520 --> 00:01:31,593
and then they started scoring
these reviews by giving them a rating.
27
00:01:32,321 --> 00:01:36,144
It's designed to investigate
the discourse of reception and connection
28
00:01:36,144 --> 00:01:39,333
with the changing trans-Atlantic
literary landscape
29
00:01:39,333 --> 00:01:42,664
for the period 1770 to 1830.
30
00:01:46,143 --> 00:01:49,103
You're going to pardon me if I speak fast,
because I've got five minutes
31
00:01:49,103 --> 00:01:50,646
to go over this.
32
00:01:50,646 --> 00:01:55,443
It includes 690 English language texts
responding to works
33
00:01:55,443 --> 00:01:59,565
written or translated
by 18th- and 19th-century women writers.
34
00:01:59,593 --> 00:02:04,813
There are 74 authors in the corpus,
using 112 different sources,
35
00:02:04,813 --> 00:02:07,782
or periodicals, or magazines.
36
00:02:07,782 --> 00:02:10,773
And there are 628 critical reviews.
37
00:02:11,867 --> 00:02:14,671
Here's a picture that shows you
what we're talking about
38
00:02:14,671 --> 00:02:16,573
in terms of a review.
39
00:02:16,573 --> 00:02:18,819
And you can also see what kind of scores
40
00:02:18,819 --> 00:02:25,403
were given by the academics
at Northeastern University.
41
00:02:25,833 --> 00:02:28,922
Most of these are women
who were giving scores
42
00:02:28,922 --> 00:02:34,031
based on the reviews that were done
mostly, probably all men,
43
00:02:34,031 --> 00:02:39,799
back in this time period 1770 to 1830
of works written by women.
44
00:02:39,799 --> 00:02:43,469
By works, we're talking about plays,
and novels, and poems,
45
00:02:43,469 --> 00:02:46,955
essays, and other kinds of articles.
46
00:02:48,615 --> 00:02:50,275
So, what are we talking about?
47
00:02:50,275 --> 00:02:54,676
This required creating
items for authors for their works,
48
00:02:54,676 --> 00:02:57,946
like I said, novels and plays and poems.
49
00:02:57,946 --> 00:03:04,938
It required creating new items
for this period of time
50
00:03:05,038 --> 00:03:08,391
where there are defunct periodicals.
51
00:03:08,391 --> 00:03:12,499
It required creating items
for the scholarly articles.
52
00:03:12,578 --> 00:03:16,900
And then the review scores of each,
and the review score by,
53
00:03:16,943 --> 00:03:19,998
which in this case would be
Women Writers in Review,
54
00:03:19,998 --> 00:03:23,336
and what we still need to add
is the described by source.
55
00:03:25,226 --> 00:03:28,970
This gives you a picture
of the kind of spreadsheets,
56
00:03:28,970 --> 00:03:31,397
Google Spreadsheets,
that I have been working on.
57
00:03:31,397 --> 00:03:34,296
I shouldn't just say I,
because I've had a lot of help.
58
00:03:34,296 --> 00:03:37,546
I've had a lot of people
who were working on this project with me.
59
00:03:37,546 --> 00:03:40,413
And you can see at the top,
something about the authors,
60
00:03:40,413 --> 00:03:41,736
about the works.
61
00:03:41,736 --> 00:03:45,496
The third group is going to be
the periodical,
62
00:03:45,496 --> 00:03:48,006
and then, how the scores started showing.
63
00:03:49,203 --> 00:03:52,122
And of course, this is how they look--
64
00:03:52,122 --> 00:03:57,396
the beauty of being able to present
the preliminary findings.
65
00:03:57,856 --> 00:04:01,767
Once we have uploaded all of the data,
66
00:04:02,989 --> 00:04:05,906
and I hope that that's going to be done
by the end of this year,
67
00:04:06,956 --> 00:04:08,496
this will obviously look different.
68
00:04:09,916 --> 00:04:10,931
Appendix.
69
00:04:10,931 --> 00:04:15,267
So, here's what the depiction looks like
70
00:04:15,267 --> 00:04:18,505
at the Northeastern University website.
71
00:04:19,024 --> 00:04:22,474
I don't think it's quite as clear
as what we can do with Wikidata.
72
00:04:22,531 --> 00:04:27,351
And so, this was probably the reason why,
when I started as a visiting scholar
73
00:04:27,351 --> 00:04:31,751
in 2017, they asked if this is one
of the projects that I could work on.
74
00:04:31,751 --> 00:04:36,093
They stopped their work
the year before, in 2016.
75
00:04:36,093 --> 00:04:39,073
And I think they just don't have
the resources to continue.
76
00:04:40,251 --> 00:04:43,415
Some parts of this presentation
came from another
77
00:04:43,415 --> 00:04:45,812
that was published in 2016.
78
00:04:45,812 --> 00:04:49,401
And last but not least, here are links
79
00:04:49,401 --> 00:04:53,361
to the different parts
of the work that I'm doing.
80
00:04:54,257 --> 00:04:55,561
Thank you very much.
81
00:04:55,561 --> 00:04:56,845
Questions.
82
00:04:56,845 --> 00:04:58,754
(applause)
83
00:05:10,397 --> 00:05:14,665
(woman) So, when you have a work,
and you have the review of the work,
84
00:05:14,665 --> 00:05:17,703
are you looking
at a particular edition of the work,
85
00:05:17,703 --> 00:05:20,665
or are these all reviews
of first editions?
86
00:05:21,271 --> 00:05:22,861
It's a good question. No.
87
00:05:22,861 --> 00:05:25,601
They are not just reviews
of the first edition.
88
00:05:25,601 --> 00:05:28,601
Some are reviews of the second
or third edition.
89
00:05:30,062 --> 00:05:32,262
I'm going to add something
that maybe I should have said
90
00:05:32,262 --> 00:05:34,951
before I closed
and went to question and answers--
91
00:05:34,966 --> 00:05:36,800
what's so special about this?
92
00:05:37,220 --> 00:05:40,461
What's special is nobody else
has done this on Wikidata.
93
00:05:41,454 --> 00:05:45,580
Surely, there are other universities
that have their own collections,
94
00:05:45,580 --> 00:05:51,447
where their scholars have reviewed
the reviews of someone's work
95
00:05:51,800 --> 00:05:53,394
in some language.
96
00:05:54,491 --> 00:05:57,389
So, hopefully,
once this methodology gets--
97
00:05:58,000 --> 00:06:02,390
once I write this up and the project
is over and presented again,
98
00:06:02,390 --> 00:06:05,310
that there will be other
universities, other libraries
99
00:06:05,310 --> 00:06:07,923
that will speak up and say,
"We've got data sets, too,
100
00:06:08,248 --> 00:06:13,020
and we're going to go ahead
and upload them into Wikidata ourselves,"
101
00:06:13,020 --> 00:06:15,910
and then it'd be lovely
to start doing some comparisons.
102
00:06:19,572 --> 00:06:22,060
Anyone? Jane.
103
00:06:22,093 --> 00:06:23,767
(Jane) Do you actually have books?
104
00:06:24,293 --> 00:06:26,889
Do you actually have the books--
are the books in existence,
105
00:06:26,889 --> 00:06:28,860
or are you actually
doing metadata about books
106
00:06:28,860 --> 00:06:31,400
where we don't even know
where the books are?
107
00:06:31,780 --> 00:06:34,829
Northeastern University
actually has the book,
108
00:06:34,829 --> 00:06:37,209
or the essay, or the poem.
109
00:06:39,759 --> 00:06:45,392
And they have the critical review
of the book, or the essay, or the poem.
110
00:06:45,755 --> 00:06:48,820
And they're working
on the transcription of these,
111
00:06:48,820 --> 00:06:51,452
and they're not at 100% yet.
112
00:06:52,432 --> 00:06:56,256
They're not at 100%, but it's like,
all things working on it.
113
00:07:00,218 --> 00:07:02,043
Any other questions?
114
00:07:05,697 --> 00:07:07,399
(host) We're going to wrap it up there.
115
00:07:07,399 --> 00:07:09,063
Thanks for being such a nice audience.
116
00:07:09,063 --> 00:07:11,677
(applause)
117
00:07:14,012 --> 00:07:18,581
Lady bug for [inaudible].
118
00:08:58,271 --> 00:08:59,372
(man) Finally got that.
119
00:08:59,372 --> 00:09:02,565
What I'm going to do is I'm just going
to click on these to load.
120
00:09:02,565 --> 00:09:06,091
Just while-- is that new tab there?
121
00:09:06,946 --> 00:09:08,053
[inaudible]
122
00:09:08,053 --> 00:09:10,524
The first one? Yeah, perfect.
123
00:09:11,024 --> 00:09:13,503
Sorry, my German is not even rusty,
124
00:09:13,503 --> 00:09:15,251
it's simply non-existent.
125
00:09:15,663 --> 00:09:19,561
So, I'll just let them load,
because then these queries can run
126
00:09:19,561 --> 00:09:22,728
while I'm sort of introducing
what I was talking about and doing.
127
00:09:22,728 --> 00:09:24,795
So, hi, I'm Nav from Histropedia.
128
00:09:24,795 --> 00:09:28,169
And basically, for the last
quite a few years,
129
00:09:28,169 --> 00:09:29,710
we've been relatively quiet,
130
00:09:29,710 --> 00:09:32,423
while we've been sort of working
on technology and tools
131
00:09:32,423 --> 00:09:36,837
that we need to sort of develop,
ultimately, Histropedia version 2,
132
00:09:36,837 --> 00:09:39,433
which is going to be, you know,
this huge enhancement
133
00:09:39,433 --> 00:09:40,771
on the first version.
134
00:09:40,771 --> 00:09:43,270
Well, it's kind of in progress,
but as we do it,
135
00:09:43,270 --> 00:09:45,236
we've been experimenting
with these other tools,
136
00:09:45,236 --> 00:09:47,387
and building the technology
that we're going to need.
137
00:09:48,132 --> 00:09:51,781
One really crucial part for this
is the ability to sort of see
138
00:09:51,781 --> 00:09:55,085
the whole of history
from the billions of years time scale,
139
00:09:55,085 --> 00:09:58,602
to up to the current day,
140
00:09:58,602 --> 00:10:00,638
and zooming all the way into single days.
141
00:10:00,638 --> 00:10:03,433
And ultimately, in the end,
down to hours and minutes.
142
00:10:03,433 --> 00:10:06,517
We've managed to create
a [inaudible] of update to our engine.
143
00:10:06,517 --> 00:10:08,327
Other engines can already do this,
144
00:10:08,327 --> 00:10:11,122
but unfortunately, they also can't handle
the large data sets.
145
00:10:11,122 --> 00:10:13,269
So, we finally got this update
to our engine.
146
00:10:13,269 --> 00:10:15,392
It allows us to zoom to billions of years.
147
00:10:15,392 --> 00:10:19,533
So, recently-- the recently
finished update,
148
00:10:19,533 --> 00:10:22,333
and it's basically, it's an update
to our query viewer tool,
149
00:10:22,333 --> 00:10:24,482
which is like a live version
of Histropedia
150
00:10:24,482 --> 00:10:26,832
just linked straight to Wikidata.
151
00:10:26,832 --> 00:10:29,092
So, it's literally based on a query,
152
00:10:29,092 --> 00:10:31,372
a live query, and we see
the results of it.
153
00:10:31,372 --> 00:10:33,883
So, it's sort of separate
to our main tool.
154
00:10:33,883 --> 00:10:37,502
So, I'm going to flick to the first one,
which is my first experiment.
155
00:10:37,502 --> 00:10:39,716
And you'll forgive me, the queries--
156
00:10:39,716 --> 00:10:42,181
the code was kind of finished
not so long ago,
157
00:10:42,181 --> 00:10:44,736
and the queries, I've been trying
to find out what can I find
158
00:10:44,736 --> 00:10:47,692
and what's interesting
to look at, what's missing.
159
00:10:47,692 --> 00:10:52,154
So, I started off
with a kind of, sort of, well--
160
00:10:52,154 --> 00:10:54,241
So, that's not the right--
that's not Life on Earth.
161
00:10:54,241 --> 00:10:55,699
Is this Life on Earth?
162
00:10:56,123 --> 00:10:57,467
That will do, anyway.
163
00:10:57,467 --> 00:11:01,985
So, I started off just trying to look
at what sort of things
164
00:11:01,985 --> 00:11:04,657
are actually in Wikidata.
165
00:11:04,657 --> 00:11:07,407
And this particular one--
sorry, it's in reverse.
166
00:11:07,407 --> 00:11:09,829
So, this is the first one
I wanted to show you.
167
00:11:09,829 --> 00:11:12,485
So, this is a kind of
a life on Earth query
168
00:11:12,485 --> 00:11:14,457
that I wanted to develop.
169
00:11:14,457 --> 00:11:18,410
And basically, what it is
is all the taxons in Wikidata
170
00:11:18,410 --> 00:11:20,157
that have a date.
171
00:11:20,157 --> 00:11:23,726
And as you can probably see
from the panel, there is not many of them.
172
00:11:23,726 --> 00:11:25,784
But we do have the different taxon ranks.
173
00:11:25,784 --> 00:11:27,596
So, you know, is it a species, a class--
174
00:11:27,596 --> 00:11:29,725
for a biologist,
this makes a lot of sense.
175
00:11:29,725 --> 00:11:32,446
But if I was just to close that a bit,
176
00:11:32,596 --> 00:11:35,453
we can see, we are going back
to the earliest forms of life here.
177
00:11:35,453 --> 00:11:37,236
3.5 billion years ago.
178
00:11:37,236 --> 00:11:42,707
And as we zoom in here, we start to see
the more modern forms of life,
179
00:11:42,746 --> 00:11:47,232
and we see some really
interesting things developing,
180
00:11:47,232 --> 00:11:50,829
but we're still lacking a lot of data
in terms of this kind of time range.
181
00:11:52,250 --> 00:11:55,286
So, my next thought was,
"Okay, well, why aren't--"
182
00:11:55,592 --> 00:11:57,088
"I want to see a Tyrannosaurus Rex."
183
00:11:57,088 --> 00:11:59,838
That's what I really wanted to see
on my query, and it wasn't there.
184
00:11:59,838 --> 00:12:02,138
So, had a little dig in,
and I found out why.
185
00:12:02,234 --> 00:12:05,284
It's because they're much more
being stored
186
00:12:05,284 --> 00:12:08,696
in terms of the temporal range
or time period that they relate to.
187
00:12:09,065 --> 00:12:11,412
So, on comes the next query,
188
00:12:11,412 --> 00:12:13,144
where I actually sort of--
189
00:12:13,664 --> 00:12:17,641
basically, this query
is looking for any item
190
00:12:17,641 --> 00:12:22,284
that has a temporal range start,
and/or a temporal range end.
191
00:12:22,665 --> 00:12:25,965
Which is basically in the form--
in life forms, it kind of relates
192
00:12:25,965 --> 00:12:28,644
to when they emerged
and when they became extinct.
193
00:12:28,644 --> 00:12:31,044
So, these are the periods
on the side here.
194
00:12:31,585 --> 00:12:33,190
If I just close that a bit--
195
00:12:33,190 --> 00:12:37,364
you can see that we have
quite a lot of interesting stuff.
196
00:12:37,364 --> 00:12:39,834
And there's the Tyrannosaurus
that I was looking for.
197
00:12:39,834 --> 00:12:43,394
So, I finally got that,
and I was like, "Yes! I've done it!"
198
00:12:43,394 --> 00:12:46,084
I've got that Triceratops
in there for bonus.
199
00:12:46,084 --> 00:12:48,984
But of course, still loads missing.
200
00:12:48,984 --> 00:12:50,665
And I'd love to see lots more here.
201
00:12:50,665 --> 00:12:52,590
But at least, it gives you the idea.
202
00:12:52,590 --> 00:12:55,794
The nice thing is, here as well,
if I star some of these,
203
00:12:55,794 --> 00:12:58,374
you can see that
the time range is shown.
204
00:12:58,374 --> 00:13:01,027
So, you can start to do
what I really wanted to do, is say,
205
00:13:01,027 --> 00:13:04,004
"Okay, when did this one end,
and when did the next one begin?
206
00:13:04,004 --> 00:13:06,085
When did things start going extinct?"
207
00:13:06,085 --> 00:13:09,832
So, I was pretty excited, but, still,
really hoping for a lot more.
208
00:13:09,832 --> 00:13:11,619
So, there's a lot of editing to be done
209
00:13:11,619 --> 00:13:15,098
in terms of these large geological
and cosmic time scales.
210
00:13:15,909 --> 00:13:19,273
You can see on the color code,
I can also do extinction period.
211
00:13:19,273 --> 00:13:23,489
So, I say, I want to find out stuff
that went extinct in the late Cretaceous.
212
00:13:23,489 --> 00:13:25,768
And I now know that two things did that.
213
00:13:25,768 --> 00:13:27,717
There's obviously quite a few more.
214
00:13:27,717 --> 00:13:30,483
And I put the taxon rank
in there, as well,
215
00:13:30,483 --> 00:13:31,986
just so that we can also see,
216
00:13:31,986 --> 00:13:34,588
"Okay, which, what
is its species, genus, et cetera."
217
00:13:35,479 --> 00:13:37,143
So, pretty exciting.
218
00:13:37,143 --> 00:13:41,192
I was quite happy, but it's unfolding,
what needs to be done a lot.
219
00:13:42,126 --> 00:13:45,447
So I went to the next one, which was--
220
00:13:45,447 --> 00:13:48,045
I was thinking, "Well, I can't find
all the data I'm looking for.
221
00:13:48,045 --> 00:13:49,347
Let's go a bit more general,
222
00:13:49,347 --> 00:13:53,833
and just look for all of a certain kind
of dates in Wikidata that I can find
223
00:13:53,833 --> 00:13:57,240
that are over 10,000 years old, basically.
224
00:13:58,219 --> 00:14:00,703
And what type of thing are they?"
225
00:14:00,762 --> 00:14:04,298
So, this color code is relatively okay,
but it might be a bit misleading,
226
00:14:04,298 --> 00:14:06,264
because some things are multiple types.
227
00:14:06,264 --> 00:14:08,318
So, therefore,
it's a bit random, at times.
228
00:14:08,318 --> 00:14:11,468
But, you get some really
fascinating stuff in here.
229
00:14:11,468 --> 00:14:14,255
I've got for a start--
I've got all of the millennia
230
00:14:14,255 --> 00:14:18,238
that we have in Wikidata,
which is, you know, there you go.
231
00:14:18,238 --> 00:14:21,558
Read about everything that happened
in all these different millennia.
232
00:14:21,558 --> 00:14:23,629
No pictures for any
of these, unfortunately.
233
00:14:23,629 --> 00:14:26,670
So, there's nothing to really say
what happened in them.
234
00:14:26,670 --> 00:14:29,203
Taxon, which we were just looking at,
which kind of led me on
235
00:14:29,203 --> 00:14:31,124
to the other queries.
236
00:14:31,124 --> 00:14:34,079
And of course, that sort of
like all of them in one group.
237
00:14:34,079 --> 00:14:36,875
Interesting stuff.
Archaeological cultures.
238
00:14:36,875 --> 00:14:40,121
And this is like, okay,
this is more like up my street.
239
00:14:40,121 --> 00:14:42,670
This is the sort of things
I want to learn about.
240
00:14:42,670 --> 00:14:45,234
Again, pictures would be nice.
241
00:14:45,493 --> 00:14:48,781
But it's really showing you
something interesting.
242
00:14:48,781 --> 00:14:50,361
And it's just worth exploring here.
243
00:14:50,361 --> 00:14:52,534
And of course, there's some
that really make me excited
244
00:14:52,534 --> 00:14:54,048
for what we could be doing.
245
00:14:54,048 --> 00:14:57,288
For example, there was
something here which was--
246
00:14:58,028 --> 00:15:00,888
I mean, system, actually,
was quite an interesting one.
247
00:15:01,794 --> 00:15:04,237
And sorry, that's not actually
the one I was thinking about.
248
00:15:04,237 --> 00:15:05,958
In fact, that means nothing to me at all.
249
00:15:05,958 --> 00:15:07,613
Someone might know what that means.
250
00:15:08,057 --> 00:15:10,813
Art movements,
archaeological sites, activities.
251
00:15:10,813 --> 00:15:12,478
There was only two of these,
252
00:15:12,478 --> 00:15:15,788
but I really like the idea, because--
and they're both the same.
253
00:15:15,788 --> 00:15:17,658
They're both hunting.
254
00:15:17,730 --> 00:15:19,390
And of course, there's two of them.
255
00:15:19,390 --> 00:15:22,360
And the reason is, is because
there's a little qualifier on there.
256
00:15:22,360 --> 00:15:25,143
If we were to just
look through, we can see--
257
00:15:25,143 --> 00:15:27,735
we can see somewhere down here,
will be the start time.
258
00:15:27,735 --> 00:15:30,690
And the qualifier is talking about
when Homo erectus did it,
259
00:15:30,690 --> 00:15:32,735
and when Homo sapiens did it.
260
00:15:32,735 --> 00:15:35,513
So that should be
in brackets on the query,
261
00:15:35,513 --> 00:15:39,002
a little extension to do to show you
what the two different versions mean.
262
00:15:39,002 --> 00:15:42,390
But I would love to see
all of human skills in here.
263
00:15:42,390 --> 00:15:44,708
When did we first do farming,
when did we first this--
264
00:15:44,708 --> 00:15:46,010
when did fire come about?
265
00:15:46,010 --> 00:15:48,270
All of these things,
when did we first extract iron?
266
00:15:48,270 --> 00:15:50,355
When did we first--
all of these wonderful things
267
00:15:50,355 --> 00:15:53,607
that developed
to modern world that we live in.
268
00:15:53,607 --> 00:15:56,873
So, really exciting signs
of what could be there,
269
00:15:56,873 --> 00:15:58,112
if it all got populated.
270
00:15:58,112 --> 00:16:00,210
So, you know, this is what
we really need to work on,
271
00:16:00,210 --> 00:16:02,333
is some of this historical info.
272
00:16:03,243 --> 00:16:05,060
Last one, I just wanted to just show you,
273
00:16:05,060 --> 00:16:07,283
which was just an extra
bonus one I threw in,
274
00:16:07,283 --> 00:16:10,875
just to look at the time periods
that we actually have,
275
00:16:10,875 --> 00:16:13,921
the historical ages
that we have in Wikidata.
276
00:16:13,921 --> 00:16:17,524
And so, this is actually just all
sub-classes of unit of time.
277
00:16:17,524 --> 00:16:22,396
And then, this is the actual
instance that it was.
278
00:16:22,396 --> 00:16:23,775
And it's just really interesting.
279
00:16:23,775 --> 00:16:25,849
This is more the kind of thing--
280
00:16:26,979 --> 00:16:29,541
In Histropedia Mark II,
these are the kind of things
281
00:16:29,541 --> 00:16:31,944
that will actually will be displayed
more under the timeline
282
00:16:31,944 --> 00:16:33,984
as a sort of a range or period.
283
00:16:33,993 --> 00:16:36,436
And so, we are particularly interested
in these periods
284
00:16:36,436 --> 00:16:37,976
being really tight and nice,
285
00:16:37,976 --> 00:16:40,718
because it helps you to, then,
say what happened when,
286
00:16:40,718 --> 00:16:43,983
and you can sound really clever
when you talk about when things happened,
287
00:16:43,983 --> 00:16:47,263
in the Neolithic or the upper
Paleolithic, or whatever.
288
00:16:47,263 --> 00:16:49,121
I'm still pretty clueless on most of it,
289
00:16:49,121 --> 00:16:51,918
because I'm just kind of just waiting
for the data to be up to scratch.
290
00:16:51,918 --> 00:16:55,163
Great. I think I can actually
round it up there.
291
00:16:55,163 --> 00:16:57,145
Loads more exciting queries to come.
292
00:16:57,145 --> 00:17:00,420
A lot more features and cool stuff,
actually, just around the corner for us,
293
00:17:00,420 --> 00:17:02,758
because we've just finished
a lot of cool things.
294
00:17:02,758 --> 00:17:05,471
But there's a little bit of time
to pull it all together.
295
00:17:05,471 --> 00:17:07,373
So, look out for more.
296
00:17:07,373 --> 00:17:09,760
If there's any questions,
I think I've got one minute.
297
00:17:09,760 --> 00:17:11,458
So, it would have to be one.
298
00:17:11,510 --> 00:17:13,253
(host) Yes, Nav.
I forgot to introduce you.
299
00:17:13,253 --> 00:17:16,933
I'm sorry. That's Nav, as he said,
Histropedia, Evans. Thank you very much.
300
00:17:16,933 --> 00:17:17,986
Thank you. Cheers. Yeah.
301
00:17:17,986 --> 00:17:19,450
(host) Very fast questions.
302
00:17:19,450 --> 00:17:21,815
Anyone with a very fast question
[inaudible].
303
00:17:24,654 --> 00:17:29,230
(woman 2) Very quickly, how can
I do my own, if I want languages,
304
00:17:29,230 --> 00:17:30,818
when do we start, for instance.
305
00:17:30,818 --> 00:17:32,031
Absolutely. Good question.
306
00:17:32,031 --> 00:17:34,320
So just click on the--
oh, I've shared this.
307
00:17:34,320 --> 00:17:36,853
It's called cosmic timelines on the URL.
308
00:17:36,853 --> 00:17:40,911
Should be cosmic and geological,
but then it's not a short URL anymore.
309
00:17:40,911 --> 00:17:43,711
So, you click on this icon
in the top corner there,
310
00:17:43,711 --> 00:17:47,431
and then, you get to the query page,
which is like the home page of this tool.
311
00:17:47,431 --> 00:17:49,311
This is where the query is pasted in.
312
00:17:49,311 --> 00:17:51,491
So, at the moment,
I've got the language there.
313
00:17:51,491 --> 00:17:53,483
If I want to change it to something else,
314
00:17:53,483 --> 00:17:56,062
Arabic, or French, or whatever--
315
00:17:56,062 --> 00:17:58,271
and here are the-- this is the area
316
00:17:58,271 --> 00:18:03,092
where you sort of enter in exactly
which variables in your query
317
00:18:03,092 --> 00:18:04,600
you would like to do each thing.
318
00:18:04,600 --> 00:18:06,781
If you put nothing in,
it will try and figure it out.
319
00:18:06,781 --> 00:18:09,971
But if you want advanced stuff--
and really important, is the precision,
320
00:18:09,971 --> 00:18:13,033
because that's not available
on the query service timeline.
321
00:18:13,033 --> 00:18:14,123
So, you get everything--
322
00:18:14,123 --> 00:18:16,303
is the first of January
10 billion years ago,
323
00:18:16,303 --> 00:18:18,363
you know, which is not
what we want to see.
324
00:18:18,363 --> 00:18:20,603
And the rank, which is quite interesting.
325
00:18:20,603 --> 00:18:24,173
My timelines are all based
on a very simple rank of site link count,
326
00:18:24,173 --> 00:18:27,058
how many different articles there are,
or something else.
327
00:18:27,058 --> 00:18:29,432
But that's how you go
and mess around with it with yourself,
328
00:18:29,432 --> 00:18:32,034
and you put your color codes
and your filters in down here.
329
00:18:32,034 --> 00:18:34,098
Comma separate them,
if you would like more,
330
00:18:34,098 --> 00:18:36,007
and they come up as options
in the final tool.
331
00:18:36,007 --> 00:18:37,836
And I think that
pretty much is it, isn't it.
332
00:18:37,836 --> 00:18:39,863
So, any other questions,
do find me afterwards.
333
00:18:39,863 --> 00:18:41,655
Always happy to get cornered
for this stuff.
334
00:18:41,655 --> 00:18:42,954
I love talking about it.
335
00:18:42,954 --> 00:18:44,989
Okay. So, thank you very much. Cheers.
336
00:18:44,989 --> 00:18:46,948
(applause)
337
00:19:28,344 --> 00:19:30,220
(mumbles)
338
00:19:30,265 --> 00:19:32,115
So, where is the first one?
339
00:19:33,854 --> 00:19:35,397
This one, no.
340
00:19:45,636 --> 00:19:47,132
This? Sorry.
341
00:19:48,270 --> 00:19:50,090
Is it full screen?
342
00:19:50,217 --> 00:19:52,129
Yep. Full screen.
343
00:19:54,747 --> 00:19:56,289
Well, good work.
344
00:19:58,388 --> 00:19:59,434
[Strike.]
345
00:19:59,497 --> 00:20:02,312
Yeah, so, okay. Thank you.
346
00:20:04,752 --> 00:20:07,062
So, hi, I'm Thibaud Senalada.
347
00:20:07,062 --> 00:20:08,952
As [inaudible] introduced me.
348
00:20:09,552 --> 00:20:14,212
I'm a software engineer
at the French National Library.
349
00:20:14,992 --> 00:20:18,349
And I'm here today
to talk to you about NOEMI,
350
00:20:18,979 --> 00:20:23,682
which is a software, a proof of concept,
351
00:20:23,682 --> 00:20:26,501
and a [inaudible] software
352
00:20:26,635 --> 00:20:29,961
to the French Library to cataloging.
353
00:20:30,787 --> 00:20:32,870
Sorry. [inaudible].
354
00:20:32,870 --> 00:20:35,359
Sorry for my English. It's a bit of fuzzy.
355
00:20:36,971 --> 00:20:39,321
And so, what's NOEMI?
356
00:20:39,321 --> 00:20:41,589
So, NOEMI stands for:
357
00:20:41,589 --> 00:20:44,591
Nouer les oeuvres, expressions,
Manifestations et Items.
358
00:20:44,591 --> 00:20:46,533
Which, in English, is:
359
00:20:46,533 --> 00:20:49,891
to link work, expression,
manifestation, and items.
360
00:20:51,086 --> 00:20:58,057
It's based on the FRBR,
361
00:20:58,057 --> 00:21:00,633
and [inaudible].
362
00:21:00,881 --> 00:21:03,105
Yeah. Anyway.
363
00:21:03,631 --> 00:21:04,839
So, yeah.
364
00:21:05,244 --> 00:21:09,540
So, this software,
we use to produce metadata.
365
00:21:10,841 --> 00:21:12,201
It will be used
366
00:21:12,201 --> 00:21:17,831
by 600 people on a daily basis.
367
00:21:18,911 --> 00:21:24,271
And as I say in the title,
it will be based on Wikibase.
368
00:21:25,415 --> 00:21:31,871
So, there is also a format manager.
369
00:21:32,388 --> 00:21:39,138
So, people using this software
will use like a code editor,
370
00:21:39,254 --> 00:21:41,817
but for MARC format.
371
00:21:41,968 --> 00:21:45,178
So, it's [inaudible], things like that.
372
00:21:46,814 --> 00:21:49,868
A data processing tool, like I said.
373
00:21:49,959 --> 00:21:53,040
And also, authorization management,
374
00:21:54,327 --> 00:21:56,378
because they will need a--
375
00:21:57,337 --> 00:22:01,417
if there is some data,
where it can be modified.
376
00:22:05,877 --> 00:22:07,840
So, the PoC context.
377
00:22:08,728 --> 00:22:12,738
So, this software will be replacing
an old software,
378
00:22:12,855 --> 00:22:15,688
called ADCAT02.
379
00:22:17,111 --> 00:22:20,964
It is part of the bibliographic
transition.
380
00:22:20,984 --> 00:22:24,554
So, I say the [inaudible].
381
00:22:25,359 --> 00:22:29,390
[inaudible]. [inaudible] in English?
382
00:22:30,254 --> 00:22:31,662
Format.
383
00:22:32,717 --> 00:22:35,734
And it will be the [inaudible] of the--
384
00:22:39,979 --> 00:22:41,090
Sorry.
385
00:22:42,349 --> 00:22:46,560
It will be [inaudible]
all the [inaudible]
386
00:22:46,560 --> 00:22:49,689
of the BnF with data.
387
00:22:51,731 --> 00:22:54,124
And so, doing this work,
388
00:22:54,124 --> 00:22:59,693
we accessed Wikibase to see
if it fits our needs.
389
00:23:01,244 --> 00:23:03,383
And [inaudible] pretty good.
390
00:23:04,485 --> 00:23:06,930
So, why Wikibase?
391
00:23:06,930 --> 00:23:08,821
Because of the flexibility of the format.
392
00:23:08,835 --> 00:23:11,646
We arrive--
393
00:23:11,850 --> 00:23:16,388
to inject MARC, INTERMARC for BnF--
394
00:23:16,960 --> 00:23:18,350
in the database.
395
00:23:18,399 --> 00:23:22,803
And use it to-- use this link management
396
00:23:22,803 --> 00:23:25,529
between entities using Blazegraph,
397
00:23:25,529 --> 00:23:27,776
so, as Wikibase does.
398
00:23:29,155 --> 00:23:32,700
We also choose Wikibase,
because it was already--
399
00:23:35,183 --> 00:23:38,900
it handles history and user account.
400
00:23:39,941 --> 00:23:42,414
So, it's easiest for us.
401
00:23:43,106 --> 00:23:48,270
And it also has a good--
it's pretty easy to create bots
402
00:23:48,270 --> 00:23:51,090
to watch and curate data
403
00:23:51,840 --> 00:23:53,430
and also to make statistics.
404
00:23:54,820 --> 00:23:57,170
It's free and open, and sustainable.
405
00:23:57,908 --> 00:23:59,084
Yeah, so.
406
00:23:59,610 --> 00:24:02,519
I'm sorry if you don't
understand what I say,
407
00:24:02,519 --> 00:24:04,839
because I know my English
is not that good.
408
00:24:07,720 --> 00:24:12,139
But during this PoC,
we encountered some trouble.
409
00:24:12,802 --> 00:24:13,938
Okay.
410
00:24:14,790 --> 00:24:21,117
First of all, as a search engine,
I think we have to create
411
00:24:21,117 --> 00:24:24,150
another--
412
00:24:24,185 --> 00:24:28,988
not another, a supplementary
search engine to use it with,
413
00:24:29,433 --> 00:24:31,120
to fit our needs.
414
00:24:31,688 --> 00:24:37,155
Because we need some search
415
00:24:37,155 --> 00:24:42,366
like faceted search and filters.
416
00:24:43,755 --> 00:24:47,525
Also we have the [inaudible],
417
00:24:47,525 --> 00:24:50,407
of using postgreSQL database.
418
00:24:50,407 --> 00:24:54,885
And for the moment,
I think Wikibase [inaudible].
419
00:24:56,436 --> 00:25:01,266
And when we try to use postgreSQL,
it was a bit difficult,
420
00:25:01,266 --> 00:25:04,394
and will cause some issues.
421
00:25:05,662 --> 00:25:08,825
And we have also some fear
about performance,
422
00:25:08,825 --> 00:25:15,238
because the catalog is about
20 million entities,
423
00:25:16,366 --> 00:25:19,146
20 million bibliographic entities.
424
00:25:19,146 --> 00:25:22,851
That can be more
than 20 million entities, actually.
425
00:25:23,276 --> 00:25:27,771
And we don't know the time
that we'll have to inject them
426
00:25:27,809 --> 00:25:30,765
in the Wikibase, and how to do it.
427
00:25:32,198 --> 00:25:34,267
So, [inaudible],
428
00:25:34,324 --> 00:25:39,616
but the real software development
has already started.
429
00:25:43,242 --> 00:25:46,175
We start by creating
an interface with Wikibase.
430
00:25:46,261 --> 00:25:47,711
We're using Java.
431
00:25:48,091 --> 00:25:50,093
Like PyWikibase.
432
00:25:51,691 --> 00:25:54,888
- (man) Pywikibot.
- Pywikibot. Yeah, thank you.
433
00:25:56,027 --> 00:25:57,723
The same way, but in Java.
434
00:25:59,309 --> 00:26:02,831
We also inject already the format
into the Wikibase.
435
00:26:03,540 --> 00:26:09,093
And we do something
like the INTERMARC editor,
436
00:26:09,458 --> 00:26:12,134
[inaudible], et cetera.
437
00:26:13,672 --> 00:26:14,926
Thank you.
438
00:26:15,333 --> 00:26:17,135
(applause)
439
00:26:23,527 --> 00:26:24,749
Yeah.
440
00:26:27,748 --> 00:26:29,813
(man 2) Faceted search
will be a nice feature
441
00:26:29,813 --> 00:26:31,885
in the Wikidata UI itself.
442
00:26:31,924 --> 00:26:34,062
So, have you talked
to any of the developers,
443
00:26:34,062 --> 00:26:35,675
or is that something
that could be done?
444
00:26:35,711 --> 00:26:37,108
Sorry, I don't understand.
445
00:26:37,108 --> 00:26:39,041
(man 2) The faceted search idea.
446
00:26:39,911 --> 00:26:41,982
It would be nice to be able
to search only humans,
447
00:26:41,982 --> 00:26:44,221
or search only works, or something, right?
448
00:26:44,321 --> 00:26:47,991
Yeah. I'm sorry, I don't-- I don't--
449
00:26:48,131 --> 00:26:50,436
(man 2) Yeah, I mean, so,
it would be nice if we had that
450
00:26:50,436 --> 00:26:52,265
in Wikidata itself in the UI.
451
00:26:52,822 --> 00:26:53,954
Yeah, yeah, yeah.
452
00:26:54,088 --> 00:26:56,077
[inaudible]
453
00:26:56,077 --> 00:26:57,911
Yeah, okay, thank you.
454
00:26:57,911 --> 00:27:00,026
I'm sorry. (laughs)
455
00:27:01,186 --> 00:27:03,902
Yeah, yeah. But I think we will--
456
00:27:04,506 --> 00:27:07,266
I don't know if we want
to do it inside Wikibase,
457
00:27:07,266 --> 00:27:10,746
or in our next systems.
458
00:27:10,785 --> 00:27:15,186
For the moment,
we don't really solve that.
459
00:27:15,965 --> 00:27:17,885
For the moment, I think.
460
00:27:17,885 --> 00:27:19,285
Sorry.
461
00:27:27,645 --> 00:27:30,644
(man 3) I suppose on the topic
of the faceted search,
462
00:27:32,535 --> 00:27:35,068
Wikidata, SPARQL Query, Wikibase--
463
00:27:35,068 --> 00:27:38,965
SPARQL Query is I think,
functionally equivalent
464
00:27:38,965 --> 00:27:41,405
to a facetable search.
465
00:27:42,105 --> 00:27:44,234
So, it's mostly an interface issue, right?
466
00:27:44,284 --> 00:27:47,791
I mean, you could build an interface
that starts with a query,
467
00:27:47,791 --> 00:27:51,111
and then, gives you
possible facets to filter by.
468
00:27:51,370 --> 00:27:52,660
And when you click one of them,
469
00:27:52,660 --> 00:27:55,217
it adds a condition
to the SPARQL Query, right?
470
00:27:55,664 --> 00:27:58,183
Yeah, but I think the SPARQL--
471
00:27:59,157 --> 00:28:04,310
they don't go as detailed
as we want, as we have--
472
00:28:05,632 --> 00:28:09,631
When we inject the format,
we use a statement for--
473
00:28:10,525 --> 00:28:13,124
the format is like XML.
474
00:28:13,223 --> 00:28:15,842
So, it's a zone, subzone, and value.
475
00:28:16,413 --> 00:28:20,292
And in the [inaudible] statement,
we add the subzone,
476
00:28:20,892 --> 00:28:22,902
because the zone was already there.
477
00:28:23,002 --> 00:28:28,565
And we want to query
some qualifier on this.
478
00:28:28,659 --> 00:28:35,206
And I don't know if the SPARQL
goes through that-- I'm sorry--
479
00:28:36,145 --> 00:28:38,277
in a fast way.
480
00:28:40,025 --> 00:28:46,285
I think we need some index
for us to [inaudible].
481
00:28:46,925 --> 00:28:48,145
Yeah.
482
00:28:48,145 --> 00:28:50,250
(man 3) SPARQL doesn't do a query--
483
00:28:52,321 --> 00:28:55,703
To do proper string searches
in SPARQL is very hard.
484
00:28:55,703 --> 00:28:57,610
You have to have filters, which are slow,
485
00:28:57,610 --> 00:28:59,815
and it really doesn't work that well.
486
00:28:59,815 --> 00:29:02,845
So, it's a different
search problem, really.
487
00:29:06,871 --> 00:29:09,270
More question? If anyone has one?
488
00:29:12,215 --> 00:29:13,999
- Great. Thank you.
- Thank you.
489
00:29:14,044 --> 00:29:15,895
(applause)
490
00:29:37,766 --> 00:29:41,960
(host) Nielsen speaking about
the tool Ordia. Thank you.
491
00:30:05,084 --> 00:30:06,460
So, I'm Finn Årup Nielsen,
492
00:30:06,460 --> 00:30:09,006
and a couple of years ago,
I started Scholia
493
00:30:09,006 --> 00:30:14,611
that displays data from Wikidata
via a SPARQL Query
494
00:30:14,611 --> 00:30:16,359
to the Wikidata Query Service
495
00:30:16,359 --> 00:30:18,959
so we can generate, for example,
a list of publications
496
00:30:18,959 --> 00:30:20,380
for a specific author.
497
00:30:20,866 --> 00:30:26,941
Now, last year, Wikidata
introduced lexicographic data.
498
00:30:29,332 --> 00:30:32,655
And I [inaudible] the idea of Scholia
499
00:30:32,655 --> 00:30:39,279
that is using Wikidata
and the Wikidata Query Service
500
00:30:39,445 --> 00:30:42,036
to generate overviews
of lexicographic data.
501
00:30:42,585 --> 00:30:46,125
So, Ordia is the example of this one here.
502
00:30:46,197 --> 00:30:51,998
So, it generates-- it's a web application
run from the Toolforge service,
503
00:30:51,998 --> 00:30:57,198
and for example, it will dynamically
generate a page such as--
504
00:30:57,234 --> 00:31:01,768
This one here is statistics over
what there is of lexicographic data
505
00:31:01,768 --> 00:31:03,841
in Wikidata.
506
00:31:03,992 --> 00:31:07,404
For example, the number of lexemes,
is currently over 200,000.
507
00:31:08,664 --> 00:31:10,483
So, there's a range of things
you can do here.
508
00:31:10,483 --> 00:31:12,916
You can, for example,
look in the aspects of that.
509
00:31:12,916 --> 00:31:15,560
The menu, there's quite a lot
of things here.
510
00:31:15,560 --> 00:31:18,485
And so, I will search
on a specific Danish lexemes.
511
00:31:19,503 --> 00:31:22,835
"Rød"-- which is "red" in Danish.
512
00:31:23,376 --> 00:31:27,466
So, you basically get,
for the specific lexeme,
513
00:31:28,286 --> 00:31:30,618
the same type of information
that you could see
514
00:31:30,618 --> 00:31:33,751
in the ordinary part of Wikidata, here.
515
00:31:34,451 --> 00:31:38,256
Annotations about the lexeme,
annotation about the forms,
516
00:31:39,359 --> 00:31:40,872
single or plural forms.
517
00:31:41,548 --> 00:31:43,501
Annotation about the sentence.
518
00:31:44,683 --> 00:31:47,678
But what you can't see
in ordinary Wikidata
519
00:31:47,678 --> 00:31:52,150
is sort of aggregating across lexemes.
520
00:31:52,246 --> 00:31:54,207
And this is, for example, down here--
521
00:31:54,207 --> 00:31:55,902
down here with the compound.
522
00:31:55,902 --> 00:31:57,764
So, in Danish, like in German,
523
00:31:57,764 --> 00:31:59,950
words can be compounded.
524
00:31:59,950 --> 00:32:03,478
For example, for "red",
we have rødkælk
525
00:32:03,478 --> 00:32:05,830
which is compounded by two words.
526
00:32:06,721 --> 00:32:10,085
And we've got, on the second one here,
rødvin-- red wine.
527
00:32:11,060 --> 00:32:15,691
This list here is constructed
by a SPARQL Query to the Wikidata Service.
528
00:32:16,751 --> 00:32:20,406
And also, further down here,
we've got a lot of Danish words here.
529
00:32:20,970 --> 00:32:26,122
Further down here, we should have
a graph of the words
530
00:32:27,426 --> 00:32:29,164
which are compounded from rød.
531
00:32:29,658 --> 00:32:31,980
We have [rød]-- red here in the middle.
532
00:32:31,980 --> 00:32:34,372
And for example, around--
somewhere around here,
533
00:32:34,372 --> 00:32:36,895
which should have,
for example, "red cabbage,"
534
00:32:36,936 --> 00:32:40,343
"red cabbage salad,"
"red cabbage soup," and so on.
535
00:32:40,434 --> 00:32:43,055
So you can browse around,
in this one here, and see it.
536
00:32:44,204 --> 00:32:51,188
We can go a bit back here,
and then look on the main sense
537
00:32:51,388 --> 00:32:55,030
of the word rød-- red in Danish.
538
00:32:55,550 --> 00:33:01,610
So, Ordia automatically generates
information about hyponyms.
539
00:33:02,570 --> 00:33:04,400
Subconcepts, for example,
540
00:33:04,400 --> 00:33:07,400
light red, dark red,
pink, purple, and so on,
541
00:33:07,525 --> 00:33:14,272
are in the-- when we make
a Wikidata Query service, SPARQL Query.
542
00:33:14,576 --> 00:33:20,570
Then we go around in the Wikidata graph,
543
00:33:20,626 --> 00:33:22,266
and get this information here.
544
00:33:22,266 --> 00:33:24,786
And we can also get translation
automatically,
545
00:33:24,786 --> 00:33:28,316
even though it's not necessarily stated
within the Wikidata lexemes items.
546
00:33:28,316 --> 00:33:32,679
For example, here, we have translated
rød to "red" in English,
547
00:33:32,679 --> 00:33:36,089
and röd in Swedish, and so on.
548
00:33:36,107 --> 00:33:38,191
There's not that very many there.
549
00:33:38,747 --> 00:33:40,262
There's a range of other things here.
550
00:33:40,262 --> 00:33:43,487
Let me show you,
for example, this one here--
551
00:33:44,387 --> 00:33:51,308
this is veninde- now I go
over to this one here.
552
00:33:54,308 --> 00:33:57,328
-inde, which is a feminine suffix.
553
00:33:58,058 --> 00:34:00,498
So, this is auto-generated there,
554
00:34:00,498 --> 00:34:02,641
it's a combination of "instance of"--
555
00:34:03,268 --> 00:34:07,171
lexemes that are "instance of"
feminine suffixes.
556
00:34:08,142 --> 00:34:11,519
And for example, for German,
we have [inaudible].
557
00:34:11,519 --> 00:34:15,373
So, -in would be
a feminine suffix in German.
558
00:34:15,704 --> 00:34:21,291
And I put in sort of the five Danish
feminine suffixes
559
00:34:22,571 --> 00:34:24,206
of Danish.
560
00:34:25,480 --> 00:34:29,106
Another facility is, for example,
if you have a text,
561
00:34:29,106 --> 00:34:34,021
you can copy and paste it
into this Text to lexemes here.
562
00:34:34,571 --> 00:34:35,911
Let me--
563
00:34:37,482 --> 00:34:41,218
"a car crashed into...
564
00:34:41,864 --> 00:34:44,141
a green house."
565
00:34:46,485 --> 00:34:48,701
Let me change that to "English".
566
00:34:49,006 --> 00:34:50,029
Press Submit.
567
00:34:50,029 --> 00:34:53,355
Now, Ordia will then extract
each of the word here,
568
00:34:53,355 --> 00:34:54,733
in this sentence here,
569
00:34:54,733 --> 00:34:58,217
and try to see whether they
are entered in the specific form,
570
00:34:58,217 --> 00:35:00,778
a lexeme, are entered in Wikidata.
571
00:35:00,778 --> 00:35:04,228
And these simple words here
are entered in Wikidata.
572
00:35:04,228 --> 00:35:09,190
But if we, for example, change it to--
there's nothing called "vancar"
573
00:35:09,190 --> 00:35:13,998
but just let us do that here.
574
00:35:14,535 --> 00:35:19,532
And you got down here--
it's as a blue link
575
00:35:20,335 --> 00:35:23,295
that you can create a new
Wikidata lexeme item.
576
00:35:24,556 --> 00:35:29,097
But the range of other things to explore
577
00:35:29,716 --> 00:35:31,496
in this web application.
578
00:35:31,496 --> 00:35:35,596
And if there's any suggestions,
or comments, or notes, or something,
579
00:35:35,596 --> 00:35:39,337
you can contact me, or put in
an issue on GitHub.
580
00:35:39,337 --> 00:35:44,856
So, this particular application
is developed on GitHub,
581
00:35:44,856 --> 00:35:50,526
and I'm open for new ideas
and ways to represent information there.
582
00:35:51,306 --> 00:35:52,701
Okay, thank you.
583
00:35:52,701 --> 00:35:54,661
(applause)
584
00:35:59,328 --> 00:36:00,906
Questions?
585
00:36:03,262 --> 00:36:04,524
(woman 3) I love your tool.
586
00:36:04,524 --> 00:36:09,752
Can you show the languages,
that which is awesome for me, I think,
587
00:36:09,752 --> 00:36:11,731
to show other languages.
588
00:36:12,183 --> 00:36:14,537
So, this is a bit of statistics
over the languages,
589
00:36:14,537 --> 00:36:17,046
and the Russians
have been scraping Wictionary,
590
00:36:17,046 --> 00:36:20,327
and that's why they have now
100,000 lexemes.
591
00:36:24,387 --> 00:36:28,088
There's also a lot of work on Basque here.
592
00:36:29,566 --> 00:36:32,241
I think there's an organization
putting that information in here.
593
00:36:32,241 --> 00:36:34,932
And you can also see a graph of these--
594
00:36:34,932 --> 00:36:37,662
this is Number of forms as functions
of number of lexemes.
595
00:36:38,798 --> 00:36:41,279
And all the way up here--
596
00:36:41,279 --> 00:36:45,255
here, this is Russian,
down here, Basque, I think.
597
00:36:45,476 --> 00:36:47,997
And English, perhaps, down here.
598
00:36:48,953 --> 00:36:50,692
And also in the Number of senses,
599
00:36:52,473 --> 00:36:58,360
I think Basque, English, and Russian,
600
00:37:00,184 --> 00:37:02,048
Hebrew, and so on.
601
00:37:02,048 --> 00:37:03,343
Yeah.
602
00:37:11,045 --> 00:37:12,950
(man 4) That looks
like an incredible tool.
603
00:37:12,950 --> 00:37:15,097
But I was just wondering,
is it all fully live?
604
00:37:15,097 --> 00:37:18,344
Is it all based on SPARQL Queries
and live or are there some things--
605
00:37:18,344 --> 00:37:20,458
- Yes. I believe, yes.
- Fantastic.
606
00:37:20,511 --> 00:37:24,961
But as they get more data into Wikidata,
607
00:37:24,961 --> 00:37:26,100
there's a bit of an issue.
608
00:37:26,100 --> 00:37:27,328
For example, for Russian here.
609
00:37:27,328 --> 00:37:31,966
I started out this a year ago
when there's not that very many lexemes,
610
00:37:32,061 --> 00:37:35,503
and so there was no problems
with the time-outs.
611
00:37:35,503 --> 00:37:38,367
But representing it here--
612
00:37:38,367 --> 00:37:42,268
but if I press Russian,
I think there might be some issues.
613
00:37:42,268 --> 00:37:44,284
There's a count that works here,
614
00:37:44,284 --> 00:37:46,101
for example, longest words or phrases.
615
00:37:46,101 --> 00:37:49,252
But I think the lexemes
are sort of loading in.
616
00:37:49,252 --> 00:37:52,727
I think I'll need to fix that
as Wikidata grows here.
617
00:37:53,258 --> 00:37:55,927
As you see, there's a lot
of Russian nouns, apparently.
618
00:37:56,699 --> 00:37:58,451
And I don't know whether the--
619
00:37:59,351 --> 00:38:01,519
apparently, that's what
they're working on.
620
00:38:01,573 --> 00:38:03,960
There seems also to be
a bit of time-out there.
621
00:38:06,705 --> 00:38:08,033
[inaudible], oh, yes.
622
00:38:08,115 --> 00:38:09,984
The first one there.
623
00:38:10,832 --> 00:38:16,110
But apparently, the longest words
and phrases is a bit too expansive.
624
00:38:17,931 --> 00:38:20,334
But apparently, it can be loaded there,
and it's probably--
625
00:38:21,318 --> 00:38:23,167
it's loaded all the 100,000 there,
626
00:38:23,167 --> 00:38:27,938
so you can click all 10,000 pages.
627
00:38:36,748 --> 00:38:38,678
(host) If there aren't
any other questions--
628
00:38:39,564 --> 00:38:40,950
The longest word came now.
629
00:38:40,950 --> 00:38:43,146
So, it's, yeah.
630
00:38:44,972 --> 00:38:46,390
Probably--
631
00:38:47,855 --> 00:38:49,975
[inaudible]
632
00:38:50,321 --> 00:38:51,540
What is that?
633
00:38:51,540 --> 00:38:53,518
- (audience) It's a chemical.
- A chemical, yes.
634
00:38:56,317 --> 00:38:58,303
(host) More questions? Or shall we?
635
00:38:59,792 --> 00:39:02,332
Alright, alright. Thank you very much.
636
00:39:02,332 --> 00:39:04,392
(applause)
637
00:39:23,642 --> 00:39:25,121
(Nicolas) Is it good?
638
00:39:31,008 --> 00:39:32,346
(host) Awesome.
639
00:39:34,920 --> 00:39:38,137
Alright, now, to wrap it up,
we have Nicolas Vigneron,
640
00:39:38,137 --> 00:39:40,778
talking about Wikisource and Wikidata.
641
00:39:41,469 --> 00:39:42,804
(Nicolas) This is good?
642
00:39:44,542 --> 00:39:46,126
Who knows Wikisource?
643
00:39:47,582 --> 00:39:48,959
Yay!
644
00:39:50,740 --> 00:39:53,582
More and more people
raising hands every year.
645
00:39:53,582 --> 00:39:54,957
That's good.
646
00:39:55,282 --> 00:40:01,462
So, this morning, [Lydia] said that
Wikivoyage was the first real user of--
647
00:40:03,306 --> 00:40:05,987
[inaudible]
648
00:40:06,572 --> 00:40:08,347
Wikisource is not that far behind.
649
00:40:09,230 --> 00:40:13,280
There's a lot to do,
and I want to do some basic numbers,
650
00:40:13,280 --> 00:40:16,964
statistics, about where we are,
and where I want to go.
651
00:40:17,613 --> 00:40:23,409
So first, there will be a lot of questions
of what is a book,
652
00:40:23,409 --> 00:40:25,389
what is bibliographical data.
653
00:40:25,389 --> 00:40:27,229
People from the BnF can agree with me.
654
00:40:27,229 --> 00:40:29,969
That can be a nightmare
if you go into details.
655
00:40:30,164 --> 00:40:35,803
But some big numbers that--
Google Books tried to do an estimation
656
00:40:35,803 --> 00:40:39,676
on how many "books," air quote books,
there is in the world,
657
00:40:39,676 --> 00:40:43,005
and there's 130 million books
in the world.
658
00:40:43,705 --> 00:40:47,279
And, yeah, let's put them all on Wikidata.
659
00:40:47,650 --> 00:40:49,300
Or not. I don't know.
660
00:40:49,392 --> 00:40:51,049
But where are we now?
661
00:40:51,413 --> 00:40:52,468
And why is it books?
662
00:40:52,468 --> 00:40:55,668
Because for Google Books,
everything is scanned, basically.
663
00:40:55,795 --> 00:40:58,670
They don't have exactly
a very clear distinction.
664
00:40:59,400 --> 00:41:04,350
There's sometimes, two-page books,
which [inaudible], Google Books is a book.
665
00:41:04,714 --> 00:41:10,131
But for many people, you have to have
at least 50 pages to be a book.
666
00:41:10,536 --> 00:41:12,321
So, that's always hard to count.
667
00:41:12,885 --> 00:41:15,603
But here's what we know on Wikidata.
668
00:41:15,603 --> 00:41:18,704
This the graph of what
is a book for Wikidata.
669
00:41:18,704 --> 00:41:21,524
You have-- that's totally [inaudible]--
670
00:41:21,524 --> 00:41:23,979
but that's Wikidata,
literary work as well.
671
00:41:23,979 --> 00:41:27,194
And this is all the subclasses,
or subclasses of subclasses--
672
00:41:27,194 --> 00:41:30,334
or subclasses of subclasses
of what is a book.
673
00:41:30,804 --> 00:41:32,705
So, that's very hard to do.
674
00:41:32,737 --> 00:41:34,253
I can do a graph like that,
675
00:41:34,253 --> 00:41:36,833
but SPARQL Query engine doesn't work
676
00:41:36,833 --> 00:41:41,523
if I want to count everything
that is instance of these subclasses,
677
00:41:41,523 --> 00:41:45,143
and basically, SPARQL says no, time-out.
678
00:41:45,633 --> 00:41:47,020
So, what's the problem?
679
00:41:47,020 --> 00:41:50,713
But I know already that there's
a lot of subclasses,
680
00:41:50,713 --> 00:41:52,153
but we need to look into it.
681
00:41:52,153 --> 00:41:57,943
And probably, if you know Wikidata,
on the page, Wikidata point statistics,
682
00:41:58,643 --> 00:42:02,647
you have all the numbers by big classes,
683
00:42:02,647 --> 00:42:07,047
and you all probably know
that the big chunk here
684
00:42:07,047 --> 00:42:08,642
is scholarly articles,
685
00:42:08,707 --> 00:42:12,749
which is, thanks to
the WikiCite project, in particular,
686
00:42:14,113 --> 00:42:17,125
which can be books or not,
depending on definition.
687
00:42:19,062 --> 00:42:22,508
You see that there's no subclass books,
688
00:42:23,032 --> 00:42:26,034
because there's not enough to show.
689
00:42:26,049 --> 00:42:28,472
It's probably somewhere in the others,
690
00:42:28,472 --> 00:42:30,127
the purple area is others.
691
00:42:30,163 --> 00:42:34,115
And there's a lot of things
that's under one percent.
692
00:42:34,162 --> 00:42:38,821
So, basically, we can say
that we have less one percent
693
00:42:38,821 --> 00:42:42,131
of things identified as books in Wikidata.
694
00:42:42,551 --> 00:42:46,091
Maybe there is more books,
but not identified as such.
695
00:42:47,842 --> 00:42:49,284
I'm talking about books,
696
00:42:49,383 --> 00:42:51,768
but when we are talking
about bibliographical data,
697
00:42:51,768 --> 00:42:53,920
there's also the author, person,
698
00:42:53,920 --> 00:42:58,472
so maybe some of the human here
are also authors, surely.
699
00:43:00,068 --> 00:43:03,221
And we need to do another count,
which is another big query to do.
700
00:43:03,602 --> 00:43:05,301
That times out, so--
701
00:43:05,396 --> 00:43:08,015
I have a lot of not number
to this, sorry.
702
00:43:10,619 --> 00:43:14,332
So, yeah, basically, this first slide
is about how it's complicated
703
00:43:14,332 --> 00:43:19,122
to know how much we have of what,
and how to count them.
704
00:43:19,445 --> 00:43:21,091
So, yeah, hard to count.
705
00:43:21,618 --> 00:43:23,280
What we know--
706
00:43:24,133 --> 00:43:26,618
that is we have a lot of properties--
707
00:43:27,185 --> 00:43:29,684
700,000, I guess,
708
00:43:30,208 --> 00:43:31,680
now on Wikidata.
709
00:43:32,593 --> 00:43:35,952
We know that we have a lot of identifiers
among these properties.
710
00:43:36,721 --> 00:43:42,538
And we know that almost 4,000
are properties for identifiers
711
00:43:43,146 --> 00:43:45,623
relative to bibliographical,
712
00:43:45,737 --> 00:43:49,862
like ID at the National Library of France,
713
00:43:49,862 --> 00:43:52,251
National Library of Yaddi, Yaddi, Yada,
714
00:43:52,251 --> 00:43:56,681
because we love identifier
of National Library on Wikidata.
715
00:43:56,681 --> 00:44:00,271
So, we have almost all libraries,
national libraries and more.
716
00:44:01,101 --> 00:44:03,796
So, we have a lot of properties.
I know that.
717
00:44:05,071 --> 00:44:06,727
And we are widely used.
718
00:44:06,834 --> 00:44:10,053
I know that, for instance,
BnF properties use--
719
00:44:10,579 --> 00:44:12,772
BnF is National Library of France--
720
00:44:12,772 --> 00:44:18,989
is used 1 million times--
OCOC, VIAF, or the big like that.
721
00:44:21,001 --> 00:44:24,202
A lot of uses in Wikidata.
722
00:44:25,426 --> 00:44:28,980
But it's not because we have
a lot of uses of various properties
723
00:44:28,980 --> 00:44:30,666
in Wikidata that it's complete.
724
00:44:31,266 --> 00:44:33,758
As Thibaud said, there's more
than 20 million books,
725
00:44:33,758 --> 00:44:37,099
[inaudible], which is more as entities.
726
00:44:37,837 --> 00:44:39,569
And we have only 1 million,
727
00:44:39,569 --> 00:44:43,538
so we have 19 million still to do.
728
00:44:45,177 --> 00:44:47,276
Also, what we know from the Wikidata side,
729
00:44:47,276 --> 00:44:51,918
is that we have a good--
very quite active Wikidata project,
730
00:44:51,918 --> 00:44:53,840
called WikiProject Books,
731
00:44:54,332 --> 00:44:58,127
where we have a model we kind of agree on,
732
00:44:58,181 --> 00:45:00,916
which is not always followed,
which is, again, a problem.
733
00:45:00,956 --> 00:45:02,710
What is a book? You know it.
734
00:45:03,414 --> 00:45:05,385
I only have five minutes,
so, I'll keep going.
735
00:45:06,090 --> 00:45:08,880
And then, I'm a Wikisourcean,
so, Wikisourcer.
736
00:45:09,426 --> 00:45:11,930
So, I wanted to know
the other way around
737
00:45:11,930 --> 00:45:13,496
what is from Wikisource already,
738
00:45:13,496 --> 00:45:16,406
because Wikisource is already
inside the Wikimedia project.
739
00:45:16,406 --> 00:45:19,883
A lot of bibliographical records
and information.
740
00:45:19,883 --> 00:45:23,161
So, in the 66 million items on Wikidata,
741
00:45:23,161 --> 00:45:28,850
already 1 million are linked
to Wikisource.
742
00:45:29,330 --> 00:45:31,890
[inaudible].
743
00:45:32,350 --> 00:45:36,080
So, that's very few,
but that's quite a lot.
744
00:45:37,496 --> 00:45:40,174
There's a lot of author.
745
00:45:40,174 --> 00:45:44,670
There's some books, texts,
work, edition, whatever.
746
00:45:45,271 --> 00:45:48,425
Not always well-arranged.
747
00:45:48,869 --> 00:45:50,600
And there's a lot of internal pages,
748
00:45:50,600 --> 00:45:53,150
like categories and templates,
and things like that.
749
00:45:53,194 --> 00:45:54,984
But still, 1 million in total.
750
00:45:58,329 --> 00:46:01,767
The Wikisource community
are often small communities,
751
00:46:01,767 --> 00:46:05,010
like on the French community Wikisource,
752
00:46:05,010 --> 00:46:07,537
which is one of the biggest,
there's 50 people.
753
00:46:07,537 --> 00:46:08,787
That's the biggest we have.
754
00:46:09,047 --> 00:46:12,937
So, we love Wikidata, because,
hey, they did a lot of work for us.
755
00:46:12,942 --> 00:46:15,131
So, just take it from Wikisource.
756
00:46:15,131 --> 00:46:19,885
So, in this small community,
we love to reuse Wikidata data.
757
00:46:20,935 --> 00:46:24,076
Right now, we use a lot of a tool
which is called WEF--
758
00:46:24,358 --> 00:46:27,978
Wikidata Edit Framework-- thank you.
759
00:46:29,318 --> 00:46:33,098
And we are eager to see
how Wikidata Bridge will work.
760
00:46:33,438 --> 00:46:36,798
And we are trying to do things
with a team in Wikidata
761
00:46:37,638 --> 00:46:40,678
in Wikipedia Deutschland team,
[inaudible].
762
00:46:41,007 --> 00:46:43,934
And there's a lot
of collaboration in the future
763
00:46:43,934 --> 00:46:46,586
that we want to do: better integrate,
764
00:46:47,636 --> 00:46:51,068
do everything in one click when you import
a first book in Wikisource,
765
00:46:51,068 --> 00:46:52,465
things like that.
766
00:46:53,364 --> 00:46:57,664
Better-- do links between
edition in Wikidata.
767
00:46:57,852 --> 00:46:59,492
That needs to be done.
768
00:47:00,041 --> 00:47:02,282
The Foundation is doing the wish list now,
769
00:47:02,282 --> 00:47:04,853
and we have a lot of requests about that.
770
00:47:05,938 --> 00:47:07,342
And yeah, that's it.
771
00:47:07,342 --> 00:47:09,116
That was just a short overview.
772
00:47:09,116 --> 00:47:15,272
So, if you have some questions,
I'll take them and be available later,
773
00:47:15,712 --> 00:47:17,112
if you want to.
774
00:47:17,723 --> 00:47:19,722
(applause)
775
00:47:25,639 --> 00:47:28,281
Come on, you love Wikisource,
you have questions!
776
00:47:33,989 --> 00:47:35,775
(woman 4) I asked you
already this in August,
777
00:47:35,775 --> 00:47:38,411
and I wonder if this has already changed.
778
00:47:38,411 --> 00:47:42,337
What is the biggest problem you have
in Wikisource right now,
779
00:47:42,337 --> 00:47:43,761
from your perspective?
780
00:47:44,167 --> 00:47:45,670
The first one, only? (chuckles)
781
00:47:48,314 --> 00:47:54,152
I think because it's a small community,
we need efficient tools that work easily,
782
00:47:54,152 --> 00:47:57,148
because we have very few people,
783
00:47:57,148 --> 00:47:59,464
so we need tool that are easy to use
784
00:47:59,464 --> 00:48:04,247
and a one-click solution
to [inaudible] a bit,
785
00:48:04,371 --> 00:48:05,607
that's a big dream.
786
00:48:05,607 --> 00:48:07,179
I think that's what's most important,
787
00:48:07,179 --> 00:48:10,485
because that's the threshold
in Wikisource, it's a small community.
788
00:48:11,204 --> 00:48:13,241
I think this is the most important.
789
00:48:14,615 --> 00:48:15,975
[inaudible]
790
00:48:16,867 --> 00:48:19,600
(man 5) I'm curious if you can speak
to your opinion,
791
00:48:19,600 --> 00:48:23,154
or the French Wikisource opinion,
or maybe you spoke to other communities
792
00:48:23,154 --> 00:48:29,834
about the notion of not including
metadata about all the world's books.
793
00:48:30,234 --> 00:48:31,635
That was mentioned in the morning.
794
00:48:31,635 --> 00:48:34,965
Maybe other Wikibases,
and other federated databases
795
00:48:34,965 --> 00:48:38,026
will have that information,
and Wikidata won't.
796
00:48:39,159 --> 00:48:41,494
How does that feel for Wikisource?
797
00:48:43,981 --> 00:48:45,502
This is my very personal opinion.
798
00:48:45,502 --> 00:48:47,386
I know that people
in the Wikisource community
799
00:48:47,386 --> 00:48:48,723
disagree with that.
800
00:48:48,723 --> 00:48:50,537
But I think we need to stay--
801
00:48:50,537 --> 00:48:53,194
an external Wikibase
is not a good solution,
802
00:48:53,194 --> 00:48:55,353
because we have Shakespeare on Wikisource,
803
00:48:55,353 --> 00:48:58,323
and we have Shakespeare on Wikipedia.
804
00:48:58,564 --> 00:49:01,295
So, we need to interlink,
and interlink is there.
805
00:49:01,295 --> 00:49:04,007
Or like, Romeo and Juliet,
we have them both.
806
00:49:04,007 --> 00:49:07,229
So, we are still pretty close
to Wikipedia.
807
00:49:07,433 --> 00:49:09,431
And the difference with WikiCites--
808
00:49:09,431 --> 00:49:12,515
with WikiCite, we have a lot of items
which are small.
809
00:49:14,372 --> 00:49:16,051
Wikisource is the other way around.
810
00:49:16,150 --> 00:49:18,281
We have few items, who are big.
811
00:49:18,281 --> 00:49:20,515
Which can be a scaling problem
and everything,
812
00:49:20,515 --> 00:49:23,615
but it's quite a small subset of data.
813
00:49:23,683 --> 00:49:27,539
So, my personal opinion
is we should stay in the Wikidata.
814
00:49:28,391 --> 00:49:32,117
Again, because we are not
very much a lot of people,
815
00:49:32,117 --> 00:49:34,287
so we need to stay,
with the tool we know,
816
00:49:34,287 --> 00:49:35,846
don't change too much the tools
817
00:49:35,846 --> 00:49:37,736
for the small community, please.
818
00:49:37,769 --> 00:49:39,282
So, that's it.
819
00:49:39,282 --> 00:49:40,910
But I know that other people disagree.
820
00:49:40,910 --> 00:49:44,579
You can talk to [Sadeep] if you want.
He will have another point of view.
821
00:49:46,119 --> 00:49:49,319
Thank you. I think, last question, maybe.
822
00:49:51,234 --> 00:49:54,446
(man 6) Sometimes, I find it difficult
to link the Wikidata item
823
00:49:54,446 --> 00:50:00,976
with a Wikisource article,
because there's a Wikisource novel--
824
00:50:01,079 --> 00:50:06,128
might be split over several pages,
and there's an index page,
825
00:50:06,128 --> 00:50:08,853
and there's perhaps a front page,
or something like that.
826
00:50:08,853 --> 00:50:12,053
Do you have that problem,
or is that a general problem, or--
827
00:50:12,092 --> 00:50:16,892
Yeah, that's one of the first ideas
on the wish list
828
00:50:16,892 --> 00:50:19,092
for the Foundation, actually.
829
00:50:19,092 --> 00:50:20,790
Yeah, because Wikipedia is on the--
830
00:50:20,790 --> 00:50:22,772
if you know the [inaudible] organization,
831
00:50:22,772 --> 00:50:26,598
Wikipedia is on the work level,
and Wikisource on the edition level.
832
00:50:26,598 --> 00:50:28,572
So, already, you have a problem there.
833
00:50:28,572 --> 00:50:30,931
And then, we have several editions
of the same work,
834
00:50:30,931 --> 00:50:34,014
and we have sub-chapters
and things inside the edition.
835
00:50:34,014 --> 00:50:41,001
So, yeah, that's one too many problems
which is hard to solve by nature.
836
00:50:41,555 --> 00:50:44,839
But there's maybe a tool
that can help to solve that.
837
00:50:45,893 --> 00:50:47,469
Hopefully.
838
00:50:49,172 --> 00:50:51,395
And that's time, ladies and gentlemen.
839
00:50:51,398 --> 00:50:53,283
So, thank you very much, Nicolas.
840
00:50:53,335 --> 00:50:55,137
(applause)
841
00:50:59,010 --> 00:51:01,127
And please join me giving
one more round of applause
842
00:51:01,127 --> 00:51:03,147
to all of our wonderful speakers.
843
00:51:03,147 --> 00:51:04,901
(applause)