0:00:06.009,0:00:09.069
(host) Hello, everyone. Thank you[br]for coming to these lightning talks.
0:00:09.069,0:00:11.529
Our first speaker, I'm going[br]to run straight into it,
0:00:11.529,0:00:13.781
is going to be Rosie[br]Stephenson-Goodknight.
0:00:13.781,0:00:15.319
Did I get that right?
0:00:15.319,0:00:19.609
Yes. And so she's going to be talking[br]about the Women Writers Project.
0:00:19.609,0:00:22.569
And we're going to--[br]yeah, is that right? Great.
0:00:22.569,0:00:24.299
And so, we're going[br]to just launch right in,
0:00:24.299,0:00:26.699
and I want to remind you,[br]if there's time for questions,
0:00:26.699,0:00:28.802
to please not speak[br]until you have the microphone.
0:00:28.802,0:00:30.329
Thank you.
0:00:31.589,0:00:34.125
(Rosie) Hi, everyone, and thanks[br]for coming to this session,
0:00:34.125,0:00:36.829
where we're going to talk[br]about Women Writers in Review,
0:00:36.829,0:00:40.329
cultures of reception associated[br]with trans-Atlantic,
0:00:40.329,0:00:43.977
English language women writers,[br]broadly construed.
0:00:44.523,0:00:48.387
Women Writers in Review is an initiative[br]of the Women Writers Project
0:00:48.387,0:00:50.535
of Northeastern University.
0:00:50.535,0:00:55.253
It moved there from Brown University,[br]approximately 15 years ago.
0:00:55.993,0:01:00.287
Women Writers in Review is a collection[br]of 18th- and 19th-century reviews,
0:01:00.287,0:01:04.281
publication notices,[br]literary histories, and other texts
0:01:04.281,0:01:09.511
corresponding to trans-Atlantic--[br]so, UK and US mostly,
0:01:09.511,0:01:12.953
though a few Canadian--[br]written works by women.
0:01:13.255,0:01:15.683
It's a project where the two universities,
0:01:15.683,0:01:18.133
Brown University[br]and Northeastern University,
0:01:18.133,0:01:22.645
started collecting the manuscripts[br]of women from this period.
0:01:23.337,0:01:27.520
And then they started collecting[br]the reviews of these works,
0:01:27.520,0:01:31.593
and then they started scoring[br]these reviews by giving them a rating.
0:01:32.321,0:01:36.144
It's designed to investigate[br]the discourse of reception and connection
0:01:36.144,0:01:39.333
with the changing trans-Atlantic[br]literary landscape
0:01:39.333,0:01:42.664
for the period 1770 to 1830.
0:01:46.143,0:01:49.103
You're going to pardon me if I speak fast,[br]because I've got five minutes
0:01:49.103,0:01:50.646
to go over this.
0:01:50.646,0:01:55.443
It includes 690 English language texts[br]responding to works
0:01:55.443,0:01:59.565
written or translated[br]by 18th- and 19th-century women writers.
0:01:59.593,0:02:04.813
There are 74 authors in the corpus,[br]using 112 different sources,
0:02:04.813,0:02:07.782
or periodicals, or magazines.
0:02:07.782,0:02:10.773
And there are 628 critical reviews.
0:02:11.867,0:02:14.671
Here's a picture that shows you[br]what we're talking about
0:02:14.671,0:02:16.573
in terms of a review.
0:02:16.573,0:02:18.819
And you can also see what kind of scores
0:02:18.819,0:02:25.403
were given by the academics[br]at Northeastern University.
0:02:25.833,0:02:28.922
Most of these are women[br]who were giving scores
0:02:28.922,0:02:34.031
based on the reviews that were done[br]mostly, probably all men,
0:02:34.031,0:02:39.799
back in this time period 1770 to 1830[br]of works written by women.
0:02:39.799,0:02:43.469
By works, we're talking about plays,[br]and novels, and poems,
0:02:43.469,0:02:46.955
essays, and other kinds of articles.
0:02:48.615,0:02:50.275
So, what are we talking about?
0:02:50.275,0:02:54.676
This required creating[br]items for authors for their works,
0:02:54.676,0:02:57.946
like I said, novels and plays and poems.
0:02:57.946,0:03:04.938
It required creating new items[br]for this period of time
0:03:05.038,0:03:08.391
where there are defunct periodicals.
0:03:08.391,0:03:12.499
It required creating items[br]for the scholarly articles.
0:03:12.578,0:03:16.900
And then the review scores of each,[br]and the review score by,
0:03:16.943,0:03:19.998
which in this case would be[br]Women Writers in Review,
0:03:19.998,0:03:23.336
and what we still need to add[br]is the described by source.
0:03:25.226,0:03:28.970
This gives you a picture[br]of the kind of spreadsheets,
0:03:28.970,0:03:31.397
Google Spreadsheets,[br]that I have been working on.
0:03:31.397,0:03:34.296
I shouldn't just say I,[br]because I've had a lot of help.
0:03:34.296,0:03:37.546
I've had a lot of people[br]who were working on this project with me.
0:03:37.546,0:03:40.413
And you can see at the top,[br]something about the authors,
0:03:40.413,0:03:41.736
about the works.
0:03:41.736,0:03:45.496
The third group is going to be[br]the periodical,
0:03:45.496,0:03:48.006
and then, how the scores started showing.
0:03:49.203,0:03:52.122
And of course, this is how they look--
0:03:52.122,0:03:57.396
the beauty of being able to present[br]the preliminary findings.
0:03:57.856,0:04:01.767
Once we have uploaded all of the data,
0:04:02.989,0:04:05.906
and I hope that that's going to be done[br]by the end of this year,
0:04:06.956,0:04:08.496
this will obviously look different.
0:04:09.916,0:04:10.931
Appendix.
0:04:10.931,0:04:15.267
So, here's what the depiction looks like
0:04:15.267,0:04:18.505
at the Northeastern University website.
0:04:19.024,0:04:22.474
I don't think it's quite as clear[br]as what we can do with Wikidata.
0:04:22.531,0:04:27.351
And so, this was probably the reason why,[br]when I started as a visiting scholar
0:04:27.351,0:04:31.751
in 2017, they asked if this is one[br]of the projects that I could work on.
0:04:31.751,0:04:36.093
They stopped their work[br]the year before, in 2016.
0:04:36.093,0:04:39.073
And I think they just don't have[br]the resources to continue.
0:04:40.251,0:04:43.415
Some parts of this presentation[br]came from another
0:04:43.415,0:04:45.812
that was published in 2016.
0:04:45.812,0:04:49.401
And last but not least, here are links
0:04:49.401,0:04:53.361
to the different parts[br]of the work that I'm doing.
0:04:54.257,0:04:55.561
Thank you very much.
0:04:55.561,0:04:56.845
Questions.
0:04:56.845,0:04:58.754
(applause)
0:05:10.397,0:05:14.665
(woman) So, when you have a work,[br]and you have the review of the work,
0:05:14.665,0:05:17.703
are you looking[br]at a particular edition of the work,
0:05:17.703,0:05:20.665
or are these all reviews[br]of first editions?
0:05:21.271,0:05:22.861
It's a good question. No.
0:05:22.861,0:05:25.601
They are not just reviews[br]of the first edition.
0:05:25.601,0:05:28.601
Some are reviews of the second[br]or third edition.
0:05:30.062,0:05:32.262
I'm going to add something[br]that maybe I should have said
0:05:32.262,0:05:34.951
before I closed[br]and went to question and answers--
0:05:34.966,0:05:36.800
what's so special about this?
0:05:37.220,0:05:40.461
What's special is nobody else[br]has done this on Wikidata.
0:05:41.454,0:05:45.580
Surely, there are other universities[br]that have their own collections,
0:05:45.580,0:05:51.447
where their scholars have reviewed[br]the reviews of someone's work
0:05:51.800,0:05:53.394
in some language.
0:05:54.491,0:05:57.389
So, hopefully,[br]once this methodology gets--
0:05:58.000,0:06:02.390
once I write this up and the project[br]is over and presented again,
0:06:02.390,0:06:05.310
that there will be other[br]universities, other libraries
0:06:05.310,0:06:07.923
that will speak up and say,[br]"We've got data sets, too,
0:06:08.248,0:06:13.020
and we're going to go ahead[br]and upload them into Wikidata ourselves,"
0:06:13.020,0:06:15.910
and then it'd be lovely [br]to start doing some comparisons.
0:06:19.572,0:06:22.060
Anyone? Jane.
0:06:22.093,0:06:23.767
(Jane) Do you actually have books?
0:06:24.293,0:06:26.889
Do you actually have the books--[br]are the books in existence,
0:06:26.889,0:06:28.860
or are you actually[br]doing metadata about books
0:06:28.860,0:06:31.400
where we don't even know[br]where the books are?
0:06:31.780,0:06:34.829
Northeastern University[br]actually has the book,
0:06:34.829,0:06:37.209
or the essay, or the poem.
0:06:39.759,0:06:45.392
And they have the critical review[br]of the book, or the essay, or the poem.
0:06:45.755,0:06:48.820
And they're working[br]on the transcription of these,
0:06:48.820,0:06:51.452
and they're not at 100% yet.
0:06:52.432,0:06:56.256
They're not at 100%, but it's like,[br]all things working on it.
0:07:00.218,0:07:02.043
Any other questions?
0:07:05.697,0:07:07.399
(host) We're going to wrap it up there.
0:07:07.399,0:07:09.063
Thanks for being such a nice audience.
0:07:09.063,0:07:11.677
(applause)
0:07:14.012,0:07:18.581
Lady bug for [inaudible].
0:08:58.271,0:08:59.372
(man) Finally got that.
0:08:59.372,0:09:02.565
What I'm going to do is I'm just going[br]to click on these to load.
0:09:02.565,0:09:06.091
Just while-- is that new tab there?
0:09:06.946,0:09:08.053
[inaudible]
0:09:08.053,0:09:10.524
The first one? Yeah, perfect.
0:09:11.024,0:09:13.503
Sorry, my German is not even rusty,
0:09:13.503,0:09:15.251
it's simply non-existent.
0:09:15.663,0:09:19.561
So, I'll just let them load,[br]because then these queries can run
0:09:19.561,0:09:22.728
while I'm sort of introducing[br]what I was talking about and doing.
0:09:22.728,0:09:24.795
So, hi, I'm Nav from Histropedia.
0:09:24.795,0:09:28.169
And basically, for the last[br]quite a few years,
0:09:28.169,0:09:29.710
we've been relatively quiet,
0:09:29.710,0:09:32.423
while we've been sort of working[br]on technology and tools
0:09:32.423,0:09:36.837
that we need to sort of develop,[br]ultimately, Histropedia version 2,
0:09:36.837,0:09:39.433
which is going to be, you know,[br]this huge enhancement
0:09:39.433,0:09:40.771
on the first version.
0:09:40.771,0:09:43.270
Well, it's kind of in progress,[br]but as we do it,
0:09:43.270,0:09:45.236
we've been experimenting[br]with these other tools,
0:09:45.236,0:09:47.387
and building the technology[br]that we're going to need.
0:09:48.132,0:09:51.781
One really crucial part for this[br]is the ability to sort of see
0:09:51.781,0:09:55.085
the whole of history[br]from the billions of years time scale,
0:09:55.085,0:09:58.602
to up to the current day,
0:09:58.602,0:10:00.638
and zooming all the way into single days.
0:10:00.638,0:10:03.433
And ultimately, in the end,[br]down to hours and minutes.
0:10:03.433,0:10:06.517
We've managed to create[br]a [inaudible] of update to our engine.
0:10:06.517,0:10:08.327
Other engines can already do this,
0:10:08.327,0:10:11.122
but unfortunately, they also can't handle[br]the large data sets.
0:10:11.122,0:10:13.269
So, we finally got this update[br]to our engine.
0:10:13.269,0:10:15.392
It allows us to zoom to billions of years.
0:10:15.392,0:10:19.533
So, recently-- the recently[br]finished update,
0:10:19.533,0:10:22.333
and it's basically, it's an update[br]to our query viewer tool,
0:10:22.333,0:10:24.482
which is like a live version[br]of Histropedia
0:10:24.482,0:10:26.832
just linked straight to Wikidata.
0:10:26.832,0:10:29.092
So, it's literally based on a query,
0:10:29.092,0:10:31.372
a live query, and we see[br]the results of it.
0:10:31.372,0:10:33.883
So, it's sort of separate[br]to our main tool.
0:10:33.883,0:10:37.502
So, I'm going to flick to the first one,[br]which is my first experiment.
0:10:37.502,0:10:39.716
And you'll forgive me, the queries--
0:10:39.716,0:10:42.181
the code was kind of finished[br]not so long ago,
0:10:42.181,0:10:44.736
and the queries, I've been trying[br]to find out what can I find
0:10:44.736,0:10:47.692
and what's interesting[br]to look at, what's missing.
0:10:47.692,0:10:52.154
So, I started off[br]with a kind of, sort of, well--
0:10:52.154,0:10:54.241
So, that's not the right--[br]that's not Life on Earth.
0:10:54.241,0:10:55.699
Is this Life on Earth?
0:10:56.123,0:10:57.467
That will do, anyway.
0:10:57.467,0:11:01.985
So, I started off just trying to look[br]at what sort of things
0:11:01.985,0:11:04.657
are actually in Wikidata.
0:11:04.657,0:11:07.407
And this particular one--[br]sorry, it's in reverse.
0:11:07.407,0:11:09.829
So, this is the first one[br]I wanted to show you.
0:11:09.829,0:11:12.485
So, this is a kind of[br]a life on Earth query
0:11:12.485,0:11:14.457
that I wanted to develop.
0:11:14.457,0:11:18.410
And basically, what it is[br]is all the taxons in Wikidata
0:11:18.410,0:11:20.157
that have a date.
0:11:20.157,0:11:23.726
And as you can probably see[br]from the panel, there is not many of them.
0:11:23.726,0:11:25.784
But we do have the different taxon ranks.
0:11:25.784,0:11:27.596
So, you know, is it a species, a class--
0:11:27.596,0:11:29.725
for a biologist,[br]this makes a lot of sense.
0:11:29.725,0:11:32.446
But if I was just to close that a bit,
0:11:32.596,0:11:35.453
we can see, we are going back[br]to the earliest forms of life here.
0:11:35.453,0:11:37.236
3.5 billion years ago.
0:11:37.236,0:11:42.707
And as we zoom in here, we start to see[br]the more modern forms of life,
0:11:42.746,0:11:47.232
and we see some really[br]interesting things developing,
0:11:47.232,0:11:50.829
but we're still lacking a lot of data[br]in terms of this kind of time range.
0:11:52.250,0:11:55.286
So, my next thought was,[br]"Okay, well, why aren't--"
0:11:55.592,0:11:57.088
"I want to see a Tyrannosaurus Rex."
0:11:57.088,0:11:59.838
That's what I really wanted to see[br]on my query, and it wasn't there.
0:11:59.838,0:12:02.138
So, had a little dig in,[br]and I found out why.
0:12:02.234,0:12:05.284
It's because they're much more[br]being stored
0:12:05.284,0:12:08.696
in terms of the temporal range[br]or time period that they relate to.
0:12:09.065,0:12:11.412
So, on comes the next query,
0:12:11.412,0:12:13.144
where I actually sort of--
0:12:13.664,0:12:17.641
basically, this query[br]is looking for any item
0:12:17.641,0:12:22.284
that has a temporal range start,[br]and/or a temporal range end.
0:12:22.665,0:12:25.965
Which is basically in the form--[br]in life forms, it kind of relates
0:12:25.965,0:12:28.644
to when they emerged[br]and when they became extinct.
0:12:28.644,0:12:31.044
So, these are the periods[br]on the side here.
0:12:31.585,0:12:33.190
If I just close that a bit--
0:12:33.190,0:12:37.364
you can see that we have[br]quite a lot of interesting stuff.
0:12:37.364,0:12:39.834
And there's the Tyrannosaurus[br]that I was looking for.
0:12:39.834,0:12:43.394
So, I finally got that,[br]and I was like, "Yes! I've done it!"
0:12:43.394,0:12:46.084
I've got that Triceratops[br]in there for bonus.
0:12:46.084,0:12:48.984
But of course, still loads missing.
0:12:48.984,0:12:50.665
And I'd love to see lots more here.
0:12:50.665,0:12:52.590
But at least, it gives you the idea.
0:12:52.590,0:12:55.794
The nice thing is, here as well,[br]if I star some of these,
0:12:55.794,0:12:58.374
you can see that[br]the time range is shown.
0:12:58.374,0:13:01.027
So, you can start to do[br]what I really wanted to do, is say,
0:13:01.027,0:13:04.004
"Okay, when did this one end,[br]and when did the next one begin?
0:13:04.004,0:13:06.085
When did things start going extinct?"
0:13:06.085,0:13:09.832
So, I was pretty excited, but, still,[br]really hoping for a lot more.
0:13:09.832,0:13:11.619
So, there's a lot of editing to be done
0:13:11.619,0:13:15.098
in terms of these large geological[br]and cosmic time scales.
0:13:15.909,0:13:19.273
You can see on the color code,[br]I can also do extinction period.
0:13:19.273,0:13:23.489
So, I say, I want to find out stuff[br]that went extinct in the late Cretaceous.
0:13:23.489,0:13:25.768
And I now know that two things did that.
0:13:25.768,0:13:27.717
There's obviously quite a few more.
0:13:27.717,0:13:30.483
And I put the taxon rank[br]in there, as well,
0:13:30.483,0:13:31.986
just so that we can also see,
0:13:31.986,0:13:34.588
"Okay, which, what[br]is its species, genus, et cetera."
0:13:35.479,0:13:37.143
So, pretty exciting.
0:13:37.143,0:13:41.192
I was quite happy, but it's unfolding,[br]what needs to be done a lot.
0:13:42.126,0:13:45.447
So I went to the next one, which was--
0:13:45.447,0:13:48.045
I was thinking, "Well, I can't find[br]all the data I'm looking for.
0:13:48.045,0:13:49.347
Let's go a bit more general,
0:13:49.347,0:13:53.833
and just look for all of a certain kind[br]of dates in Wikidata that I can find
0:13:53.833,0:13:57.240
that are over 10,000 years old, basically.
0:13:58.219,0:14:00.703
And what type of thing are they?"
0:14:00.762,0:14:04.298
So, this color code is relatively okay,[br]but it might be a bit misleading,
0:14:04.298,0:14:06.264
because some things are multiple types.
0:14:06.264,0:14:08.318
So, therefore,[br]it's a bit random, at times.
0:14:08.318,0:14:11.468
But, you get some really[br]fascinating stuff in here.
0:14:11.468,0:14:14.255
I've got for a start--[br]I've got all of the millennia
0:14:14.255,0:14:18.238
that we have in Wikidata,[br]which is, you know, there you go.
0:14:18.238,0:14:21.558
Read about everything that happened[br]in all these different millennia.
0:14:21.558,0:14:23.629
No pictures for any[br]of these, unfortunately.
0:14:23.629,0:14:26.670
So, there's nothing to really say[br]what happened in them.
0:14:26.670,0:14:29.203
Taxon, which we were just looking at,[br]which kind of led me on
0:14:29.203,0:14:31.124
to the other queries.
0:14:31.124,0:14:34.079
And of course, that sort of[br]like all of them in one group.
0:14:34.079,0:14:36.875
Interesting stuff.[br]Archaeological cultures.
0:14:36.875,0:14:40.121
And this is like, okay,[br]this is more like up my street.
0:14:40.121,0:14:42.670
This is the sort of things[br]I want to learn about.
0:14:42.670,0:14:45.234
Again, pictures would be nice.
0:14:45.493,0:14:48.781
But it's really showing you[br]something interesting.
0:14:48.781,0:14:50.361
And it's just worth exploring here.
0:14:50.361,0:14:52.534
And of course, there's some[br]that really make me excited
0:14:52.534,0:14:54.048
for what we could be doing.
0:14:54.048,0:14:57.288
For example, there was[br]something here which was--
0:14:58.028,0:15:00.888
I mean, system, actually,[br]was quite an interesting one.
0:15:01.794,0:15:04.237
And sorry, that's not actually[br]the one I was thinking about.
0:15:04.237,0:15:05.958
In fact, that means nothing to me at all.
0:15:05.958,0:15:07.613
Someone might know what that means.
0:15:08.057,0:15:10.813
Art movements,[br]archaeological sites, activities.
0:15:10.813,0:15:12.478
There was only two of these,
0:15:12.478,0:15:15.788
but I really like the idea, because--[br]and they're both the same.
0:15:15.788,0:15:17.658
They're both hunting.
0:15:17.730,0:15:19.390
And of course, there's two of them.
0:15:19.390,0:15:22.360
And the reason is, is because[br]there's a little qualifier on there.
0:15:22.360,0:15:25.143
If we were to just[br]look through, we can see--
0:15:25.143,0:15:27.735
we can see somewhere down here,[br]will be the start time.
0:15:27.735,0:15:30.690
And the qualifier is talking about[br]when Homo erectus did it,
0:15:30.690,0:15:32.735
and when Homo sapiens did it.
0:15:32.735,0:15:35.513
So that should be[br]in brackets on the query,
0:15:35.513,0:15:39.002
a little extension to do to show you[br]what the two different versions mean.
0:15:39.002,0:15:42.390
But I would love to see[br]all of human skills in here.
0:15:42.390,0:15:44.708
When did we first do farming,[br]when did we first this--
0:15:44.708,0:15:46.010
when did fire come about?
0:15:46.010,0:15:48.270
All of these things,[br]when did we first extract iron?
0:15:48.270,0:15:50.355
When did we first--[br]all of these wonderful things
0:15:50.355,0:15:53.607
that developed[br]to modern world that we live in.
0:15:53.607,0:15:56.873
So, really exciting signs[br]of what could be there,
0:15:56.873,0:15:58.112
if it all got populated.
0:15:58.112,0:16:00.210
So, you know, this is what[br]we really need to work on,
0:16:00.210,0:16:02.333
is some of this historical info.
0:16:03.243,0:16:05.060
Last one, I just wanted to just show you,
0:16:05.060,0:16:07.283
which was just an extra[br]bonus one I threw in,
0:16:07.283,0:16:10.875
just to look at the time periods[br]that we actually have,
0:16:10.875,0:16:13.921
the historical ages[br]that we have in Wikidata.
0:16:13.921,0:16:17.524
And so, this is actually just all[br]sub-classes of unit of time.
0:16:17.524,0:16:22.396
And then, this is the actual[br]instance that it was.
0:16:22.396,0:16:23.775
And it's just really interesting.
0:16:23.775,0:16:25.849
This is more the kind of thing--
0:16:26.979,0:16:29.541
In Histropedia Mark II,[br]these are the kind of things
0:16:29.541,0:16:31.944
that will actually will be displayed[br]more under the timeline
0:16:31.944,0:16:33.984
as a sort of a range or period.
0:16:33.993,0:16:36.436
And so, we are particularly interested[br]in these periods
0:16:36.436,0:16:37.976
being really tight and nice,
0:16:37.976,0:16:40.718
because it helps you to, then,[br]say what happened when,
0:16:40.718,0:16:43.983
and you can sound really clever[br]when you talk about when things happened,
0:16:43.983,0:16:47.263
in the Neolithic or the upper[br]Paleolithic, or whatever.
0:16:47.263,0:16:49.121
I'm still pretty clueless on most of it,
0:16:49.121,0:16:51.918
because I'm just kind of just waiting[br]for the data to be up to scratch.
0:16:51.918,0:16:55.163
Great. I think I can actually[br]round it up there.
0:16:55.163,0:16:57.145
Loads more exciting queries to come.
0:16:57.145,0:17:00.420
A lot more features and cool stuff,[br]actually, just around the corner for us,
0:17:00.420,0:17:02.758
because we've just finished[br]a lot of cool things.
0:17:02.758,0:17:05.471
But there's a little bit of time[br]to pull it all together.
0:17:05.471,0:17:07.373
So, look out for more.
0:17:07.373,0:17:09.760
If there's any questions,[br]I think I've got one minute.
0:17:09.760,0:17:11.458
So, it would have to be one.
0:17:11.510,0:17:13.253
(host) Yes, Nav.[br]I forgot to introduce you.
0:17:13.253,0:17:16.933
I'm sorry. That's Nav, as he said,[br]Histropedia, Evans. Thank you very much.
0:17:16.933,0:17:17.986
Thank you. Cheers. Yeah.
0:17:17.986,0:17:19.450
(host) Very fast questions.
0:17:19.450,0:17:21.815
Anyone with a very fast question[br][inaudible].
0:17:24.654,0:17:29.230
(woman 2) Very quickly, how can[br]I do my own, if I want languages,
0:17:29.230,0:17:30.818
when do we start, for instance.
0:17:30.818,0:17:32.031
Absolutely. Good question.
0:17:32.031,0:17:34.320
So just click on the--[br]oh, I've shared this.
0:17:34.320,0:17:36.853
It's called cosmic timelines on the URL.
0:17:36.853,0:17:40.911
Should be cosmic and geological,[br]but then it's not a short URL anymore.
0:17:40.911,0:17:43.711
So, you click on this icon[br]in the top corner there,
0:17:43.711,0:17:47.431
and then, you get to the query page,[br]which is like the home page of this tool.
0:17:47.431,0:17:49.311
This is where the query is pasted in.
0:17:49.311,0:17:51.491
So, at the moment,[br]I've got the language there.
0:17:51.491,0:17:53.483
If I want to change it to something else,
0:17:53.483,0:17:56.062
Arabic, or French, or whatever--
0:17:56.062,0:17:58.271
and here are the-- this is the area
0:17:58.271,0:18:03.092
where you sort of enter in exactly[br]which variables in your query
0:18:03.092,0:18:04.600
you would like to do each thing.
0:18:04.600,0:18:06.781
If you put nothing in,[br]it will try and figure it out.
0:18:06.781,0:18:09.971
But if you want advanced stuff--[br]and really important, is the precision,
0:18:09.971,0:18:13.033
because that's not available[br]on the query service timeline.
0:18:13.033,0:18:14.123
So, you get everything--
0:18:14.123,0:18:16.303
is the first of January[br]10 billion years ago,
0:18:16.303,0:18:18.363
you know, which is not[br]what we want to see.
0:18:18.363,0:18:20.603
And the rank, which is quite interesting.
0:18:20.603,0:18:24.173
My timelines are all based[br]on a very simple rank of site link count,
0:18:24.173,0:18:27.058
how many different articles there are,[br]or something else.
0:18:27.058,0:18:29.432
But that's how you go[br]and mess around with it with yourself,
0:18:29.432,0:18:32.034
and you put your color codes[br]and your filters in down here.
0:18:32.034,0:18:34.098
Comma separate them,[br]if you would like more,
0:18:34.098,0:18:36.007
and they come up as options[br]in the final tool.
0:18:36.007,0:18:37.836
And I think that[br]pretty much is it, isn't it.
0:18:37.836,0:18:39.863
So, any other questions,[br]do find me afterwards.
0:18:39.863,0:18:41.655
Always happy to get cornered[br]for this stuff.
0:18:41.655,0:18:42.954
I love talking about it.
0:18:42.954,0:18:44.989
Okay. So, thank you very much. Cheers.
0:18:44.989,0:18:46.948
(applause)
0:19:28.344,0:19:30.220
(mumbles)
0:19:30.265,0:19:32.115
So, where is the first one?
0:19:33.854,0:19:35.397
This one, no.
0:19:45.636,0:19:47.132
This? Sorry.
0:19:48.270,0:19:50.090
Is it full screen?
0:19:50.217,0:19:52.129
Yep. Full screen.
0:19:54.747,0:19:56.289
Well, good work.
0:19:58.388,0:19:59.434
[Strike.]
0:19:59.497,0:20:02.312
Yeah, so, okay. Thank you.
0:20:04.752,0:20:07.062
So, hi, I'm Thibaud Senalada.
0:20:07.062,0:20:08.952
As [inaudible] introduced me.
0:20:09.552,0:20:14.212
I'm a software engineer[br]at the French National Library.
0:20:14.992,0:20:18.349
And I'm here today[br]to talk to you about NOEMI,
0:20:18.979,0:20:23.682
which is a software, a proof of concept,
0:20:23.682,0:20:26.501
and a [inaudible] software
0:20:26.635,0:20:29.961
to the French Library to cataloging.
0:20:30.787,0:20:32.870
Sorry. [inaudible].
0:20:32.870,0:20:35.359
Sorry for my English. It's a bit of fuzzy.
0:20:36.971,0:20:39.321
And so, what's NOEMI?
0:20:39.321,0:20:41.589
So, NOEMI stands for:
0:20:41.589,0:20:44.591
Nouer les oeuvres, expressions,[br]Manifestations et Items.
0:20:44.591,0:20:46.533
Which, in English, is:
0:20:46.533,0:20:49.891
to link work, expression,[br]manifestation, and items.
0:20:51.086,0:20:58.057
It's based on the FRBR,
0:20:58.057,0:21:00.633
and [inaudible].
0:21:00.881,0:21:03.105
Yeah. Anyway.
0:21:03.631,0:21:04.839
So, yeah.
0:21:05.244,0:21:09.540
So, this software,[br]we use to produce metadata.
0:21:10.841,0:21:12.201
It will be used
0:21:12.201,0:21:17.831
by 600 people on a daily basis.
0:21:18.911,0:21:24.271
And as I say in the title,[br]it will be based on Wikibase.
0:21:25.415,0:21:31.871
So, there is also a format manager.
0:21:32.388,0:21:39.138
So, people using this software[br]will use like a code editor,
0:21:39.254,0:21:41.817
but for MARC format.
0:21:41.968,0:21:45.178
So, it's [inaudible], things like that.
0:21:46.814,0:21:49.868
A data processing tool, like I said.
0:21:49.959,0:21:53.040
And also, authorization management,
0:21:54.327,0:21:56.378
because they will need a--
0:21:57.337,0:22:01.417
if there is some data,[br]where it can be modified.
0:22:05.877,0:22:07.840
So, the PoC context.
0:22:08.728,0:22:12.738
So, this software will be replacing[br]an old software,
0:22:12.855,0:22:15.688
called ADCAT02.
0:22:17.111,0:22:20.964
It is part of the bibliographic[br]transition.
0:22:20.984,0:22:24.554
So, I say the [inaudible].
0:22:25.359,0:22:29.390
[inaudible]. [inaudible] in English?
0:22:30.254,0:22:31.662
Format.
0:22:32.717,0:22:35.734
And it will be the [inaudible] of the--
0:22:39.979,0:22:41.090
Sorry.
0:22:42.349,0:22:46.560
It will be [inaudible][br]all the [inaudible]
0:22:46.560,0:22:49.689
of the BnF with data.
0:22:51.731,0:22:54.124
And so, doing this work,
0:22:54.124,0:22:59.693
we accessed Wikibase to see[br]if it fits our needs.
0:23:01.244,0:23:03.383
And [inaudible] pretty good.
0:23:04.485,0:23:06.930
So, why Wikibase?
0:23:06.930,0:23:08.821
Because of the flexibility of the format.
0:23:08.835,0:23:11.646
We arrive--
0:23:11.850,0:23:16.388
to inject MARC, INTERMARC for BnF--
0:23:16.960,0:23:18.350
in the database.
0:23:18.399,0:23:22.803
And use it to-- use this link management
0:23:22.803,0:23:25.529
between entities using Blazegraph,
0:23:25.529,0:23:27.776
so, as Wikibase does.
0:23:29.155,0:23:32.700
We also choose Wikibase,[br]because it was already--
0:23:35.183,0:23:38.900
it handles history and user account.
0:23:39.941,0:23:42.414
So, it's easiest for us.
0:23:43.106,0:23:48.270
And it also has a good--[br]it's pretty easy to create bots
0:23:48.270,0:23:51.090
to watch and curate data
0:23:51.840,0:23:53.430
and also to make statistics.
0:23:54.820,0:23:57.170
It's free and open, and sustainable.
0:23:57.908,0:23:59.084
Yeah, so.
0:23:59.610,0:24:02.519
I'm sorry if you don't[br]understand what I say,
0:24:02.519,0:24:04.839
because I know my English[br]is not that good.
0:24:07.720,0:24:12.139
But during this PoC,[br]we encountered some trouble.
0:24:12.802,0:24:13.938
Okay.
0:24:14.790,0:24:21.117
First of all, as a search engine,[br]I think we have to create
0:24:21.117,0:24:24.150
another--
0:24:24.185,0:24:28.988
not another, a supplementary[br]search engine to use it with,
0:24:29.433,0:24:31.120
to fit our needs.
0:24:31.688,0:24:37.155
Because we need some search
0:24:37.155,0:24:42.366
like faceted search and filters.
0:24:43.755,0:24:47.525
Also we have the [inaudible],
0:24:47.525,0:24:50.407
of using postgreSQL database.
0:24:50.407,0:24:54.885
And for the moment,[br]I think Wikibase [inaudible].
0:24:56.436,0:25:01.266
And when we try to use postgreSQL,[br]it was a bit difficult,
0:25:01.266,0:25:04.394
and will cause some issues.
0:25:05.662,0:25:08.825
And we have also some fear[br]about performance,
0:25:08.825,0:25:15.238
because the catalog is about[br]20 million entities,
0:25:16.366,0:25:19.146
20 million bibliographic entities.
0:25:19.146,0:25:22.851
That can be more[br]than 20 million entities, actually.
0:25:23.276,0:25:27.771
And we don't know the time[br]that we'll have to inject them
0:25:27.809,0:25:30.765
in the Wikibase, and how to do it.
0:25:32.198,0:25:34.267
So, [inaudible],
0:25:34.324,0:25:39.616
but the real software development[br]has already started.
0:25:43.242,0:25:46.175
We start by creating[br]an interface with Wikibase.
0:25:46.261,0:25:47.711
We're using Java.
0:25:48.091,0:25:50.093
Like PyWikibase.
0:25:51.691,0:25:54.888
- (man) Pywikibot.[br]- Pywikibot. Yeah, thank you.
0:25:56.027,0:25:57.723
The same way, but in Java.
0:25:59.309,0:26:02.831
We also inject already the format[br]into the Wikibase.
0:26:03.540,0:26:09.093
And we do something[br]like the INTERMARC editor,
0:26:09.458,0:26:12.134
[inaudible], et cetera.
0:26:13.672,0:26:14.926
Thank you.
0:26:15.333,0:26:17.135
(applause)
0:26:23.527,0:26:24.749
Yeah.
0:26:27.748,0:26:29.813
(man 2) Faceted search[br]will be a nice feature
0:26:29.813,0:26:31.885
in the Wikidata UI itself.
0:26:31.924,0:26:34.062
So, have you talked[br]to any of the developers,
0:26:34.062,0:26:35.675
or is that something[br]that could be done?
0:26:35.711,0:26:37.108
Sorry, I don't understand.
0:26:37.108,0:26:39.041
(man 2) The faceted search idea.
0:26:39.911,0:26:41.982
It would be nice to be able[br]to search only humans,
0:26:41.982,0:26:44.221
or search only works, or something, right?
0:26:44.321,0:26:47.991
Yeah. I'm sorry, I don't-- I don't--
0:26:48.131,0:26:50.436
(man 2) Yeah, I mean, so,[br]it would be nice if we had that
0:26:50.436,0:26:52.265
in Wikidata itself in the UI.
0:26:52.822,0:26:53.954
Yeah, yeah, yeah.
0:26:54.088,0:26:56.077
[inaudible]
0:26:56.077,0:26:57.911
Yeah, okay, thank you.
0:26:57.911,0:27:00.026
I'm sorry. (laughs)
0:27:01.186,0:27:03.902
Yeah, yeah. But I think we will--
0:27:04.506,0:27:07.266
I don't know if we want[br]to do it inside Wikibase,
0:27:07.266,0:27:10.746
or in our next systems.
0:27:10.785,0:27:15.186
For the moment,[br]we don't really solve that.
0:27:15.965,0:27:17.885
For the moment, I think.
0:27:17.885,0:27:19.285
Sorry.
0:27:27.645,0:27:30.644
(man 3) I suppose on the topic[br]of the faceted search,
0:27:32.535,0:27:35.068
Wikidata, SPARQL Query, Wikibase--
0:27:35.068,0:27:38.965
SPARQL Query is I think,[br]functionally equivalent
0:27:38.965,0:27:41.405
to a facetable search.
0:27:42.105,0:27:44.234
So, it's mostly an interface issue, right?
0:27:44.284,0:27:47.791
I mean, you could build an interface[br]that starts with a query,
0:27:47.791,0:27:51.111
and then, gives you[br]possible facets to filter by.
0:27:51.370,0:27:52.660
And when you click one of them,
0:27:52.660,0:27:55.217
it adds a condition[br]to the SPARQL Query, right?
0:27:55.664,0:27:58.183
Yeah, but I think the SPARQL--
0:27:59.157,0:28:04.310
they don't go as detailed[br]as we want, as we have--
0:28:05.632,0:28:09.631
When we inject the format,[br]we use a statement for--
0:28:10.525,0:28:13.124
the format is like XML.
0:28:13.223,0:28:15.842
So, it's a zone, subzone, and value.
0:28:16.413,0:28:20.292
And in the [inaudible] statement,[br]we add the subzone,
0:28:20.892,0:28:22.902
because the zone was already there.
0:28:23.002,0:28:28.565
And we want to query[br]some qualifier on this.
0:28:28.659,0:28:35.206
And I don't know if the SPARQL[br]goes through that-- I'm sorry--
0:28:36.145,0:28:38.277
in a fast way.
0:28:40.025,0:28:46.285
I think we need some index[br]for us to [inaudible].
0:28:46.925,0:28:48.145
Yeah.
0:28:48.145,0:28:50.250
(man 3) SPARQL doesn't do a query--
0:28:52.321,0:28:55.703
To do proper string searches[br]in SPARQL is very hard.
0:28:55.703,0:28:57.610
You have to have filters, which are slow,
0:28:57.610,0:28:59.815
and it really doesn't work that well.
0:28:59.815,0:29:02.845
So, it's a different[br]search problem, really.
0:29:06.871,0:29:09.270
More question? If anyone has one?
0:29:12.215,0:29:13.999
- Great. Thank you.[br]- Thank you.
0:29:14.044,0:29:15.895
(applause)
0:29:37.766,0:29:41.960
(host) Nielsen speaking about[br]the tool Ordia. Thank you.
0:30:05.084,0:30:06.460
So, I'm Finn Årup Nielsen,
0:30:06.460,0:30:09.006
and a couple of years ago,[br]I started Scholia
0:30:09.006,0:30:14.611
that displays data from Wikidata[br]via a SPARQL Query
0:30:14.611,0:30:16.359
to the Wikidata Query Service
0:30:16.359,0:30:18.959
so we can generate, for example,[br]a list of publications
0:30:18.959,0:30:20.380
for a specific author.
0:30:20.866,0:30:26.941
Now, last year, Wikidata[br]introduced lexicographic data.
0:30:29.332,0:30:32.655
And I [inaudible] the idea of Scholia
0:30:32.655,0:30:39.279
that is using Wikidata[br]and the Wikidata Query Service
0:30:39.445,0:30:42.036
to generate overviews[br]of lexicographic data.
0:30:42.585,0:30:46.125
So, Ordia is the example of this one here.
0:30:46.197,0:30:51.998
So, it generates-- it's a web application[br]run from the Toolforge service,
0:30:51.998,0:30:57.198
and for example, it will dynamically[br]generate a page such as--
0:30:57.234,0:31:01.768
This one here is statistics over[br]what there is of lexicographic data
0:31:01.768,0:31:03.841
in Wikidata.
0:31:03.992,0:31:07.404
For example, the number of lexemes,[br]is currently over 200,000.
0:31:08.664,0:31:10.483
So, there's a range of things[br]you can do here.
0:31:10.483,0:31:12.916
You can, for example,[br]look in the aspects of that.
0:31:12.916,0:31:15.560
The menu, there's quite a lot[br]of things here.
0:31:15.560,0:31:18.485
And so, I will search[br]on a specific Danish lexemes.
0:31:19.503,0:31:22.835
"Rød"-- which is "red" in Danish.
0:31:23.376,0:31:27.466
So, you basically get,[br]for the specific lexeme,
0:31:28.286,0:31:30.618
the same type of information[br]that you could see
0:31:30.618,0:31:33.751
in the ordinary part of Wikidata, here.
0:31:34.451,0:31:38.256
Annotations about the lexeme,[br]annotation about the forms,
0:31:39.359,0:31:40.872
single or plural forms.
0:31:41.548,0:31:43.501
Annotation about the sentence.
0:31:44.683,0:31:47.678
But what you can't see[br]in ordinary Wikidata
0:31:47.678,0:31:52.150
is sort of aggregating across lexemes.
0:31:52.246,0:31:54.207
And this is, for example, down here--
0:31:54.207,0:31:55.902
down here with the compound.
0:31:55.902,0:31:57.764
So, in Danish, like in German,
0:31:57.764,0:31:59.950
words can be compounded.
0:31:59.950,0:32:03.478
For example, for "red",[br]we have rødkælk
0:32:03.478,0:32:05.830
which is compounded by two words.
0:32:06.721,0:32:10.085
And we've got, on the second one here,[br]rødvin-- red wine.
0:32:11.060,0:32:15.691
This list here is constructed[br]by a SPARQL Query to the Wikidata Service.
0:32:16.751,0:32:20.406
And also, further down here,[br]we've got a lot of Danish words here.
0:32:20.970,0:32:26.122
Further down here, we should have[br]a graph of the words
0:32:27.426,0:32:29.164
which are compounded from rød.
0:32:29.658,0:32:31.980
We have [rød]-- red here in the middle.
0:32:31.980,0:32:34.372
And for example, around--[br]somewhere around here,
0:32:34.372,0:32:36.895
which should have,[br]for example, "red cabbage,"
0:32:36.936,0:32:40.343
"red cabbage salad,"[br]"red cabbage soup," and so on.
0:32:40.434,0:32:43.055
So you can browse around,[br]in this one here, and see it.
0:32:44.204,0:32:51.188
We can go a bit back here,[br]and then look on the main sense
0:32:51.388,0:32:55.030
of the word rød-- red in Danish.
0:32:55.550,0:33:01.610
So, Ordia automatically generates[br]information about hyponyms.
0:33:02.570,0:33:04.400
Subconcepts, for example,
0:33:04.400,0:33:07.400
light red, dark red,[br]pink, purple, and so on,
0:33:07.525,0:33:14.272
are in the-- when we make[br]a Wikidata Query service, SPARQL Query.
0:33:14.576,0:33:20.570
Then we go around in the Wikidata graph,
0:33:20.626,0:33:22.266
and get this information here.
0:33:22.266,0:33:24.786
And we can also get translation[br]automatically,
0:33:24.786,0:33:28.316
even though it's not necessarily stated[br]within the Wikidata lexemes items.
0:33:28.316,0:33:32.679
For example, here, we have translated[br]rød to "red" in English,
0:33:32.679,0:33:36.089
and röd in Swedish, and so on.
0:33:36.107,0:33:38.191
There's not that very many there.
0:33:38.747,0:33:40.262
There's a range of other things here.
0:33:40.262,0:33:43.487
Let me show you,[br]for example, this one here--
0:33:44.387,0:33:51.308
this is veninde- now I go[br]over to this one here.
0:33:54.308,0:33:57.328
-inde, which is a feminine suffix.
0:33:58.058,0:34:00.498
So, this is auto-generated there,
0:34:00.498,0:34:02.641
it's a combination of "instance of"--
0:34:03.268,0:34:07.171
lexemes that are "instance of"[br]feminine suffixes.
0:34:08.142,0:34:11.519
And for example, for German,[br]we have [inaudible].
0:34:11.519,0:34:15.373
So, -in would be[br]a feminine suffix in German.
0:34:15.704,0:34:21.291
And I put in sort of the five Danish[br]feminine suffixes
0:34:22.571,0:34:24.206
of Danish.
0:34:25.480,0:34:29.106
Another facility is, for example,[br]if you have a text,
0:34:29.106,0:34:34.021
you can copy and paste it[br]into this Text to lexemes here.
0:34:34.571,0:34:35.911
Let me--
0:34:37.482,0:34:41.218
"a car crashed into...
0:34:41.864,0:34:44.141
a green house."
0:34:46.485,0:34:48.701
Let me change that to "English".
0:34:49.006,0:34:50.029
Press Submit.
0:34:50.029,0:34:53.355
Now, Ordia will then extract[br]each of the word here,
0:34:53.355,0:34:54.733
in this sentence here,
0:34:54.733,0:34:58.217
and try to see whether they[br]are entered in the specific form,
0:34:58.217,0:35:00.778
a lexeme, are entered in Wikidata.
0:35:00.778,0:35:04.228
And these simple words here[br]are entered in Wikidata.
0:35:04.228,0:35:09.190
But if we, for example, change it to--[br]there's nothing called "vancar"
0:35:09.190,0:35:13.998
but just let us do that here.
0:35:14.535,0:35:19.532
And you got down here--[br]it's as a blue link
0:35:20.335,0:35:23.295
that you can create a new[br]Wikidata lexeme item.
0:35:24.556,0:35:29.097
But the range of other things to explore
0:35:29.716,0:35:31.496
in this web application.
0:35:31.496,0:35:35.596
And if there's any suggestions,[br]or comments, or notes, or something,
0:35:35.596,0:35:39.337
you can contact me, or put in[br]an issue on GitHub.
0:35:39.337,0:35:44.856
So, this particular application[br]is developed on GitHub,
0:35:44.856,0:35:50.526
and I'm open for new ideas[br]and ways to represent information there.
0:35:51.306,0:35:52.701
Okay, thank you.
0:35:52.701,0:35:54.661
(applause)
0:35:59.328,0:36:00.906
Questions?
0:36:03.262,0:36:04.524
(woman 3) I love your tool.
0:36:04.524,0:36:09.752
Can you show the languages,[br]that which is awesome for me, I think,
0:36:09.752,0:36:11.731
to show other languages.
0:36:12.183,0:36:14.537
So, this is a bit of statistics[br]over the languages,
0:36:14.537,0:36:17.046
and the Russians[br]have been scraping Wictionary,
0:36:17.046,0:36:20.327
and that's why they have now[br]100,000 lexemes.
0:36:24.387,0:36:28.088
There's also a lot of work on Basque here.
0:36:29.566,0:36:32.241
I think there's an organization[br]putting that information in here.
0:36:32.241,0:36:34.932
And you can also see a graph of these--
0:36:34.932,0:36:37.662
this is Number of forms as functions[br]of number of lexemes.
0:36:38.798,0:36:41.279
And all the way up here--
0:36:41.279,0:36:45.255
here, this is Russian,[br]down here, Basque, I think.
0:36:45.476,0:36:47.997
And English, perhaps, down here.
0:36:48.953,0:36:50.692
And also in the Number of senses,
0:36:52.473,0:36:58.360
I think Basque, English, and Russian,
0:37:00.184,0:37:02.048
Hebrew, and so on.
0:37:02.048,0:37:03.343
Yeah.
0:37:11.045,0:37:12.950
(man 4) That looks[br]like an incredible tool.
0:37:12.950,0:37:15.097
But I was just wondering,[br]is it all fully live?
0:37:15.097,0:37:18.344
Is it all based on SPARQL Queries[br]and live or are there some things--
0:37:18.344,0:37:20.458
- Yes. I believe, yes.[br]- Fantastic.
0:37:20.511,0:37:24.961
But as they get more data into Wikidata,
0:37:24.961,0:37:26.100
there's a bit of an issue.
0:37:26.100,0:37:27.328
For example, for Russian here.
0:37:27.328,0:37:31.966
I started out this a year ago[br]when there's not that very many lexemes,
0:37:32.061,0:37:35.503
and so there was no problems[br]with the time-outs.
0:37:35.503,0:37:38.367
But representing it here--
0:37:38.367,0:37:42.268
but if I press Russian,[br]I think there might be some issues.
0:37:42.268,0:37:44.284
There's a count that works here,
0:37:44.284,0:37:46.101
for example, longest words or phrases.
0:37:46.101,0:37:49.252
But I think the lexemes[br]are sort of loading in.
0:37:49.252,0:37:52.727
I think I'll need to fix that[br]as Wikidata grows here.
0:37:53.258,0:37:55.927
As you see, there's a lot[br]of Russian nouns, apparently.
0:37:56.699,0:37:58.451
And I don't know whether the--
0:37:59.351,0:38:01.519
apparently, that's what[br]they're working on.
0:38:01.573,0:38:03.960
There seems also to be[br]a bit of time-out there.
0:38:06.705,0:38:08.033
[inaudible], oh, yes.
0:38:08.115,0:38:09.984
The first one there.
0:38:10.832,0:38:16.110
But apparently, the longest words[br]and phrases is a bit too expansive.
0:38:17.931,0:38:20.334
But apparently, it can be loaded there,[br]and it's probably--
0:38:21.318,0:38:23.167
it's loaded all the 100,000 there,
0:38:23.167,0:38:27.938
so you can click all 10,000 pages.
0:38:36.748,0:38:38.678
(host) If there aren't[br]any other questions--
0:38:39.564,0:38:40.950
The longest word came now.
0:38:40.950,0:38:43.146
So, it's, yeah.
0:38:44.972,0:38:46.390
Probably--
0:38:47.855,0:38:49.975
[inaudible]
0:38:50.321,0:38:51.540
What is that?
0:38:51.540,0:38:53.518
- (audience) It's a chemical.[br]- A chemical, yes.
0:38:56.317,0:38:58.303
(host) More questions? Or shall we?
0:38:59.792,0:39:02.332
Alright, alright. Thank you very much.
0:39:02.332,0:39:04.392
(applause)
0:39:23.642,0:39:25.121
(Nicolas) Is it good?
0:39:31.008,0:39:32.346
(host) Awesome.
0:39:34.920,0:39:38.137
Alright, now, to wrap it up,[br]we have Nicolas Vigneron,
0:39:38.137,0:39:40.778
talking about Wikisource and Wikidata.
0:39:41.469,0:39:42.804
(Nicolas) This is good?
0:39:44.542,0:39:46.126
Who knows Wikisource?
0:39:47.582,0:39:48.959
Yay!
0:39:50.740,0:39:53.582
More and more people[br]raising hands every year.
0:39:53.582,0:39:54.957
That's good.
0:39:55.282,0:40:01.462
So, this morning, [Lydia] said that[br]Wikivoyage was the first real user of--
0:40:03.306,0:40:05.987
[inaudible]
0:40:06.572,0:40:08.347
Wikisource is not that far behind.
0:40:09.230,0:40:13.280
There's a lot to do,[br]and I want to do some basic numbers,
0:40:13.280,0:40:16.964
statistics, about where we are,[br]and where I want to go.
0:40:17.613,0:40:23.409
So first, there will be a lot of questions[br]of what is a book,
0:40:23.409,0:40:25.389
what is bibliographical data.
0:40:25.389,0:40:27.229
People from the BnF can agree with me.
0:40:27.229,0:40:29.969
That can be a nightmare[br]if you go into details.
0:40:30.164,0:40:35.803
But some big numbers that--[br]Google Books tried to do an estimation
0:40:35.803,0:40:39.676
on how many "books," air quote books,[br]there is in the world,
0:40:39.676,0:40:43.005
and there's 130 million books[br]in the world.
0:40:43.705,0:40:47.279
And, yeah, let's put them all on Wikidata.
0:40:47.650,0:40:49.300
Or not. I don't know.
0:40:49.392,0:40:51.049
But where are we now?
0:40:51.413,0:40:52.468
And why is it books?
0:40:52.468,0:40:55.668
Because for Google Books,[br]everything is scanned, basically.
0:40:55.795,0:40:58.670
They don't have exactly[br]a very clear distinction.
0:40:59.400,0:41:04.350
There's sometimes, two-page books,[br]which [inaudible], Google Books is a book.
0:41:04.714,0:41:10.131
But for many people, you have to have[br]at least 50 pages to be a book.
0:41:10.536,0:41:12.321
So, that's always hard to count.
0:41:12.885,0:41:15.603
But here's what we know on Wikidata.
0:41:15.603,0:41:18.704
This the graph of what[br]is a book for Wikidata.
0:41:18.704,0:41:21.524
You have-- that's totally [inaudible]--
0:41:21.524,0:41:23.979
but that's Wikidata,[br]literary work as well.
0:41:23.979,0:41:27.194
And this is all the subclasses,[br]or subclasses of subclasses--
0:41:27.194,0:41:30.334
or subclasses of subclasses[br]of what is a book.
0:41:30.804,0:41:32.705
So, that's very hard to do.
0:41:32.737,0:41:34.253
I can do a graph like that,
0:41:34.253,0:41:36.833
but SPARQL Query engine doesn't work
0:41:36.833,0:41:41.523
if I want to count everything[br]that is instance of these subclasses,
0:41:41.523,0:41:45.143
and basically, SPARQL says no, time-out.
0:41:45.633,0:41:47.020
So, what's the problem?
0:41:47.020,0:41:50.713
But I know already that there's[br]a lot of subclasses,
0:41:50.713,0:41:52.153
but we need to look into it.
0:41:52.153,0:41:57.943
And probably, if you know Wikidata,[br]on the page, Wikidata point statistics,
0:41:58.643,0:42:02.647
you have all the numbers by big classes,
0:42:02.647,0:42:07.047
and you all probably know[br]that the big chunk here
0:42:07.047,0:42:08.642
is scholarly articles,
0:42:08.707,0:42:12.749
which is, thanks to[br]the WikiCite project, in particular,
0:42:14.113,0:42:17.125
which can be books or not,[br]depending on definition.
0:42:19.062,0:42:22.508
You see that there's no subclass books,
0:42:23.032,0:42:26.034
because there's not enough to show.
0:42:26.049,0:42:28.472
It's probably somewhere in the others,
0:42:28.472,0:42:30.127
the purple area is others.
0:42:30.163,0:42:34.115
And there's a lot of things[br]that's under one percent.
0:42:34.162,0:42:38.821
So, basically, we can say[br]that we have less one percent
0:42:38.821,0:42:42.131
of things identified as books in Wikidata.
0:42:42.551,0:42:46.091
Maybe there is more books,[br]but not identified as such.
0:42:47.842,0:42:49.284
I'm talking about books,
0:42:49.383,0:42:51.768
but when we are talking[br]about bibliographical data,
0:42:51.768,0:42:53.920
there's also the author, person,
0:42:53.920,0:42:58.472
so maybe some of the human here[br]are also authors, surely.
0:43:00.068,0:43:03.221
And we need to do another count,[br]which is another big query to do.
0:43:03.602,0:43:05.301
That times out, so--
0:43:05.396,0:43:08.015
I have a lot of not number[br]to this, sorry.
0:43:10.619,0:43:14.332
So, yeah, basically, this first slide[br]is about how it's complicated
0:43:14.332,0:43:19.122
to know how much we have of what,[br]and how to count them.
0:43:19.445,0:43:21.091
So, yeah, hard to count.
0:43:21.618,0:43:23.280
What we know--
0:43:24.133,0:43:26.618
that is we have a lot of properties--
0:43:27.185,0:43:29.684
700,000, I guess,
0:43:30.208,0:43:31.680
now on Wikidata.
0:43:32.593,0:43:35.952
We know that we have a lot of identifiers[br]among these properties.
0:43:36.721,0:43:42.538
And we know that almost 4,000[br]are properties for identifiers
0:43:43.146,0:43:45.623
relative to bibliographical,
0:43:45.737,0:43:49.862
like ID at the National Library of France,
0:43:49.862,0:43:52.251
National Library of Yaddi, Yaddi, Yada,
0:43:52.251,0:43:56.681
because we love identifier[br]of National Library on Wikidata.
0:43:56.681,0:44:00.271
So, we have almost all libraries,[br]national libraries and more.
0:44:01.101,0:44:03.796
So, we have a lot of properties.[br]I know that.
0:44:05.071,0:44:06.727
And we are widely used.
0:44:06.834,0:44:10.053
I know that, for instance,[br]BnF properties use--
0:44:10.579,0:44:12.772
BnF is National Library of France--
0:44:12.772,0:44:18.989
is used 1 million times--[br]OCOC, VIAF, or the big like that.
0:44:21.001,0:44:24.202
A lot of uses in Wikidata.
0:44:25.426,0:44:28.980
But it's not because we have[br]a lot of uses of various properties
0:44:28.980,0:44:30.666
in Wikidata that it's complete.
0:44:31.266,0:44:33.758
As Thibaud said, there's more[br]than 20 million books,
0:44:33.758,0:44:37.099
[inaudible], which is more as entities.
0:44:37.837,0:44:39.569
And we have only 1 million,
0:44:39.569,0:44:43.538
so we have 19 million still to do.
0:44:45.177,0:44:47.276
Also, what we know from the Wikidata side,
0:44:47.276,0:44:51.918
is that we have a good--[br]very quite active Wikidata project,
0:44:51.918,0:44:53.840
called WikiProject Books,
0:44:54.332,0:44:58.127
where we have a model we kind of agree on,
0:44:58.181,0:45:00.916
which is not always followed,[br]which is, again, a problem.
0:45:00.956,0:45:02.710
What is a book? You know it.
0:45:03.414,0:45:05.385
I only have five minutes,[br]so, I'll keep going.
0:45:06.090,0:45:08.880
And then, I'm a Wikisourcean,[br]so, Wikisourcer.
0:45:09.426,0:45:11.930
So, I wanted to know[br]the other way around
0:45:11.930,0:45:13.496
what is from Wikisource already,
0:45:13.496,0:45:16.406
because Wikisource is already[br]inside the Wikimedia project.
0:45:16.406,0:45:19.883
A lot of bibliographical records[br]and information.
0:45:19.883,0:45:23.161
So, in the 66 million items on Wikidata,
0:45:23.161,0:45:28.850
already 1 million are linked[br]to Wikisource.
0:45:29.330,0:45:31.890
[inaudible].
0:45:32.350,0:45:36.080
So, that's very few,[br]but that's quite a lot.
0:45:37.496,0:45:40.174
There's a lot of author.
0:45:40.174,0:45:44.670
There's some books, texts,[br]work, edition, whatever.
0:45:45.271,0:45:48.425
Not always well-arranged.
0:45:48.869,0:45:50.600
And there's a lot of internal pages,
0:45:50.600,0:45:53.150
like categories and templates,[br]and things like that.
0:45:53.194,0:45:54.984
But still, 1 million in total.
0:45:58.329,0:46:01.767
The Wikisource community[br]are often small communities,
0:46:01.767,0:46:05.010
like on the French community Wikisource,
0:46:05.010,0:46:07.537
which is one of the biggest,[br]there's 50 people.
0:46:07.537,0:46:08.787
That's the biggest we have.
0:46:09.047,0:46:12.937
So, we love Wikidata, because,[br]hey, they did a lot of work for us.
0:46:12.942,0:46:15.131
So, just take it from Wikisource.
0:46:15.131,0:46:19.885
So, in this small community,[br]we love to reuse Wikidata data.
0:46:20.935,0:46:24.076
Right now, we use a lot of a tool[br]which is called WEF--
0:46:24.358,0:46:27.978
Wikidata Edit Framework-- thank you.
0:46:29.318,0:46:33.098
And we are eager to see[br]how Wikidata Bridge will work.
0:46:33.438,0:46:36.798
And we are trying to do things[br]with a team in Wikidata
0:46:37.638,0:46:40.678
in Wikipedia Deutschland team,[br][inaudible].
0:46:41.007,0:46:43.934
And there's a lot[br]of collaboration in the future
0:46:43.934,0:46:46.586
that we want to do: better integrate,
0:46:47.636,0:46:51.068
do everything in one click when you import[br]a first book in Wikisource,
0:46:51.068,0:46:52.465
things like that.
0:46:53.364,0:46:57.664
Better-- do links between[br]edition in Wikidata.
0:46:57.852,0:46:59.492
That needs to be done.
0:47:00.041,0:47:02.282
The Foundation is doing the wish list now,
0:47:02.282,0:47:04.853
and we have a lot of requests about that.
0:47:05.938,0:47:07.342
And yeah, that's it.
0:47:07.342,0:47:09.116
That was just a short overview.
0:47:09.116,0:47:15.272
So, if you have some questions,[br]I'll take them and be available later,
0:47:15.712,0:47:17.112
if you want to.
0:47:17.723,0:47:19.722
(applause)
0:47:25.639,0:47:28.281
Come on, you love Wikisource,[br]you have questions!
0:47:33.989,0:47:35.775
(woman 4) I asked you[br]already this in August,
0:47:35.775,0:47:38.411
and I wonder if this has already changed.
0:47:38.411,0:47:42.337
What is the biggest problem you have[br]in Wikisource right now,
0:47:42.337,0:47:43.761
from your perspective?
0:47:44.167,0:47:45.670
The first one, only? (chuckles)
0:47:48.314,0:47:54.152
I think because it's a small community,[br]we need efficient tools that work easily,
0:47:54.152,0:47:57.148
because we have very few people,
0:47:57.148,0:47:59.464
so we need tool that are easy to use
0:47:59.464,0:48:04.247
and a one-click solution[br]to [inaudible] a bit,
0:48:04.371,0:48:05.607
that's a big dream.
0:48:05.607,0:48:07.179
I think that's what's most important,
0:48:07.179,0:48:10.485
because that's the threshold[br]in Wikisource, it's a small community.
0:48:11.204,0:48:13.241
I think this is the most important.
0:48:14.615,0:48:15.975
[inaudible]
0:48:16.867,0:48:19.600
(man 5) I'm curious if you can speak[br]to your opinion,
0:48:19.600,0:48:23.154
or the French Wikisource opinion,[br]or maybe you spoke to other communities
0:48:23.154,0:48:29.834
about the notion of not including[br]metadata about all the world's books.
0:48:30.234,0:48:31.635
That was mentioned in the morning.
0:48:31.635,0:48:34.965
Maybe other Wikibases,[br]and other federated databases
0:48:34.965,0:48:38.026
will have that information,[br]and Wikidata won't.
0:48:39.159,0:48:41.494
How does that feel for Wikisource?
0:48:43.981,0:48:45.502
This is my very personal opinion.
0:48:45.502,0:48:47.386
I know that people[br]in the Wikisource community
0:48:47.386,0:48:48.723
disagree with that.
0:48:48.723,0:48:50.537
But I think we need to stay--
0:48:50.537,0:48:53.194
an external Wikibase[br]is not a good solution,
0:48:53.194,0:48:55.353
because we have Shakespeare on Wikisource,
0:48:55.353,0:48:58.323
and we have Shakespeare on Wikipedia.
0:48:58.564,0:49:01.295
So, we need to interlink,[br]and interlink is there.
0:49:01.295,0:49:04.007
Or like, Romeo and Juliet,[br]we have them both.
0:49:04.007,0:49:07.229
So, we are still pretty close[br]to Wikipedia.
0:49:07.433,0:49:09.431
And the difference with WikiCites--
0:49:09.431,0:49:12.515
with WikiCite, we have a lot of items[br]which are small.
0:49:14.372,0:49:16.051
Wikisource is the other way around.
0:49:16.150,0:49:18.281
We have few items, who are big.
0:49:18.281,0:49:20.515
Which can be a scaling problem[br]and everything,
0:49:20.515,0:49:23.615
but it's quite a small subset of data.
0:49:23.683,0:49:27.539
So, my personal opinion[br]is we should stay in the Wikidata.
0:49:28.391,0:49:32.117
Again, because we are not[br]very much a lot of people,
0:49:32.117,0:49:34.287
so we need to stay,[br]with the tool we know,
0:49:34.287,0:49:35.846
don't change too much the tools
0:49:35.846,0:49:37.736
for the small community, please.
0:49:37.769,0:49:39.282
So, that's it.
0:49:39.282,0:49:40.910
But I know that other people disagree.
0:49:40.910,0:49:44.579
You can talk to [Sadeep] if you want.[br]He will have another point of view.
0:49:46.119,0:49:49.319
Thank you. I think, last question, maybe.
0:49:51.234,0:49:54.446
(man 6) Sometimes, I find it difficult[br]to link the Wikidata item
0:49:54.446,0:50:00.976
with a Wikisource article,[br]because there's a Wikisource novel--
0:50:01.079,0:50:06.128
might be split over several pages,[br]and there's an index page,
0:50:06.128,0:50:08.853
and there's perhaps a front page,[br]or something like that.
0:50:08.853,0:50:12.053
Do you have that problem,[br]or is that a general problem, or--
0:50:12.092,0:50:16.892
Yeah, that's one of the first ideas[br]on the wish list
0:50:16.892,0:50:19.092
for the Foundation, actually.
0:50:19.092,0:50:20.790
Yeah, because Wikipedia is on the--
0:50:20.790,0:50:22.772
if you know the [inaudible] organization,
0:50:22.772,0:50:26.598
Wikipedia is on the work level,[br]and Wikisource on the edition level.
0:50:26.598,0:50:28.572
So, already, you have a problem there.
0:50:28.572,0:50:30.931
And then, we have several editions[br]of the same work,
0:50:30.931,0:50:34.014
and we have sub-chapters[br]and things inside the edition.
0:50:34.014,0:50:41.001
So, yeah, that's one too many problems[br]which is hard to solve by nature.
0:50:41.555,0:50:44.839
But there's maybe a tool[br]that can help to solve that.
0:50:45.893,0:50:47.469
Hopefully.
0:50:49.172,0:50:51.395
And that's time, ladies and gentlemen.
0:50:51.398,0:50:53.283
So, thank you very much, Nicolas.
0:50:53.335,0:50:55.137
(applause)
0:50:59.010,0:51:01.127
And please join me giving[br]one more round of applause
0:51:01.127,0:51:03.147
to all of our wonderful speakers.
0:51:03.147,0:51:04.901
(applause)