WEBVTT
00:00:06.009 --> 00:00:09.069
(host) Hello, everyone. Thank you
for coming to these lightning talks.
00:00:09.069 --> 00:00:11.529
Our first speaker, I'm going
to run straight into it,
00:00:11.529 --> 00:00:13.781
is going to be Rosie
Stephenson-Goodknight.
00:00:13.781 --> 00:00:15.319
Did I get that right?
00:00:15.319 --> 00:00:19.609
Yes. And so she's going to be talking
about the Women Writers Project.
00:00:19.609 --> 00:00:22.569
And we're going to--
yeah, is that right? Great.
00:00:22.569 --> 00:00:24.299
And so, we're going
to just launch right in,
00:00:24.299 --> 00:00:26.699
and I want to remind you,
if there's time for questions,
00:00:26.699 --> 00:00:28.802
to please not speak
until you have the microphone.
00:00:28.802 --> 00:00:30.329
Thank you.
00:00:31.589 --> 00:00:34.125
(Rosie) Hi, everyone, and thanks
for coming to this session,
00:00:34.125 --> 00:00:36.829
where we're going to talk
about Women Writers in Review,
00:00:36.829 --> 00:00:40.329
cultures of reception associated
with trans-Atlantic,
00:00:40.329 --> 00:00:43.977
English language women writers,
broadly construed.
00:00:44.523 --> 00:00:48.387
Women Writers in Review is an initiative
of the Women Writers Project
00:00:48.387 --> 00:00:50.535
of Northeastern University.
00:00:50.535 --> 00:00:55.253
It moved there from Brown University,
approximately 15 years ago.
00:00:55.993 --> 00:01:00.287
Women Writers in Review is a collection
of 18th- and 19th-century reviews,
00:01:00.287 --> 00:01:04.281
publication notices,
literary histories, and other texts
00:01:04.281 --> 00:01:09.511
corresponding to trans-Atlantic--
so, UK and US mostly,
00:01:09.511 --> 00:01:12.953
though a few Canadian--
written works by women.
00:01:13.255 --> 00:01:15.683
It's a project where the two universities,
00:01:15.683 --> 00:01:18.133
Brown University
and Northeastern University,
00:01:18.133 --> 00:01:22.645
started collecting the manuscripts
of women from this period.
00:01:23.337 --> 00:01:27.520
And then they started collecting
the reviews of these works,
00:01:27.520 --> 00:01:31.593
and then they started scoring
these reviews by giving them a rating.
00:01:32.321 --> 00:01:36.144
It's designed to investigate
the discourse of reception and connection
00:01:36.144 --> 00:01:39.333
with the changing trans-Atlantic
literary landscape
00:01:39.333 --> 00:01:42.664
for the period 1770 to 1830.
00:01:46.143 --> 00:01:49.103
You're going to pardon me if I speak fast,
because I've got five minutes
00:01:49.103 --> 00:01:50.646
to go over this.
00:01:50.646 --> 00:01:55.443
It includes 690 English language texts
responding to works
00:01:55.443 --> 00:01:59.565
written or translated
by 18th- and 19th-century women writers.
00:01:59.593 --> 00:02:04.813
There are 74 authors in the corpus,
using 112 different sources,
00:02:04.813 --> 00:02:07.782
or periodicals, or magazines.
00:02:07.782 --> 00:02:10.773
And there are 628 critical reviews.
00:02:11.867 --> 00:02:14.671
Here's a picture that shows you
what we're talking about
00:02:14.671 --> 00:02:16.573
in terms of a review.
00:02:16.573 --> 00:02:18.819
And you can also see what kind of scores
00:02:18.819 --> 00:02:25.403
were given by the academics
at Northeastern University.
00:02:25.833 --> 00:02:28.922
Most of these are women
who were giving scores
00:02:28.922 --> 00:02:34.031
based on the reviews that were done
mostly, probably all men,
00:02:34.031 --> 00:02:39.799
back in this time period 1770 to 1830
of works written by women.
00:02:39.799 --> 00:02:43.469
By works, we're talking about plays,
and novels, and poems,
00:02:43.469 --> 00:02:46.955
essays, and other kinds of articles.
00:02:48.615 --> 00:02:50.275
So, what are we talking about?
00:02:50.275 --> 00:02:54.676
This required creating
items for authors for their works,
00:02:54.676 --> 00:02:57.946
like I said, novels and plays and poems.
00:02:57.946 --> 00:03:04.938
It required creating new items
for this period of time
00:03:05.038 --> 00:03:08.391
where there are defunct periodicals.
00:03:08.391 --> 00:03:12.499
It required creating items
for the scholarly articles.
00:03:12.578 --> 00:03:16.900
And then the review scores of each,
and the review score by,
00:03:16.943 --> 00:03:19.998
which in this case would be
Women Writers in Review,
00:03:19.998 --> 00:03:23.336
and what we still need to add
is the described by source.
00:03:25.226 --> 00:03:28.970
This gives you a picture
of the kind of spreadsheets,
00:03:28.970 --> 00:03:31.397
Google Spreadsheets,
that I have been working on.
00:03:31.397 --> 00:03:34.296
I shouldn't just say I,
because I've had a lot of help.
00:03:34.296 --> 00:03:37.546
I've had a lot of people
who were working on this project with me.
00:03:37.546 --> 00:03:40.413
And you can see at the top,
something about the authors,
00:03:40.413 --> 00:03:41.736
about the works.
00:03:41.736 --> 00:03:45.496
The third group is going to be
the periodical,
00:03:45.496 --> 00:03:48.006
and then, how the scores started showing.
00:03:49.203 --> 00:03:52.122
And of course, this is how they look--
00:03:52.122 --> 00:03:57.396
the beauty of being able to present
the preliminary findings.
00:03:57.856 --> 00:04:01.767
Once we have uploaded all of the data,
00:04:02.989 --> 00:04:05.906
and I hope that that's going to be done
by the end of this year,
00:04:06.956 --> 00:04:08.496
this will obviously look different.
00:04:09.916 --> 00:04:10.931
Appendix.
00:04:10.931 --> 00:04:15.267
So, here's what the depiction looks like
00:04:15.267 --> 00:04:18.505
at the Northeastern University website.
00:04:19.024 --> 00:04:22.474
I don't think it's quite as clear
as what we can do with Wikidata.
00:04:22.531 --> 00:04:27.351
And so, this was probably the reason why,
when I started as a visiting scholar
00:04:27.351 --> 00:04:31.751
in 2017, they asked if this is one
of the projects that I could work on.
00:04:31.751 --> 00:04:36.093
They stopped their work
the year before, in 2016.
00:04:36.093 --> 00:04:39.073
And I think they just don't have
the resources to continue.
00:04:40.251 --> 00:04:43.415
Some parts of this presentation
came from another
00:04:43.415 --> 00:04:45.812
that was published in 2016.
00:04:45.812 --> 00:04:49.401
And last but not least, here are links
00:04:49.401 --> 00:04:53.361
to the different parts
of the work that I'm doing.
00:04:54.257 --> 00:04:55.561
Thank you very much.
00:04:55.561 --> 00:04:56.845
Questions.
00:04:56.845 --> 00:04:58.754
(applause)
00:05:10.397 --> 00:05:14.665
(woman) So, when you have a work,
and you have the review of the work,
00:05:14.665 --> 00:05:17.703
are you looking
at a particular edition of the work,
00:05:17.703 --> 00:05:20.665
or are these all reviews
of first editions?
00:05:21.271 --> 00:05:22.861
It's a good question. No.
00:05:22.861 --> 00:05:25.601
They are not just reviews
of the first edition.
00:05:25.601 --> 00:05:28.601
Some are reviews of the second
or third edition.
00:05:30.062 --> 00:05:32.262
I'm going to add something
that maybe I should have said
00:05:32.262 --> 00:05:34.951
before I closed
and went to question and answers--
00:05:34.966 --> 00:05:36.800
what's so special about this?
00:05:37.220 --> 00:05:40.461
What's special is nobody else
has done this on Wikidata.
00:05:41.454 --> 00:05:45.580
Surely, there are other universities
that have their own collections,
00:05:45.580 --> 00:05:51.447
where their scholars have reviewed
the reviews of someone's work
00:05:51.800 --> 00:05:53.394
in some language.
00:05:54.491 --> 00:05:57.389
So, hopefully,
once this methodology gets--
00:05:58.000 --> 00:06:02.390
once I write this up and the project
is over and presented again,
00:06:02.390 --> 00:06:05.310
that there will be other
universities, other libraries
00:06:05.310 --> 00:06:07.923
that will speak up and say,
"We've got data sets, too,
00:06:08.248 --> 00:06:13.020
and we're going to go ahead
and upload them into Wikidata ourselves,"
00:06:13.020 --> 00:06:15.910
and then it'd be lovely
to start doing some comparisons.
00:06:19.572 --> 00:06:22.060
Anyone? Jane.
00:06:22.093 --> 00:06:23.767
(Jane) Do you actually have books?
00:06:24.293 --> 00:06:26.889
Do you actually have the books--
are the books in existence,
00:06:26.889 --> 00:06:28.860
or are you actually
doing metadata about books
00:06:28.860 --> 00:06:31.400
where we don't even know
where the books are?
00:06:31.780 --> 00:06:34.829
Northeastern University
actually has the book,
00:06:34.829 --> 00:06:37.209
or the essay, or the poem.
00:06:39.759 --> 00:06:45.392
And they have the critical review
of the book, or the essay, or the poem.
00:06:45.755 --> 00:06:48.820
And they're working
on the transcription of these,
00:06:48.820 --> 00:06:51.452
and they're not at 100% yet.
00:06:52.432 --> 00:06:56.256
They're not at 100%, but it's like,
all things working on it.
00:07:00.218 --> 00:07:02.043
Any other questions?
00:07:05.697 --> 00:07:07.399
(host) We're going to wrap it up there.
00:07:07.399 --> 00:07:09.063
Thanks for being such a nice audience.
00:07:09.063 --> 00:07:11.677
(applause)
00:07:14.012 --> 00:07:18.581
Lady bug for [inaudible].
00:08:58.271 --> 00:08:59.372
(man) Finally got that.
00:08:59.372 --> 00:09:02.565
What I'm going to do is I'm just going
to click on these to load.
00:09:02.565 --> 00:09:06.091
Just while-- is that new tab there?
00:09:06.946 --> 00:09:08.053
[inaudible]
00:09:08.053 --> 00:09:10.524
The first one? Yeah, perfect.
00:09:11.024 --> 00:09:13.503
Sorry, my German is not even rusty,
00:09:13.503 --> 00:09:15.251
it's simply non-existent.
00:09:15.663 --> 00:09:19.561
So, I'll just let them load,
because then these queries can run
00:09:19.561 --> 00:09:22.728
while I'm sort of introducing
what I was talking about and doing.
00:09:22.728 --> 00:09:24.795
So, hi, I'm Nav from Histropedia.
00:09:24.795 --> 00:09:28.169
And basically, for the last
quite a few years,
00:09:28.169 --> 00:09:29.710
we've been relatively quiet,
00:09:29.710 --> 00:09:32.423
while we've been sort of working
on technology and tools
00:09:32.423 --> 00:09:36.837
that we need to sort of develop,
ultimately, Histropedia version 2,
00:09:36.837 --> 00:09:39.433
which is going to be, you know,
this huge enhancement
00:09:39.433 --> 00:09:40.771
on the first version.
00:09:40.771 --> 00:09:43.270
Well, it's kind of in progress,
but as we do it,
00:09:43.270 --> 00:09:45.236
we've been experimenting
with these other tools,
00:09:45.236 --> 00:09:47.387
and building the technology
that we're going to need.
00:09:48.132 --> 00:09:51.781
One really crucial part for this
is the ability to sort of see
00:09:51.781 --> 00:09:55.085
the whole of history
from the billions of years time scale,
00:09:55.085 --> 00:09:58.602
to up to the current day,
00:09:58.602 --> 00:10:00.638
and zooming all the way into single days.
00:10:00.638 --> 00:10:03.433
And ultimately, in the end,
down to hours and minutes.
00:10:03.433 --> 00:10:06.517
We've managed to create
a [inaudible] of update to our engine.
00:10:06.517 --> 00:10:08.327
Other engines can already do this,
00:10:08.327 --> 00:10:11.122
but unfortunately, they also can't handle
the large data sets.
00:10:11.122 --> 00:10:13.269
So, we finally got this update
to our engine.
00:10:13.269 --> 00:10:15.392
It allows us to zoom to billions of years.
00:10:15.392 --> 00:10:19.533
So, recently-- the recently
finished update,
00:10:19.533 --> 00:10:22.333
and it's basically, it's an update
to our query viewer tool,
00:10:22.333 --> 00:10:24.482
which is like a live version
of Histropedia
00:10:24.482 --> 00:10:26.832
just linked straight to Wikidata.
00:10:26.832 --> 00:10:29.092
So, it's literally based on a query,
00:10:29.092 --> 00:10:31.372
a live query, and we see
the results of it.
00:10:31.372 --> 00:10:33.883
So, it's sort of separate
to our main tool.
00:10:33.883 --> 00:10:37.502
So, I'm going to flick to the first one,
which is my first experiment.
00:10:37.502 --> 00:10:39.716
And you'll forgive me, the queries--
00:10:39.716 --> 00:10:42.181
the code was kind of finished
not so long ago,
00:10:42.181 --> 00:10:44.736
and the queries, I've been trying
to find out what can I find
00:10:44.736 --> 00:10:47.692
and what's interesting
to look at, what's missing.
00:10:47.692 --> 00:10:52.154
So, I started off
with a kind of, sort of, well--
00:10:52.154 --> 00:10:54.241
So, that's not the right--
that's not Life on Earth.
00:10:54.241 --> 00:10:55.699
Is this Life on Earth?
00:10:56.123 --> 00:10:57.467
That will do, anyway.
00:10:57.467 --> 00:11:01.985
So, I started off just trying to look
at what sort of things
00:11:01.985 --> 00:11:04.657
are actually in Wikidata.
00:11:04.657 --> 00:11:07.407
And this particular one--
sorry, it's in reverse.
00:11:07.407 --> 00:11:09.829
So, this is the first one
I wanted to show you.
00:11:09.829 --> 00:11:12.485
So, this is a kind of
a life on Earth query
00:11:12.485 --> 00:11:14.457
that I wanted to develop.
00:11:14.457 --> 00:11:18.410
And basically, what it is
is all the taxons in Wikidata
00:11:18.410 --> 00:11:20.157
that have a date.
00:11:20.157 --> 00:11:23.726
And as you can probably see
from the panel, there is not many of them.
00:11:23.726 --> 00:11:25.784
But we do have the different taxon ranks.
00:11:25.784 --> 00:11:27.596
So, you know, is it a species, a class--
00:11:27.596 --> 00:11:29.725
for a biologist,
this makes a lot of sense.
00:11:29.725 --> 00:11:32.446
But if I was just to close that a bit,
00:11:32.596 --> 00:11:35.453
we can see, we are going back
to the earliest forms of life here.
00:11:35.453 --> 00:11:37.236
3.5 billion years ago.
00:11:37.236 --> 00:11:42.707
And as we zoom in here, we start to see
the more modern forms of life,
00:11:42.746 --> 00:11:47.232
and we see some really
interesting things developing,
00:11:47.232 --> 00:11:50.829
but we're still lacking a lot of data
in terms of this kind of time range.
00:11:52.250 --> 00:11:55.286
So, my next thought was,
"Okay, well, why aren't--"
00:11:55.592 --> 00:11:57.088
"I want to see a Tyrannosaurus Rex."
00:11:57.088 --> 00:11:59.838
That's what I really wanted to see
on my query, and it wasn't there.
00:11:59.838 --> 00:12:02.138
So, had a little dig in,
and I found out why.
00:12:02.234 --> 00:12:05.284
It's because they're much more
being stored
00:12:05.284 --> 00:12:08.696
in terms of the temporal range
or time period that they relate to.
00:12:09.065 --> 00:12:11.412
So, on comes the next query,
00:12:11.412 --> 00:12:13.144
where I actually sort of--
00:12:13.664 --> 00:12:17.641
basically, this query
is looking for any item
00:12:17.641 --> 00:12:22.284
that has a temporal range start,
and/or a temporal range end.
00:12:22.665 --> 00:12:25.965
Which is basically in the form--
in life forms, it kind of relates
00:12:25.965 --> 00:12:28.644
to when they emerged
and when they became extinct.
00:12:28.644 --> 00:12:31.044
So, these are the periods
on the side here.
00:12:31.585 --> 00:12:33.190
If I just close that a bit--
00:12:33.190 --> 00:12:37.364
you can see that we have
quite a lot of interesting stuff.
00:12:37.364 --> 00:12:39.834
And there's the Tyrannosaurus
that I was looking for.
00:12:39.834 --> 00:12:43.394
So, I finally got that,
and I was like, "Yes! I've done it!"
00:12:43.394 --> 00:12:46.084
I've got that Triceratops
in there for bonus.
00:12:46.084 --> 00:12:48.984
But of course, still loads missing.
00:12:48.984 --> 00:12:50.665
And I'd love to see lots more here.
00:12:50.665 --> 00:12:52.590
But at least, it gives you the idea.
00:12:52.590 --> 00:12:55.794
The nice thing is, here as well,
if I star some of these,
00:12:55.794 --> 00:12:58.374
you can see that
the time range is shown.
00:12:58.374 --> 00:13:01.027
So, you can start to do
what I really wanted to do, is say,
00:13:01.027 --> 00:13:04.004
"Okay, when did this one end,
and when did the next one begin?
00:13:04.004 --> 00:13:06.085
When did things start going extinct?"
00:13:06.085 --> 00:13:09.832
So, I was pretty excited, but, still,
really hoping for a lot more.
00:13:09.832 --> 00:13:11.619
So, there's a lot of editing to be done
00:13:11.619 --> 00:13:15.098
in terms of these large geological
and cosmic time scales.
00:13:15.909 --> 00:13:19.273
You can see on the color code,
I can also do extinction period.
00:13:19.273 --> 00:13:23.489
So, I say, I want to find out stuff
that went extinct in the late Cretaceous.
00:13:23.489 --> 00:13:25.768
And I now know that two things did that.
00:13:25.768 --> 00:13:27.717
There's obviously quite a few more.
00:13:27.717 --> 00:13:30.483
And I put the taxon rank
in there, as well,
00:13:30.483 --> 00:13:31.986
just so that we can also see,
00:13:31.986 --> 00:13:34.588
"Okay, which, what
is its species, genus, et cetera."
00:13:35.479 --> 00:13:37.143
So, pretty exciting.
00:13:37.143 --> 00:13:41.192
I was quite happy, but it's unfolding,
what needs to be done a lot.
00:13:42.126 --> 00:13:45.447
So I went to the next one, which was--
00:13:45.447 --> 00:13:48.045
I was thinking, "Well, I can't find
all the data I'm looking for.
00:13:48.045 --> 00:13:49.347
Let's go a bit more general,
00:13:49.347 --> 00:13:53.833
and just look for all of a certain kind
of dates in Wikidata that I can find
00:13:53.833 --> 00:13:57.240
that are over 10,000 years old, basically.
00:13:58.219 --> 00:14:00.703
And what type of thing are they?"
00:14:00.762 --> 00:14:04.298
So, this color code is relatively okay,
but it might be a bit misleading,
00:14:04.298 --> 00:14:06.264
because some things are multiple types.
00:14:06.264 --> 00:14:08.318
So, therefore,
it's a bit random, at times.
00:14:08.318 --> 00:14:11.468
But, you get some really
fascinating stuff in here.
00:14:11.468 --> 00:14:14.255
I've got for a start--
I've got all of the millennia
00:14:14.255 --> 00:14:18.238
that we have in Wikidata,
which is, you know, there you go.
00:14:18.238 --> 00:14:21.558
Read about everything that happened
in all these different millennia.
00:14:21.558 --> 00:14:23.629
No pictures for any
of these, unfortunately.
00:14:23.629 --> 00:14:26.670
So, there's nothing to really say
what happened in them.
00:14:26.670 --> 00:14:29.203
Taxon, which we were just looking at,
which kind of led me on
00:14:29.203 --> 00:14:31.124
to the other queries.
00:14:31.124 --> 00:14:34.079
And of course, that sort of
like all of them in one group.
00:14:34.079 --> 00:14:36.875
Interesting stuff.
Archaeological cultures.
00:14:36.875 --> 00:14:40.121
And this is like, okay,
this is more like up my street.
00:14:40.121 --> 00:14:42.670
This is the sort of things
I want to learn about.
00:14:42.670 --> 00:14:45.234
Again, pictures would be nice.
00:14:45.493 --> 00:14:48.781
But it's really showing you
something interesting.
00:14:48.781 --> 00:14:50.361
And it's just worth exploring here.
00:14:50.361 --> 00:14:52.534
And of course, there's some
that really make me excited
00:14:52.534 --> 00:14:54.048
for what we could be doing.
00:14:54.048 --> 00:14:57.288
For example, there was
something here which was--
00:14:58.028 --> 00:15:00.888
I mean, system, actually,
was quite an interesting one.
00:15:01.794 --> 00:15:04.237
And sorry, that's not actually
the one I was thinking about.
00:15:04.237 --> 00:15:05.958
In fact, that means nothing to me at all.
00:15:05.958 --> 00:15:07.613
Someone might know what that means.
00:15:08.057 --> 00:15:10.813
Art movements,
archaeological sites, activities.
00:15:10.813 --> 00:15:12.478
There was only two of these,
00:15:12.478 --> 00:15:15.788
but I really like the idea, because--
and they're both the same.
00:15:15.788 --> 00:15:17.658
They're both hunting.
00:15:17.730 --> 00:15:19.390
And of course, there's two of them.
00:15:19.390 --> 00:15:22.360
And the reason is, is because
there's a little qualifier on there.
00:15:22.360 --> 00:15:25.143
If we were to just
look through, we can see--
00:15:25.143 --> 00:15:27.735
we can see somewhere down here,
will be the start time.
00:15:27.735 --> 00:15:30.690
And the qualifier is talking about
when Homo erectus did it,
00:15:30.690 --> 00:15:32.735
and when Homo sapiens did it.
00:15:32.735 --> 00:15:35.513
So that should be
in brackets on the query,
00:15:35.513 --> 00:15:39.002
a little extension to do to show you
what the two different versions mean.
00:15:39.002 --> 00:15:42.390
But I would love to see
all of human skills in here.
00:15:42.390 --> 00:15:44.708
When did we first do farming,
when did we first this--
00:15:44.708 --> 00:15:46.010
when did fire come about?
00:15:46.010 --> 00:15:48.270
All of these things,
when did we first extract iron?
00:15:48.270 --> 00:15:50.355
When did we first--
all of these wonderful things
00:15:50.355 --> 00:15:53.607
that developed
to modern world that we live in.
00:15:53.607 --> 00:15:56.873
So, really exciting signs
of what could be there,
00:15:56.873 --> 00:15:58.112
if it all got populated.
00:15:58.112 --> 00:16:00.210
So, you know, this is what
we really need to work on,
00:16:00.210 --> 00:16:02.333
is some of this historical info.
00:16:03.243 --> 00:16:05.060
Last one, I just wanted to just show you,
00:16:05.060 --> 00:16:07.283
which was just an extra
bonus one I threw in,
00:16:07.283 --> 00:16:10.875
just to look at the time periods
that we actually have,
00:16:10.875 --> 00:16:13.921
the historical ages
that we have in Wikidata.
00:16:13.921 --> 00:16:17.524
And so, this is actually just all
sub-classes of unit of time.
00:16:17.524 --> 00:16:22.396
And then, this is the actual
instance that it was.
00:16:22.396 --> 00:16:23.775
And it's just really interesting.
00:16:23.775 --> 00:16:25.849
This is more the kind of thing--
00:16:26.979 --> 00:16:29.541
In Histropedia Mark II,
these are the kind of things
00:16:29.541 --> 00:16:31.944
that will actually will be displayed
more under the timeline
00:16:31.944 --> 00:16:33.984
as a sort of a range or period.
00:16:33.993 --> 00:16:36.436
And so, we are particularly interested
in these periods
00:16:36.436 --> 00:16:37.976
being really tight and nice,
00:16:37.976 --> 00:16:40.718
because it helps you to, then,
say what happened when,
00:16:40.718 --> 00:16:43.983
and you can sound really clever
when you talk about when things happened,
00:16:43.983 --> 00:16:47.263
in the Neolithic or the upper
Paleolithic, or whatever.
00:16:47.263 --> 00:16:49.121
I'm still pretty clueless on most of it,
00:16:49.121 --> 00:16:51.918
because I'm just kind of just waiting
for the data to be up to scratch.
00:16:51.918 --> 00:16:55.163
Great. I think I can actually
round it up there.
00:16:55.163 --> 00:16:57.145
Loads more exciting queries to come.
00:16:57.145 --> 00:17:00.420
A lot more features and cool stuff,
actually, just around the corner for us,
00:17:00.420 --> 00:17:02.758
because we've just finished
a lot of cool things.
00:17:02.758 --> 00:17:05.471
But there's a little bit of time
to pull it all together.
00:17:05.471 --> 00:17:07.373
So, look out for more.
00:17:07.373 --> 00:17:09.760
If there's any questions,
I think I've got one minute.
00:17:09.760 --> 00:17:11.458
So, it would have to be one.
00:17:11.510 --> 00:17:13.253
(host) Yes, Nav.
I forgot to introduce you.
00:17:13.253 --> 00:17:16.933
I'm sorry. That's Nav, as he said,
Histropedia, Evans. Thank you very much.
00:17:16.933 --> 00:17:17.986
Thank you. Cheers. Yeah.
00:17:17.986 --> 00:17:19.450
(host) Very fast questions.
00:17:19.450 --> 00:17:21.815
Anyone with a very fast question
[inaudible].
00:17:24.654 --> 00:17:29.230
(woman 2) Very quickly, how can
I do my own, if I want languages,
00:17:29.230 --> 00:17:30.818
when do we start, for instance.
00:17:30.818 --> 00:17:32.031
Absolutely. Good question.
00:17:32.031 --> 00:17:34.320
So just click on the--
oh, I've shared this.
00:17:34.320 --> 00:17:36.853
It's called cosmic timelines on the URL.
00:17:36.853 --> 00:17:40.911
Should be cosmic and geological,
but then it's not a short URL anymore.
00:17:40.911 --> 00:17:43.711
So, you click on this icon
in the top corner there,
00:17:43.711 --> 00:17:47.431
and then, you get to the query page,
which is like the home page of this tool.
00:17:47.431 --> 00:17:49.311
This is where the query is pasted in.
00:17:49.311 --> 00:17:51.491
So, at the moment,
I've got the language there.
00:17:51.491 --> 00:17:53.483
If I want to change it to something else,
00:17:53.483 --> 00:17:56.062
Arabic, or French, or whatever--
00:17:56.062 --> 00:17:58.271
and here are the-- this is the area
00:17:58.271 --> 00:18:03.092
where you sort of enter in exactly
which variables in your query
00:18:03.092 --> 00:18:04.600
you would like to do each thing.
00:18:04.600 --> 00:18:06.781
If you put nothing in,
it will try and figure it out.
00:18:06.781 --> 00:18:09.971
But if you want advanced stuff--
and really important, is the precision,
00:18:09.971 --> 00:18:13.033
because that's not available
on the query service timeline.
00:18:13.033 --> 00:18:14.123
So, you get everything--
00:18:14.123 --> 00:18:16.303
is the first of January
10 billion years ago,
00:18:16.303 --> 00:18:18.363
you know, which is not
what we want to see.
00:18:18.363 --> 00:18:20.603
And the rank, which is quite interesting.
00:18:20.603 --> 00:18:24.173
My timelines are all based
on a very simple rank of site link count,
00:18:24.173 --> 00:18:27.058
how many different articles there are,
or something else.
00:18:27.058 --> 00:18:29.432
But that's how you go
and mess around with it with yourself,
00:18:29.432 --> 00:18:32.034
and you put your color codes
and your filters in down here.
00:18:32.034 --> 00:18:34.098
Comma separate them,
if you would like more,
00:18:34.098 --> 00:18:36.007
and they come up as options
in the final tool.
00:18:36.007 --> 00:18:37.836
And I think that
pretty much is it, isn't it.
00:18:37.836 --> 00:18:39.863
So, any other questions,
do find me afterwards.
00:18:39.863 --> 00:18:41.655
Always happy to get cornered
for this stuff.
00:18:41.655 --> 00:18:42.954
I love talking about it.
00:18:42.954 --> 00:18:44.989
Okay. So, thank you very much. Cheers.
00:18:44.989 --> 00:18:46.948
(applause)
00:19:28.344 --> 00:19:30.220
(mumbles)
00:19:30.265 --> 00:19:32.115
So, where is the first one?
00:19:33.854 --> 00:19:35.397
This one, no.
00:19:45.636 --> 00:19:47.132
This? Sorry.
00:19:48.270 --> 00:19:50.090
Is it full screen?
00:19:50.217 --> 00:19:52.129
Yep. Full screen.
00:19:54.747 --> 00:19:56.289
Well, good work.
00:19:58.388 --> 00:19:59.434
[Strike.]
00:19:59.497 --> 00:20:02.312
Yeah, so, okay. Thank you.
00:20:04.752 --> 00:20:07.062
So, hi, I'm Thibaud Senalada.
00:20:07.062 --> 00:20:08.952
As [inaudible] introduced me.
00:20:09.552 --> 00:20:14.212
I'm a software engineer
at the French National Library.
00:20:14.992 --> 00:20:18.349
And I'm here today
to talk to you about NOEMI,
00:20:18.979 --> 00:20:23.682
which is a software, a proof of concept,
00:20:23.682 --> 00:20:26.501
and a [inaudible] software
00:20:26.635 --> 00:20:29.961
to the French Library to cataloging.
00:20:30.787 --> 00:20:32.870
Sorry. [inaudible].
00:20:32.870 --> 00:20:35.359
Sorry for my English. It's a bit of fuzzy.
00:20:36.971 --> 00:20:39.321
And so, what's NOEMI?
00:20:39.321 --> 00:20:41.589
So, NOEMI stands for:
00:20:41.589 --> 00:20:44.591
Nouer les oeuvres, expressions,
Manifestations et Items.
00:20:44.591 --> 00:20:46.533
Which, in English, is:
00:20:46.533 --> 00:20:49.891
to link work, expression,
manifestation, and items.
00:20:51.086 --> 00:20:58.057
It's based on the FRBR,
00:20:58.057 --> 00:21:00.633
and [inaudible].
00:21:00.881 --> 00:21:03.105
Yeah. Anyway.
00:21:03.631 --> 00:21:04.839
So, yeah.
00:21:05.244 --> 00:21:09.540
So, this software,
we use to produce metadata.
00:21:10.841 --> 00:21:12.201
It will be used
00:21:12.201 --> 00:21:17.831
by 600 people on a daily basis.
00:21:18.911 --> 00:21:24.271
And as I say in the title,
it will be based on Wikibase.
00:21:25.415 --> 00:21:31.871
So, there is also a format manager.
00:21:32.388 --> 00:21:39.138
So, people using this software
will use like a code editor,
00:21:39.254 --> 00:21:41.817
but for MARC format.
00:21:41.968 --> 00:21:45.178
So, it's [inaudible], things like that.
00:21:46.814 --> 00:21:49.868
A data processing tool, like I said.
00:21:49.959 --> 00:21:53.040
And also, authorization management,
00:21:54.327 --> 00:21:56.378
because they will need a--
00:21:57.337 --> 00:22:01.417
if there is some data,
where it can be modified.
00:22:05.877 --> 00:22:07.840
So, the PoC context.
00:22:08.728 --> 00:22:12.738
So, this software will be replacing
an old software,
00:22:12.855 --> 00:22:15.688
called ADCAT02.
00:22:17.111 --> 00:22:20.964
It is part of the bibliographic
transition.
00:22:20.984 --> 00:22:24.554
So, I say the [inaudible].
00:22:25.359 --> 00:22:29.390
[inaudible]. [inaudible] in English?
00:22:30.254 --> 00:22:31.662
Format.
00:22:32.717 --> 00:22:35.734
And it will be the [inaudible] of the--
00:22:39.979 --> 00:22:41.090
Sorry.
00:22:42.349 --> 00:22:46.560
It will be [inaudible]
all the [inaudible]
00:22:46.560 --> 00:22:49.689
of the BnF with data.
00:22:51.731 --> 00:22:54.124
And so, doing this work,
00:22:54.124 --> 00:22:59.693
we accessed Wikibase to see
if it fits our needs.
00:23:01.244 --> 00:23:03.383
And [inaudible] pretty good.
00:23:04.485 --> 00:23:06.930
So, why Wikibase?
00:23:06.930 --> 00:23:08.821
Because of the flexibility of the format.
00:23:08.835 --> 00:23:11.646
We arrive--
00:23:11.850 --> 00:23:16.388
to inject MARC, INTERMARC for BnF--
00:23:16.960 --> 00:23:18.350
in the database.
00:23:18.399 --> 00:23:22.803
And use it to-- use this link management
00:23:22.803 --> 00:23:25.529
between entities using Blazegraph,
00:23:25.529 --> 00:23:27.776
so, as Wikibase does.
00:23:29.155 --> 00:23:32.700
We also choose Wikibase,
because it was already--
00:23:35.183 --> 00:23:38.900
it handles history and user account.
00:23:39.941 --> 00:23:42.414
So, it's easiest for us.
00:23:43.106 --> 00:23:48.270
And it also has a good--
it's pretty easy to create bots
00:23:48.270 --> 00:23:51.090
to watch and curate data
00:23:51.840 --> 00:23:53.430
and also to make statistics.
00:23:54.820 --> 00:23:57.170
It's free and open, and sustainable.
00:23:57.908 --> 00:23:59.084
Yeah, so.
00:23:59.610 --> 00:24:02.519
I'm sorry if you don't
understand what I say,
00:24:02.519 --> 00:24:04.839
because I know my English
is not that good.
00:24:07.720 --> 00:24:12.139
But during this PoC,
we encountered some trouble.
00:24:12.802 --> 00:24:13.938
Okay.
00:24:14.790 --> 00:24:21.117
First of all, as a search engine,
I think we have to create
00:24:21.117 --> 00:24:24.150
another--
00:24:24.185 --> 00:24:28.988
not another, a supplementary
search engine to use it with,
00:24:29.433 --> 00:24:31.120
to fit our needs.
00:24:31.688 --> 00:24:37.155
Because we need some search
00:24:37.155 --> 00:24:42.366
like faceted search and filters.
00:24:43.755 --> 00:24:47.525
Also we have the [inaudible],
00:24:47.525 --> 00:24:50.407
of using postgreSQL database.
00:24:50.407 --> 00:24:54.885
And for the moment,
I think Wikibase [inaudible].
00:24:56.436 --> 00:25:01.266
And when we try to use postgreSQL,
it was a bit difficult,
00:25:01.266 --> 00:25:04.394
and will cause some issues.
00:25:05.662 --> 00:25:08.825
And we have also some fear
about performance,
00:25:08.825 --> 00:25:15.238
because the catalog is about
20 million entities,
00:25:16.366 --> 00:25:19.146
20 million bibliographic entities.
00:25:19.146 --> 00:25:22.851
That can be more
than 20 million entities, actually.
00:25:23.276 --> 00:25:27.771
And we don't know the time
that we'll have to inject them
00:25:27.809 --> 00:25:30.765
in the Wikibase, and how to do it.
00:25:32.198 --> 00:25:34.267
So, [inaudible],
00:25:34.324 --> 00:25:39.616
but the real software development
has already started.
00:25:43.242 --> 00:25:46.175
We start by creating
an interface with Wikibase.
00:25:46.261 --> 00:25:47.711
We're using Java.
00:25:48.091 --> 00:25:50.093
Like PyWikibase.
00:25:51.691 --> 00:25:54.888
- (man) Pywikibot.
- Pywikibot. Yeah, thank you.
00:25:56.027 --> 00:25:57.723
The same way, but in Java.
00:25:59.309 --> 00:26:02.831
We also inject already the format
into the Wikibase.
00:26:03.540 --> 00:26:09.093
And we do something
like the INTERMARC editor,
00:26:09.458 --> 00:26:12.134
[inaudible], et cetera.
00:26:13.672 --> 00:26:14.926
Thank you.
00:26:15.333 --> 00:26:17.135
(applause)
00:26:23.527 --> 00:26:24.749
Yeah.
00:26:27.748 --> 00:26:29.813
(man 2) Faceted search
will be a nice feature
00:26:29.813 --> 00:26:31.885
in the Wikidata UI itself.
00:26:31.924 --> 00:26:34.062
So, have you talked
to any of the developers,
00:26:34.062 --> 00:26:35.675
or is that something
that could be done?
00:26:35.711 --> 00:26:37.108
Sorry, I don't understand.
00:26:37.108 --> 00:26:39.041
(man 2) The faceted search idea.
00:26:39.911 --> 00:26:41.982
It would be nice to be able
to search only humans,
00:26:41.982 --> 00:26:44.221
or search only works, or something, right?
00:26:44.321 --> 00:26:47.991
Yeah. I'm sorry, I don't-- I don't--
00:26:48.131 --> 00:26:50.436
(man 2) Yeah, I mean, so,
it would be nice if we had that
00:26:50.436 --> 00:26:52.265
in Wikidata itself in the UI.
00:26:52.822 --> 00:26:53.954
Yeah, yeah, yeah.
00:26:54.088 --> 00:26:56.077
[inaudible]
00:26:56.077 --> 00:26:57.911
Yeah, okay, thank you.
00:26:57.911 --> 00:27:00.026
I'm sorry. (laughs)
00:27:01.186 --> 00:27:03.902
Yeah, yeah. But I think we will--
00:27:04.506 --> 00:27:07.266
I don't know if we want
to do it inside Wikibase,
00:27:07.266 --> 00:27:10.746
or in our next systems.
00:27:10.785 --> 00:27:15.186
For the moment,
we don't really solve that.
00:27:15.965 --> 00:27:17.885
For the moment, I think.
00:27:17.885 --> 00:27:19.285
Sorry.
00:27:27.645 --> 00:27:30.644
(man 3) I suppose on the topic
of the faceted search,
00:27:32.535 --> 00:27:35.068
Wikidata, SPARQL Query, Wikibase--
00:27:35.068 --> 00:27:38.965
SPARQL Query is I think,
functionally equivalent
00:27:38.965 --> 00:27:41.405
to a facetable search.
00:27:42.105 --> 00:27:44.234
So, it's mostly an interface issue, right?
00:27:44.284 --> 00:27:47.791
I mean, you could build an interface
that starts with a query,
00:27:47.791 --> 00:27:51.111
and then, gives you
possible facets to filter by.
00:27:51.370 --> 00:27:52.660
And when you click one of them,
00:27:52.660 --> 00:27:55.217
it adds a condition
to the SPARQL Query, right?
00:27:55.664 --> 00:27:58.183
Yeah, but I think the SPARQL--
00:27:59.157 --> 00:28:04.310
they don't go as detailed
as we want, as we have--
00:28:05.632 --> 00:28:09.631
When we inject the format,
we use a statement for--
00:28:10.525 --> 00:28:13.124
the format is like XML.
00:28:13.223 --> 00:28:15.842
So, it's a zone, subzone, and value.
00:28:16.413 --> 00:28:20.292
And in the [inaudible] statement,
we add the subzone,
00:28:20.892 --> 00:28:22.902
because the zone was already there.
00:28:23.002 --> 00:28:28.565
And we want to query
some qualifier on this.
00:28:28.659 --> 00:28:35.206
And I don't know if the SPARQL
goes through that-- I'm sorry--
00:28:36.145 --> 00:28:38.277
in a fast way.
00:28:40.025 --> 00:28:46.285
I think we need some index
for us to [inaudible].
00:28:46.925 --> 00:28:48.145
Yeah.
00:28:48.145 --> 00:28:50.250
(man 3) SPARQL doesn't do a query--
00:28:52.321 --> 00:28:55.703
To do proper string searches
in SPARQL is very hard.
00:28:55.703 --> 00:28:57.610
You have to have filters, which are slow,
00:28:57.610 --> 00:28:59.815
and it really doesn't work that well.
00:28:59.815 --> 00:29:02.845
So, it's a different
search problem, really.
00:29:06.871 --> 00:29:09.270
More question? If anyone has one?
00:29:12.215 --> 00:29:13.999
- Great. Thank you.
- Thank you.
00:29:14.044 --> 00:29:15.895
(applause)
00:29:37.766 --> 00:29:41.960
(host) Nielsen speaking about
the tool Ordia. Thank you.
00:30:05.084 --> 00:30:06.460
So, I'm Finn Årup Nielsen,
00:30:06.460 --> 00:30:09.006
and a couple of years ago,
I started Scholia
00:30:09.006 --> 00:30:14.611
that displays data from Wikidata
via a SPARQL Query
00:30:14.611 --> 00:30:16.359
to the Wikidata Query Service
00:30:16.359 --> 00:30:18.959
so we can generate, for example,
a list of publications
00:30:18.959 --> 00:30:20.380
for a specific author.
00:30:20.866 --> 00:30:26.941
Now, last year, Wikidata
introduced lexicographic data.
00:30:29.332 --> 00:30:32.655
And I [inaudible] the idea of Scholia
00:30:32.655 --> 00:30:39.279
that is using Wikidata
and the Wikidata Query Service
00:30:39.445 --> 00:30:42.036
to generate overviews
of lexicographic data.
00:30:42.585 --> 00:30:46.125
So, Ordia is the example of this one here.
00:30:46.197 --> 00:30:51.998
So, it generates-- it's a web application
run from the Toolforge service,
00:30:51.998 --> 00:30:57.198
and for example, it will dynamically
generate a page such as--
00:30:57.234 --> 00:31:01.768
This one here is statistics over
what there is of lexicographic data
00:31:01.768 --> 00:31:03.841
in Wikidata.
00:31:03.992 --> 00:31:07.404
For example, the number of lexemes,
is currently over 200,000.
00:31:08.664 --> 00:31:10.483
So, there's a range of things
you can do here.
00:31:10.483 --> 00:31:12.916
You can, for example,
look in the aspects of that.
00:31:12.916 --> 00:31:15.560
The menu, there's quite a lot
of things here.
00:31:15.560 --> 00:31:18.485
And so, I will search
on a specific Danish lexemes.
00:31:19.503 --> 00:31:22.835
"Rød"-- which is "red" in Danish.
00:31:23.376 --> 00:31:27.466
So, you basically get,
for the specific lexeme,
00:31:28.286 --> 00:31:30.618
the same type of information
that you could see
00:31:30.618 --> 00:31:33.751
in the ordinary part of Wikidata, here.
00:31:34.451 --> 00:31:38.256
Annotations about the lexeme,
annotation about the forms,
00:31:39.359 --> 00:31:40.872
single or plural forms.
00:31:41.548 --> 00:31:43.501
Annotation about the sentence.
00:31:44.683 --> 00:31:47.678
But what you can't see
in ordinary Wikidata
00:31:47.678 --> 00:31:52.150
is sort of aggregating across lexemes.
00:31:52.246 --> 00:31:54.207
And this is, for example, down here--
00:31:54.207 --> 00:31:55.902
down here with the compound.
00:31:55.902 --> 00:31:57.764
So, in Danish, like in German,
00:31:57.764 --> 00:31:59.950
words can be compounded.
00:31:59.950 --> 00:32:03.478
For example, for "red",
we have rødkælk
00:32:03.478 --> 00:32:05.830
which is compounded by two words.
00:32:06.721 --> 00:32:10.085
And we've got, on the second one here,
rødvin-- red wine.
00:32:11.060 --> 00:32:15.691
This list here is constructed
by a SPARQL Query to the Wikidata Service.
00:32:16.751 --> 00:32:20.406
And also, further down here,
we've got a lot of Danish words here.
00:32:20.970 --> 00:32:26.122
Further down here, we should have
a graph of the words
00:32:27.426 --> 00:32:29.164
which are compounded from rød.
00:32:29.658 --> 00:32:31.980
We have [rød]-- red here in the middle.
00:32:31.980 --> 00:32:34.372
And for example, around--
somewhere around here,
00:32:34.372 --> 00:32:36.895
which should have,
for example, "red cabbage,"
00:32:36.936 --> 00:32:40.343
"red cabbage salad,"
"red cabbage soup," and so on.
00:32:40.434 --> 00:32:43.055
So you can browse around,
in this one here, and see it.
00:32:44.204 --> 00:32:51.188
We can go a bit back here,
and then look on the main sense
00:32:51.388 --> 00:32:55.030
of the word rød-- red in Danish.
00:32:55.550 --> 00:33:01.610
So, Ordia automatically generates
information about hyponyms.
00:33:02.570 --> 00:33:04.400
Subconcepts, for example,
00:33:04.400 --> 00:33:07.400
light red, dark red,
pink, purple, and so on,
00:33:07.525 --> 00:33:14.272
are in the-- when we make
a Wikidata Query service, SPARQL Query.
00:33:14.576 --> 00:33:20.570
Then we go around in the Wikidata graph,
00:33:20.626 --> 00:33:22.266
and get this information here.
00:33:22.266 --> 00:33:24.786
And we can also get translation
automatically,
00:33:24.786 --> 00:33:28.316
even though it's not necessarily stated
within the Wikidata lexemes items.
00:33:28.316 --> 00:33:32.679
For example, here, we have translated
rød to "red" in English,
00:33:32.679 --> 00:33:36.089
and röd in Swedish, and so on.
00:33:36.107 --> 00:33:38.191
There's not that very many there.
00:33:38.747 --> 00:33:40.262
There's a range of other things here.
00:33:40.262 --> 00:33:43.487
Let me show you,
for example, this one here--
00:33:44.387 --> 00:33:51.308
this is veninde- now I go
over to this one here.
00:33:54.308 --> 00:33:57.328
-inde, which is a feminine suffix.
00:33:58.058 --> 00:34:00.498
So, this is auto-generated there,
00:34:00.498 --> 00:34:02.641
it's a combination of "instance of"--
00:34:03.268 --> 00:34:07.171
lexemes that are "instance of"
feminine suffixes.
00:34:08.142 --> 00:34:11.519
And for example, for German,
we have [inaudible].
00:34:11.519 --> 00:34:15.373
So, -in would be
a feminine suffix in German.
00:34:15.704 --> 00:34:21.291
And I put in sort of the five Danish
feminine suffixes
00:34:22.571 --> 00:34:24.206
of Danish.
00:34:25.480 --> 00:34:29.106
Another facility is, for example,
if you have a text,
00:34:29.106 --> 00:34:34.021
you can copy and paste it
into this Text to lexemes here.
00:34:34.571 --> 00:34:35.911
Let me--
00:34:37.482 --> 00:34:41.218
"a car crashed into...
00:34:41.864 --> 00:34:44.141
a green house."
00:34:46.485 --> 00:34:48.701
Let me change that to "English".
00:34:49.006 --> 00:34:50.029
Press Submit.
00:34:50.029 --> 00:34:53.355
Now, Ordia will then extract
each of the word here,
00:34:53.355 --> 00:34:54.733
in this sentence here,
00:34:54.733 --> 00:34:58.217
and try to see whether they
are entered in the specific form,
00:34:58.217 --> 00:35:00.778
a lexeme, are entered in Wikidata.
00:35:00.778 --> 00:35:04.228
And these simple words here
are entered in Wikidata.
00:35:04.228 --> 00:35:09.190
But if we, for example, change it to--
there's nothing called "vancar"
00:35:09.190 --> 00:35:13.998
but just let us do that here.
00:35:14.535 --> 00:35:19.532
And you got down here--
it's as a blue link
00:35:20.335 --> 00:35:23.295
that you can create a new
Wikidata lexeme item.
00:35:24.556 --> 00:35:29.097
But the range of other things to explore
00:35:29.716 --> 00:35:31.496
in this web application.
00:35:31.496 --> 00:35:35.596
And if there's any suggestions,
or comments, or notes, or something,
00:35:35.596 --> 00:35:39.337
you can contact me, or put in
an issue on GitHub.
00:35:39.337 --> 00:35:44.856
So, this particular application
is developed on GitHub,
00:35:44.856 --> 00:35:50.526
and I'm open for new ideas
and ways to represent information there.
00:35:51.306 --> 00:35:52.701
Okay, thank you.
00:35:52.701 --> 00:35:54.661
(applause)
00:35:59.328 --> 00:36:00.906
Questions?
00:36:03.262 --> 00:36:04.524
(woman 3) I love your tool.
00:36:04.524 --> 00:36:09.752
Can you show the languages,
that which is awesome for me, I think,
00:36:09.752 --> 00:36:11.731
to show other languages.
00:36:12.183 --> 00:36:14.537
So, this is a bit of statistics
over the languages,
00:36:14.537 --> 00:36:17.046
and the Russians
have been scraping Wictionary,
00:36:17.046 --> 00:36:20.327
and that's why they have now
100,000 lexemes.
00:36:24.387 --> 00:36:28.088
There's also a lot of work on Basque here.
00:36:29.566 --> 00:36:32.241
I think there's an organization
putting that information in here.
00:36:32.241 --> 00:36:34.932
And you can also see a graph of these--
00:36:34.932 --> 00:36:37.662
this is Number of forms as functions
of number of lexemes.
00:36:38.798 --> 00:36:41.279
And all the way up here--
00:36:41.279 --> 00:36:45.255
here, this is Russian,
down here, Basque, I think.
00:36:45.476 --> 00:36:47.997
And English, perhaps, down here.
00:36:48.953 --> 00:36:50.692
And also in the Number of senses,
00:36:52.473 --> 00:36:58.360
I think Basque, English, and Russian,
00:37:00.184 --> 00:37:02.048
Hebrew, and so on.
00:37:02.048 --> 00:37:03.343
Yeah.
00:37:11.045 --> 00:37:12.950
(man 4) That looks
like an incredible tool.
00:37:12.950 --> 00:37:15.097
But I was just wondering,
is it all fully live?
00:37:15.097 --> 00:37:18.344
Is it all based on SPARQL Queries
and live or are there some things--
00:37:18.344 --> 00:37:20.458
- Yes. I believe, yes.
- Fantastic.
00:37:20.511 --> 00:37:24.961
But as they get more data into Wikidata,
00:37:24.961 --> 00:37:26.100
there's a bit of an issue.
00:37:26.100 --> 00:37:27.328
For example, for Russian here.
00:37:27.328 --> 00:37:31.966
I started out this a year ago
when there's not that very many lexemes,
00:37:32.061 --> 00:37:35.503
and so there was no problems
with the time-outs.
00:37:35.503 --> 00:37:38.367
But representing it here--
00:37:38.367 --> 00:37:42.268
but if I press Russian,
I think there might be some issues.
00:37:42.268 --> 00:37:44.284
There's a count that works here,
00:37:44.284 --> 00:37:46.101
for example, longest words or phrases.
00:37:46.101 --> 00:37:49.252
But I think the lexemes
are sort of loading in.
00:37:49.252 --> 00:37:52.727
I think I'll need to fix that
as Wikidata grows here.
00:37:53.258 --> 00:37:55.927
As you see, there's a lot
of Russian nouns, apparently.
00:37:56.699 --> 00:37:58.451
And I don't know whether the--
00:37:59.351 --> 00:38:01.519
apparently, that's what
they're working on.
00:38:01.573 --> 00:38:03.960
There seems also to be
a bit of time-out there.
00:38:06.705 --> 00:38:08.033
[inaudible], oh, yes.
00:38:08.115 --> 00:38:09.984
The first one there.
00:38:10.832 --> 00:38:16.110
But apparently, the longest words
and phrases is a bit too expansive.
00:38:17.931 --> 00:38:20.334
But apparently, it can be loaded there,
and it's probably--
00:38:21.318 --> 00:38:23.167
it's loaded all the 100,000 there,
00:38:23.167 --> 00:38:27.938
so you can click all 10,000 pages.
00:38:36.748 --> 00:38:38.678
(host) If there aren't
any other questions--
00:38:39.564 --> 00:38:40.950
The longest word came now.
00:38:40.950 --> 00:38:43.146
So, it's, yeah.
00:38:44.972 --> 00:38:46.390
Probably--
00:38:47.855 --> 00:38:49.975
[inaudible]
00:38:50.321 --> 00:38:51.540
What is that?
00:38:51.540 --> 00:38:53.518
- (audience) It's a chemical.
- A chemical, yes.
00:38:56.317 --> 00:38:58.303
(host) More questions? Or shall we?
00:38:59.792 --> 00:39:02.332
Alright, alright. Thank you very much.
00:39:02.332 --> 00:39:04.392
(applause)
00:39:23.642 --> 00:39:25.121
(Nicolas) Is it good?
00:39:31.008 --> 00:39:32.346
(host) Awesome.
00:39:34.920 --> 00:39:38.137
Alright, now, to wrap it up,
we have Nicolas Vigneron,
00:39:38.137 --> 00:39:40.778
talking about Wikisource and Wikidata.
00:39:41.469 --> 00:39:42.804
(Nicolas) This is good?
00:39:44.542 --> 00:39:46.126
Who knows Wikisource?
00:39:47.582 --> 00:39:48.959
Yay!
00:39:50.740 --> 00:39:53.582
More and more people
raising hands every year.
00:39:53.582 --> 00:39:54.957
That's good.
00:39:55.282 --> 00:40:01.462
So, this morning, [Lydia] said that
Wikivoyage was the first real user of--
00:40:03.306 --> 00:40:05.987
[inaudible]
00:40:06.572 --> 00:40:08.347
Wikisource is not that far behind.
00:40:09.230 --> 00:40:13.280
There's a lot to do,
and I want to do some basic numbers,
00:40:13.280 --> 00:40:16.964
statistics, about where we are,
and where I want to go.
00:40:17.613 --> 00:40:23.409
So first, there will be a lot of questions
of what is a book,
00:40:23.409 --> 00:40:25.389
what is bibliographical data.
00:40:25.389 --> 00:40:27.229
People from the BnF can agree with me.
00:40:27.229 --> 00:40:29.969
That can be a nightmare
if you go into details.
00:40:30.164 --> 00:40:35.803
But some big numbers that--
Google Books tried to do an estimation
00:40:35.803 --> 00:40:39.676
on how many "books," air quote books,
there is in the world,
00:40:39.676 --> 00:40:43.005
and there's 130 million books
in the world.
00:40:43.705 --> 00:40:47.279
And, yeah, let's put them all on Wikidata.
00:40:47.650 --> 00:40:49.300
Or not. I don't know.
00:40:49.392 --> 00:40:51.049
But where are we now?
00:40:51.413 --> 00:40:52.468
And why is it books?
00:40:52.468 --> 00:40:55.668
Because for Google Books,
everything is scanned, basically.
00:40:55.795 --> 00:40:58.670
They don't have exactly
a very clear distinction.
00:40:59.400 --> 00:41:04.350
There's sometimes, two-page books,
which [inaudible], Google Books is a book.
00:41:04.714 --> 00:41:10.131
But for many people, you have to have
at least 50 pages to be a book.
00:41:10.536 --> 00:41:12.321
So, that's always hard to count.
00:41:12.885 --> 00:41:15.603
But here's what we know on Wikidata.
00:41:15.603 --> 00:41:18.704
This the graph of what
is a book for Wikidata.
00:41:18.704 --> 00:41:21.524
You have-- that's totally [inaudible]--
00:41:21.524 --> 00:41:23.979
but that's Wikidata,
literary work as well.
00:41:23.979 --> 00:41:27.194
And this is all the subclasses,
or subclasses of subclasses--
00:41:27.194 --> 00:41:30.334
or subclasses of subclasses
of what is a book.
00:41:30.804 --> 00:41:32.705
So, that's very hard to do.
00:41:32.737 --> 00:41:34.253
I can do a graph like that,
00:41:34.253 --> 00:41:36.833
but SPARQL Query engine doesn't work
00:41:36.833 --> 00:41:41.523
if I want to count everything
that is instance of these subclasses,
00:41:41.523 --> 00:41:45.143
and basically, SPARQL says no, time-out.
00:41:45.633 --> 00:41:47.020
So, what's the problem?
00:41:47.020 --> 00:41:50.713
But I know already that there's
a lot of subclasses,
00:41:50.713 --> 00:41:52.153
but we need to look into it.
00:41:52.153 --> 00:41:57.943
And probably, if you know Wikidata,
on the page, Wikidata point statistics,
00:41:58.643 --> 00:42:02.647
you have all the numbers by big classes,
00:42:02.647 --> 00:42:07.047
and you all probably know
that the big chunk here
00:42:07.047 --> 00:42:08.642
is scholarly articles,
00:42:08.707 --> 00:42:12.749
which is, thanks to
the WikiCite project, in particular,
00:42:14.113 --> 00:42:17.125
which can be books or not,
depending on definition.
00:42:19.062 --> 00:42:22.508
You see that there's no subclass books,
00:42:23.032 --> 00:42:26.034
because there's not enough to show.
00:42:26.049 --> 00:42:28.472
It's probably somewhere in the others,
00:42:28.472 --> 00:42:30.127
the purple area is others.
00:42:30.163 --> 00:42:34.115
And there's a lot of things
that's under one percent.
00:42:34.162 --> 00:42:38.821
So, basically, we can say
that we have less one percent
00:42:38.821 --> 00:42:42.131
of things identified as books in Wikidata.
00:42:42.551 --> 00:42:46.091
Maybe there is more books,
but not identified as such.
00:42:47.842 --> 00:42:49.284
I'm talking about books,
00:42:49.383 --> 00:42:51.768
but when we are talking
about bibliographical data,
00:42:51.768 --> 00:42:53.920
there's also the author, person,
00:42:53.920 --> 00:42:58.472
so maybe some of the human here
are also authors, surely.
00:43:00.068 --> 00:43:03.221
And we need to do another count,
which is another big query to do.
00:43:03.602 --> 00:43:05.301
That times out, so--
00:43:05.396 --> 00:43:08.015
I have a lot of not number
to this, sorry.
00:43:10.619 --> 00:43:14.332
So, yeah, basically, this first slide
is about how it's complicated
00:43:14.332 --> 00:43:19.122
to know how much we have of what,
and how to count them.
00:43:19.445 --> 00:43:21.091
So, yeah, hard to count.
00:43:21.618 --> 00:43:23.280
What we know--
00:43:24.133 --> 00:43:26.618
that is we have a lot of properties--
00:43:27.185 --> 00:43:29.684
700,000, I guess,
00:43:30.208 --> 00:43:31.680
now on Wikidata.
00:43:32.593 --> 00:43:35.952
We know that we have a lot of identifiers
among these properties.
00:43:36.721 --> 00:43:42.538
And we know that almost 4,000
are properties for identifiers
00:43:43.146 --> 00:43:45.623
relative to bibliographical,
00:43:45.737 --> 00:43:49.862
like ID at the National Library of France,
00:43:49.862 --> 00:43:52.251
National Library of Yaddi, Yaddi, Yada,
00:43:52.251 --> 00:43:56.681
because we love identifier
of National Library on Wikidata.
00:43:56.681 --> 00:44:00.271
So, we have almost all libraries,
national libraries and more.
00:44:01.101 --> 00:44:03.796
So, we have a lot of properties.
I know that.
00:44:05.071 --> 00:44:06.727
And we are widely used.
00:44:06.834 --> 00:44:10.053
I know that, for instance,
BnF properties use--
00:44:10.579 --> 00:44:12.772
BnF is National Library of France--
00:44:12.772 --> 00:44:18.989
is used 1 million times--
OCOC, VIAF, or the big like that.
00:44:21.001 --> 00:44:24.202
A lot of uses in Wikidata.
00:44:25.426 --> 00:44:28.980
But it's not because we have
a lot of uses of various properties
00:44:28.980 --> 00:44:30.666
in Wikidata that it's complete.
00:44:31.266 --> 00:44:33.758
As Thibaud said, there's more
than 20 million books,
00:44:33.758 --> 00:44:37.099
[inaudible], which is more as entities.
00:44:37.837 --> 00:44:39.569
And we have only 1 million,
00:44:39.569 --> 00:44:43.538
so we have 19 million still to do.
00:44:45.177 --> 00:44:47.276
Also, what we know from the Wikidata side,
00:44:47.276 --> 00:44:51.918
is that we have a good--
very quite active Wikidata project,
00:44:51.918 --> 00:44:53.840
called WikiProject Books,
00:44:54.332 --> 00:44:58.127
where we have a model we kind of agree on,
00:44:58.181 --> 00:45:00.916
which is not always followed,
which is, again, a problem.
00:45:00.956 --> 00:45:02.710
What is a book? You know it.
00:45:03.414 --> 00:45:05.385
I only have five minutes,
so, I'll keep going.
00:45:06.090 --> 00:45:08.880
And then, I'm a Wikisourcean,
so, Wikisourcer.
00:45:09.426 --> 00:45:11.930
So, I wanted to know
the other way around
00:45:11.930 --> 00:45:13.496
what is from Wikisource already,
00:45:13.496 --> 00:45:16.406
because Wikisource is already
inside the Wikimedia project.
00:45:16.406 --> 00:45:19.883
A lot of bibliographical records
and information.
00:45:19.883 --> 00:45:23.161
So, in the 66 million items on Wikidata,
00:45:23.161 --> 00:45:28.850
already 1 million are linked
to Wikisource.
00:45:29.330 --> 00:45:31.890
[inaudible].
00:45:32.350 --> 00:45:36.080
So, that's very few,
but that's quite a lot.
00:45:37.496 --> 00:45:40.174
There's a lot of author.
00:45:40.174 --> 00:45:44.670
There's some books, texts,
work, edition, whatever.
00:45:45.271 --> 00:45:48.425
Not always well-arranged.
00:45:48.869 --> 00:45:50.600
And there's a lot of internal pages,
00:45:50.600 --> 00:45:53.150
like categories and templates,
and things like that.
00:45:53.194 --> 00:45:54.984
But still, 1 million in total.
00:45:58.329 --> 00:46:01.767
The Wikisource community
are often small communities,
00:46:01.767 --> 00:46:05.010
like on the French community Wikisource,
00:46:05.010 --> 00:46:07.537
which is one of the biggest,
there's 50 people.
00:46:07.537 --> 00:46:08.787
That's the biggest we have.
00:46:09.047 --> 00:46:12.937
So, we love Wikidata, because,
hey, they did a lot of work for us.
00:46:12.942 --> 00:46:15.131
So, just take it from Wikisource.
00:46:15.131 --> 00:46:19.885
So, in this small community,
we love to reuse Wikidata data.
00:46:20.935 --> 00:46:24.076
Right now, we use a lot of a tool
which is called WEF--
00:46:24.358 --> 00:46:27.978
Wikidata Edit Framework-- thank you.
00:46:29.318 --> 00:46:33.098
And we are eager to see
how Wikidata Bridge will work.
00:46:33.438 --> 00:46:36.798
And we are trying to do things
with a team in Wikidata
00:46:37.638 --> 00:46:40.678
in Wikipedia Deutschland team,
[inaudible].
00:46:41.007 --> 00:46:43.934
And there's a lot
of collaboration in the future
00:46:43.934 --> 00:46:46.586
that we want to do: better integrate,
00:46:47.636 --> 00:46:51.068
do everything in one click when you import
a first book in Wikisource,
00:46:51.068 --> 00:46:52.465
things like that.
00:46:53.364 --> 00:46:57.664
Better-- do links between
edition in Wikidata.
00:46:57.852 --> 00:46:59.492
That needs to be done.
00:47:00.041 --> 00:47:02.282
The Foundation is doing the wish list now,
00:47:02.282 --> 00:47:04.853
and we have a lot of requests about that.
00:47:05.938 --> 00:47:07.342
And yeah, that's it.
00:47:07.342 --> 00:47:09.116
That was just a short overview.
00:47:09.116 --> 00:47:15.272
So, if you have some questions,
I'll take them and be available later,
00:47:15.712 --> 00:47:17.112
if you want to.
00:47:17.723 --> 00:47:19.722
(applause)
00:47:25.639 --> 00:47:28.281
Come on, you love Wikisource,
you have questions!
00:47:33.989 --> 00:47:35.775
(woman 4) I asked you
already this in August,
00:47:35.775 --> 00:47:38.411
and I wonder if this has already changed.
00:47:38.411 --> 00:47:42.337
What is the biggest problem you have
in Wikisource right now,
00:47:42.337 --> 00:47:43.761
from your perspective?
00:47:44.167 --> 00:47:45.670
The first one, only? (chuckles)
00:47:48.314 --> 00:47:54.152
I think because it's a small community,
we need efficient tools that work easily,
00:47:54.152 --> 00:47:57.148
because we have very few people,
00:47:57.148 --> 00:47:59.464
so we need tool that are easy to use
00:47:59.464 --> 00:48:04.247
and a one-click solution
to [inaudible] a bit,
00:48:04.371 --> 00:48:05.607
that's a big dream.
00:48:05.607 --> 00:48:07.179
I think that's what's most important,
00:48:07.179 --> 00:48:10.485
because that's the threshold
in Wikisource, it's a small community.
00:48:11.204 --> 00:48:13.241
I think this is the most important.
00:48:14.615 --> 00:48:15.975
[inaudible]
00:48:16.867 --> 00:48:19.600
(man 5) I'm curious if you can speak
to your opinion,
00:48:19.600 --> 00:48:23.154
or the French Wikisource opinion,
or maybe you spoke to other communities
00:48:23.154 --> 00:48:29.834
about the notion of not including
metadata about all the world's books.
00:48:30.234 --> 00:48:31.635
That was mentioned in the morning.
00:48:31.635 --> 00:48:34.965
Maybe other Wikibases,
and other federated databases
00:48:34.965 --> 00:48:38.026
will have that information,
and Wikidata won't.
00:48:39.159 --> 00:48:41.494
How does that feel for Wikisource?
00:48:43.981 --> 00:48:45.502
This is my very personal opinion.
00:48:45.502 --> 00:48:47.386
I know that people
in the Wikisource community
00:48:47.386 --> 00:48:48.723
disagree with that.
00:48:48.723 --> 00:48:50.537
But I think we need to stay--
00:48:50.537 --> 00:48:53.194
an external Wikibase
is not a good solution,
00:48:53.194 --> 00:48:55.353
because we have Shakespeare on Wikisource,
00:48:55.353 --> 00:48:58.323
and we have Shakespeare on Wikipedia.
00:48:58.564 --> 00:49:01.295
So, we need to interlink,
and interlink is there.
00:49:01.295 --> 00:49:04.007
Or like, Romeo and Juliet,
we have them both.
00:49:04.007 --> 00:49:07.229
So, we are still pretty close
to Wikipedia.
00:49:07.433 --> 00:49:09.431
And the difference with WikiCites--
00:49:09.431 --> 00:49:12.515
with WikiCite, we have a lot of items
which are small.
00:49:14.372 --> 00:49:16.051
Wikisource is the other way around.
00:49:16.150 --> 00:49:18.281
We have few items, who are big.
00:49:18.281 --> 00:49:20.515
Which can be a scaling problem
and everything,
00:49:20.515 --> 00:49:23.615
but it's quite a small subset of data.
00:49:23.683 --> 00:49:27.539
So, my personal opinion
is we should stay in the Wikidata.
00:49:28.391 --> 00:49:32.117
Again, because we are not
very much a lot of people,
00:49:32.117 --> 00:49:34.287
so we need to stay,
with the tool we know,
00:49:34.287 --> 00:49:35.846
don't change too much the tools
00:49:35.846 --> 00:49:37.736
for the small community, please.
00:49:37.769 --> 00:49:39.282
So, that's it.
00:49:39.282 --> 00:49:40.910
But I know that other people disagree.
00:49:40.910 --> 00:49:44.579
You can talk to [Sadeep] if you want.
He will have another point of view.
00:49:46.119 --> 00:49:49.319
Thank you. I think, last question, maybe.
00:49:51.234 --> 00:49:54.446
(man 6) Sometimes, I find it difficult
to link the Wikidata item
00:49:54.446 --> 00:50:00.976
with a Wikisource article,
because there's a Wikisource novel--
00:50:01.079 --> 00:50:06.128
might be split over several pages,
and there's an index page,
00:50:06.128 --> 00:50:08.853
and there's perhaps a front page,
or something like that.
00:50:08.853 --> 00:50:12.053
Do you have that problem,
or is that a general problem, or--
00:50:12.092 --> 00:50:16.892
Yeah, that's one of the first ideas
on the wish list
00:50:16.892 --> 00:50:19.092
for the Foundation, actually.
00:50:19.092 --> 00:50:20.790
Yeah, because Wikipedia is on the--
00:50:20.790 --> 00:50:22.772
if you know the [inaudible] organization,
00:50:22.772 --> 00:50:26.598
Wikipedia is on the work level,
and Wikisource on the edition level.
00:50:26.598 --> 00:50:28.572
So, already, you have a problem there.
00:50:28.572 --> 00:50:30.931
And then, we have several editions
of the same work,
00:50:30.931 --> 00:50:34.014
and we have sub-chapters
and things inside the edition.
00:50:34.014 --> 00:50:41.001
So, yeah, that's one too many problems
which is hard to solve by nature.
00:50:41.555 --> 00:50:44.839
But there's maybe a tool
that can help to solve that.
00:50:45.893 --> 00:50:47.469
Hopefully.
00:50:49.172 --> 00:50:51.395
And that's time, ladies and gentlemen.
00:50:51.398 --> 00:50:53.283
So, thank you very much, Nicolas.
00:50:53.335 --> 00:50:55.137
(applause)
00:50:59.010 --> 00:51:01.127
And please join me giving
one more round of applause
00:51:01.127 --> 00:51:03.147
to all of our wonderful speakers.
00:51:03.147 --> 00:51:04.901
(applause)