-
(host) Hello, everyone. Thank you
for coming to these lightning talks.
-
Our first speaker, I'm going
to run straight into it,
-
is going to be Rosie
Stephenson-Goodknight.
-
Did I get that right?
-
Yes. And so she's going to be talking
about the Women Writers Project.
-
And we're going to--
yeah, is that right? Great.
-
And so, we're going
to just launch right in,
-
and I want to remind you,
if there's time for questions,
-
to please not speak
until you have the microphone.
-
Thank you.
-
(Rosie) Hi, everyone, and thanks
for coming to this session,
-
where we're going to talk
about Women Writers in Review,
-
cultures of reception associated
with trans-Atlantic,
-
English language women writers,
broadly construed.
-
Women Writers in Review is an initiative
of the Women Writers Project
-
of Northeastern University.
-
It moved there from Brown University,
approximately 15 years ago.
-
Women Writers in Review is a collection
of 18th- and 19th-century reviews,
-
publication notices,
literary histories, and other texts
-
corresponding to trans-Atlantic--
so, UK and US mostly,
-
though a few Canadian--
written works by women.
-
It's a project where the two universities,
-
Brown University
and Northeastern University,
-
started collecting the manuscripts
of women from this period.
-
And then they started collecting
the reviews of these works,
-
and then they started scoring
these reviews by giving them a rating.
-
It's designed to investigate
the discourse of reception and connection
-
with the changing trans-Atlantic
literary landscape
-
for the period 1770 to 1830.
-
You're going to pardon me if I speak fast,
because I've got five minutes
-
to go over this.
-
It includes 690 English language texts
responding to works
-
written or translated
by 18th- and 19th-century women writers.
-
There are 74 authors in the corpus,
using 112 different sources,
-
or periodicals, or magazines.
-
And there are 628 critical reviews.
-
Here's a picture that shows you
what we're talking about
-
in terms of a review.
-
And you can also see what kind of scores
-
were given by the academics
at Northeastern University.
-
Most of these are women
who were giving scores
-
based on the reviews that were done
mostly, probably all men,
-
back in this time period 1770 to 1830
of works written by women.
-
By works, we're talking about plays,
and novels, and poems,
-
essays, and other kinds of articles.
-
So, what are we talking about?
-
This required creating
items for authors for their works,
-
like I said, novels and plays and poems.
-
It required creating new items
for this period of time
-
where there are defunct periodicals.
-
It required creating items
for the scholarly articles.
-
And then the review scores of each,
and the review score by,
-
which in this case would be
Women Writers in Review,
-
and what we still need to add
is the described by source.
-
This gives you a picture
of the kind of spreadsheets,
-
Google Spreadsheets,
that I have been working on.
-
I shouldn't just say I,
because I've had a lot of help.
-
I've had a lot of people
who were working on this project with me.
-
And you can see at the top,
something about the authors,
-
about the works.
-
The third group is going to be
the periodical,
-
and then, how the scores started showing.
-
And of course, this is how they look--
-
the beauty of being able to present
the preliminary findings.
-
Once we have uploaded all of the data,
-
and I hope that that's going to be done
by the end of this year,
-
this will obviously look different.
-
Appendix.
-
So, here's what the depiction looks like
-
at the Northeastern University website.
-
I don't think it's quite as clear
as what we can do with Wikidata.
-
And so, this was probably the reason why,
when I started as a visiting scholar
-
in 2017, they asked if this is one
of the projects that I could work on.
-
They stopped their work
the year before, in 2016.
-
And I think they just don't have
the resources to continue.
-
Some parts of this presentation
came from another
-
that was published in 2016.
-
And last but not least, here are links
-
to the different parts
of the work that I'm doing.
-
Thank you very much.
-
Questions.
-
(applause)
-
(woman) So, when you have a work,
and you have the review of the work,
-
are you looking
at a particular edition of the work,
-
or are these all reviews
of first editions?
-
It's a good question. No.
-
They are not just reviews
of the first edition.
-
Some are reviews of the second
or third edition.
-
I'm going to add something
that maybe I should have said
-
before I closed
and went to question and answers--
-
what's so special about this?
-
What's special is nobody else
has done this on Wikidata.
-
Surely, there are other universities
that have their own collections,
-
where their scholars have reviewed
the reviews of someone's work
-
in some language.
-
So, hopefully,
once this methodology gets--
-
once I write this up and the project
is over and presented again,
-
that there will be other
universities, other libraries
-
that will speak up and say,
"We've got data sets, too,
-
and we're going to go ahead
and upload them into Wikidata ourselves,"
-
and then it'd be lovely
to start doing some comparisons.
-
Anyone? Jane.
-
(Jane) Do you actually have books?
-
Do you actually have the books--
are the books in existence,
-
or are you actually
doing metadata about books
-
where we don't even know
where the books are?
-
Northeastern University
actually has the book,
-
or the essay, or the poem.
-
And they have the critical review
of the book, or the essay, or the poem.
-
And they're working
on the transcription of these,
-
and they're not at 100% yet.
-
They're not at 100%, but it's like,
all things working on it.
-
Any other questions?
-
(host) We're going to wrap it up there.
-
Thanks for being such a nice audience.
-
(applause)
-
Lady bug for [inaudible].
-
(man) Finally got that.
-
What I'm going to do is I'm just going
to click on these to load.
-
Just while-- is that new tab there?
-
[inaudible]
-
The first one? Yeah, perfect.
-
Sorry, my German is not even rusty,
-
it's simply non-existent.
-
So, I'll just let them load,
because then these queries can run
-
while I'm sort of introducing
what I was talking about and doing.
-
So, hi, I'm Nav from Histropedia.
-
And basically, for the last
quite a few years,
-
we've been relatively quiet,
-
while we've been sort of working
on technology and tools
-
that we need to sort of develop,
ultimately, Histropedia version 2,
-
which is going to be, you know,
this huge enhancement
-
on the first version.
-
Well, it's kind of in progress,
but as we do it,
-
we've been experimenting
with these other tools,
-
and building the technology
that we're going to need.
-
One really crucial part for this
is the ability to sort of see
-
the whole of history
from the billions of years time scale,
-
to up to the current day,
-
and zooming all the way into single days.
-
And ultimately, in the end,
down to hours and minutes.
-
We've managed to create
a [inaudible] of update to our engine.
-
Other engines can already do this,
-
but unfortunately, they also can't handle
the large data sets.
-
So, we finally got this update
to our engine.
-
It allows us to zoom to billions of years.
-
So, recently-- the recently
finished update,
-
and it's basically, it's an update
to our query viewer tool,
-
which is like a live version
of Histropedia
-
just linked straight to Wikidata.
-
So, it's literally based on a query,
-
a live query, and we see
the results of it.
-
So, it's sort of separate
to our main tool.
-
So, I'm going to flick to the first one,
which is my first experiment.
-
And you'll forgive me, the queries--
-
the code was kind of finished
not so long ago,
-
and the queries, I've been trying
to find out what can I find
-
and what's interesting
to look at, what's missing.
-
So, I started off
with a kind of, sort of, well--
-
So, that's not the right--
that's not Life on Earth.
-
Is this Life on Earth?
-
That will do, anyway.
-
So, I started off just trying to look
at what sort of things
-
are actually in Wikidata.
-
And this particular one--
sorry, it's in reverse.
-
So, this is the first one
I wanted to show you.
-
So, this is a kind of
a life on Earth query
-
that I wanted to develop.
-
And basically, what it is
is all the taxons in Wikidata
-
that have a date.
-
And as you can probably see
from the panel, there is not many of them.
-
But we do have the different taxon ranks.
-
So, you know, is it a species, a class--
-
for a biologist,
this makes a lot of sense.
-
But if I was just to close that a bit,
-
we can see, we are going back
to the earliest forms of life here.
-
3.5 billion years ago.
-
And as we zoom in here, we start to see
the more modern forms of life,
-
and we see some really
interesting things developing,
-
but we're still lacking a lot of data
in terms of this kind of time range.
-
So, my next thought was,
"Okay, well, why aren't--"
-
"I want to see a Tyrannosaurus Rex."
-
That's what I really wanted to see
on my query, and it wasn't there.
-
So, had a little dig in,
and I found out why.
-
It's because they're much more
being stored
-
in terms of the temporal range
or time period that they relate to.
-
So, on comes the next query,
-
where I actually sort of--
-
basically, this query
is looking for any item
-
that has a temporal range start,
and/or a temporal range end.
-
Which is basically in the form--
in life forms, it kind of relates
-
to when they emerged
and when they became extinct.
-
So, these are the periods
on the side here.
-
If I just close that a bit--
-
you can see that we have
quite a lot of interesting stuff.
-
And there's the Tyrannosaurus
that I was looking for.
-
So, I finally got that,
and I was like, "Yes! I've done it!"
-
I've got that Triceratops
in there for bonus.
-
But of course, still loads missing.
-
And I'd love to see lots more here.
-
But at least, it gives you the idea.
-
The nice thing is, here as well,
if I star some of these,
-
you can see that
the time range is shown.
-
So, you can start to do
what I really wanted to do, is say,
-
"Okay, when did this one end,
and when did the next one begin?
-
When did things start going extinct?"
-
So, I was pretty excited, but, still,
really hoping for a lot more.
-
So, there's a lot of editing to be done
-
in terms of these large geological
and cosmic time scales.
-
You can see on the color code,
I can also do extinction period.
-
So, I say, I want to find out stuff
that went extinct in the late Cretaceous.
-
And I now know that two things did that.
-
There's obviously quite a few more.
-
And I put the taxon rank
in there, as well,
-
just so that we can also see,
-
"Okay, which, what
is its species, genus, et cetera."
-
So, pretty exciting.
-
I was quite happy, but it's unfolding,
what needs to be done a lot.
-
So I went to the next one, which was--
-
I was thinking, "Well, I can't find
all the data I'm looking for.
-
Let's go a bit more general,
-
and just look for all of a certain kind
of dates in Wikidata that I can find
-
that are over 10,000 years old, basically.
-
And what type of thing are they?"
-
So, this color code is relatively okay,
but it might be a bit misleading,
-
because some things are multiple types.
-
So, therefore,
it's a bit random, at times.
-
But, you get some really
fascinating stuff in here.
-
I've got for a start--
I've got all of the millennia
-
that we have in Wikidata,
which is, you know, there you go.
-
Read about everything that happened
in all these different millennia.
-
No pictures for any
of these, unfortunately.
-
So, there's nothing to really say
what happened in them.
-
Taxon, which we were just looking at,
which kind of led me on
-
to the other queries.
-
And of course, that sort of
like all of them in one group.
-
Interesting stuff.
Archaeological cultures.
-
And this is like, okay,
this is more like up my street.
-
This is the sort of things
I want to learn about.
-
Again, pictures would be nice.
-
But it's really showing you
something interesting.
-
And it's just worth exploring here.
-
And of course, there's some
that really make me excited
-
for what we could be doing.
-
For example, there was
something here which was--
-
I mean, system, actually,
was quite an interesting one.
-
And sorry, that's not actually
the one I was thinking about.
-
In fact, that means nothing to me at all.
-
Someone might know what that means.
-
Art movements,
archaeological sites, activities.
-
There was only two of these,
-
but I really like the idea, because--
and they're both the same.
-
They're both hunting.
-
And of course, there's two of them.
-
And the reason is, is because
there's a little qualifier on there.
-
If we were to just
look through, we can see--
-
we can see somewhere down here,
will be the start time.
-
And the qualifier is talking about
when Homo erectus did it,
-
and when Homo sapiens did it.
-
So that should be
in brackets on the query,
-
a little extension to do to show you
what the two different versions mean.
-
But I would love to see
all of human skills in here.
-
When did we first do farming,
when did we first this--
-
when did fire come about?
-
All of these things,
when did we first extract iron?
-
When did we first--
all of these wonderful things
-
that developed
to modern world that we live in.
-
So, really exciting signs
of what could be there,
-
if it all got populated.
-
So, you know, this is what
we really need to work on,
-
is some of this historical info.
-
Last one, I just wanted to just show you,
-
which was just an extra
bonus one I threw in,
-
just to look at the time periods
that we actually have,
-
the historical ages
that we have in Wikidata.
-
And so, this is actually just all
sub-classes of unit of time.
-
And then, this is the actual
instance that it was.
-
And it's just really interesting.
-
This is more the kind of thing--
-
In Histropedia Mark II,
these are the kind of things
-
that will actually will be displayed
more under the timeline
-
as a sort of a range or period.
-
And so, we are particularly interested
in these periods
-
being really tight and nice,
-
because it helps you to, then,
say what happened when,
-
and you can sound really clever
when you talk about when things happened,
-
in the Neolithic or the upper
Paleolithic, or whatever.
-
I'm still pretty clueless on most of it,
-
because I'm just kind of just waiting
for the data to be up to scratch.
-
Great. I think I can actually
round it up there.
-
Loads more exciting queries to come.
-
A lot more features and cool stuff,
actually, just around the corner for us,
-
because we've just finished
a lot of cool things.
-
But there's a little bit of time
to pull it all together.
-
So, look out for more.
-
If there's any questions,
I think I've got one minute.
-
So, it would have to be one.
-
(host) Yes, Nav.
I forgot to introduce you.
-
I'm sorry. That's Nav, as he said,
Histropedia, Evans. Thank you very much.
-
Thank you. Cheers. Yeah.
-
(host) Very fast questions.
-
Anyone with a very fast question
[inaudible].
-
(woman 2) Very quickly, how can
I do my own, if I want languages,
-
when do we start, for instance.
-
Absolutely. Good question.
-
So just click on the--
oh, I've shared this.
-
It's called cosmic timelines on the URL.
-
Should be cosmic and geological,
but then it's not a short URL anymore.
-
So, you click on this icon
in the top corner there,
-
and then, you get to the query page,
which is like the home page of this tool.
-
This is where the query is pasted in.
-
So, at the moment,
I've got the language there.
-
If I want to change it to something else,
-
Arabic, or French, or whatever--
-
and here are the-- this is the area
-
where you sort of enter in exactly
which variables in your query
-
you would like to do each thing.
-
If you put nothing in,
it will try and figure it out.
-
But if you want advanced stuff--
and really important, is the precision,
-
because that's not available
on the query service timeline.
-
So, you get everything--
-
is the first of January
10 billion years ago,
-
you know, which is not
what we want to see.
-
And the rank, which is quite interesting.
-
My timelines are all based
on a very simple rank of site link count,
-
how many different articles there are,
or something else.
-
But that's how you go
and mess around with it with yourself,
-
and you put your color codes
and your filters in down here.
-
Comma separate them,
if you would like more,
-
and they come up as options
in the final tool.
-
And I think that
pretty much is it, isn't it.
-
So, any other questions,
do find me afterwards.
-
Always happy to get cornered
for this stuff.
-
I love talking about it.
-
Okay. So, thank you very much. Cheers.
-
(applause)
-
(mumbles)
-
So, where is the first one?
-
This one, no.
-
This? Sorry.
-
Is it full screen?
-
Yep. Full screen.
-
Well, good work.
-
[Strike.]
-
Yeah, so, okay. Thank you.
-
So, hi, I'm Thibaud Senalada.
-
As [inaudible] introduced me.
-
I'm a software engineer
at the French National Library.
-
And I'm here today
to talk to you about NOEMI,
-
which is a software, a proof of concept,
-
and a [inaudible] software
-
to the French Library to cataloging.
-
Sorry. [inaudible].
-
Sorry for my English. It's a bit of fuzzy.
-
And so, what's NOEMI?
-
So, NOEMI stands for:
-
Nouer les oeuvres, expressions,
Manifestations et Items.
-
Which, in English, is:
-
to link work, expression,
manifestation, and items.
-
It's based on the FRBR,
-
and [inaudible].
-
Yeah. Anyway.
-
So, yeah.
-
So, this software,
we use to produce metadata.
-
It will be used
-
by 600 people on a daily basis.
-
And as I say in the title,
it will be based on Wikibase.
-
So, there is also a format manager.
-
So, people using this software
will use like a code editor,
-
but for MARC format.
-
So, it's [inaudible], things like that.
-
A data processing tool, like I said.
-
And also, authorization management,
-
because they will need a--
-
if there is some data,
where it can be modified.
-
So, the PoC context.
-
So, this software will be replacing
an old software,
-
called ADCAT02.
-
It is part of the bibliographic
transition.
-
So, I say the [inaudible].
-
[inaudible]. [inaudible] in English?
-
Format.
-
And it will be the [inaudible] of the--
-
Sorry.
-
It will be [inaudible]
all the [inaudible]
-
of the BnF with data.
-
And so, doing this work,
-
we accessed Wikibase to see
if it fits our needs.
-
And [inaudible] pretty good.
-
So, why Wikibase?
-
Because of the flexibility of the format.
-
We arrive--
-
to inject MARC, INTERMARC for BnF--
-
in the database.
-
And use it to-- use this link management
-
between entities using Blazegraph,
-
so, as Wikibase does.
-
We also choose Wikibase,
because it was already--
-
it handles history and user account.
-
So, it's easiest for us.
-
And it also has a good--
it's pretty easy to create bots
-
to watch and curate data
-
and also to make statistics.
-
It's free and open, and sustainable.
-
Yeah, so.
-
I'm sorry if you don't
understand what I say,
-
because I know my English
is not that good.
-
But during this PoC,
we encountered some trouble.
-
Okay.
-
First of all, as a search engine,
I think we have to create
-
another--
-
not another, a supplementary
search engine to use it with,
-
to fit our needs.
-
Because we need some search
-
like faceted search and filters.
-
Also we have the [inaudible],
-
of using postgreSQL database.
-
And for the moment,
I think Wikibase [inaudible].
-
And when we try to use postgreSQL,
it was a bit difficult,
-
and will cause some issues.
-
And we have also some fear
about performance,
-
because the catalog is about
20 million entities,
-
20 million bibliographic entities.
-
That can be more
than 20 million entities, actually.
-
And we don't know the time
that we'll have to inject them
-
in the Wikibase, and how to do it.
-
So, [inaudible],
-
but the real software development
has already started.
-
We start by creating
an interface with Wikibase.
-
We're using Java.
-
Like PyWikibase.
-
- (man) Pywikibot.
- Pywikibot. Yeah, thank you.
-
The same way, but in Java.
-
We also inject already the format
into the Wikibase.
-
And we do something
like the INTERMARC editor,
-
[inaudible], et cetera.
-
Thank you.
-
(applause)
-
Yeah.
-
(man 2) Faceted search
will be a nice feature
-
in the Wikidata UI itself.
-
So, have you talked
to any of the developers,
-
or is that something
that could be done?
-
Sorry, I don't understand.
-
(man 2) The faceted search idea.
-
It would be nice to be able
to search only humans,
-
or search only works, or something, right?
-
Yeah. I'm sorry, I don't-- I don't--
-
(man 2) Yeah, I mean, so,
it would be nice if we had that
-
in Wikidata itself in the UI.
-
Yeah, yeah, yeah.
-
[inaudible]
-
Yeah, okay, thank you.
-
I'm sorry. (laughs)
-
Yeah, yeah. But I think we will--
-
I don't know if we want
to do it inside Wikibase,
-
or in our next systems.
-
For the moment,
we don't really solve that.
-
For the moment, I think.
-
Sorry.
-
(man 3) I suppose on the topic
of the faceted search,
-
Wikidata, SPARQL Query, Wikibase--
-
SPARQL Query is I think,
functionally equivalent
-
to a facetable search.
-
So, it's mostly an interface issue, right?
-
I mean, you could build an interface
that starts with a query,
-
and then, gives you
possible facets to filter by.
-
And when you click one of them,
-
it adds a condition
to the SPARQL Query, right?
-
Yeah, but I think the SPARQL--
-
they don't go as detailed
as we want, as we have--
-
When we inject the format,
we use a statement for--
-
the format is like XML.
-
So, it's a zone, subzone, and value.
-
And in the [inaudible] statement,
we add the subzone,
-
because the zone was already there.
-
And we want to query
some qualifier on this.
-
And I don't know if the SPARQL
goes through that-- I'm sorry--
-
in a fast way.
-
I think we need some index
for us to [inaudible].
-
Yeah.
-
(man 3) SPARQL doesn't do a query--
-
To do proper string searches
in SPARQL is very hard.
-
You have to have filters, which are slow,
-
and it really doesn't work that well.
-
So, it's a different
search problem, really.
-
More question? If anyone has one?
-
- Great. Thank you.
- Thank you.
-
(applause)
-
(host) Nielsen speaking about
the tool Ordia. Thank you.
-
So, I'm Finn Årup Nielsen,
-
and a couple of years ago,
I started Scholia
-
that displays data from Wikidata
via a SPARQL Query
-
to the Wikidata Query Service
-
so we can generate, for example,
a list of publications
-
for a specific author.
-
Now, last year, Wikidata
introduced lexicographic data.
-
And I [inaudible] the idea of Scholia
-
that is using Wikidata
and the Wikidata Query Service
-
to generate overviews
of lexicographic data.
-
So, Ordia is the example of this one here.
-
So, it generates-- it's a web application
run from the Toolforge service,
-
and for example, it will dynamically
generate a page such as--
-
This one here is statistics over
what there is of lexicographic data
-
in Wikidata.
-
For example, the number of lexemes,
is currently over 200,000.
-
So, there's a range of things
you can do here.
-
You can, for example,
look in the aspects of that.
-
The menu, there's quite a lot
of things here.
-
And so, I will search
on a specific Danish lexemes.
-
"Rød"-- which is "red" in Danish.
-
So, you basically get,
for the specific lexeme,
-
the same type of information
that you could see
-
in the ordinary part of Wikidata, here.
-
Annotations about the lexeme,
annotation about the forms,
-
single or plural forms.
-
Annotation about the sentence.
-
But what you can't see
in ordinary Wikidata
-
is sort of aggregating across lexemes.
-
And this is, for example, down here--
-
down here with the compound.
-
So, in Danish, like in German,
-
words can be compounded.
-
For example, for "red",
we have rødkælk
-
which is compounded by two words.
-
And we've got, on the second one here,
rødvin-- red wine.
-
This list here is constructed
by a SPARQL Query to the Wikidata Service.
-
And also, further down here,
we've got a lot of Danish words here.
-
Further down here, we should have
a graph of the words
-
which are compounded from rød.
-
We have [rød]-- red here in the middle.
-
And for example, around--
somewhere around here,
-
which should have,
for example, "red cabbage,"
-
"red cabbage salad,"
"red cabbage soup," and so on.
-
So you can browse around,
in this one here, and see it.
-
We can go a bit back here,
and then look on the main sense
-
of the word rød-- red in Danish.
-
So, Ordia automatically generates
information about hyponyms.
-
Subconcepts, for example,
-
light red, dark red,
pink, purple, and so on,
-
are in the-- when we make
a Wikidata Query service, SPARQL Query.
-
Then we go around in the Wikidata graph,
-
and get this information here.
-
And we can also get translation
automatically,
-
even though it's not necessarily stated
within the Wikidata lexemes items.
-
For example, here, we have translated
rød to "red" in English,
-
and röd in Swedish, and so on.
-
There's not that very many there.
-
There's a range of other things here.
-
Let me show you,
for example, this one here--
-
this is veninde- now I go
over to this one here.
-
-inde, which is a feminine suffix.
-
So, this is auto-generated there,
-
it's a combination of "instance of"--
-
lexemes that are "instance of"
feminine suffixes.
-
And for example, for German,
we have [inaudible].
-
So, -in would be
a feminine suffix in German.
-
And I put in sort of the five Danish
feminine suffixes
-
of Danish.
-
Another facility is, for example,
if you have a text,
-
you can copy and paste it
into this Text to lexemes here.
-
Let me--
-
"a car crashed into...
-
a green house."
-
Let me change that to "English".
-
Press Submit.
-
Now, Ordia will then extract
each of the word here,
-
in this sentence here,
-
and try to see whether they
are entered in the specific form,
-
a lexeme, are entered in Wikidata.
-
And these simple words here
are entered in Wikidata.
-
But if we, for example, change it to--
there's nothing called "vancar"
-
but just let us do that here.
-
And you got down here--
it's as a blue link
-
that you can create a new
Wikidata lexeme item.
-
But the range of other things to explore
-
in this web application.
-
And if there's any suggestions,
or comments, or notes, or something,
-
you can contact me, or put in
an issue on GitHub.
-
So, this particular application
is developed on GitHub,
-
and I'm open for new ideas
and ways to represent information there.
-
Okay, thank you.
-
(applause)
-
Questions?
-
(woman 3) I love your tool.
-
Can you show the languages,
that which is awesome for me, I think,
-
to show other languages.
-
So, this is a bit of statistics
over the languages,
-
and the Russians
have been scraping Wictionary,
-
and that's why they have now
100,000 lexemes.
-
There's also a lot of work on Basque here.
-
I think there's an organization
putting that information in here.
-
And you can also see a graph of these--
-
this is Number of forms as functions
of number of lexemes.
-
And all the way up here--
-
here, this is Russian,
down here, Basque, I think.
-
And English, perhaps, down here.
-
And also in the Number of senses,
-
I think Basque, English, and Russian,
-
Hebrew, and so on.
-
Yeah.
-
(man 4) That looks
like an incredible tool.
-
But I was just wondering,
is it all fully live?
-
Is it all based on SPARQL Queries
and live or are there some things--
-
- Yes. I believe, yes.
- Fantastic.
-
But as they get more data into Wikidata,
-
there's a bit of an issue.
-
For example, for Russian here.
-
I started out this a year ago
when there's not that very many lexemes,
-
and so there was no problems
with the time-outs.
-
But representing it here--
-
but if I press Russian,
I think there might be some issues.
-
There's a count that works here,
-
for example, longest words or phrases.
-
But I think the lexemes
are sort of loading in.
-
I think I'll need to fix that
as Wikidata grows here.
-
As you see, there's a lot
of Russian nouns, apparently.
-
And I don't know whether the--
-
apparently, that's what
they're working on.
-
There seems also to be
a bit of time-out there.
-
[inaudible], oh, yes.
-
The first one there.
-
But apparently, the longest words
and phrases is a bit too expansive.
-
But apparently, it can be loaded there,
and it's probably--
-
it's loaded all the 100,000 there,
-
so you can click all 10,000 pages.
-
(host) If there aren't
any other questions--
-
The longest word came now.
-
So, it's, yeah.
-
Probably--
-
[inaudible]
-
What is that?
-
- (audience) It's a chemical.
- A chemical, yes.
-
(host) More questions? Or shall we?
-
Alright, alright. Thank you very much.
-
(applause)
-
(Nicolas) Is it good?
-
(host) Awesome.
-
Alright, now, to wrap it up,
we have Nicolas Vigneron,
-
talking about Wikisource and Wikidata.
-
(Nicolas) This is good?
-
Who knows Wikisource?
-
Yay!
-
More and more people
raising hands every year.
-
That's good.
-
So, this morning, [Lydia] said that
Wikivoyage was the first real user of--
-
[inaudible]
-
Wikisource is not that far behind.
-
There's a lot to do,
and I want to do some basic numbers,
-
statistics, about where we are,
and where I want to go.
-
So first, there will be a lot of questions
of what is a book,
-
what is bibliographical data.
-
People from the BnF can agree with me.
-
That can be a nightmare
if you go into details.
-
But some big numbers that--
Google Books tried to do an estimation
-
on how many "books," air quote books,
there is in the world,
-
and there's 130 million books
in the world.
-
And, yeah, let's put them all on Wikidata.
-
Or not. I don't know.
-
But where are we now?
-
And why is it books?
-
Because for Google Books,
everything is scanned, basically.
-
They don't have exactly
a very clear distinction.
-
There's sometimes, two-page books,
which [inaudible], Google Books is a book.
-
But for many people, you have to have
at least 50 pages to be a book.
-
So, that's always hard to count.
-
But here's what we know on Wikidata.
-
This the graph of what
is a book for Wikidata.
-
You have-- that's totally [inaudible]--
-
but that's Wikidata,
literary work as well.
-
And this is all the subclasses,
or subclasses of subclasses--
-
or subclasses of subclasses
of what is a book.
-
So, that's very hard to do.
-
I can do a graph like that,
-
but SPARQL Query engine doesn't work
-
if I want to count everything
that is instance of these subclasses,
-
and basically, SPARQL says no, time-out.
-
So, what's the problem?
-
But I know already that there's
a lot of subclasses,
-
but we need to look into it.
-
And probably, if you know Wikidata,
on the page, Wikidata point statistics,
-
you have all the numbers by big classes,
-
and you all probably know
that the big chunk here
-
is scholarly articles,
-
which is, thanks to
the WikiCite project, in particular,
-
which can be books or not,
depending on definition.
-
You see that there's no subclass books,
-
because there's not enough to show.
-
It's probably somewhere in the others,
-
the purple area is others.
-
And there's a lot of things
that's under one percent.
-
So, basically, we can say
that we have less one percent
-
of things identified as books in Wikidata.
-
Maybe there is more books,
but not identified as such.
-
I'm talking about books,
-
but when we are talking
about bibliographical data,
-
there's also the author, person,
-
so maybe some of the human here
are also authors, surely.
-
And we need to do another count,
which is another big query to do.
-
That times out, so--
-
I have a lot of not number
to this, sorry.
-
So, yeah, basically, this first slide
is about how it's complicated
-
to know how much we have of what,
and how to count them.
-
So, yeah, hard to count.
-
What we know--
-
that is we have a lot of properties--
-
700,000, I guess,
-
now on Wikidata.
-
We know that we have a lot of identifiers
among these properties.
-
And we know that almost 4,000
are properties for identifiers
-
relative to bibliographical,
-
like ID at the National Library of France,
-
National Library of Yaddi, Yaddi, Yada,
-
because we love identifier
of National Library on Wikidata.
-
So, we have almost all libraries,
national libraries and more.
-
So, we have a lot of properties.
I know that.
-
And we are widely used.
-
I know that, for instance,
BnF properties use--
-
BnF is National Library of France--
-
is used 1 million times--
OCOC, VIAF, or the big like that.
-
A lot of uses in Wikidata.
-
But it's not because we have
a lot of uses of various properties
-
in Wikidata that it's complete.
-
As Thibaud said, there's more
than 20 million books,
-
[inaudible], which is more as entities.
-
And we have only 1 million,
-
so we have 19 million still to do.
-
Also, what we know from the Wikidata side,
-
is that we have a good--
very quite active Wikidata project,
-
called WikiProject Books,
-
where we have a model we kind of agree on,
-
which is not always followed,
which is, again, a problem.
-
What is a book? You know it.
-
I only have five minutes,
so, I'll keep going.
-
And then, I'm a Wikisourcean,
so, Wikisourcer.
-
So, I wanted to know
the other way around
-
what is from Wikisource already,
-
because Wikisource is already
inside the Wikimedia project.
-
A lot of bibliographical records
and information.
-
So, in the 66 million items on Wikidata,
-
already 1 million are linked
to Wikisource.
-
[inaudible].
-
So, that's very few,
but that's quite a lot.
-
There's a lot of author.
-
There's some books, texts,
work, edition, whatever.
-
Not always well-arranged.
-
And there's a lot of internal pages,
-
like categories and templates,
and things like that.
-
But still, 1 million in total.
-
The Wikisource community
are often small communities,
-
like on the French community Wikisource,
-
which is one of the biggest,
there's 50 people.
-
That's the biggest we have.
-
So, we love Wikidata, because,
hey, they did a lot of work for us.
-
So, just take it from Wikisource.
-
So, in this small community,
we love to reuse Wikidata data.
-
Right now, we use a lot of a tool
which is called WEF--
-
Wikidata Edit Framework-- thank you.
-
And we are eager to see
how Wikidata Bridge will work.
-
And we are trying to do things
with a team in Wikidata
-
in Wikipedia Deutschland team,
[inaudible].
-
And there's a lot
of collaboration in the future
-
that we want to do: better integrate,
-
do everything in one click when you import
a first book in Wikisource,
-
things like that.
-
Better-- do links between
edition in Wikidata.
-
That needs to be done.
-
The Foundation is doing the wish list now,
-
and we have a lot of requests about that.
-
And yeah, that's it.
-
That was just a short overview.
-
So, if you have some questions,
I'll take them and be available later,
-
if you want to.
-
(applause)
-
Come on, you love Wikisource,
you have questions!
-
(woman 4) I asked you
already this in August,
-
and I wonder if this has already changed.
-
What is the biggest problem you have
in Wikisource right now,
-
from your perspective?
-
The first one, only? (chuckles)
-
I think because it's a small community,
we need efficient tools that work easily,
-
because we have very few people,
-
so we need tool that are easy to use
-
and a one-click solution
to [inaudible] a bit,
-
that's a big dream.
-
I think that's what's most important,
-
because that's the threshold
in Wikisource, it's a small community.
-
I think this is the most important.
-
[inaudible]
-
(man 5) I'm curious if you can speak
to your opinion,
-
or the French Wikisource opinion,
or maybe you spoke to other communities
-
about the notion of not including
metadata about all the world's books.
-
That was mentioned in the morning.
-
Maybe other Wikibases,
and other federated databases
-
will have that information,
and Wikidata won't.
-
How does that feel for Wikisource?
-
This is my very personal opinion.
-
I know that people
in the Wikisource community
-
disagree with that.
-
But I think we need to stay--
-
an external Wikibase
is not a good solution,
-
because we have Shakespeare on Wikisource,
-
and we have Shakespeare on Wikipedia.
-
So, we need to interlink,
and interlink is there.
-
Or like, Romeo and Juliet,
we have them both.
-
So, we are still pretty close
to Wikipedia.
-
And the difference with WikiCites--
-
with WikiCite, we have a lot of items
which are small.
-
Wikisource is the other way around.
-
We have few items, who are big.
-
Which can be a scaling problem
and everything,
-
but it's quite a small subset of data.
-
So, my personal opinion
is we should stay in the Wikidata.
-
Again, because we are not
very much a lot of people,
-
so we need to stay,
with the tool we know,
-
don't change too much the tools
-
for the small community, please.
-
So, that's it.
-
But I know that other people disagree.
-
You can talk to [Sadeep] if you want.
He will have another point of view.
-
Thank you. I think, last question, maybe.
-
(man 6) Sometimes, I find it difficult
to link the Wikidata item
-
with a Wikisource article,
because there's a Wikisource novel--
-
might be split over several pages,
and there's an index page,
-
and there's perhaps a front page,
or something like that.
-
Do you have that problem,
or is that a general problem, or--
-
Yeah, that's one of the first ideas
on the wish list
-
for the Foundation, actually.
-
Yeah, because Wikipedia is on the--
-
if you know the [inaudible] organization,
-
Wikipedia is on the work level,
and Wikisource on the edition level.
-
So, already, you have a problem there.
-
And then, we have several editions
of the same work,
-
and we have sub-chapters
and things inside the edition.
-
So, yeah, that's one too many problems
which is hard to solve by nature.
-
But there's maybe a tool
that can help to solve that.
-
Hopefully.
-
And that's time, ladies and gentlemen.
-
So, thank you very much, Nicolas.
-
(applause)
-
And please join me giving
one more round of applause
-
to all of our wonderful speakers.
-
(applause)