(host) Hello, everyone. Thank you
for coming to these lightning talks.
Our first speaker, I'm going
to run straight into it,
is going to be Rosie
Stephenson-Goodknight.
Did I get that right?
Yes. And so she's going to be talking
about the Women Writers Project.
And we're going to--
yeah, is that right? Great.
And so, we're going
to just launch right in,
and I want to remind you,
if there's time for questions,
to please not speak
until you have the microphone.
Thank you.
(Rosie) Hi, everyone, and thanks
for coming to this session,
where we're going to talk
about Women Writers in Review,
cultures of reception associated
with trans-Atlantic,
English language women writers,
broadly construed.
Women Writers in Review is an initiative
of the Women Writers Project
of Northeastern University.
It moved there from Brown University,
approximately 15 years ago.
Women Writers in Review is a collection
of 18th- and 19th-century reviews,
publication notices,
literary histories, and other texts
corresponding to trans-Atlantic--
so, UK and US mostly,
though a few Canadian--
written works by women.
It's a project where the two universities,
Brown University
and Northeastern University,
started collecting the manuscripts
of women from this period.
And then they started collecting
the reviews of these works,
and then they started scoring
these reviews by giving them a rating.
It's designed to investigate
the discourse of reception and connection
with the changing trans-Atlantic
literary landscape
for the period 1770 to 1830.
You're going to pardon me if I speak fast,
because I've got five minutes
to go over this.
It includes 690 English language texts
responding to works
written or translated
by 18th- and 19th-century women writers.
There are 74 authors in the corpus,
using 112 different sources,
or periodicals, or magazines.
And there are 628 critical reviews.
Here's a picture that shows you
what we're talking about
in terms of a review.
And you can also see what kind of scores
were given by the academics
at Northeastern University.
Most of these are women
who were giving scores
based on the reviews that were done
mostly, probably all men,
back in this time period 1770 to 1830
of works written by women.
By works, we're talking about plays,
and novels, and poems,
essays, and other kinds of articles.
So, what are we talking about?
This required creating
items for authors for their works,
like I said, novels and plays and poems.
It required creating new items
for this period of time
where there are defunct periodicals.
It required creating items
for the scholarly articles.
And then the review scores of each,
and the review score by,
which in this case would be
Women Writers in Review,
and what we still need to add
is the described by source.
This gives you a picture
of the kind of spreadsheets,
Google Spreadsheets,
that I have been working on.
I shouldn't just say I,
because I've had a lot of help.
I've had a lot of people
who were working on this project with me.
And you can see at the top,
something about the authors,
about the works.
The third group is going to be
the periodical,
and then, how the scores started showing.
And of course, this is how they look--
the beauty of being able to present
the preliminary findings.
Once we have uploaded all of the data,
and I hope that that's going to be done
by the end of this year,
this will obviously look different.
Appendix.
So, here's what the depiction looks like
at the Northeastern University website.
I don't think it's quite as clear
as what we can do with Wikidata.
And so, this was probably the reason why,
when I started as a visiting scholar
in 2017, they asked if this is one
of the projects that I could work on.
They stopped their work
the year before, in 2016.
And I think they just don't have
the resources to continue.
Some parts of this presentation
came from another
that was published in 2016.
And last but not least, here are links
to the different parts
of the work that I'm doing.
Thank you very much.
Questions.
(applause)
(woman) So, when you have a work,
and you have the review of the work,
are you looking
at a particular edition of the work,
or are these all reviews
of first editions?
It's a good question. No.
They are not just reviews
of the first edition.
Some are reviews of the second
or third edition.
I'm going to add something
that maybe I should have said
before I closed
and went to question and answers--
what's so special about this?
What's special is nobody else
has done this on Wikidata.
Surely, there are other universities
that have their own collections,
where their scholars have reviewed
the reviews of someone's work
in some language.
So, hopefully,
once this methodology gets--
once I write this up and the project
is over and presented again,
that there will be other
universities, other libraries
that will speak up and say,
"We've got data sets, too,
and we're going to go ahead
and upload them into Wikidata ourselves,"
and then it'd be lovely
to start doing some comparisons.
Anyone? Jane.
(Jane) Do you actually have books?
Do you actually have the books--
are the books in existence,
or are you actually
doing metadata about books
where we don't even know
where the books are?
Northeastern University
actually has the book,
or the essay, or the poem.
And they have the critical review
of the book, or the essay, or the poem.
And they're working
on the transcription of these,
and they're not at 100% yet.
They're not at 100%, but it's like,
all things working on it.
Any other questions?
(host) We're going to wrap it up there.
Thanks for being such a nice audience.
(applause)
Lady bug for [inaudible].
(man) Finally got that.
What I'm going to do is I'm just going
to click on these to load.
Just while-- is that new tab there?
[inaudible]
The first one? Yeah, perfect.
Sorry, my German is not even rusty,
it's simply non-existent.
So, I'll just let them load,
because then these queries can run
while I'm sort of introducing
what I was talking about and doing.
So, hi, I'm Nav from Histropedia.
And basically, for the last
quite a few years,
we've been relatively quiet,
while we've been sort of working
on technology and tools
that we need to sort of develop,
ultimately, Histropedia version 2,
which is going to be, you know,
this huge enhancement
on the first version.
Well, it's kind of in progress,
but as we do it,
we've been experimenting
with these other tools,
and building the technology
that we're going to need.
One really crucial part for this
is the ability to sort of see
the whole of history
from the billions of years time scale,
to up to the current day,
and zooming all the way into single days.
And ultimately, in the end,
down to hours and minutes.
We've managed to create
a [inaudible] of update to our engine.
Other engines can already do this,
but unfortunately, they also can't handle
the large data sets.
So, we finally got this update
to our engine.
It allows us to zoom to billions of years.
So, recently-- the recently
finished update,
and it's basically, it's an update
to our query viewer tool,
which is like a live version
of Histropedia
just linked straight to Wikidata.
So, it's literally based on a query,
a live query, and we see
the results of it.
So, it's sort of separate
to our main tool.
So, I'm going to flick to the first one,
which is my first experiment.
And you'll forgive me, the queries--
the code was kind of finished
not so long ago,
and the queries, I've been trying
to find out what can I find
and what's interesting
to look at, what's missing.
So, I started off
with a kind of, sort of, well--
So, that's not the right--
that's not Life on Earth.
Is this Life on Earth?
That will do, anyway.
So, I started off just trying to look
at what sort of things
are actually in Wikidata.
And this particular one--
sorry, it's in reverse.
So, this is the first one
I wanted to show you.
So, this is a kind of
a life on Earth query
that I wanted to develop.
And basically, what it is
is all the taxons in Wikidata
that have a date.
And as you can probably see
from the panel, there is not many of them.
But we do have the different taxon ranks.
So, you know, is it a species, a class--
for a biologist,
this makes a lot of sense.
But if I was just to close that a bit,
we can see, we are going back
to the earliest forms of life here.
3.5 billion years ago.
And as we zoom in here, we start to see
the more modern forms of life,
and we see some really
interesting things developing,
but we're still lacking a lot of data
in terms of this kind of time range.
So, my next thought was,
"Okay, well, why aren't--"
"I want to see a Tyrannosaurus Rex."
That's what I really wanted to see
on my query, and it wasn't there.
So, had a little dig in,
and I found out why.
It's because they're much more
being stored
in terms of the temporal range
or time period that they relate to.
So, on comes the next query,
where I actually sort of--
basically, this query
is looking for any item
that has a temporal range start,
and/or a temporal range end.
Which is basically in the form--
in life forms, it kind of relates
to when they emerged
and when they became extinct.
So, these are the periods
on the side here.
If I just close that a bit--
you can see that we have
quite a lot of interesting stuff.
And there's the Tyrannosaurus
that I was looking for.
So, I finally got that,
and I was like, "Yes! I've done it!"
I've got that Triceratops
in there for bonus.
But of course, still loads missing.
And I'd love to see lots more here.
But at least, it gives you the idea.
The nice thing is, here as well,
if I star some of these,
you can see that
the time range is shown.
So, you can start to do
what I really wanted to do, is say,
"Okay, when did this one end,
and when did the next one begin?
When did things start going extinct?"
So, I was pretty excited, but, still,
really hoping for a lot more.
So, there's a lot of editing to be done
in terms of these large geological
and cosmic time scales.
You can see on the color code,
I can also do extinction period.
So, I say, I want to find out stuff
that went extinct in the late Cretaceous.
And I now know that two things did that.
There's obviously quite a few more.
And I put the taxon rank
in there, as well,
just so that we can also see,
"Okay, which, what
is its species, genus, et cetera."
So, pretty exciting.
I was quite happy, but it's unfolding,
what needs to be done a lot.
So I went to the next one, which was--
I was thinking, "Well, I can't find
all the data I'm looking for.
Let's go a bit more general,
and just look for all of a certain kind
of dates in Wikidata that I can find
that are over 10,000 years old, basically.
And what type of thing are they?"
So, this color code is relatively okay,
but it might be a bit misleading,
because some things are multiple types.
So, therefore,
it's a bit random, at times.
But, you get some really
fascinating stuff in here.
I've got for a start--
I've got all of the millennia
that we have in Wikidata,
which is, you know, there you go.
Read about everything that happened
in all these different millennia.
No pictures for any
of these, unfortunately.
So, there's nothing to really say
what happened in them.
Taxon, which we were just looking at,
which kind of led me on
to the other queries.
And of course, that sort of
like all of them in one group.
Interesting stuff.
Archaeological cultures.
And this is like, okay,
this is more like up my street.
This is the sort of things
I want to learn about.
Again, pictures would be nice.
But it's really showing you
something interesting.
And it's just worth exploring here.
And of course, there's some
that really make me excited
for what we could be doing.
For example, there was
something here which was--
I mean, system, actually,
was quite an interesting one.
And sorry, that's not actually
the one I was thinking about.
In fact, that means nothing to me at all.
Someone might know what that means.
Art movements,
archaeological sites, activities.
There was only two of these,
but I really like the idea, because--
and they're both the same.
They're both hunting.
And of course, there's two of them.
And the reason is, is because
there's a little qualifier on there.
If we were to just
look through, we can see--
we can see somewhere down here,
will be the start time.
And the qualifier is talking about
when Homo erectus did it,
and when Homo sapiens did it.
So that should be
in brackets on the query,
a little extension to do to show you
what the two different versions mean.
But I would love to see
all of human skills in here.
When did we first do farming,
when did we first this--
when did fire come about?
All of these things,
when did we first extract iron?
When did we first--
all of these wonderful things
that developed
to modern world that we live in.
So, really exciting signs
of what could be there,
if it all got populated.
So, you know, this is what
we really need to work on,
is some of this historical info.
Last one, I just wanted to just show you,
which was just an extra
bonus one I threw in,
just to look at the time periods
that we actually have,
the historical ages
that we have in Wikidata.
And so, this is actually just all
sub-classes of unit of time.
And then, this is the actual
instance that it was.
And it's just really interesting.
This is more the kind of thing--
In Histropedia Mark II,
these are the kind of things
that will actually will be displayed
more under the timeline
as a sort of a range or period.
And so, we are particularly interested
in these periods
being really tight and nice,
because it helps you to, then,
say what happened when,
and you can sound really clever
when you talk about when things happened,
in the Neolithic or the upper
Paleolithic, or whatever.
I'm still pretty clueless on most of it,
because I'm just kind of just waiting
for the data to be up to scratch.
Great. I think I can actually
round it up there.
Loads more exciting queries to come.
A lot more features and cool stuff,
actually, just around the corner for us,
because we've just finished
a lot of cool things.
But there's a little bit of time
to pull it all together.
So, look out for more.
If there's any questions,
I think I've got one minute.
So, it would have to be one.
(host) Yes, Nav.
I forgot to introduce you.
I'm sorry. That's Nav, as he said,
Histropedia, Evans. Thank you very much.
Thank you. Cheers. Yeah.
(host) Very fast questions.
Anyone with a very fast question
[inaudible].
(woman 2) Very quickly, how can
I do my own, if I want languages,
when do we start, for instance.
Absolutely. Good question.
So just click on the--
oh, I've shared this.
It's called cosmic timelines on the URL.
Should be cosmic and geological,
but then it's not a short URL anymore.
So, you click on this icon
in the top corner there,
and then, you get to the query page,
which is like the home page of this tool.
This is where the query is pasted in.
So, at the moment,
I've got the language there.
If I want to change it to something else,
Arabic, or French, or whatever--
and here are the-- this is the area
where you sort of enter in exactly
which variables in your query
you would like to do each thing.
If you put nothing in,
it will try and figure it out.
But if you want advanced stuff--
and really important, is the precision,
because that's not available
on the query service timeline.
So, you get everything--
is the first of January
10 billion years ago,
you know, which is not
what we want to see.
And the rank, which is quite interesting.
My timelines are all based
on a very simple rank of site link count,
how many different articles there are,
or something else.
But that's how you go
and mess around with it with yourself,
and you put your color codes
and your filters in down here.
Comma separate them,
if you would like more,
and they come up as options
in the final tool.
And I think that
pretty much is it, isn't it.
So, any other questions,
do find me afterwards.
Always happy to get cornered
for this stuff.
I love talking about it.
Okay. So, thank you very much. Cheers.
(applause)
(mumbles)
So, where is the first one?
This one, no.
This? Sorry.
Is it full screen?
Yep. Full screen.
Well, good work.
[Strike.]
Yeah, so, okay. Thank you.
So, hi, I'm Thibaud Senalada.
As [inaudible] introduced me.
I'm a software engineer
at the French National Library.
And I'm here today
to talk to you about NOEMI,
which is a software, a proof of concept,
and a [inaudible] software
to the French Library to cataloging.
Sorry. [inaudible].
Sorry for my English. It's a bit of fuzzy.
And so, what's NOEMI?
So, NOEMI stands for:
Nouer les oeuvres, expressions,
Manifestations et Items.
Which, in English, is:
to link work, expression,
manifestation, and items.
It's based on the FRBR,
and [inaudible].
Yeah. Anyway.
So, yeah.
So, this software,
we use to produce metadata.
It will be used
by 600 people on a daily basis.
And as I say in the title,
it will be based on Wikibase.
So, there is also a format manager.
So, people using this software
will use like a code editor,
but for MARC format.
So, it's [inaudible], things like that.
A data processing tool, like I said.
And also, authorization management,
because they will need a--
if there is some data,
where it can be modified.
So, the PoC context.
So, this software will be replacing
an old software,
called ADCAT02.
It is part of the bibliographic
transition.
So, I say the [inaudible].
[inaudible]. [inaudible] in English?
Format.
And it will be the [inaudible] of the--
Sorry.
It will be [inaudible]
all the [inaudible]
of the BnF with data.
And so, doing this work,
we accessed Wikibase to see
if it fits our needs.
And [inaudible] pretty good.
So, why Wikibase?
Because of the flexibility of the format.
We arrive--
to inject MARC, INTERMARC for BnF--
in the database.
And use it to-- use this link management
between entities using Blazegraph,
so, as Wikibase does.
We also choose Wikibase,
because it was already--
it handles history and user account.
So, it's easiest for us.
And it also has a good--
it's pretty easy to create bots
to watch and curate data
and also to make statistics.
It's free and open, and sustainable.
Yeah, so.
I'm sorry if you don't
understand what I say,
because I know my English
is not that good.
But during this PoC,
we encountered some trouble.
Okay.
First of all, as a search engine,
I think we have to create
another--
not another, a supplementary
search engine to use it with,
to fit our needs.
Because we need some search
like faceted search and filters.
Also we have the [inaudible],
of using postgreSQL database.
And for the moment,
I think Wikibase [inaudible].
And when we try to use postgreSQL,
it was a bit difficult,
and will cause some issues.
And we have also some fear
about performance,
because the catalog is about
20 million entities,
20 million bibliographic entities.
That can be more
than 20 million entities, actually.
And we don't know the time
that we'll have to inject them
in the Wikibase, and how to do it.
So, [inaudible],
but the real software development
has already started.
We start by creating
an interface with Wikibase.
We're using Java.
Like PyWikibase.
- (man) Pywikibot.
- Pywikibot. Yeah, thank you.
The same way, but in Java.
We also inject already the format
into the Wikibase.
And we do something
like the INTERMARC editor,
[inaudible], et cetera.
Thank you.
(applause)
Yeah.
(man 2) Faceted search
will be a nice feature
in the Wikidata UI itself.
So, have you talked
to any of the developers,
or is that something
that could be done?
Sorry, I don't understand.
(man 2) The faceted search idea.
It would be nice to be able
to search only humans,
or search only works, or something, right?
Yeah. I'm sorry, I don't-- I don't--
(man 2) Yeah, I mean, so,
it would be nice if we had that
in Wikidata itself in the UI.
Yeah, yeah, yeah.
[inaudible]
Yeah, okay, thank you.
I'm sorry. (laughs)
Yeah, yeah. But I think we will--
I don't know if we want
to do it inside Wikibase,
or in our next systems.
For the moment,
we don't really solve that.
For the moment, I think.
Sorry.
(man 3) I suppose on the topic
of the faceted search,
Wikidata, SPARQL Query, Wikibase--
SPARQL Query is I think,
functionally equivalent
to a facetable search.
So, it's mostly an interface issue, right?
I mean, you could build an interface
that starts with a query,
and then, gives you
possible facets to filter by.
And when you click one of them,
it adds a condition
to the SPARQL Query, right?
Yeah, but I think the SPARQL--
they don't go as detailed
as we want, as we have--
When we inject the format,
we use a statement for--
the format is like XML.
So, it's a zone, subzone, and value.
And in the [inaudible] statement,
we add the subzone,
because the zone was already there.
And we want to query
some qualifier on this.
And I don't know if the SPARQL
goes through that-- I'm sorry--
in a fast way.
I think we need some index
for us to [inaudible].
Yeah.
(man 3) SPARQL doesn't do a query--
To do proper string searches
in SPARQL is very hard.
You have to have filters, which are slow,
and it really doesn't work that well.
So, it's a different
search problem, really.
More question? If anyone has one?
- Great. Thank you.
- Thank you.
(applause)
(host) Nielsen speaking about
the tool Ordia. Thank you.
So, I'm Finn Årup Nielsen,
and a couple of years ago,
I started Scholia
that displays data from Wikidata
via a SPARQL Query
to the Wikidata Query Service
so we can generate, for example,
a list of publications
for a specific author.
Now, last year, Wikidata
introduced lexicographic data.
And I [inaudible] the idea of Scholia
that is using Wikidata
and the Wikidata Query Service
to generate overviews
of lexicographic data.
So, Ordia is the example of this one here.
So, it generates-- it's a web application
run from the Toolforge service,
and for example, it will dynamically
generate a page such as--
This one here is statistics over
what there is of lexicographic data
in Wikidata.
For example, the number of lexemes,
is currently over 200,000.
So, there's a range of things
you can do here.
You can, for example,
look in the aspects of that.
The menu, there's quite a lot
of things here.
And so, I will search
on a specific Danish lexemes.
"Rød"-- which is "red" in Danish.
So, you basically get,
for the specific lexeme,
the same type of information
that you could see
in the ordinary part of Wikidata, here.
Annotations about the lexeme,
annotation about the forms,
single or plural forms.
Annotation about the sentence.
But what you can't see
in ordinary Wikidata
is sort of aggregating across lexemes.
And this is, for example, down here--
down here with the compound.
So, in Danish, like in German,
words can be compounded.
For example, for "red",
we have rødkælk
which is compounded by two words.
And we've got, on the second one here,
rødvin-- red wine.
This list here is constructed
by a SPARQL Query to the Wikidata Service.
And also, further down here,
we've got a lot of Danish words here.
Further down here, we should have
a graph of the words
which are compounded from rød.
We have [rød]-- red here in the middle.
And for example, around--
somewhere around here,
which should have,
for example, "red cabbage,"
"red cabbage salad,"
"red cabbage soup," and so on.
So you can browse around,
in this one here, and see it.
We can go a bit back here,
and then look on the main sense
of the word rød-- red in Danish.
So, Ordia automatically generates
information about hyponyms.
Subconcepts, for example,
light red, dark red,
pink, purple, and so on,
are in the-- when we make
a Wikidata Query service, SPARQL Query.
Then we go around in the Wikidata graph,
and get this information here.
And we can also get translation
automatically,
even though it's not necessarily stated
within the Wikidata lexemes items.
For example, here, we have translated
rød to "red" in English,
and röd in Swedish, and so on.
There's not that very many there.
There's a range of other things here.
Let me show you,
for example, this one here--
this is veninde- now I go
over to this one here.
-inde, which is a feminine suffix.
So, this is auto-generated there,
it's a combination of "instance of"--
lexemes that are "instance of"
feminine suffixes.
And for example, for German,
we have [inaudible].
So, -in would be
a feminine suffix in German.
And I put in sort of the five Danish
feminine suffixes
of Danish.
Another facility is, for example,
if you have a text,
you can copy and paste it
into this Text to lexemes here.
Let me--
"a car crashed into...
a green house."
Let me change that to "English".
Press Submit.
Now, Ordia will then extract
each of the word here,
in this sentence here,
and try to see whether they
are entered in the specific form,
a lexeme, are entered in Wikidata.
And these simple words here
are entered in Wikidata.
But if we, for example, change it to--
there's nothing called "vancar"
but just let us do that here.
And you got down here--
it's as a blue link
that you can create a new
Wikidata lexeme item.
But the range of other things to explore
in this web application.
And if there's any suggestions,
or comments, or notes, or something,
you can contact me, or put in
an issue on GitHub.
So, this particular application
is developed on GitHub,
and I'm open for new ideas
and ways to represent information there.
Okay, thank you.
(applause)
Questions?
(woman 3) I love your tool.
Can you show the languages,
that which is awesome for me, I think,
to show other languages.
So, this is a bit of statistics
over the languages,
and the Russians
have been scraping Wictionary,
and that's why they have now
100,000 lexemes.
There's also a lot of work on Basque here.
I think there's an organization
putting that information in here.
And you can also see a graph of these--
this is Number of forms as functions
of number of lexemes.
And all the way up here--
here, this is Russian,
down here, Basque, I think.
And English, perhaps, down here.
And also in the Number of senses,
I think Basque, English, and Russian,
Hebrew, and so on.
Yeah.
(man 4) That looks
like an incredible tool.
But I was just wondering,
is it all fully live?
Is it all based on SPARQL Queries
and live or are there some things--
- Yes. I believe, yes.
- Fantastic.
But as they get more data into Wikidata,
there's a bit of an issue.
For example, for Russian here.
I started out this a year ago
when there's not that very many lexemes,
and so there was no problems
with the time-outs.
But representing it here--
but if I press Russian,
I think there might be some issues.
There's a count that works here,
for example, longest words or phrases.
But I think the lexemes
are sort of loading in.
I think I'll need to fix that
as Wikidata grows here.
As you see, there's a lot
of Russian nouns, apparently.
And I don't know whether the--
apparently, that's what
they're working on.
There seems also to be
a bit of time-out there.
[inaudible], oh, yes.
The first one there.
But apparently, the longest words
and phrases is a bit too expansive.
But apparently, it can be loaded there,
and it's probably--
it's loaded all the 100,000 there,
so you can click all 10,000 pages.
(host) If there aren't
any other questions--
The longest word came now.
So, it's, yeah.
Probably--
[inaudible]
What is that?
- (audience) It's a chemical.
- A chemical, yes.
(host) More questions? Or shall we?
Alright, alright. Thank you very much.
(applause)
(Nicolas) Is it good?
(host) Awesome.
Alright, now, to wrap it up,
we have Nicolas Vigneron,
talking about Wikisource and Wikidata.
(Nicolas) This is good?
Who knows Wikisource?
Yay!
More and more people
raising hands every year.
That's good.
So, this morning, [Lydia] said that
Wikivoyage was the first real user of--
[inaudible]
Wikisource is not that far behind.
There's a lot to do,
and I want to do some basic numbers,
statistics, about where we are,
and where I want to go.
So first, there will be a lot of questions
of what is a book,
what is bibliographical data.
People from the BnF can agree with me.
That can be a nightmare
if you go into details.
But some big numbers that--
Google Books tried to do an estimation
on how many "books," air quote books,
there is in the world,
and there's 130 million books
in the world.
And, yeah, let's put them all on Wikidata.
Or not. I don't know.
But where are we now?
And why is it books?
Because for Google Books,
everything is scanned, basically.
They don't have exactly
a very clear distinction.
There's sometimes, two-page books,
which [inaudible], Google Books is a book.
But for many people, you have to have
at least 50 pages to be a book.
So, that's always hard to count.
But here's what we know on Wikidata.
This the graph of what
is a book for Wikidata.
You have-- that's totally [inaudible]--
but that's Wikidata,
literary work as well.
And this is all the subclasses,
or subclasses of subclasses--
or subclasses of subclasses
of what is a book.
So, that's very hard to do.
I can do a graph like that,
but SPARQL Query engine doesn't work
if I want to count everything
that is instance of these subclasses,
and basically, SPARQL says no, time-out.
So, what's the problem?
But I know already that there's
a lot of subclasses,
but we need to look into it.
And probably, if you know Wikidata,
on the page, Wikidata point statistics,
you have all the numbers by big classes,
and you all probably know
that the big chunk here
is scholarly articles,
which is, thanks to
the WikiCite project, in particular,
which can be books or not,
depending on definition.
You see that there's no subclass books,
because there's not enough to show.
It's probably somewhere in the others,
the purple area is others.
And there's a lot of things
that's under one percent.
So, basically, we can say
that we have less one percent
of things identified as books in Wikidata.
Maybe there is more books,
but not identified as such.
I'm talking about books,
but when we are talking
about bibliographical data,
there's also the author, person,
so maybe some of the human here
are also authors, surely.
And we need to do another count,
which is another big query to do.
That times out, so--
I have a lot of not number
to this, sorry.
So, yeah, basically, this first slide
is about how it's complicated
to know how much we have of what,
and how to count them.
So, yeah, hard to count.
What we know--
that is we have a lot of properties--
700,000, I guess,
now on Wikidata.
We know that we have a lot of identifiers
among these properties.
And we know that almost 4,000
are properties for identifiers
relative to bibliographical,
like ID at the National Library of France,
National Library of Yaddi, Yaddi, Yada,
because we love identifier
of National Library on Wikidata.
So, we have almost all libraries,
national libraries and more.
So, we have a lot of properties.
I know that.
And we are widely used.
I know that, for instance,
BnF properties use--
BnF is National Library of France--
is used 1 million times--
OCOC, VIAF, or the big like that.
A lot of uses in Wikidata.
But it's not because we have
a lot of uses of various properties
in Wikidata that it's complete.
As Thibaud said, there's more
than 20 million books,
[inaudible], which is more as entities.
And we have only 1 million,
so we have 19 million still to do.
Also, what we know from the Wikidata side,
is that we have a good--
very quite active Wikidata project,
called WikiProject Books,
where we have a model we kind of agree on,
which is not always followed,
which is, again, a problem.
What is a book? You know it.
I only have five minutes,
so, I'll keep going.
And then, I'm a Wikisourcean,
so, Wikisourcer.
So, I wanted to know
the other way around
what is from Wikisource already,
because Wikisource is already
inside the Wikimedia project.
A lot of bibliographical records
and information.
So, in the 66 million items on Wikidata,
already 1 million are linked
to Wikisource.
[inaudible].
So, that's very few,
but that's quite a lot.
There's a lot of author.
There's some books, texts,
work, edition, whatever.
Not always well-arranged.
And there's a lot of internal pages,
like categories and templates,
and things like that.
But still, 1 million in total.
The Wikisource community
are often small communities,
like on the French community Wikisource,
which is one of the biggest,
there's 50 people.
That's the biggest we have.
So, we love Wikidata, because,
hey, they did a lot of work for us.
So, just take it from Wikisource.
So, in this small community,
we love to reuse Wikidata data.
Right now, we use a lot of a tool
which is called WEF--
Wikidata Edit Framework-- thank you.
And we are eager to see
how Wikidata Bridge will work.
And we are trying to do things
with a team in Wikidata
in Wikipedia Deutschland team,
[inaudible].
And there's a lot
of collaboration in the future
that we want to do: better integrate,
do everything in one click when you import
a first book in Wikisource,
things like that.
Better-- do links between
edition in Wikidata.
That needs to be done.
The Foundation is doing the wish list now,
and we have a lot of requests about that.
And yeah, that's it.
That was just a short overview.
So, if you have some questions,
I'll take them and be available later,
if you want to.
(applause)
Come on, you love Wikisource,
you have questions!
(woman 4) I asked you
already this in August,
and I wonder if this has already changed.
What is the biggest problem you have
in Wikisource right now,
from your perspective?
The first one, only? (chuckles)
I think because it's a small community,
we need efficient tools that work easily,
because we have very few people,
so we need tool that are easy to use
and a one-click solution
to [inaudible] a bit,
that's a big dream.
I think that's what's most important,
because that's the threshold
in Wikisource, it's a small community.
I think this is the most important.
[inaudible]
(man 5) I'm curious if you can speak
to your opinion,
or the French Wikisource opinion,
or maybe you spoke to other communities
about the notion of not including
metadata about all the world's books.
That was mentioned in the morning.
Maybe other Wikibases,
and other federated databases
will have that information,
and Wikidata won't.
How does that feel for Wikisource?
This is my very personal opinion.
I know that people
in the Wikisource community
disagree with that.
But I think we need to stay--
an external Wikibase
is not a good solution,
because we have Shakespeare on Wikisource,
and we have Shakespeare on Wikipedia.
So, we need to interlink,
and interlink is there.
Or like, Romeo and Juliet,
we have them both.
So, we are still pretty close
to Wikipedia.
And the difference with WikiCites--
with WikiCite, we have a lot of items
which are small.
Wikisource is the other way around.
We have few items, who are big.
Which can be a scaling problem
and everything,
but it's quite a small subset of data.
So, my personal opinion
is we should stay in the Wikidata.
Again, because we are not
very much a lot of people,
so we need to stay,
with the tool we know,
don't change too much the tools
for the small community, please.
So, that's it.
But I know that other people disagree.
You can talk to [Sadeep] if you want.
He will have another point of view.
Thank you. I think, last question, maybe.
(man 6) Sometimes, I find it difficult
to link the Wikidata item
with a Wikisource article,
because there's a Wikisource novel--
might be split over several pages,
and there's an index page,
and there's perhaps a front page,
or something like that.
Do you have that problem,
or is that a general problem, or--
Yeah, that's one of the first ideas
on the wish list
for the Foundation, actually.
Yeah, because Wikipedia is on the--
if you know the [inaudible] organization,
Wikipedia is on the work level,
and Wikisource on the edition level.
So, already, you have a problem there.
And then, we have several editions
of the same work,
and we have sub-chapters
and things inside the edition.
So, yeah, that's one too many problems
which is hard to solve by nature.
But there's maybe a tool
that can help to solve that.
Hopefully.
And that's time, ladies and gentlemen.
So, thank you very much, Nicolas.
(applause)
And please join me giving
one more round of applause
to all of our wonderful speakers.
(applause)