-
(host) ...this session
basically on time as well.
-
So yeah, this is the Inventaire guys.
-
And yeah, enjoy.
-
(laughter)
-
Thank you for being here.
-
We'll be presenting Inventaire
very quickly.
-
We would like to go in-depth more
on how we work day to day
-
with Wikidata shifting data.
-
A quick demo of...
-
This is Inventaire,
I hope a lot of people know it already.
-
It's in order to share books,
physical books
-
and everyone can just scan the ISBN
-
and share, lend, sell.
-
We're just making a relationship platform
in order to exchange books.
-
And of course,
it's reusing Wikidata's data.
-
That's the first part of the project.
-
The second part is
-
the Wikidata-federated
open bibliographic databse
-
that we've been building
for a long time now,
-
five years, something like that.
-
There you go.
-
So we basically reuse Wikidata's items
that are in a local [inaudible]
-
that we regulate the updates,
-
and we also add extra items
to our local database
-
in order to comply with books
that are not already on Wikidata.
-
And we are using a very
similar data model
-
in order to comply to Wikidata's data.
-
This is a brand new entities map
-
which basically describes
how we model the data in Inventaire
-
and we go back after
-
on the typing system
that we are strictly enforcing
-
compared to what Wikidata is doing.
-
Because, yeah, we do like ontologies,
we do like to talk about semantic,
-
but we are kind of dealing
with reality here
-
and we need to strictly type
what we are dealing with
-
especially with ontologies
and inheritance that have some troubles...
-
We have some troubles to comply to.
-
If you want to add something on that.
-
We'll get into the details after.
-
We have a preview of what that meant--
-
Which is really the part
where people start to scream
-
because the way we do typing...
-
We have different types of entities
we deal with in Inventaire
-
which are mainly works,
editions, series, and humans.
-
And the thing is that,
to answer the question
-
is this item in Wikidata a series,
a work, an edition?
-
You have different ways to do that
but none of them are both perfect
-
and fast, efficient and so the way
we do that at the moment
-
is that we have those lists of aliases
of what our owners...
-
That's the list of properties,
all those properties...
-
Well, I'm sorry that's the wrong type
-
That's the aliases,
the things we consider being humans
-
are P31Q5 obviously
but also things that are duo, sibling duo,
-
writer, house name, pseudonym
-
we consider that when we encounter
those things in an entity that's a human.
-
And humans are actually the simplest ones
-
because P41Q5 is one
of the most consistent way
-
to type an entity in Wikidata
and that's great
-
and we want more of that.
(laughs)
-
But look at works, this is all
the things we consider to be works,
-
so book, Q5 and bla, bla, bla
and all those comic book,
-
comic strip, novella, graphic novel.
-
So every time you see those P41s,
we consider them to be works.
-
And yeah, we could go more on those
but that's our hack to make typing work.
-
So now the problem in a way...
-
Wikadata is awesome, we all agree on that.
-
The only thing is that
it might change over time
-
and so how do we keep track of those
changes in our own local data model?
-
So those are the things that could
be redefined on the fly, overnight.
-
They are discussions I know,
yet sometimes,
-
it's possible that we do have trouble
-
to redefine...
-
with properties that could be redefined.
-
For example that happened
a couple of--maybe six months ago
-
or something like that.
-
I think it's more than a year ago
but we have been considering it.
-
Between the time we detected this problem
-
and started to think about taking
measures about that six months ago.
-
So all of a sudden,
this property became...
-
Not all of a sudden
but we found it all of a sudden.
-
...became from languages,
-
what's exactly that [inaudible]
-
So this property was language
of edition...
-
- What was it?
- That was the property we were using
-
for edition's language and at some point
-
it was decided that there were too many
properties to talk about language,
-
so this one was transformed
into original language of film or TV show.
-
So we started having our data
be about TV show all of a sudden
-
and that was wrong.
-
So the conclusion of this example
is don't recycle properties, please.
-
(laughter)
-
Other examples of shifting data--
maybe we don't have that much time
-
to go through all of it.
-
There's some examples like when work
-
became editions on entities,
-
on items--that's also quite tricky
-
because then,
how do we categorize it again?
-
So we are strictly typing
-
and that helps us...
-
There are advantages to that,
lots of advantages.
-
It is simplified world that we live in
-
It's not as complex
as Wikidata reality is showing.
-
So every edition has
at least one associated work
-
that is something that we can rely on.
-
So if a work become an edition,
then that sometimes have a problem.
-
Editions data cannot be added
on a work then.
-
We are strict about that
even if Wikidata is not.
-
So we are enforcing a policy that
we would like to have on Wikidata
-
but we are only a small part
of a bigger system.
-
We have done autocomplete.
-
Well, that's something
we have demonstrated other times
-
but the idea for example when
we have genre properties,
-
we will just suggest--like the user
will start to type a genre
-
and only your genre will be suggested
so it makes... it's like a simplified...
-
because we are strictly typing
-
we can have less weird input for the user
-
which is very interesting for us
-
because our users are not aware
of Wikidata
-
and all the things that are in Wikidata,
-
so we sort of simplify everything
as much as we can
-
but that's at the cost of flexibility.
-
The flexibility of Wikidata
is lost in this process.
-
So for instance...
Oh that's soon.
-
(laughs)
So we have to go faster maybe.
-
One of the cases that was,
how do we...
-
Wait.
-
The simplified typing system
is at the cost of how do we get--
-
we have the list of aliases of types
we saw earlier,
-
but sometimes we don't have all the types
-
so for example when we encounter
a P31 science fiction trilogy,
-
if we didn't have it in our alias list
that's breaking the system
-
and so we are back at this problem.
-
There are different ways to work on that
and so that's trying to make...
-
In suggestion, we talked about
that problem in our earlier presentation
-
and we were told,
"Yeah, you should do that with SPARQL."
-
Yes, that could mean asking for
-
is this entity an instance of some way
or subclass of written work,
-
these kind of things.
-
And that's a very expensive query
and we can't do that for everything.
-
So that's why we have these aliases.
-
And also we have...
-
This lost flexibility,
for the sake of simplicity
-
we lose flexibility and that's how
we have this for examples maybe.
-
Yeah it's quite obvious
we cannot do much about it.
-
There are more than humans
that authors books.
-
For example, there are collectives
-
and this is not yet taken
into account in Inventaire.
-
And editions can be a whole series,
-
lots of different possibilities
that do not fit into this reality.
-
That is the world.
-
Maybe going fast.
-
On the list of issues,
we have also these querying issues,
-
different strategies to try
to be both efficient
-
and complete in the way we find all
the works of a given author.
-
And we can't go over all those subclasses
because that's too expensive
-
but at the same time, we can't just ask
for all the items have a P50
-
and an author because editions
also have this P50.
-
And so, yeah,
that's the kind of problems we have.
-
On the ideas we are playing with
we have what was...
-
yeah, the concept of extending entities
locally could be a solution
-
to some of the probelms we presented.
-
It was mentioned earlier as shadow items
but maybe it's not such a good name,
-
so we will call it
"extend entities locally"
-
which means adding statements on item
that is on Wikidata locally
-
but without overwriting
because that would be crazy.
-
That would have some problems though
because for copyrighted work,
-
then we could actually work with that
which is not possible through Wikidata.
-
We also do not have to agree
with Wikidata's community
-
in order to enforce our schemas.
-
We can also add links on
non-Wikidata items from a Wikidata item.
-
But the [inaudible] is quite huge.
-
We have to follow Wikidata's algorithm
in order to make it compatible.
-
And it's problematic
for pushing data to Wikidata
-
if we lack some information.
-
We also reference,
we would like to push it in order to...
-
go through it, sorry.
-
Actually it's quite the end.
-
- This one?
- No, no, please, please.
-
(laughs) So to keep updated
we have to sometimes
-
make mass updates of Inventaire data,
-
and that's also the occasion
of great scripting,
-
and that's not always elegant
but at least it's happening.
-
So we need to make this great
to transition
-
and not to make
this language of TV show for instance.
-
Yes, and maybe for a final note
-
on an argument for a small Wikidata
-
because we have problems
with the query service update time
-
which has been mentioned
a few times, and this is...
-
It seems to be due to the big ambition
of Wikidata to cover all sort of items
-
including scientific articles
and so many scientific articles.
-
Maybe they are not the only ones to blame
-
but we end up having this large delay
-
between an update and the propagation
on the Wikidata query service,
-
and that's a problem for us
-
because for example you will have
a user modify--adding,
-
connecting your work
with an author on Wikidata
-
and then going to the author page
and expecting that their contribution
-
be visible on the author page
and they won't see it happening
-
and so, we need to find ways to tell them
that's it's going to be propagated
-
but we don't know when.
-
And then we have the problems of
-
we cache the request
to get the author data,
-
and we don't know when we shall update
this cached version of the query.
-
So that's the kind of problems we have
with the query delays, the query service.
-
And so having a smaller Wikidata
-
could maybe helping us to not have
to deal with this problem
-
because then we could just consider
that the update will be quick
-
and that we can just maybe, in a few,
-
less than ten minutes update our version
-
and at least be close
to what people contributed.
-
That's it.
-
Thanks for listening.
-
If you have any questions or comments
we'll be happy to talk now
-
or after also during the event,
-
and we'll be here also on Sunday
to talk about Wikibase.
-
And if you have questions, yeah?
-
(host) Why don't you give these guys
a round of applause.
-
(applause)
-
Meanwhile we can look at the map.
-
(host) We have quite
a generous question time
-
because these guys have finished
with plenty of time to go
-
so lots of questions.
-
Yeah, the idea was put on the table
pain we encounter
-
in daily life with working out of Wikidata
-
and to have your ideas and comments
-
and how much you shared
of those pain points,
-
and what solution also you might
have found to tackle them.
-
Also more general questions is possible.
-
(host) I'm going to go
to the chap in the front
-
and then we'll go backwards as we go.
-
(man) I guess first off, it was very
therapeutic to hear that all this pain
-
I've encountered personally,
it's like "Oh yes, it's not just me."
-
(laughter)
-
But one thing I've encountered
with the schema issues is that,
-
yes, my go to approach
is always just like
-
Oh, let's just find all subclasses
of a specific instance
-
to solve this in it.
-
I've encountered
a lot of the issues you have
-
though it seems like going
in the reverse
-
has helped solve that issue
-
at least for my use cases,
for instance I noticed that
-
all humans were instances
of a manufactured component
-
and I just said, "Okay, let me find
all classes that instance of a human is,
-
and this helped me go through
and like find these errors in schema
-
of subclass relationships,
-
and I was wondering if you
had gone through any of these processes?
-
Were there any other approaches,
more to this, to find errors?
-
Yeah, we went through
some trial and error there
-
and we encountered things like...
-
We have this very important distinction
between editions and works,
-
but at some point,
editions were a subclass of something
-
that was a subclass of works
and so the separation was falling apart
-
and so that's the example of one
of the things that were modified
-
because someone was thinking the world
was different than what it was.
-
And so that's how we were approached
to this more blacklist, whitelist system
-
like how good it types list.
-
Yes, we are also coming from the editions
and from the ISBNs of people's books
-
then, we have to go upward in the classes
in order to find out that somehow
-
this edition inherits from the work
and then how do you do that?
-
Like that's very problematic.
-
(man) I guess I have some SPARQL queries
that might be useful for that.
-
It just generates a nice graph
of the instance of subclass.
-
I didn't write it.
-
Someone wrote it for me
when I described this problem
-
so I can't take credit
but it might be useful for that.
-
But without cyclic problems,
-
like how do you deal with it
when there are inconsistencies
-
or things that are like editions,
instance of work and... ?
-
(man) I think just visualizing it
and it's very easy...
-
No, it's not very easy, it's possible
to then find these inconsistencies.
-
But I also think there
are loops in Wikidata,
-
for instance a concept
is itself an instance of concept.
-
It's not a useful subclass or instance of
but it's a valid one.
-
But do you generate this map
and see if there are errors
-
- but use the results other than--
- (man) I guess this was a...
-
Oh, I noticed that Douglas Adams
in an instance of this
-
and there are these errors
and then like just pitching it
-
to the communities, like fix this problem.
-
But you don't use
this subclasses query live?
-
(man) No, not live, no.
-
It was more of a debugging
-
and then realizing
it was small enough to fix.
-
(man 2) I suppose I'll just comment about
the issues around books,
-
so I would say that we should
never use a book as an instance,
-
and we should try to move books'
instances either to works or editions,
-
and perhaps you can agree to that.
-
And furthermore,
when I come to this, a book instance,
-
I would say that perhaps sometimes
rather than converting to a work,
-
a literary work, I would convert it
to an edition instead.
-
For example if it has ISBN numbers,
-
I would say that it's more
like an edition
-
and perhaps meant like an edition.
-
Also, for example if it's cited,
I suppose typically in citations
-
you are citing an edition
rather than a work.
-
But I imagine according to you that
this way could generate problems for you,
-
so once you'd rather sort of convert
the book to a work
-
and then create an instance of...
-
...an instance of an edition
instead of that
-
and move, for example, the ISBN numbers
-
and perhaps other identifiers
to that item.
-
We have seen the different criteria
-
depending on who was wanting
to make the separation.
-
So people who are interested
in the citation
-
or coming from Wikisource
-
want to convert pretty much
everything to editions
-
and people who are more interested
in the works as abstract categories
-
for the editions try
to convert everything to works.
-
And because in the case you described,
-
rather than considering that something
with an ISBN is rather an edition,
-
I will delete the ISBN
and consider it a work,
-
and in the case what we see often is that
-
there are Wikipedia articles
on those items
-
and those Wikipedia articles
don't talk about a specific edition
-
but about the concept of the work more.
-
And so these are the kind of problems
that are discussed in WikiProject books
-
and we are not seeing the end of it
and that's why for the moment...
-
(man 2) I want to say that if it's...
-
If there's a Wikipedia article
about the book,
-
then it should be a work, the item.
-
Lots of them have ISBNs
also on the Wikipedia page.
-
(man 2) I suppose that should
then be removed
-
perhaps over to the edition item.
-
It would be nice to have
a consensus on that.
-
It's an ongoing discussion
on WikiProject books, I guess.
-
(host) We have time for maybe
just one more quick question.
-
Excellent!
-
Well, if you'd like to show
your appreciation again for these guys,
-
that would be great.
-
(applause)