(host) ...this session
basically on time as well.
So yeah, this is the Inventaire guys.
And yeah, enjoy.
Thank you for being here.
We'll be presenting Inventaire
very quickly.
We would like to go in-depth more
on how we work day to day
with Wikidata shifting data.
A quick demo of...
This is Inventaire,
I hope a lot of people know it already.
It's in order to share books,
physical books
and everyone can just scan the ISBN
and share, lend, sell.
We're just making a relationship platform
in order to exchange books.
And of course,
it's reusing Wikidata's data.
That's the first part of the project.
The second part is
the Wikidata-federated
open bibliographic databse
that we've been building
for a long time now,
five years, something like that.
There you go.
So we basically reuse Wikidata's items
that are in a local [inaudible]
that we regulate the updates,
and we also add extra items
to our local database
in order to comply with books
that are not already on Wikidata.
And we are using a very
similar data model
in order to comply to Wikidata's data.
This is a brand new entities map
which basically describes
how we model the data in Inventaire
and we go back after
on the typing system
that we are strictly enforcing
compared to what Wikidata is doing.
Because, yeah, we do like ontologies,
we do like to talk about semantic,
but we are kind of dealing
with reality here
and we need to strictly type
what we are dealing with
especially with ontologies
and inheritance that have some troubles...
We have some troubles to comply to.
If you want to add something on that.
We'll get into the details after.
We have a preview of what that meant--
Which is really the part
where people start to scream
because the way we do typing...
We have different types of entities
we deal with in Inventaire
which are mainly works,
editions, series, and humans.
And the thing is that,
to answer the question
is this item in Wikidata a series,
a work, an edition?
You have different ways to do that
but none of them are both perfect
and fast, efficient and so the way
we do that at the moment
is that we have those lists of aliases
of what our owners...
That's the list of properties,
all those properties...
Well, I'm sorry that's the wrong type
That's the aliases,
the things we consider being humans
are P31Q5 obviously
but also things that are duo, sibling duo,
writer, house name, pseudonym
we consider that when we encounter
those things in an entity that's a human.
And humans are actually the simplest ones
because P41Q5 is one
of the most consistent way
to type an entity in Wikidata
and that's great
and we want more of that.
But look at works, this is all
the things we consider to be works,
so book, Q5 and bla, bla, bla
and all those comic book,
comic strip, novella, graphic novel.
So every time you see those P41s,
we consider them to be works.
And yeah, we could go more on those
but that's our hack to make typing work.
So now the problem in a way...
Wikadata is awesome, we all agree on that.
The only thing is that
it might change over time
and so how do we keep track of those
changes in our own local data model?
So those are the things that could
be redefined on the fly, overnight.
They are discussions I know,
yet sometimes,
it's possible that we do have trouble
to redefine...
with properties that could be redefined.
For example that happened
a couple of--maybe six months ago
or something like that.
I think it's more than a year ago
but we have been considering it.
Between the time we detected this problem
and started to think about taking
measures about that six months ago.
So all of a sudden,
this property became...
Not all of a sudden
but we found it all of a sudden.
...became from languages,
what's exactly that [inaudible]
So this property was language
of edition...
- What was it?
- That was the property we were using
for edition's language and at some point
it was decided that there were too many
properties to talk about language,
so this one was transformed
into original language of film or TV show.
So we started having our data
be about TV show all of a sudden
and that was wrong.
So the conclusion of this example
is don't recycle properties, please.
Other examples of shifting data--
maybe we don't have that much time
to go through all of it.
There's some examples like when work
became editions on entities,
on items--that's also quite tricky
because then,
how do we categorize it again?
So we are strictly typing
and that helps us...
There are advantages to that,
lots of advantages.
It is simplified world that we live in
It's not as complex
as Wikidata reality is showing.
So every edition has
at least one associated work
that is something that we can rely on.
So if a work become an edition,
then that sometimes have a problem.
Editions data cannot be added
on a work then.
We are strict about that
even if Wikidata is not.
So we are enforcing a policy that
we would like to have on Wikidata
but we are only a small part
of a bigger system.
We have done autocomplete.
Well, that's something
we have demonstrated other times
but the idea for example when
we have genre properties,
we will just suggest--like the user
will start to type a genre
and only your genre will be suggested
so it makes... it's like a simplified...
because we are strictly typing
we can have less weird input for the user
which is very interesting for us
because our users are not aware
of Wikidata
and all the things that are in Wikidata,
so we sort of simplify everything
as much as we can
but that's at the cost of flexibility.
The flexibility of Wikidata
is lost in this process.
So for instance...
Oh that's soon.
So we have to go faster maybe.
One of the cases that was,
how do we...
The simplified typing system
is at the cost of how do we get--
we have the list of aliases of types
we saw earlier,
but sometimes we don't have all the types
so for example when we encounter
a P31 science fiction trilogy,
if we didn't have it in our alias list
that's breaking the system
and so we are back at this problem.
There are different ways to work on that
and so that's trying to make...
In suggestion, we talked about
that problem in our earlier presentation
and we were told,
"Yeah, you should do that with SPARQL."
Yes, that could mean asking for
is this entity an instance of some way
or subclass of written work,
these kind of things.
And that's a very expensive query
and we can't do that for everything.
So that's why we have these aliases.
And also we have...
This lost flexibility,
for the sake of simplicity
we lose flexibility and that's how
we have this for examples maybe.
Yeah it's quite obvious
we cannot do much about it.
There are more than humans
that authors books.
For example, there are collectives
and this is not yet taken
into account in Inventaire.
And editions can be a whole series,
lots of different possibilities
that do not fit into this reality.
That is the world.
Maybe going fast.
On the list of issues,
we have also these querying issues,
different strategies to try
to be both efficient
and complete in the way we find all
the works of a given author.
And we can't go over all those subclasses
because that's too expensive
but at the same time, we can't just ask
for all the items have a P50
and an author because editions
also have this P50.
And so, yeah,
that's the kind of problems we have.
On the ideas we are playing with
we have what was...
yeah, the concept of extending entities
locally could be a solution
to some of the probelms we presented.
It was mentioned earlier as shadow items
but maybe it's not such a good name,
so we will call it
"extend entities locally"
which means adding statements on item
that is on Wikidata locally
but without overwriting
because that would be crazy.
That would have some problems though
because for copyrighted work,
then we could actually work with that
which is not possible through Wikidata.
We also do not have to agree
with Wikidata's community
in order to enforce our schemas.
We can also add links on
non-Wikidata items from a Wikidata item.
But the [inaudible] is quite huge.
We have to follow Wikidata's algorithm
in order to make it compatible.
And it's problematic
for pushing data to Wikidata
if we lack some information.
We also reference,
we would like to push it in order to...
go through it, sorry.
Actually it's quite the end.
- This one?
- No, no, please, please.
(laughs) So to keep updated
we have to sometimes
make mass updates of Inventaire data,
and that's also the occasion
of great scripting,
and that's not always elegant
but at least it's happening.
So we need to make this great
to transition
and not to make
this language of TV show for instance.
Yes, and maybe for a final note
on an argument for a small Wikidata
because we have problems
with the query service update time
which has been mentioned
a few times, and this is...
It seems to be due to the big ambition
of Wikidata to cover all sort of items
including scientific articles
and so many scientific articles.
Maybe they are not the only ones to blame
but we end up having this large delay
between an update and the propagation
on the Wikidata query service,
and that's a problem for us
because for example you will have
a user modify--adding,
connecting your work
with an author on Wikidata
and then going to the author page
and expecting that their contribution
be visible on the author page
and they won't see it happening
and so, we need to find ways to tell them
that's it's going to be propagated
but we don't know when.
And then we have the problems of
we cache the request
to get the author data,
and we don't know when we shall update
this cached version of the query.
So that's the kind of problems we have
with the query delays, the query service.
And so having a smaller Wikidata
could maybe helping us to not have
to deal with this problem
because then we could just consider
that the update will be quick
and that we can just maybe, in a few,
less than ten minutes update our version
and at least be close
to what people contributed.
That's it.
Thanks for listening.
If you have any questions or comments
we'll be happy to talk now
or after also during the event,
and we'll be here also on Sunday
to talk about Wikibase.
And if you have questions, yeah?
(host) Why don't you give these guys
a round of applause.
Meanwhile we can look at the map.
(host) We have quite
a generous question time
because these guys have finished
with plenty of time to go
so lots of questions.
Yeah, the idea was put on the table
pain we encounter
in daily life with working out of Wikidata
and to have your ideas and comments
and how much you shared
of those pain points,
and what solution also you might
have found to tackle them.
Also more general questions is possible.
(host) I'm going to go
to the chap in the front
and then we'll go backwards as we go.
(man) I guess first off, it was very
therapeutic to hear that all this pain
I've encountered personally,
it's like "Oh yes, it's not just me."
But one thing I've encountered
with the schema issues is that,
yes, my go to approach
is always just like
Oh, let's just find all subclasses
of a specific instance
to solve this in it.
I've encountered
a lot of the issues you have
though it seems like going
in the reverse
has helped solve that issue
at least for my use cases,
for instance I noticed that
all humans were instances
of a manufactured component
and I just said, "Okay, let me find
all classes that instance of a human is,
and this helped me go through
and like find these errors in schema
of subclass relationships,
and I was wondering if you
had gone through any of these processes?
Were there any other approaches,
more to this, to find errors?
Yeah, we went through
some trial and error there
and we encountered things like...
We have this very important distinction
between editions and works,
but at some point,
editions were a subclass of something
that was a subclass of works
and so the separation was falling apart
and so that's the example of one
of the things that were modified
because someone was thinking the world
was different than what it was.
And so that's how we were approached
to this more blacklist, whitelist system
like how good it types list.
Yes, we are also coming from the editions
and from the ISBNs of people's books
then, we have to go upward in the classes
in order to find out that somehow
this edition inherits from the work
and then how do you do that?
Like that's very problematic.
(man) I guess I have some SPARQL queries
that might be useful for that.
It just generates a nice graph
of the instance of subclass.
I didn't write it.
Someone wrote it for me
when I described this problem
so I can't take credit
but it might be useful for that.
But without cyclic problems,
like how do you deal with it
when there are inconsistencies
or things that are like editions,
instance of work and... ?
(man) I think just visualizing it
and it's very easy...
No, it's not very easy, it's possible
to then find these inconsistencies.
But I also think there
are loops in Wikidata,
for instance a concept
is itself an instance of concept.
It's not a useful subclass or instance of
but it's a valid one.
But do you generate this map
and see if there are errors
- but use the results other than--
- (man) I guess this was a...
Oh, I noticed that Douglas Adams
in an instance of this
and there are these errors
and then like just pitching it
to the communities, like fix this problem.
But you don't use
this subclasses query live?
(man) No, not live, no.
It was more of a debugging
and then realizing
it was small enough to fix.
(man 2) I suppose I'll just comment about
the issues around books,
so I would say that we should
never use a book as an instance,
and we should try to move books'
instances either to works or editions,
and perhaps you can agree to that.
And furthermore,
when I come to this, a book instance,
I would say that perhaps sometimes
rather than converting to a work,
a literary work, I would convert it
to an edition instead.
For example if it has ISBN numbers,
I would say that it's more
like an edition
and perhaps meant like an edition.
Also, for example if it's cited,
I suppose typically in citations
you are citing an edition
rather than a work.
But I imagine according to you that
this way could generate problems for you,
so once you'd rather sort of convert
the book to a work
and then create an instance of...
...an instance of an edition
instead of that
and move, for example, the ISBN numbers
and perhaps other identifiers
to that item.
We have seen the different criteria
depending on who was wanting
to make the separation.
So people who are interested
in the citation
or coming from Wikisource
want to convert pretty much
everything to editions
and people who are more interested
in the works as abstract categories
for the editions try
to convert everything to works.
And because in the case you described,
rather than considering that something
with an ISBN is rather an edition,
I will delete the ISBN
and consider it a work,
and in the case what we see often is that
there are Wikipedia articles
on those items
and those Wikipedia articles
don't talk about a specific edition
but about the concept of the work more.
And so these are the kind of problems
that are discussed in WikiProject books
and we are not seeing the end of it
and that's why for the moment...
(man 2) I want to say that if it's...
If there's a Wikipedia article
about the book,
then it should be a work, the item.
Lots of them have ISBNs
also on the Wikipedia page.
(man 2) I suppose that should
then be removed
perhaps over to the edition item.
It would be nice to have
a consensus on that.
It's an ongoing discussion
on WikiProject books, I guess.
(host) We have time for maybe
just one more quick question.
Well, if you'd like to show
your appreciation again for these guys,
that would be great.