Hi, so before we start, quickly,
so I'm Jean-Fred,
I'm a Wikidata volunteer.
Hi, I am Envel,
and I'm also a Wikidata volunteer.
And I'm Tracy, and I get paid (chuckles)
to volunteer for Wikidata,
but I'm also enthusiastic
to be here today,
and I work for a research board.
Alright, thanks for coming
to our presentation:
Sum of all video games:
our road to make Wikidata
the hub of all video game metadata.
So, first off, why should
we even care about video games,
like aren't they just
like kids playing Fortnite
or something at night?
So video games
have been here for a long time,
since the '70s or '60s or '40s.
It depends what you ask.
You can check Wikipedia's
extensive coverage
of what is even a game.
It's a major cultural industry.
More than 2.5 billion people
play in the world,
and we estimate that, at the very least,
100,000-200,000 video games
have been published since that time
and that's not counting games
published on the Play Store--
then you go through the millions,
which is not that much
when you're on Wikidata.
So a little overview of the current state
of video games on Wikidata.
These numbers are also
on our poster on the ground floor,
so we can also have it there.
So we have video games or the Q7889,
and we have 38,000 of them,
which is not that much
considering that there are
at least 200,000, as I mentioned.
We also have expansion packs,
DLCs, and compilations
but we also have, for example,
game controllers.
We have a lot of game consoles,
about 700-- that's a lot.
We have an extensive ontology
of video game genres,
that's pretty cool, 200 of them,
and [inaudible] a bit on magazines also.
Maybe video games
could be a satellite even for WikiCite
I don't know. (chuckles)
But what about outside of Wikidata?
There are a lot of databases
out there about video games.
You may have heard about
some very big ones,
like Mobygames or IGDB.
There are also a lot
of very special-interest databases--
databases that only cover certain types.
Visual Novel Database
only has about this niche genre
that is a visual novel.
You have databases that are only about
games published on the Commodore 64,
and so on.
But you also have government agencies
and commercial players,
government agencies [inaudible],
called the rating agencies,
the ones that put a little label:
it's not good for your kids under 16.
The problem is that
there is no common identifier
around all of these databases
that binds them together.
There is no cross-linking,
or it is very little.
Some database might be linked
to their neighbor/friend's database,
like the Amiga database
talk to each other a little bit.
But you won't have
one easy way of saying all that.
So there are different
data coverage and specialization,
and that often comes
also with conceptual differences.
A database might consider
a game is a work,
if you're into the FRBR model,
or that might be an edition
or that might be
a particular console version.
So there is a lot of granularity in there.
And that's important in terms of coverage
because some databases--
for example, Mobygames has a lot
of information about a lot of things,
but it doesn't have a lot of information
about the games that were published
on the early French computers,
like the Oric
or the Thomson TO MO series.
You will find that
into more French databases.
And if you go into Eastern video games,
like China or Japan,
it's not very well covered
in Western databases.
Enter WikiProject video games.
(cheers and applause)
(woman) Whoo-hoo!
We didn't make that one, actually.
So it lives at that address
and there are a lot of subpages,
and we're going to go through a little bit
of what this project is made of.
As often, there is--
we'll separate that
in what's old and what's new
and what's borrowed and what's blue.
So, as old we have--
Like a lot of WikiProjects we have,
an ontology description
with all the properties.
There are currently 64 properties,
mostly for games,
but also about series or hardware.
And we have a fairly extensive, I think--
how to put it-- separations.
We have things about the staff,
but also about the narrative universe
or about the gameplay,
like how many players there are.
So you can explore this;
it's kind of very exciting.
We also have example queries.
If we have time at the end,
we might show off some,
but you can just explore them yourself.
We also have something new.
Because those things don't exist
in other WikiProjects and Wikidata.
For example, we have an Activity Log.
You can see it here.
On this Activity Log, we track
the activity of the project.
So when we publish a blog post
or an article somewhere,
we add it here.
When we create a new identifier property
or any property related to video games,
we also add it here.
We also have achievements,
like in January, we added a condition
of an external identifier.
Another thing that we do
is we have a Tasks List.
The Tasks List can be used
by newcomers to the project
to do things in the project.
It can be [inaudible],
so we give them an insight to [inaudible]
and how to do that.
It's also where we like [inaudible]
[inaudible]
We also have something borrowed.
We have a lot of pages
of statistics reports.
We also have external identifiers
that [inaudible]--
you can see it here--
where we track--
I don't know if you can see it--
but we have more than
100 external identifiers
for video games,
so this is big, huge.
And here we can see for each item here--
just a little peek.
And also the completion of the identifier.
So, some of these things we borrowed
from the Sum of all Paintings
and other things, that begins more blue.
So the InteGraality tool that was made
initially for Sum of all Paintings
I extended it for video games,
and then I might as well
have done it for everybody.
So, yeah, one day we'll get all of these.
So this is the core properties,
the genre/developer/publisher
along video game systems,
so Windows,
PlayStation console and so on.
So, as you can see,
we have a lot of work to do
for even like
the very basic core properties.
So, yeah, one day,
all of that will be blue.
What have we been doing?
Things that we've been doing a lot
has been creating identifiers
with all these external databases
and aligning them.
So Envel mentioned we have created
over 100 external identifier properties--
that covers very big databases
and very tiny ones.
We've been using the Mix'n'match tool
extensively for matching.
And sometimes we've been using things
a bit more advanced
that Envel will detail in a moment.
Yeah, so 100 external
identifier properties created
in roughly a year to two years
and over 16 Mix'n'match catalogs.
And I started tracking
how many Q7889 items
didn't have any identifiers,
and five months ago it was 15,000
and today we're down to 9,600,
which is very much thanks
to the teaching assistant of Tracy.
So there's still 9,000 to go,
but we're getting there.
So we needed to import a lot of data
to complete those identifiers.
The first tool to do that
is the Wikidata website.
I think it's important to say it
because it's where we can fix
the small problems, and so on.
But we also have dedicated tools
to do that on Wikidata.
There is Mix'n'match, and its gadget.
The Mix'n'match Wiki gadget
is a gadget that you can add
to your account in Wikidata,
and it adds all identifiers
from [inaudible] Mix'n'match to an item.
You can easily add serial IDs [inaudible].
Other tools...
There is QuickStatements, of course.
But you also can use
more general tools, like OpenRefine,
Dataiku Data Science Studio, et cetera.
The point is it's very important
for this project,
and I think for all projects in Wikidata,
to have a healthy ecosystem
of tools that works.
There are two examples of imports.
The first one is connecting
PCGamingWiki and Wikidata.
It was made by a volunteer.
He made his own program in Ruby,
so that's an example.
The second one
is linking the OLAC video game
vocabulary with Wikidata.
It was made using OpenRefine
and Mix'n'match,
and I think Tracy
can talk more about this one.
And I have a third example,
which is one I made.
I matched the catalog of BnF,
so it's Bibliothèque...
the French National Library
with Wikidata.
So they have about 4,000 entries
about video games in their catalog,
and I matched half of them to Wikidata.
So, for that, I made a project
in Dataiku Data Science Studio.
You can see the work [inaudible].
I will not detail it,
but if you have questions,
feel free to ask.
I also developed
a Dataiku plugin to do it,
to facilitate SPARQL querying
because it's not included in the tool.
One cool thing that happened
after this one
is that BnF contacted me
about this project.
So it was very cool to have feedback,
and that contact was established.
So, another topic, the link--
So we want Wikidata to be
the linking hub for video games.
As you can see here,
a video game is, as Jean-Fred said,
a video game is about a lot of things.
We have Reviews and Scores, Speedruns,
News, Library ID,
Soundtrack, etc.
We don't want all this data
to be in Wikidata,
we want this data
to be linked to Wikidata.
So we want Wikidata to be,
like [Lidia] said yesterday, a place--
We want to see Wikidata as a place you go,
and then you go to another place.
So I think that's it.
And as you can see by the links,
video games have a really lot
of aspects to research,
and video games are really
complex cultural artifacts.
There are [inaudible],
there are [ed ones],
remasters, re-releases, mods, updates,
download of content,
and so on and so forth.
Plenty of remakes or remastered editions
are separate items
at this stage in Wikidata,
but not necessarily.
Additionally, remakes are not often linked
to the original work
using the property based on.
And perhaps we should create
an entity schema for the video games,
but we are still in the process
to get a discussion started
for the data model of video games.
Mostly, we have one item,
what we typically recognize as "the game,"
when we say we played the same game,
so it's like a Mario Kart 6.
Even if we played it
on different platforms,
so, for example, on Switch,
on Wii U, or something else.
So Wikidata items
for a game aggregate characteristics
which are shared among
different versions or editions.
This makes linking not easy
because many databases
describe games on different levels,
as Jean-Frédéric mentioned.
For instance, some have
one database entry for each edition,
and this results
in more than one identifier
for each video game item.
And so the use
of specific qualifiers is needed.
We have some discussions thinking about
the creation of different editions items,
for editions or releases.
as this is good practice for literature,
but the FRBR model which is used for books
seems not useful for everyone.
This is also an ongoing discussion
with the video game research community
about the best data model for video games.
And speaking about video game research
and the research community,
there is an active video game
research community
with a growing interest
in data about games.
Sadly, there are no national libraries
for video games
which have a comprehensive dataset
with authority data about video games--
yes, the BnF with 4,000 video games,
but there's still more outside.
That means researchers rely on data
on video game fan databases,
but as we know, there are so many,
and there's so different [inaudible].
And what makes it even harder,
the data is not open.
So could Wikidata be a source
for video game research?
Yes.
I work for the research project diggr,
and we have decided to work with Wikidata
for our video game research,
and we not only use the data
which is already there,
we create data about video games
and companies by hand
or automatically, in Wikidata.
Additionally, we have created
about 20,000 links to Mobygames,
GameFAQs and the Japanese
Media Arts Database.
And we also initiated as an alignment
with the OLAC video game genre vocabulary.
So video game
research colleagues in Japan
are also experimenting with Wikidata
to use it as a work authority
for video games.
So, our research will cause
a lot of spatial data
about video game companies
and where video games
have been released all over the world.
So we use data for video game databases,
like Mobygames in Wikidata,
to create some analyses like this.
We call it Lemongrab, the tool,
and the researcher can select
one or more platforms
and one or more release countries
and he will get an overview
about which companies are big players.
In this case, the number of published
or developed video games
for this combination.
Additionally, they can see which country
is strongly represented
by these companies.
Or we use Wikidata Query Service directly
to create maps of companies
within the video game industry.
So, at this stage, I think
there are 5,000 video game companies
already in Wikidata
which we have created
half of them, I think. (chuckles)
So, in conclusion, after two years
of working with Wikidata for our research,
we are very pleased,
especially with the cooperation
with the volunteers
of the video game taskers.
Thank you for that.
And we think Wikidata can be
the one-stop shop for video game research
because it already aggregates
so many links to very specialized sites
and it is not realistic
that we put all the data into Wikidata.
Thank you.
At the same time, we want
to be useful for the researchers.
We also want to stay
or to be or to become,
however you want it,
useful to the Wikipedias.
Right now, some Wikipedias
are using the data
from Wikipedia for their infoboxes.
So if tomorrow we just revamp
the entire data model
in a way they can't use it anymore,
it doesn't sound like a great idea.
So we'll try not to do that.
I think we want to be
enhancing all the databases,
and that's something
that's already started.
So if you go to Visual Novel Database
right now at vndb.org,
the following research
workshop that we did
with the nice diggr folks
who could meet with the database,
and they were interested enough
with all the linkage that we made
that they could harvest more links
about the entity that they talk about.
Like, "Well, okay, thanks to Wikidata,
we also retrieved reviews or speedruns
or a store where you can buy these games.
So we're already being useful.
So that was a fine example.
But also this German researcher
just started the Internationale
Computerspielesammlung,
(chuckles)
which is online, which has all the data
about the German video games,
what they have in their collections,
and they've been using Wikidata
to enrich the data IDs for labels,
so they have alternate titles.
So that was also pretty cool.
I think Wikidata can be the backend
for powering applications.
So, an example
that already exists is vglist.co,
and in some ways a little bit similar
to what avante.io does for books,
vglist.co does it for video games.
It's an app where you can record
the games you've played,
how long you spend, and your favorites.
And I just really like the fact
that it's built on top of Wikidata.
It's pretty cool.
So maybe one day we can just connect
all these things together
and harvest SPARQL to query data,
and it really doesn't matter where it is,
and say, "Yeah, data is not a database,"
and that will be fine.
Thank you very much,
and we'll take questions.
(moderator) We just have
five minutes for questions.
(applause)
(man) Hello, I really love your project,
and when I want to contribute,
where should I go?
So there was short URL in there,
and as Envel mentioned,
there are tabs at the top with the links
to the SPARQL queries and so on.
And there is a Tasks,
which is like a couple of suggestions
on where to get started.
But it's not mandatory, you can work
on whatever you want, obviously.
But, yeah, that's a nice place.
And if you have a project,
you can also bring it to the Talk page.
It's not a very lively Talk page,
like a lot of Wikidata Project
Talk pages, in many ways,
but I will read and answer,
so that's a start.
Do you already have something in mind?
We can talk after this
if you have something in mind.
- Allons-y.
- (woman) Hi there.
So I work with a group
from University of Copenhagen
and University of Washington
who are working on an initiative
called Atari Women,
recognizing all the women
who've been involved through the years
with the Atari game system.
And so I'm wondering if--
I believe that your WikiProject
covers the developers,
the designers and such,
but obviously, it crosses
into the biography part of our world.
And so how does that work?
Is there someone
who's more specialized in that area
who these folks at these two universities
could connect with, or...
Thoughts?
I don't think there will be
somebody in particular.
My impression of the [inaudible] project
is that they are fairly eclectic.
Sometimes people specialize
on very specific niche topics.
In that case, I don't think so.
So I'll be happy to take the call.
So, to answer your question,
yes, that will definitely be
in the scope of our project.
And in that period, particularly,
I don't think we want to turn back
because these days video games
are made by like 1,000 people
and do we want to create an item
about every single person,
like the credit rolls of a movie, right?
So in modern times, I don't know
if we want to be that database,
the ultimate database of game credits.
But for the Atari early days--
oh, definitely,
I would actually love to see the dataset
because it's a lot of dudes
in common knowledge of...
- (woman) I'll connect you to that.
- Yes, please.
(laughter)
(moderator) Any other questions?
Sir, just in front of you.
(man 2) Do you collaborate
with the Internet Archive?
Because there's not a month going by
that Jason Scott doesn't post.
He's rescued 170,000
DOS games or stuff like that.
There are Internet Archives identifiers
on some game items,
which is a bit weird
because usually on the Internet Archive
there's going to be
a particular release of the game,
again on the difference...
Last time I checked there were four
or five Prince of Persia
on the Internet Archive
because they have the Apple II version
and the DOS version and so on.
So not explicitly.
In general, I think we probably want
to make some connections more general
with the video game preservation scene.
There is a quite lively organization
that work hard on video game preservation.
And I think Wikidata
can be a useful resource for them
because they don't have
to manage the metadata,
and they can focus
on managing other things.
Do you have something to add to that?
No.
[inaudible], perhaps?
(man 3) I had the same question.
(laughter)
Perfect.
(moderator) There was
one more question back here.
No, probably I hallucinated. Sorry.
For one minute, we can show a query.
Or not.
(moderator) You have 30 seconds.
Will the Query Service [inaudible]?
We have links in the PDF, [inaudible]?
(man 4) If there's still time,
I have a question.
Yes, please.
During your presentation, did you notice
that some of the identifiers
have more than 100% [inaudible]?
Yeah, it's because the examples--
so that reason, one of the users,
for example, itself,
because they use [inaudible] as examples.
And also sometimes
because there are broad matches.
So if it says something that's a bit--
So, yeah, that's one
of my favorite-- if I can scroll it--
it's the characters of the Mario franchise
linked to their games.
(chuckles)
So you can find like Wario
and Princess Peach, and so on.
And my favorite is--
if you look somewhere, yes,
because there is Mario somewhere here,
and there is Dr. Mario.
And if you look at the item,
it's said to be the same as--
because Mario plumber and Mario physician
might be two different people,
we don't really know.
(laughter)
(moderator) Thank you very much
for this presentation.
(applause)