-
Hi, so before we start, quickly,
-
so I'm Jean-Fred,
I'm a Wikidata volunteer.
-
Hi, I am Envel,
-
and I'm also a Wikidata volunteer.
-
And I'm Tracy, and I get paid (chuckles)
to volunteer for Wikidata,
-
but I'm also enthusiastic
to be here today,
-
and I work for a research board.
-
Alright, thanks for coming
to our presentation:
-
Sum of all video games:
-
our road to make Wikidata
the hub of all video game metadata.
-
So, first off, why should
we even care about video games,
-
like aren't they just
like kids playing Fortnite
-
or something at night?
-
So video games
have been here for a long time,
-
since the '70s or '60s or '40s.
-
It depends what you ask.
-
You can check Wikipedia's
extensive coverage
-
of what is even a game.
-
It's a major cultural industry.
-
More than 2.5 billion people
play in the world,
-
and we estimate that, at the very least,
-
100,000-200,000 video games
have been published since that time
-
and that's not counting games
published on the Play Store--
-
then you go through the millions,
-
which is not that much
when you're on Wikidata.
-
So a little overview of the current state
of video games on Wikidata.
-
These numbers are also
on our poster on the ground floor,
-
so we can also have it there.
-
So we have video games or the Q7889,
-
and we have 38,000 of them,
-
which is not that much
-
considering that there are
at least 200,000, as I mentioned.
-
We also have expansion packs,
DLCs, and compilations
-
but we also have, for example,
game controllers.
-
We have a lot of game consoles,
about 700-- that's a lot.
-
We have an extensive ontology
of video game genres,
-
that's pretty cool, 200 of them,
-
and [inaudible] a bit on magazines also.
-
Maybe video games
could be a satellite even for WikiCite
-
I don't know. (chuckles)
-
But what about outside of Wikidata?
-
There are a lot of databases
out there about video games.
-
You may have heard about
some very big ones,
-
like Mobygames or IGDB.
-
There are also a lot
of very special-interest databases--
-
databases that only cover certain types.
-
Visual Novel Database
only has about this niche genre
-
that is a visual novel.
-
You have databases that are only about
games published on the Commodore 64,
-
and so on.
-
But you also have government agencies
and commercial players,
-
government agencies [inaudible],
called the rating agencies,
-
the ones that put a little label:
it's not good for your kids under 16.
-
The problem is that
there is no common identifier
-
around all of these databases
-
that binds them together.
-
There is no cross-linking,
or it is very little.
-
Some database might be linked
to their neighbor/friend's database,
-
like the Amiga database
talk to each other a little bit.
-
But you won't have
one easy way of saying all that.
-
So there are different
data coverage and specialization,
-
and that often comes
also with conceptual differences.
-
A database might consider
a game is a work,
-
if you're into the FRBR model,
-
or that might be an edition
-
or that might be
a particular console version.
-
So there is a lot of granularity in there.
-
And that's important in terms of coverage
-
because some databases--
-
for example, Mobygames has a lot
of information about a lot of things,
-
but it doesn't have a lot of information
-
about the games that were published
on the early French computers,
-
like the Oric
or the Thomson TO MO series.
-
You will find that
into more French databases.
-
And if you go into Eastern video games,
like China or Japan,
-
it's not very well covered
in Western databases.
-
Enter WikiProject video games.
-
(cheers and applause)
-
(woman) Whoo-hoo!
-
We didn't make that one, actually.
-
So it lives at that address
-
and there are a lot of subpages,
-
and we're going to go through a little bit
of what this project is made of.
-
As often, there is--
-
we'll separate that
in what's old and what's new
-
and what's borrowed and what's blue.
-
So, as old we have--
-
Like a lot of WikiProjects we have,
-
an ontology description
with all the properties.
-
There are currently 64 properties,
mostly for games,
-
but also about series or hardware.
-
And we have a fairly extensive, I think--
-
how to put it-- separations.
-
We have things about the staff,
-
but also about the narrative universe
-
or about the gameplay,
like how many players there are.
-
So you can explore this;
it's kind of very exciting.
-
We also have example queries.
-
If we have time at the end,
we might show off some,
-
but you can just explore them yourself.
-
We also have something new.
-
Because those things don't exist
in other WikiProjects and Wikidata.
-
For example, we have an Activity Log.
-
You can see it here.
-
On this Activity Log, we track
the activity of the project.
-
So when we publish a blog post
or an article somewhere,
-
we add it here.
-
When we create a new identifier property
-
or any property related to video games,
-
we also add it here.
-
We also have achievements,
-
like in January, we added a condition
-
of an external identifier.
-
Another thing that we do
is we have a Tasks List.
-
The Tasks List can be used
by newcomers to the project
-
to do things in the project.
-
It can be [inaudible],
-
so we give them an insight to [inaudible]
-
and how to do that.
-
It's also where we like [inaudible]
-
[inaudible]
-
We also have something borrowed.
-
We have a lot of pages
of statistics reports.
-
We also have external identifiers
that [inaudible]--
-
you can see it here--
-
where we track--
-
I don't know if you can see it--
-
but we have more than
100 external identifiers
-
for video games,
-
so this is big, huge.
-
And here we can see for each item here--
-
just a little peek.
-
And also the completion of the identifier.
-
So, some of these things we borrowed
from the Sum of all Paintings
-
and other things, that begins more blue.
-
So the InteGraality tool that was made
initially for Sum of all Paintings
-
I extended it for video games,
-
and then I might as well
have done it for everybody.
-
So, yeah, one day we'll get all of these.
-
So this is the core properties,
the genre/developer/publisher
-
along video game systems,
-
so Windows,
PlayStation console and so on.
-
So, as you can see,
we have a lot of work to do
-
for even like
the very basic core properties.
-
So, yeah, one day,
all of that will be blue.
-
What have we been doing?
-
Things that we've been doing a lot
-
has been creating identifiers
with all these external databases
-
and aligning them.
-
So Envel mentioned we have created
over 100 external identifier properties--
-
that covers very big databases
and very tiny ones.
-
We've been using the Mix'n'match tool
extensively for matching.
-
And sometimes we've been using things
a bit more advanced
-
that Envel will detail in a moment.
-
Yeah, so 100 external
identifier properties created
-
in roughly a year to two years
-
and over 16 Mix'n'match catalogs.
-
And I started tracking
-
how many Q7889 items
didn't have any identifiers,
-
and five months ago it was 15,000
-
and today we're down to 9,600,
-
which is very much thanks
to the teaching assistant of Tracy.
-
So there's still 9,000 to go,
but we're getting there.
-
So we needed to import a lot of data
-
to complete those identifiers.
-
The first tool to do that
is the Wikidata website.
-
I think it's important to say it
-
because it's where we can fix
the small problems, and so on.
-
But we also have dedicated tools
to do that on Wikidata.
-
There is Mix'n'match, and its gadget.
-
The Mix'n'match Wiki gadget
is a gadget that you can add
-
to your account in Wikidata,
-
and it adds all identifiers
-
from [inaudible] Mix'n'match to an item.
-
You can easily add serial IDs [inaudible].
-
Other tools...
There is QuickStatements, of course.
-
But you also can use
more general tools, like OpenRefine,
-
Dataiku Data Science Studio, et cetera.
-
The point is it's very important
for this project,
-
and I think for all projects in Wikidata,
-
to have a healthy ecosystem
of tools that works.
-
There are two examples of imports.
-
The first one is connecting
PCGamingWiki and Wikidata.
-
It was made by a volunteer.
-
He made his own program in Ruby,
-
so that's an example.
-
The second one
-
is linking the OLAC video game
vocabulary with Wikidata.
-
It was made using OpenRefine
and Mix'n'match,
-
and I think Tracy
can talk more about this one.
-
And I have a third example,
which is one I made.
-
I matched the catalog of BnF,
-
so it's Bibliothèque...
the French National Library
-
with Wikidata.
-
So they have about 4,000 entries
-
about video games in their catalog,
-
and I matched half of them to Wikidata.
-
So, for that, I made a project
-
in Dataiku Data Science Studio.
-
You can see the work [inaudible].
-
I will not detail it,
-
but if you have questions,
feel free to ask.
-
I also developed
a Dataiku plugin to do it,
-
to facilitate SPARQL querying
-
because it's not included in the tool.
-
One cool thing that happened
after this one
-
is that BnF contacted me
about this project.
-
So it was very cool to have feedback,
-
and that contact was established.
-
So, another topic, the link--
-
So we want Wikidata to be
the linking hub for video games.
-
As you can see here,
-
a video game is, as Jean-Fred said,
-
a video game is about a lot of things.
-
We have Reviews and Scores, Speedruns,
-
News, Library ID,
-
Soundtrack, etc.
-
We don't want all this data
to be in Wikidata,
-
we want this data
to be linked to Wikidata.
-
So we want Wikidata to be,
-
like [Lidia] said yesterday, a place--
-
We want to see Wikidata as a place you go,
-
and then you go to another place.
-
So I think that's it.
-
And as you can see by the links,
-
video games have a really lot
of aspects to research,
-
and video games are really
complex cultural artifacts.
-
There are [inaudible],
there are [ed ones],
-
remasters, re-releases, mods, updates,
-
download of content,
and so on and so forth.
-
Plenty of remakes or remastered editions
-
are separate items
at this stage in Wikidata,
-
but not necessarily.
-
Additionally, remakes are not often linked
to the original work
-
using the property based on.
-
And perhaps we should create
an entity schema for the video games,
-
but we are still in the process
-
to get a discussion started
for the data model of video games.
-
Mostly, we have one item,
-
what we typically recognize as "the game,"
-
when we say we played the same game,
-
so it's like a Mario Kart 6.
-
Even if we played it
on different platforms,
-
so, for example, on Switch,
on Wii U, or something else.
-
So Wikidata items
for a game aggregate characteristics
-
which are shared among
different versions or editions.
-
This makes linking not easy
-
because many databases
describe games on different levels,
-
as Jean-Frédéric mentioned.
-
For instance, some have
one database entry for each edition,
-
and this results
in more than one identifier
-
for each video game item.
-
And so the use
of specific qualifiers is needed.
-
We have some discussions thinking about
the creation of different editions items,
-
for editions or releases.
-
as this is good practice for literature,
-
but the FRBR model which is used for books
seems not useful for everyone.
-
This is also an ongoing discussion
with the video game research community
-
about the best data model for video games.
-
And speaking about video game research
and the research community,
-
there is an active video game
research community
-
with a growing interest
in data about games.
-
Sadly, there are no national libraries
for video games
-
which have a comprehensive dataset
-
with authority data about video games--
-
yes, the BnF with 4,000 video games,
-
but there's still more outside.
-
That means researchers rely on data
-
on video game fan databases,
-
but as we know, there are so many,
-
and there's so different [inaudible].
-
And what makes it even harder,
-
the data is not open.
-
So could Wikidata be a source
for video game research?
-
Yes.
-
I work for the research project diggr,
-
and we have decided to work with Wikidata
for our video game research,
-
and we not only use the data
which is already there,
-
we create data about video games
and companies by hand
-
or automatically, in Wikidata.
-
Additionally, we have created
about 20,000 links to Mobygames,
-
GameFAQs and the Japanese
Media Arts Database.
-
And we also initiated as an alignment
with the OLAC video game genre vocabulary.
-
So video game
research colleagues in Japan
-
are also experimenting with Wikidata
-
to use it as a work authority
for video games.
-
So, our research will cause
a lot of spatial data
-
about video game companies
-
and where video games
have been released all over the world.
-
So we use data for video game databases,
like Mobygames in Wikidata,
-
to create some analyses like this.
-
We call it Lemongrab, the tool,
-
and the researcher can select
one or more platforms
-
and one or more release countries
-
and he will get an overview
about which companies are big players.
-
In this case, the number of published
or developed video games
-
for this combination.
-
Additionally, they can see which country
-
is strongly represented
by these companies.
-
Or we use Wikidata Query Service directly
-
to create maps of companies
within the video game industry.
-
So, at this stage, I think
there are 5,000 video game companies
-
already in Wikidata
-
which we have created
half of them, I think. (chuckles)
-
So, in conclusion, after two years
of working with Wikidata for our research,
-
we are very pleased,
-
especially with the cooperation
-
with the volunteers
of the video game taskers.
-
Thank you for that.
-
And we think Wikidata can be
the one-stop shop for video game research
-
because it already aggregates
so many links to very specialized sites
-
and it is not realistic
that we put all the data into Wikidata.
-
Thank you.
-
At the same time, we want
to be useful for the researchers.
-
We also want to stay
or to be or to become,
-
however you want it,
useful to the Wikipedias.
-
Right now, some Wikipedias
are using the data
-
from Wikipedia for their infoboxes.
-
So if tomorrow we just revamp
the entire data model
-
in a way they can't use it anymore,
-
it doesn't sound like a great idea.
-
So we'll try not to do that.
-
I think we want to be
enhancing all the databases,
-
and that's something
that's already started.
-
So if you go to Visual Novel Database
right now at vndb.org,
-
the following research
workshop that we did
-
with the nice diggr folks
-
who could meet with the database,
-
and they were interested enough
with all the linkage that we made
-
that they could harvest more links
about the entity that they talk about.
-
Like, "Well, okay, thanks to Wikidata,
we also retrieved reviews or speedruns
-
or a store where you can buy these games.
-
So we're already being useful.
-
So that was a fine example.
-
But also this German researcher
-
just started the Internationale
Computerspielesammlung,
-
(chuckles)
-
which is online, which has all the data
about the German video games,
-
what they have in their collections,
-
and they've been using Wikidata
to enrich the data IDs for labels,
-
so they have alternate titles.
-
So that was also pretty cool.
-
I think Wikidata can be the backend
for powering applications.
-
So, an example
that already exists is vglist.co,
-
and in some ways a little bit similar
-
to what avante.io does for books,
-
vglist.co does it for video games.
-
It's an app where you can record
the games you've played,
-
how long you spend, and your favorites.
-
And I just really like the fact
that it's built on top of Wikidata.
-
It's pretty cool.
-
So maybe one day we can just connect
all these things together
-
and harvest SPARQL to query data,
-
and it really doesn't matter where it is,
-
and say, "Yeah, data is not a database,"
-
and that will be fine.
-
Thank you very much,
and we'll take questions.
-
(moderator) We just have
five minutes for questions.
-
(applause)
-
(man) Hello, I really love your project,
-
and when I want to contribute,
where should I go?
-
So there was short URL in there,
-
and as Envel mentioned,
-
there are tabs at the top with the links
to the SPARQL queries and so on.
-
And there is a Tasks,
-
which is like a couple of suggestions
on where to get started.
-
But it's not mandatory, you can work
on whatever you want, obviously.
-
But, yeah, that's a nice place.
-
And if you have a project,
you can also bring it to the Talk page.
-
It's not a very lively Talk page,
-
like a lot of Wikidata Project
Talk pages, in many ways,
-
but I will read and answer,
so that's a start.
-
Do you already have something in mind?
-
We can talk after this
if you have something in mind.
-
- Allons-y.
- (woman) Hi there.
-
So I work with a group
from University of Copenhagen
-
and University of Washington
-
who are working on an initiative
called Atari Women,
-
recognizing all the women
-
who've been involved through the years
with the Atari game system.
-
And so I'm wondering if--
-
I believe that your WikiProject
-
covers the developers,
the designers and such,
-
but obviously, it crosses
into the biography part of our world.
-
And so how does that work?
-
Is there someone
who's more specialized in that area
-
who these folks at these two universities
could connect with, or...
-
Thoughts?
-
I don't think there will be
somebody in particular.
-
My impression of the [inaudible] project
is that they are fairly eclectic.
-
Sometimes people specialize
on very specific niche topics.
-
In that case, I don't think so.
-
So I'll be happy to take the call.
-
So, to answer your question,
-
yes, that will definitely be
in the scope of our project.
-
And in that period, particularly,
I don't think we want to turn back
-
because these days video games
are made by like 1,000 people
-
and do we want to create an item
about every single person,
-
like the credit rolls of a movie, right?
-
So in modern times, I don't know
if we want to be that database,
-
the ultimate database of game credits.
-
But for the Atari early days--
oh, definitely,
-
I would actually love to see the dataset
-
because it's a lot of dudes
in common knowledge of...
-
- (woman) I'll connect you to that.
- Yes, please.
-
(laughter)
-
(moderator) Any other questions?
-
Sir, just in front of you.
-
(man 2) Do you collaborate
with the Internet Archive?
-
Because there's not a month going by
that Jason Scott doesn't post.
-
He's rescued 170,000
DOS games or stuff like that.
-
There are Internet Archives identifiers
on some game items,
-
which is a bit weird
because usually on the Internet Archive
-
there's going to be
a particular release of the game,
-
again on the difference...
-
Last time I checked there were four
or five Prince of Persia
-
on the Internet Archive
-
because they have the Apple II version
and the DOS version and so on.
-
So not explicitly.
-
In general, I think we probably want
to make some connections more general
-
with the video game preservation scene.
-
There is a quite lively organization
that work hard on video game preservation.
-
And I think Wikidata
can be a useful resource for them
-
because they don't have
to manage the metadata,
-
and they can focus
on managing other things.
-
Do you have something to add to that?
-
No.
-
[inaudible], perhaps?
-
(man 3) I had the same question.
-
(laughter)
-
Perfect.
-
(moderator) There was
one more question back here.
-
No, probably I hallucinated. Sorry.
-
For one minute, we can show a query.
-
Or not.
-
(moderator) You have 30 seconds.
-
Will the Query Service [inaudible]?
-
We have links in the PDF, [inaudible]?
-
(man 4) If there's still time,
I have a question.
-
Yes, please.
-
During your presentation, did you notice
-
that some of the identifiers
have more than 100% [inaudible]?
-
Yeah, it's because the examples--
-
so that reason, one of the users,
for example, itself,
-
because they use [inaudible] as examples.
-
And also sometimes
because there are broad matches.
-
So if it says something that's a bit--
-
So, yeah, that's one
of my favorite-- if I can scroll it--
-
it's the characters of the Mario franchise
linked to their games.
-
(chuckles)
-
So you can find like Wario
and Princess Peach, and so on.
-
And my favorite is--
-
if you look somewhere, yes,
because there is Mario somewhere here,
-
and there is Dr. Mario.
-
And if you look at the item,
it's said to be the same as--
-
because Mario plumber and Mario physician
might be two different people,
-
we don't really know.
-
(laughter)
-
(moderator) Thank you very much
for this presentation.
-
(applause)