Hi, so before we start, quickly, so I'm Jean-Fred, I'm a Wikidata volunteer. Hi, I am Envel, and I'm also a Wikidata volunteer. And I'm Tracy, and I get paid (chuckles) to volunteer for Wikidata, but I'm also enthusiastic to be here today, and I work for a research board. Alright, thanks for coming to our presentation: Sum of all video games: our road to make Wikidata the hub of all video game metadata. So, first off, why should we even care about video games, like aren't they just like kids playing Fortnite or something at night? So video games have been here for a long time, since the '70s or '60s or '40s. It depends what you ask. You can check Wikipedia's extensive coverage of what is even a game. It's a major cultural industry. More than 2.5 billion people play in the world, and we estimate that, at the very least, 100,000-200,000 video games have been published since that time and that's not counting games published on the Play Store-- then you go through the millions, which is not that much when you're on Wikidata. So a little overview of the current state of video games on Wikidata. These numbers are also on our poster on the ground floor, so we can also have it there. So we have video games or the Q7889, and we have 38,000 of them, which is not that much considering that there are at least 200,000, as I mentioned. We also have expansion packs, DLCs, and compilations but we also have, for example, game controllers. We have a lot of game consoles, about 700-- that's a lot. We have an extensive ontology of video game genres, that's pretty cool, 200 of them, and [inaudible] a bit on magazines also. Maybe video games could be a satellite even for WikiCite I don't know. (chuckles) But what about outside of Wikidata? There are a lot of databases out there about video games. You may have heard about some very big ones, like Mobygames or IGDB. There are also a lot of very special-interest databases-- databases that only cover certain types. Visual Novel Database only has about this niche genre that is a visual novel. You have databases that are only about games published on the Commodore 64, and so on. But you also have government agencies and commercial players, government agencies [inaudible], called the rating agencies, the ones that put a little label: it's not good for your kids under 16. The problem is that there is no common identifier around all of these databases that binds them together. There is no cross-linking, or it is very little. Some database might be linked to their neighbor/friend's database, like the Amiga database talk to each other a little bit. But you won't have one easy way of saying all that. So there are different data coverage and specialization, and that often comes also with conceptual differences. A database might consider a game is a work, if you're into the FRBR model, or that might be an edition or that might be a particular console version. So there is a lot of granularity in there. And that's important in terms of coverage because some databases-- for example, Mobygames has a lot of information about a lot of things, but it doesn't have a lot of information about the games that were published on the early French computers, like the Oric or the Thomson TO MO series. You will find that into more French databases. And if you go into Eastern video games, like China or Japan, it's not very well covered in Western databases. Enter WikiProject video games. (cheers and applause) (woman) Whoo-hoo! We didn't make that one, actually. So it lives at that address and there are a lot of subpages, and we're going to go through a little bit of what this project is made of. As often, there is-- we'll separate that in what's old and what's new and what's borrowed and what's blue. So, as old we have-- Like a lot of WikiProjects we have, an ontology description with all the properties. There are currently 64 properties, mostly for games, but also about series or hardware. And we have a fairly extensive, I think-- how to put it-- separations. We have things about the staff, but also about the narrative universe or about the gameplay, like how many players there are. So you can explore this; it's kind of very exciting. We also have example queries. If we have time at the end, we might show off some, but you can just explore them yourself. We also have something new. Because those things don't exist in other WikiProjects and Wikidata. For example, we have an Activity Log. You can see it here. On this Activity Log, we track the activity of the project. So when we publish a blog post or an article somewhere, we add it here. When we create a new identifier property or any property related to video games, we also add it here. We also have achievements, like in January, we added a condition of an external identifier. Another thing that we do is we have a Tasks List. The Tasks List can be used by newcomers to the project to do things in the project. It can be [inaudible], so we give them an insight to [inaudible] and how to do that. It's also where we like [inaudible] [inaudible] We also have something borrowed. We have a lot of pages of statistics reports. We also have external identifiers that [inaudible]-- you can see it here-- where we track-- I don't know if you can see it-- but we have more than 100 external identifiers for video games, so this is big, huge. And here we can see for each item here-- just a little peek. And also the completion of the identifier. So, some of these things we borrowed from the Sum of all Paintings and other things, that begins more blue. So the InteGraality tool that was made initially for Sum of all Paintings I extended it for video games, and then I might as well have done it for everybody. So, yeah, one day we'll get all of these. So this is the core properties, the genre/developer/publisher along video game systems, so Windows, PlayStation console and so on. So, as you can see, we have a lot of work to do for even like the very basic core properties. So, yeah, one day, all of that will be blue. What have we been doing? Things that we've been doing a lot has been creating identifiers with all these external databases and aligning them. So Envel mentioned we have created over 100 external identifier properties-- that covers very big databases and very tiny ones. We've been using the Mix'n'match tool extensively for matching. And sometimes we've been using things a bit more advanced that Envel will detail in a moment. Yeah, so 100 external identifier properties created in roughly a year to two years and over 16 Mix'n'match catalogs. And I started tracking how many Q7889 items didn't have any identifiers, and five months ago it was 15,000 and today we're down to 9,600, which is very much thanks to the teaching assistant of Tracy. So there's still 9,000 to go, but we're getting there. So we needed to import a lot of data to complete those identifiers. The first tool to do that is the Wikidata website. I think it's important to say it because it's where we can fix the small problems, and so on. But we also have dedicated tools to do that on Wikidata. There is Mix'n'match, and its gadget. The Mix'n'match Wiki gadget is a gadget that you can add to your account in Wikidata, and it adds all identifiers from [inaudible] Mix'n'match to an item. You can easily add serial IDs [inaudible]. Other tools... There is QuickStatements, of course. But you also can use more general tools, like OpenRefine, Dataiku Data Science Studio, et cetera. The point is it's very important for this project, and I think for all projects in Wikidata, to have a healthy ecosystem of tools that works. There are two examples of imports. The first one is connecting PCGamingWiki and Wikidata. It was made by a volunteer. He made his own program in Ruby, so that's an example. The second one is linking the OLAC video game vocabulary with Wikidata. It was made using OpenRefine and Mix'n'match, and I think Tracy can talk more about this one. And I have a third example, which is one I made. I matched the catalog of BnF, so it's Bibliothèque... the French National Library with Wikidata. So they have about 4,000 entries about video games in their catalog, and I matched half of them to Wikidata. So, for that, I made a project in Dataiku Data Science Studio. You can see the work [inaudible]. I will not detail it, but if you have questions, feel free to ask. I also developed a Dataiku plugin to do it, to facilitate SPARQL querying because it's not included in the tool. One cool thing that happened after this one is that BnF contacted me about this project. So it was very cool to have feedback, and that contact was established. So, another topic, the link-- So we want Wikidata to be the linking hub for video games. As you can see here, a video game is, as Jean-Fred said, a video game is about a lot of things. We have Reviews and Scores, Speedruns, News, Library ID, Soundtrack, etc. We don't want all this data to be in Wikidata, we want this data to be linked to Wikidata. So we want Wikidata to be, like [Lidia] said yesterday, a place-- We want to see Wikidata as a place you go, and then you go to another place. So I think that's it. And as you can see by the links, video games have a really lot of aspects to research, and video games are really complex cultural artifacts. There are [inaudible], there are [ed ones], remasters, re-releases, mods, updates, download of content, and so on and so forth. Plenty of remakes or remastered editions are separate items at this stage in Wikidata, but not necessarily. Additionally, remakes are not often linked to the original work using the property based on. And perhaps we should create an entity schema for the video games, but we are still in the process to get a discussion started for the data model of video games. Mostly, we have one item, what we typically recognize as "the game," when we say we played the same game, so it's like a Mario Kart 6. Even if we played it on different platforms, so, for example, on Switch, on Wii U, or something else. So Wikidata items for a game aggregate characteristics which are shared among different versions or editions. This makes linking not easy because many databases describe games on different levels, as Jean-Frédéric mentioned. For instance, some have one database entry for each edition, and this results in more than one identifier for each video game item. And so the use of specific qualifiers is needed. We have some discussions thinking about the creation of different editions items, for editions or releases. as this is good practice for literature, but the FRBR model which is used for books seems not useful for everyone. This is also an ongoing discussion with the video game research community about the best data model for video games. And speaking about video game research and the research community, there is an active video game research community with a growing interest in data about games. Sadly, there are no national libraries for video games which have a comprehensive dataset with authority data about video games-- yes, the BnF with 4,000 video games, but there's still more outside. That means researchers rely on data on video game fan databases, but as we know, there are so many, and there's so different [inaudible]. And what makes it even harder, the data is not open. So could Wikidata be a source for video game research? Yes. I work for the research project diggr, and we have decided to work with Wikidata for our video game research, and we not only use the data which is already there, we create data about video games and companies by hand or automatically, in Wikidata. Additionally, we have created about 20,000 links to Mobygames, GameFAQs and the Japanese Media Arts Database. And we also initiated as an alignment with the OLAC video game genre vocabulary. So video game research colleagues in Japan are also experimenting with Wikidata to use it as a work authority for video games. So, our research will cause a lot of spatial data about video game companies and where video games have been released all over the world. So we use data for video game databases, like Mobygames in Wikidata, to create some analyses like this. We call it Lemongrab, the tool, and the researcher can select one or more platforms and one or more release countries and he will get an overview about which companies are big players. In this case, the number of published or developed video games for this combination. Additionally, they can see which country is strongly represented by these companies. Or we use Wikidata Query Service directly to create maps of companies within the video game industry. So, at this stage, I think there are 5,000 video game companies already in Wikidata which we have created half of them, I think. (chuckles) So, in conclusion, after two years of working with Wikidata for our research, we are very pleased, especially with the cooperation with the volunteers of the video game taskers. Thank you for that. And we think Wikidata can be the one-stop shop for video game research because it already aggregates so many links to very specialized sites and it is not realistic that we put all the data into Wikidata. Thank you. At the same time, we want to be useful for the researchers. We also want to stay or to be or to become, however you want it, useful to the Wikipedias. Right now, some Wikipedias are using the data from Wikipedia for their infoboxes. So if tomorrow we just revamp the entire data model in a way they can't use it anymore, it doesn't sound like a great idea. So we'll try not to do that. I think we want to be enhancing all the databases, and that's something that's already started. So if you go to Visual Novel Database right now at vndb.org, the following research workshop that we did with the nice diggr folks who could meet with the database, and they were interested enough with all the linkage that we made that they could harvest more links about the entity that they talk about. Like, "Well, okay, thanks to Wikidata, we also retrieved reviews or speedruns or a store where you can buy these games. So we're already being useful. So that was a fine example. But also this German researcher just started the Internationale Computerspielesammlung, (chuckles) which is online, which has all the data about the German video games, what they have in their collections, and they've been using Wikidata to enrich the data IDs for labels, so they have alternate titles. So that was also pretty cool. I think Wikidata can be the backend for powering applications. So, an example that already exists is vglist.co, and in some ways a little bit similar to what avante.io does for books, vglist.co does it for video games. It's an app where you can record the games you've played, how long you spend, and your favorites. And I just really like the fact that it's built on top of Wikidata. It's pretty cool. So maybe one day we can just connect all these things together and harvest SPARQL to query data, and it really doesn't matter where it is, and say, "Yeah, data is not a database," and that will be fine. Thank you very much, and we'll take questions. (moderator) We just have five minutes for questions. (applause) (man) Hello, I really love your project, and when I want to contribute, where should I go? So there was short URL in there, and as Envel mentioned, there are tabs at the top with the links to the SPARQL queries and so on. And there is a Tasks, which is like a couple of suggestions on where to get started. But it's not mandatory, you can work on whatever you want, obviously. But, yeah, that's a nice place. And if you have a project, you can also bring it to the Talk page. It's not a very lively Talk page, like a lot of Wikidata Project Talk pages, in many ways, but I will read and answer, so that's a start. Do you already have something in mind? We can talk after this if you have something in mind. - Allons-y. - (woman) Hi there. So I work with a group from University of Copenhagen and University of Washington who are working on an initiative called Atari Women, recognizing all the women who've been involved through the years with the Atari game system. And so I'm wondering if-- I believe that your WikiProject covers the developers, the designers and such, but obviously, it crosses into the biography part of our world. And so how does that work? Is there someone who's more specialized in that area who these folks at these two universities could connect with, or... Thoughts? I don't think there will be somebody in particular. My impression of the [inaudible] project is that they are fairly eclectic. Sometimes people specialize on very specific niche topics. In that case, I don't think so. So I'll be happy to take the call. So, to answer your question, yes, that will definitely be in the scope of our project. And in that period, particularly, I don't think we want to turn back because these days video games are made by like 1,000 people and do we want to create an item about every single person, like the credit rolls of a movie, right? So in modern times, I don't know if we want to be that database, the ultimate database of game credits. But for the Atari early days-- oh, definitely, I would actually love to see the dataset because it's a lot of dudes in common knowledge of... - (woman) I'll connect you to that. - Yes, please. (laughter) (moderator) Any other questions? Sir, just in front of you. (man 2) Do you collaborate with the Internet Archive? Because there's not a month going by that Jason Scott doesn't post. He's rescued 170,000 DOS games or stuff like that. There are Internet Archives identifiers on some game items, which is a bit weird because usually on the Internet Archive there's going to be a particular release of the game, again on the difference... Last time I checked there were four or five Prince of Persia on the Internet Archive because they have the Apple II version and the DOS version and so on. So not explicitly. In general, I think we probably want to make some connections more general with the video game preservation scene. There is a quite lively organization that work hard on video game preservation. And I think Wikidata can be a useful resource for them because they don't have to manage the metadata, and they can focus on managing other things. Do you have something to add to that? No. [inaudible], perhaps? (man 3) I had the same question. (laughter) Perfect. (moderator) There was one more question back here. No, probably I hallucinated. Sorry. For one minute, we can show a query. Or not. (moderator) You have 30 seconds. Will the Query Service [inaudible]? We have links in the PDF, [inaudible]? (man 4) If there's still time, I have a question. Yes, please. During your presentation, did you notice that some of the identifiers have more than 100% [inaudible]? Yeah, it's because the examples-- so that reason, one of the users, for example, itself, because they use [inaudible] as examples. And also sometimes because there are broad matches. So if it says something that's a bit-- So, yeah, that's one of my favorite-- if I can scroll it-- it's the characters of the Mario franchise linked to their games. (chuckles) So you can find like Wario and Princess Peach, and so on. And my favorite is-- if you look somewhere, yes, because there is Mario somewhere here, and there is Dr. Mario. And if you look at the item, it's said to be the same as-- because Mario plumber and Mario physician might be two different people, we don't really know. (laughter) (moderator) Thank you very much for this presentation. (applause)