So welcome, everybody. This is "Structuring GLAM-Wiki Initiatives with Wikidata" with the presenter João Alexandre Peschanski from Wiki Movimento Brasil. And let's start. (João) So, thanks everyone for joining us. Thank you, Erica. I was also actually one of the presenters, to some extent. I have this job of presenting on behalf of four people, they are all here, including myself. (person in audience) Please shut up. Please stop talking! Thanks. (João) Me? (laughs) (person in audience) Please go on talking. (laughter) (João) Okay. So this is collective work, and I am only here as a means to order some process to some extent; this is work by Wiki Movimento Brasil, the user group, and The Research, Innovation and Dissemination Center for Neuromathematics, which is the lab where I hold a position. This is funded by the São Paulo Research Foundation. And it's basically our work to solve what nobody in Brazil really cares to solve, which is to provide knowledge. And this knowledge needs to be provided urgently, otherwise museums burn, they get destroyed, they are unfunded. Museums don't have-- this is a Global South country, they don't have resources. Only 1% of the public museums in Brazil have any sort of digital media available. So if we don't do the digitizing, if we don't do the upload, if we don't do the dissemination of this work, it just won't happen. And this knowledge will be lost, it will be destroyed, it will just be unavailable forever. So there is a sense of urgency, and what I am going to present today is the aspect of GLAM-Wiki initiative, so an initiative around the collection of galleries, libraries-- whatever galleries means-- archives and museums or other cultural institutions to provide this knowledge. So, it's a focus on process, which to some extent is interesting because it connects to the vision that was laid out on Wikidata for the Wikimedia platforms which is, Wikidata is a resource to improve efficiency and effectiveness of the other Wikimedia platforms. This is the focus, and it's particularly looking at the Brazilian experience as a model, I hope, for the Global South. It's easy when you have staff, resources, funding, whatever. It's a little bit trickier and it's more community dependent when you are from an impoverished country, when you are from a region on which the Wikimedian community needs to get directly involved in the process. And this is, to some extent, an idea that starts-- from, the way I am presenting-- from the broader aspect of providing this knowledge. A process of convergence of this knowledge and availability through how we go onto this process to the actual single item that we work on. So I am going to present, to some extent, what I've just called the "Sum of All GLAM-Wikis Brazil", the several institutions, and how we keep track of the work we've been doing, I think we are up to now-- we have uploaded around 70,000 files to Commons and hundreds of thousands of stuff to Wikidata. It's about also the item modeling of a GLAM-Wiki initiative, so we can keep track of it, and how we can involve the community because that's the agent of this knowledge development. And this is one of our main projects with this "Museu Paulista," it has over 23,000 images uploaded-- this is a museum that has been shut down for several years. So if you don't go to the Wikimedia platforms, you just don't have access to this knowledge. It's only available for the general public here. Which is different when you're just, to some extent, mirroring digital platforms that already exist, for instance, on the museum website. So, what we do, we have items for each one of the GLAMs that we work on, and they are Listeria-generated. They-- each one has a page so the community can go there. We have a template for GLAM-Wiki initiatives. It's called a TGLAM--this is not Wikidata, but it's pretty cool-- it was developed by other Portuguese here in the room. And so we keep track of them, so they are all then items on Wikidata. They are not fancy items, it's just important that we are able to keep track of what we're doing. This is important for the community to actually reach out, and convert and this is Wikidata again that there is-- you have pages on Wikipedia Commons category, so people can find elements easily. This is TGLAM which is the template that we use. This is a small GLAM that we've worked on. It's again a national museum, a public museum, that is currently shut down by the government, by lack of funding. If you want to have access to content, you need to go to the Wikimedia platforms. So it's a small activity and you see there are several members-- some members of the community small GLAM that are working-- this is again Wikidata-generated, all the list of GLAMs, so people can go back and forth over them, but most importantly-- and this is where WIkidata is coming-- you have tasks, and there are a lot of tasks. These are, again, poor cultural institutions, so the metadata that we get is generally really bad. We have batches of images that no one knows what they are. So the only way that we can solve this issue if we want to have this input onto the project is to actually mobilize the community which is something that we've done for airplanes. The community with airplanes are just fantastic. They just identified in one day like 500 images of airplanes-- we are doing this for political protest in Brazil from the National Archives. And we use several tools-- this one was presented by Andrew Lih which is TABernacle, which is a tool that he introduced to me, so I am acknowledging this help. We've actually learned a lot from the Metropolitan work which is, I think, part of what we all do here is to share processes and understanding, so we are thankful. Another one that we learned from Wikimedia Deutschland is the BRA Table-- that I don't think you've mentioned in your presentation, Andrew-- which is actually pretty cool. It was very important for us in the context, you might remember, of the fire at the National Museum on which this gigantic historical museum, science museum in Brazil, just burned down. There was no digital collection, so we organized the campaign, so people, random people, would submit or upload to Commons their images. And they are the only images we have of these museums, and I am thankful again for the community to have shared the word, and we've used this tool, BRA Table, to have the community understand the language on which the items, the entries had to be created or were created, the number of statements. It's a community tool, it's administrative stuff, and Mix'n'match, of course, which is useful to be able to find more easily where the information lies when you have external databases. And again on the administrative aspect, which is again not already the main space, we also rely on Mbabel which was presented earlier on through Listeria, so people can actually improve content not having a blank page in front of them, but being able to have some structured narrative of an entry before they can create content. So this is all in the process of improving efficiency and effectiveness for the community. And we also do that on the main space, so we have an infrastructure, most Wikipedia-- I would say, many Wikipedias have this infrastructure on, so automated infoboxes and, of course, Commons has the Commons infoboxes. These are really, really useful elements because they fetch what we are able to include onto Wikidata, and they easily give a sense of effectiveness and social impact relevance of what we are doing. These are, again, cultural institutions that are not recognized as GLAMs that we work on. And we've used this for Wiki Loves Monuments as well, so people would just upload through Wikidata their monuments and the use of the monument idea that is now true. Mike Peel, who is here, has a connection to structured data on Commons. This is again something that improved the effectiveness of the process. So I think I'm going to speed up. In Portuguese Wikipedia, we can rely on Listeria-bought generated lists on the main space, which is actually pretty cool when we are dealing with small cultural institutions, spread around, that have to some extent similar artists in their collections. Which means that every time you upload one museum, it actually generates a sort of avalanche bot editing on, I don't know, dozens of lists, for instance, of these ones-- list of paintings of Pedro Americo which is one of the most important historical painters in Brazil. So if you look at the history, most of the content that was included and sometimes small information, but sometimes a batch upload comes from Wikidata. Again, the sense of effectiveness. And now moving to the way that we deal with things. So the major difference on what you're seeing from Andrew and the work with the Met, and, I think, the way we're doing-- 5 minutes from 20 or 25? (person answering) [inaudible] We don't do Python, we do Google Sheet formula. (laughs) Which is, I think, probably harder, but we should at some point-- (person) [inaudible] - It's kind of scary. - Yeah. it's kind of scary. It is a large concatenation, but once you've done one, you can do them all. But this is how we're doing this. We use Pattypan, we rely on Commons templates, but I'll show them. So it's basically a process of search, organize, clean and quick statements, fy, whatever-- and we do reconciliation mostly through Google Sheets. We have issues with Open Refined, mostly because we receive the collections not as full collections but parts of them, normally, and we are afraid that if we use Open Refine, the decisions that we make won't be recorded. So you won't be able to have them used in different processes. And we have this gigantic Google Sheet that to some extent, people spend time finding the right Q ID or finding the ID that they need to find, and then they just reconcile through Google Sheet, again. The upload is based on Pattypan. We've tried GLAMpipe-- it's a little bit complicated, but Pattypan is the one we've been using. And again, in the process of effectiveness, the Commons templates basically bring from WIkidata the information you've uploaded, so this is one of-- this is an example of an image that we have uploaded from this very famous photographer in Brazil, and it brings with the art photo, Commons template, a structure that we feel is useful. Each one of the processes that I've shown you were identified as a topic of a Wikidata Lab. You probably heard of them as of now. These are trainings that we provide for the community, so they are able to work on each one of the steps. So, here you have Magnus and Andrew, in Brazil, helping us with-- working in this process that has been the process that we've relied on for these cultural institutions. And the trainings are available online. The last one we have available online is on disagreeing data with Denny Vrandečić. That's it. So, thank you all for being here. This was a fast-track presentation, but I think we have time for questions. Thank you. (applause) (Erica) Thank you, João. So, now we have 5 minutes for questions and please, wait for the microphone before asking. So, who's got questions here? (person 1) First of all, thank you for all the work that you are doing. And I want to ask you about the inspiration. We just came from an education panel where you said that the work that you do with your students is difficult for you because, you know, you have to find assignments and things to do that are interesting. So I am wondering how you keep yourself inspired and what do you do to kind of try new things and find a new-- next ideas to work on. (João) I don't know what you mean. Thank you. (laughs) Maybe, I am a maniac. I don't know. But it's obsessive, I don't know. It's just that, again, there is a sense that if we don't do it, no one will do it, (person) Obsessive [inaudible] (laughs) So, this is a motto for, I think, these processes. And again this is a country on which the museums are being shut down or destroyed and again if we don't do it, this content will just disappear. So we are just, right now, facing a situation in which the Brazilian government has decided but to shut down the databases on the killed and disappeared people in the military dictatorship in Brazil because they disagree that people were killed or disappeared during the military dictatorship. So content disappears. So I think we all live, and it's just not myself, Érica, Giovana and Heather, with the sense of emergency. Which I think it's a little bit different from other circumstances, other countries, but I would say that in Brazil, and we can imagine the Global South in general, this is something that is really relevant. Content will just not be there if you wait. (person 2) I was wondering if there was any positive aspects to your relationship with the Brazilian government. Has there been attention to your efforts with the museum or otherwise, I know, it got a lot of press, or did you get any positive attention, did you get any collaboration, is there any avenues in which you are getting some progress with the government? As some of you might know, if not all of them, we currently have a very bad government in Brazil. So if they knew we existed, they would shut down all the projects. So I am glad-- what we have right now is better than any communication. So we have a very, very large initiative that is, as of today, half clandestine with the Brazilian National Archive, which is under the administration of the Department of Justice which is extremely far-right. And they just don't care about what we are doing and if they knew we were-- they don't really care-- but if they knew, they wouldn't like it. Just like the Department of Education-- the Head of the Department of Education sent a letter to Wikipedians twice this year, saying that he doesn't like his entry. And we don't know what it actually means, but it came to us as an official document. And then, you can imagine how in this process this would be understood. I don't think there is any connection right now, but I think it's just an expression of what they do or understand the role of culture or, I don't know, social communication, of culture in Brazil right now. (person 2) What about local governments, states, cities? So, the question now is about local governments, cities, so that's a very actually interesting aspect. One of the GLAM that is listed, is actually not necessarily a GLAM. In like a couple of weeks ago, we decided with a local government-- Where is it? This one. It's called "Wiki Takes Santana de Parnaíba" "Wiki Occupies Santana de Parnaíba." We had this agreement, and it was generally funded by the Wikimedia foundation, that we would take over a city for several days. So we arrived with 15 Wikimedians, with the support of the local government, which opened its cultural institutions-- they are very, very small, non-digitized, and we basically Wikified everything. So we took pictures of-- it's a historical city so there were 500 monuments in the city, so we took pictures of each one of them, we mapped them on OpenStreetMap, we went to the archive, we uploaded what we could-- there were licensing issues. We interviewed the elderly in the city, and this was done with local government. But these kind of local negotiations are harder than when you have a broader federal agent because then you can just trickle down to negotiation. But it was fun mostly, which is, of course, always important. It was very, very impactful. I think we've uploaded like 10,000 images in this process. (Erica) We have time for one more question. (person 3) Hello, thanks for your presentation. Just a very practical question. There was a link to training materials in your presentation. To what? - (person 3) To training materials. - Yes. (person 3) I just tried following the link, but it points to a Wikimedia Commons image file. I would be interested in having a look at the training materials. Are they in English or in Portuguese? - So, which one-- - (person 3) The previous one. Yeah. That one. Available here that points to a JPEG file on Wikimedia Commons. - (person) [inaudible]. - (João) Ah. Okay. This one? (person 3) Oh, yeah. This is-- no, sorry. I was looking for the training materials. (João) Ok. So, anyway. Somewhere--I will provide the link, so-- - (person 3) Or if you can put on the Etherpad. - (João) Yes. (person 3) 'Cause I'd be interested to see how that relates to what we tell GLAM institutions in Belgium. - (João) Sure. So-- - (person 3) Thank you. (João) Of course, and thanks for the question. And they are not all on YouTube because at some point, we didn't have the technology to stream, but now we do. And it was implemented, so I would say the last 8 trainings out of 20 are online, and some of them are in English. Some are in Portuguese, as we are targeting the local community, it's important for us that it's in Portuguese. And then some work needs to be done for subtitles, and there were 20 of these trainings-- all the material PDFs, links are on Wiki. So they are traceable, and the idea is that we meet in the morning, like 10 a.m, we have two hours of lectures, sometimes from someone in the local community sometimes from a guest even remotely. Then we learn something very specific, like how to do modeling when you have disagreeing data, like the last one, or how to implement an automated info box, how to run a Listeria, so stuff like that. And then we learn this, and during the afternoon up to 6 p.m, we implement this on our workflow. This is why I was saying there is this aspect of training and doing. So the content is available, so you can check, and I am sorry the link was broken. But, of course, it's provided, and it's on Commons and YouTube. (Erica) So that's it. We are out of time. Thank you very much for attending this session on GLAM-Wiki initiatives with Wikidata. And the Brazilian crew is still here, available for your questions, for discussion, all those things that we've been doing and thank you very much, João. So another round of applause, please. (applause)