So welcome, everybody.
This is "Structuring GLAM-Wiki
Initiatives with Wikidata"
with the presenter
João Alexandre Peschanski
from Wiki Movimento Brasil.
And let's start.
(João) So, thanks everyone for joining us.
Thank you, Erica.
I was also actually one of the presenters,
to some extent.
I have this job of presenting
on behalf of four people,
they are all here, including myself.
(person in audience) Please shut up.
Please stop talking! Thanks.
(João) Me?
(laughs)
(person in audience) Please go on talking.
(laughter)
(João) Okay. So this is collective work,
and I am only here as a means
to order some process
to some extent; this is work
by Wiki Movimento Brasil, the user group,
and The Research, Innovation
and Dissemination Center
for Neuromathematics,
which is the lab where I hold a position.
This is funded by
the São Paulo Research Foundation.
And it's basically our work to solve
what nobody in Brazil
really cares to solve,
which is to provide knowledge.
And this knowledge needs
to be provided urgently,
otherwise museums burn,
they get destroyed,
they are unfunded.
Museums don't have--
this is a Global South country,
they don't have resources.
Only 1% of the public museums in Brazil
have any sort of digital media available.
So if we don't do the digitizing,
if we don't do the upload, if we don't do
the dissemination of this work,
it just won't happen.
And this knowledge will be lost,
it will be destroyed,
it will just be unavailable forever.
So there is a sense of urgency,
and what I am going to present today
is the aspect of GLAM-Wiki initiative,
so an initiative around the collection
of galleries, libraries--
whatever galleries means--
archives and museums
or other cultural institutions
to provide this knowledge.
So, it's a focus on process,
which to some extent is interesting
because it connects to the vision
that was laid out on Wikidata
for the Wikimedia platforms
which is, Wikidata is a resource
to improve efficiency and effectiveness
of the other Wikimedia platforms.
This is the focus,
and it's particularly looking
at the Brazilian experience as a model,
I hope, for the Global South.
It's easy when you have staff, resources,
funding, whatever.
It's a little bit trickier
and it's more community dependent
when you are from an impoverished country,
when you are from a region
on which the Wikimedian community
needs to get directly involved
in the process.
And this is, to some extent,
an idea that starts--
from, the way I am presenting--
from the broader aspect
of providing this knowledge.
A process of convergence
of this knowledge and availability
through how we go onto this process
to the actual single item that we work on.
So I am going to present, to some extent,
what I've just called
the "Sum of All GLAM-Wikis Brazil",
the several institutions,
and how we keep track
of the work we've been doing,
I think we are up to now--
we have uploaded
around 70,000 files to Commons
and hundreds of thousands
of stuff to Wikidata.
It's about also the item modeling
of a GLAM-Wiki initiative,
so we can keep track of it,
and how we can involve the community
because that's the agent
of this knowledge development.
And this is one of our main projects
with this "Museu Paulista,"
it has over 23,000 images uploaded--
this is a museum that has been shut down
for several years.
So if you don't go
to the Wikimedia platforms,
you just don't have access
to this knowledge.
It's only available
for the general public here.
Which is different
when you're just, to some extent,
mirroring digital platforms
that already exist,
for instance, on the museum website.
So, what we do, we have items
for each one of the GLAMs
that we work on,
and they are Listeria-generated.
They-- each one has a page
so the community can go there.
We have a template
for GLAM-Wiki initiatives.
It's called a TGLAM--this is not Wikidata,
but it's pretty cool--
it was developed by other Portuguese
here in the room.
And so we keep track of them,
so they are all then items on Wikidata.
They are not fancy items,
it's just important that we are able
to keep track of what we're doing.
This is important for the community
to actually reach out,
and convert and this is Wikidata again
that there is--
you have pages on Wikipedia
Commons category,
so people can find elements easily.
This is TGLAM which is
the template that we use.
This is a small GLAM that we've worked on.
It's again a national museum,
a public museum,
that is currently shut down
by the government,
by lack of funding.
If you want to have access to content,
you need to go to the Wikimedia platforms.
So it's a small activity and you see
there are several members--
some members of the community small GLAM
that are working--
this is again Wikidata-generated,
all the list of GLAMs,
so people can go back and forth over them,
but most importantly--
and this is where WIkidata is coming--
you have tasks,
and there are a lot of tasks.
These are, again,
poor cultural institutions,
so the metadata that we get
is generally really bad.
We have batches of images
that no one knows what they are.
So the only way
that we can solve this issue
if we want to have
this input onto the project
is to actually mobilize the community
which is something
that we've done for airplanes.
The community with airplanes
are just fantastic.
They just identified in one day
like 500 images of airplanes--
we are doing this for political protest
in Brazil from the National Archives.
And we use several tools--
this one was presented by Andrew Lih
which is TABernacle, which is a tool
that he introduced to me,
so I am acknowledging this help.
We've actually learned a lot
from the Metropolitan work
which is, I think,
part of what we all do here is to share
processes and understanding,
so we are thankful.
Another one that we learned
from Wikimedia Deutschland
is the BRA Table--
that I don't think you've mentioned
in your presentation, Andrew--
which is actually pretty cool.
It was very important for us
in the context, you might remember,
of the fire at the National Museum
on which this gigantic historical museum,
science museum in Brazil,
just burned down.
There was no digital collection,
so we organized the campaign,
so people, random people,
would submit or upload
to Commons their images.
And they are the only images
we have of these museums,
and I am thankful again for the community
to have shared the word,
and we've used this tool, BRA Table,
to have the community understand
the language on which the items,
the entries had to be created
or were created,
the number of statements.
It's a community tool,
it's administrative stuff,
and Mix'n'match, of course,
which is useful
to be able to find more easily
where the information lies
when you have external databases.
And again on the administrative aspect,
which is again not already the main space,
we also rely on Mbabel
which was presented earlier on
through Listeria, so people
can actually improve content
not having a blank page in front of them,
but being able to have
some structured narrative
of an entry before they can
create content.
So this is all in the process
of improving efficiency
and effectiveness for the community.
And we also do that on the main space,
so we have an infrastructure,
most Wikipedia--
I would say, many Wikipedias
have this infrastructure on,
so automated infoboxes and, of course,
Commons has the Commons infoboxes.
These are really, really useful elements
because they fetch what we are able
to include onto Wikidata,
and they easily give
a sense of effectiveness
and social impact relevance
of what we are doing.
These are, again, cultural institutions
that are not recognized as GLAMs
that we work on.
And we've used this
for Wiki Loves Monuments as well,
so people would just upload
through Wikidata their monuments
and the use of the monument idea
that is now true.
Mike Peel, who is here,
has a connection
to structured data on Commons.
This is again something that improved
the effectiveness of the process.
So I think I'm going to speed up.
In Portuguese Wikipedia, we can rely
on Listeria-bought generated lists
on the main space,
which is actually pretty cool
when we are dealing
with small cultural institutions,
spread around,
that have to some extent
similar artists in their collections.
Which means that every time
you upload one museum,
it actually generates
a sort of avalanche bot editing
on, I don't know, dozens of lists,
for instance, of these ones--
list of paintings of Pedro Americo
which is one of the most important
historical painters in Brazil.
So if you look at the history,
most of the content that was included
and sometimes small information,
but sometimes a batch upload
comes from Wikidata.
Again, the sense of effectiveness.
And now moving to the way
that we deal with things.
So the major difference
on what you're seeing
from Andrew and the work with the Met,
and, I think, the way we're doing--
5 minutes from 20 or 25?
(person answering) [inaudible]
We don't do Python,
we do Google Sheet formula.
(laughs)
Which is, I think, probably harder,
but we should at some point--
(person) [inaudible]
- It's kind of scary.
- Yeah. it's kind of scary.
It is a large concatenation,
but once you've done one,
you can do them all.
But this is how we're doing this.
We use Pattypan, we rely
on Commons templates,
but I'll show them.
So it's basically a process of search,
organize, clean and quick statements,
fy, whatever--
and we do reconciliation
mostly through Google Sheets.
We have issues with Open Refined,
mostly because we receive the collections
not as full collections
but parts of them, normally,
and we are afraid
that if we use Open Refine,
the decisions that we make
won't be recorded.
So you won't be able to have them
used in different processes.
And we have this gigantic Google Sheet
that to some extent,
people spend time finding the right Q ID
or finding the ID that they need to find,
and then they just reconcile
through Google Sheet, again.
The upload is based on Pattypan.
We've tried GLAMpipe--
it's a little bit complicated,
but Pattypan is the one we've been using.
And again, in the process
of effectiveness,
the Commons templates
basically bring from WIkidata
the information you've uploaded,
so this is one of--
this is an example of an image
that we have uploaded
from this very famous
photographer in Brazil,
and it brings with the art photo,
Commons template,
a structure that we feel is useful.
Each one of the processes
that I've shown you
were identified
as a topic of a Wikidata Lab.
You probably heard of them as of now.
These are trainings that we provide
for the community,
so they are able to work
on each one of the steps.
So, here you have Magnus and Andrew,
in Brazil, helping us with--
working in this process
that has been the process
that we've relied on
for these cultural institutions.
And the trainings are available online.
The last one we have available online
is on disagreeing data
with Denny Vrandečić.
That's it.
So, thank you all for being here.
This was a fast-track presentation,
but I think we have time for questions.
Thank you.
(applause)
(Erica) Thank you, João.
So, now we have 5 minutes for questions
and please, wait for the microphone
before asking.
So, who's got questions here?
(person 1) First of all, thank you
for all the work that you are doing.
And I want to ask you
about the inspiration.
We just came from an education panel
where you said that the work that you do
with your students is difficult for you
because, you know, you have to find
assignments and things to do
that are interesting.
So I am wondering
how you keep yourself inspired
and what do you do to kind
of try new things
and find a new--
next ideas to work on.
(João) I don't know what you mean.
Thank you.
(laughs)
Maybe, I am a maniac. I don't know.
But it's obsessive, I don't know.
It's just that, again, there is a sense
that if we don't do it,
no one will do it,
(person) Obsessive [inaudible]
(laughs)
So, this is a motto for,
I think, these processes.
And again this is a country
on which the museums are being shut down
or destroyed and again if we don't do it,
this content will just disappear.
So we are just, right now,
facing a situation
in which the Brazilian government
has decided but to shut down
the databases on the killed
and disappeared people
in the military dictatorship in Brazil
because they disagree that people
were killed or disappeared
during the military dictatorship.
So content disappears.
So I think we all live,
and it's just not myself,
Érica, Giovana and Heather,
with the sense of emergency.
Which I think it's a little bit different
from other circumstances,
other countries,
but I would say that in Brazil,
and we can imagine
the Global South in general,
this is something that is really relevant.
Content will just not be there
if you wait.
(person 2) I was wondering
if there was any positive aspects
to your relationship
with the Brazilian government.
Has there been attention
to your efforts with the museum
or otherwise, I know,
it got a lot of press,
or did you get any positive attention,
did you get any collaboration,
is there any avenues in which you are
getting some progress with the government?
As some of you might know,
if not all of them,
we currently have
a very bad government in Brazil.
So if they knew we existed,
they would shut down all the projects.
So I am glad--
what we have right now
is better than any communication.
So we have a very, very large initiative
that is, as of today, half clandestine
with the Brazilian National Archive,
which is under the administration
of the Department of Justice
which is extremely far-right.
And they just don't care
about what we are doing
and if they knew we were--
they don't really care--
but if they knew, they wouldn't like it.
Just like the Department of Education--
the Head of the Department of Education
sent a letter to Wikipedians
twice this year,
saying that he doesn't like his entry.
And we don't know what it actually means,
but it came to us as an official document.
And then, you can imagine
how in this process
this would be understood.
I don't think there is
any connection right now,
but I think it's just an expression
of what they do
or understand the role of culture
or, I don't know,
social communication,
of culture in Brazil right now.
(person 2) What about local governments,
states, cities?
So, the question now is
about local governments,
cities, so that's a very actually
interesting aspect.
One of the GLAM that is listed,
is actually not necessarily a GLAM.
In like a couple of weeks ago,
we decided with a local government--
Where is it?
This one. It's called
"Wiki Takes Santana de Parnaíba"
"Wiki Occupies Santana de Parnaíba."
We had this agreement,
and it was generally funded
by the Wikimedia foundation,
that we would take over a city
for several days.
So we arrived with 15 Wikimedians,
with the support of the local government,
which opened its cultural institutions--
they are very, very small,
non-digitized, and we basically
Wikified everything.
So we took pictures of--
it's a historical city
so there were 500 monuments in the city,
so we took pictures of each one of them,
we mapped them on OpenStreetMap,
we went to the archive,
we uploaded what we could--
there were licensing issues.
We interviewed the elderly in the city,
and this was done with local government.
But these kind
of local negotiations are harder
than when you have a broader federal agent
because then you can
just trickle down to negotiation.
But it was fun mostly, which is,
of course, always important.
It was very, very impactful.
I think we've uploaded
like 10,000 images in this process.
(Erica) We have time
for one more question.
(person 3) Hello, thanks
for your presentation.
Just a very practical question.
There was a link to training materials
in your presentation.
To what?
- (person 3) To training materials.
- Yes.
(person 3) I just tried following the link,
but it points
to a Wikimedia Commons image file.
I would be interested in having
a look at the training materials.
Are they in English or in Portuguese?
- So, which one--
- (person 3) The previous one. Yeah. That one.
Available here that points
to a JPEG file on Wikimedia Commons.
- (person) [inaudible].
- (João) Ah. Okay. This one?
(person 3) Oh, yeah. This is-- no, sorry.
I was looking for the training materials.
(João) Ok. So, anyway.
Somewhere--I will provide the link, so--
- (person 3) Or if you can put on the Etherpad.
- (João) Yes.
(person 3) 'Cause I'd be interested
to see how that relates
to what we tell GLAM institutions
in Belgium.
- (João) Sure. So--
- (person 3) Thank you.
(João) Of course,
and thanks for the question.
And they are not all on YouTube
because at some point, we didn't have
the technology to stream,
but now we do.
And it was implemented, so I would say
the last 8 trainings out of 20 are online,
and some of them are in English.
Some are in Portuguese,
as we are targeting the local community,
it's important for us
that it's in Portuguese.
And then some work needs
to be done for subtitles,
and there were 20 of these trainings--
all the material PDFs,
links are on Wiki.
So they are traceable,
and the idea is that we meet
in the morning, like 10 a.m,
we have two hours of lectures,
sometimes from someone
in the local community
sometimes from a guest even remotely.
Then we learn something very specific,
like how to do modeling
when you have disagreeing data,
like the last one,
or how to implement an automated info box,
how to run a Listeria, so stuff like that.
And then we learn this,
and during the afternoon
up to 6 p.m, we implement this
on our workflow.
This is why I was saying there is
this aspect of training and doing.
So the content is available,
so you can check,
and I am sorry the link was broken.
But, of course, it's provided,
and it's on Commons and YouTube.
(Erica) So that's it.
We are out of time.
Thank you very much
for attending this session
on GLAM-Wiki initiatives with Wikidata.
And the Brazilian crew is still here,
available for your questions,
for discussion, all those things
that we've been doing
and thank you very much, João.
So another round of applause, please.
(applause)