Thanks folks.
As I mentioned before,
you can load up the slides here
by either the QR code or the short URL,
which is wikidatacon..., this is bit.ly,
wikidatacon19glamstrategies.
And the slides are also
on the program page
on the WikidataCon site.
And then, there's also an Etherpad here
that you can click on.
So, I'll be talking about a lot of things.
that you might have heard about it
at Wikimania, if you were there,
but we are going to go
into a lot more implementation details.
Because we're at WikidataCon,
we can dive deeper
into the Wikidata and technical aspects.
But Richard and myself, we are working
at the Met Museum right now
and their Open Access.
If you didn't know,
about two plus years ago,
entering to the third year,
there's been an Open Access
strategy at the Met,
where they're releasing their images
under CC0 license and their metadata.
And one of the things
they brought us on to do
is what things could we imagine doing
with this Open Access content.
So, we're going to talk
a little bit about that
in terms of the experiments
that we've been running,
and we'd love to hear your feedback.
So, I hope to talk about 20 minutes,
and then hope to get some conversation
with you folks, since we have
a lot of knowledge in this room.
This is the announcement,
and actually the one-year anniversary,
where Katherine Maher was actually there,
at the Met to talk about that anniversary.
So, one of the things that's challenging
I think for a lot of folks
is how do you explain Wikidata,
and this GLAM
contribution strategy to Wikidata
to C-level folks at an organization.
We can talk about it with data scientists,
Wikimedians, librarians, maybe curators,
but when it comes to talking about this
with a director of a museum,
or a director of a library,
what does it actually--
how does it resonate with them?
So, one way that we actually talked
about that I think makes sense,
is everyone knows about Wikipedia,
and for the English language edition,
at least, we're talking
about 6 million articles.
And it sounds like a lot,
but if you think about it,
Wikipedia is not really the sum
of all human knowledge,
it's the sum of all reliably sourced,
mostly western knowledge.
And there's a lot of stuff out there.
We have a lot of stuff
in Commons already--
56 million media files going up
every single day--
but these are very...
a different type of standard
to what goes into Wikimedia Commons.
And the way that we have described
Wikidata to GLAM professionals,
and especially the C levels,
is that what if we could have a repository
that has a notability bar
that is not as high as Wikipedia.
So, we want all these paintings,
but not every painting
necessarily needs an article.
Wikipedia is held back by the fact
that you need to have
language editions of Wikipedia.
So, can we store the famous thing--
things, not strings.
Can we be object oriented
and not really lexical oriented?
And can we store this in a database
that stores facts, figures,
and relationships?
And that's pretty much
what Wikidata does.
And Wikidata is also a universal
kind of crosswalk database to links
to other collections out there.
So, we think this really resonates
with folks when you're talking about
what is the value of Wikidata compared
to what they're normally familiar with,
which is just Wikipedia.
Alright, so what are the benefits?
You're interlinking
your collections with others.
So, unfortunately, I apologize
to librarians here,
I'll be talking mostly about museums,
but a lot of this also is valid
also for libraries.
But you're basically connecting
your collection with the global collection
of linked open data collections.
You can also receive enriched
and improved metadata back
after contributing and linking
your collections to the world.
And there are some pretty neat
interactive multimedia applications
that you get-- I don't want
to say for free,
but your collection in Wikidata
allows you to visualize things
that you've never seen before.
We'll show you some examples.
And so, how do you convey this
to GLAM professionals effectively?
Well, I usually like to start
with storytelling,
and not technical explanations.
Okay, so if everyone here
has a cell phone,
especially if you have an iPhone,
I want you to scan this QR code
and bring up the URL
that it comes up with.
Or if you don't have a QR scanner,
just type in w.wiki/Aij in a web browser.
So go ahead and scan that.
And what comes up?
Does anyone see a knowledge graph
pop up on your screen?
So, for folks here in WikidataCon,
this is probably not
revolutionary for you.
But what it does, it does a SPARQL query
with these objects,
and it shows the linkages between them.
And you can actually drag them
around the screen.
You can actually click on nodes.
If you're [inaudible] in a mobile,
it will expand that--
you can actually start to surf
through Wikidata this way.
So, for Wikidata veterans
this is pretty cool.
One shot, you get this.
For a lot folks who have never seen
Wikidata before,
this is a revolutionary moment for them.
To actually hand-manipulate
a knowledge graph,
and to start surfing through Wikidata
without having to know SPARQL,
without having to know what a Q item is,
without having to know
what a property proposal is,
they can suddenly start seeing
connections in a way that is magical.
Hey, I see [Jacob's] here.
Jacob's been using
some of this code, as well.
So, this is some code
that we'll talk about later on
that allows you to create
these visualizations in Wikidata.
And we've really seen this
turn a lot of heads
who have really
never gotten Wikidata before.
But after seeing these interactive
knowledge graphs, they get it.
They understand the power of this.
And especially this example here,
this was a really big eye-opener
for the folks at the Met,
because this is the artifact
that is the center of this graph,
right there, the Portrait of Madame X,
a very famous portrait.
And they did not even know
that this was the inspiration
for the black dress that Rita Hayworth
wore in the movie Gilda.
So, just by seeing this graph, they said,
"Wait a minute. This is one
of our most visited portraits.
I didn't know that this was true."
And there's actually two other books
published about that painting.
You can see all these things,
not just within the realm of GLAM,
but it extends to fashion,
it extends to literature.
You're starting to see
the global connections
that your artworks have,
or your collections have via Wikidata.
So, how do we do this?
If you can remember nothing else
from this presentation,
this one page is your one-stop shopping.
Now, fortunately, you don't have
to memorize all this.
It's actually right here at
Wikidata:Linked_open_data_workflow.
So, we'll be talking about some
of these different phases
of how you first prepare,
reconcile, and examine
what the GLAM organization might have
and what does Wikidata have.
And then, what are the tools
to actually ingest
and correct or enrich that
once it's in Wikidata.
And then, what are some of ways
to reuse that content,
or to report and create
new things out of it.
So, this is the simpler version of a chart
that Sandra and the GLAM folks
at the foundation have created.
But this is trying
to sum up, in one shot--
because we know how hard things
are to find in Wikidata--
to find in one shot all the different
tools you should pay attention to
as a GLAM organization.
So, just using the Met as an example,
we started with what is the ideal object
that we have in Wikidata
that comes from the Met?
This is a typical shot of a Wikidata item,
in the mobile mode there.
And this is one
of the more famous paintings
we used as a model, here.
We have the label,
description, and aliases.
And then, we found out,
"What are the core statements
that we wanted?"
We wanted instance of, image,
inception, collection.
And what are some other properties
we would like if we had it?
Depiction information,
material used, things like that.
We actually do have an identifier.
The Met object ID is P3634.
So, for some organizations,
you might want to propose
a property just to track your items
using an object ID.
And then, for the Met,
just trying to circumscribe
what objects do we want to upload
and keep in Wikidata--
the thing that we first identified
were collection highlights.
These are like a hand-selected set
of 1,500 to 1,000 items
that were going to be given priority
to upload to Wikidata.
So, Richard and the crew
out of Wikimedia in New York
did a lot of this early work.
And then, now, we're systematically
going through to make sure
they're all complete.
And there's a secondary set
called the Heilbrunn Timeline
of Art History-- about 8,000 items
that are seminal pieces of work,
artists' works throughout history.
And there are about 8,000
that the Met has identified,
and we're also putting that
on Wikidata, as well,
using a different destination.
Here, described by source--
Heilbrunn Timeline of Art History.
So, the collection highlight
is denoted here as collection--
Metropolitan Museum of Art,
subject has role collection highlight.
And then, these 8,000
or so are like that in Wikidata.
I couldn't show this chart at Wikimania,
because it's too complicated.
But WikidataCon, we can.
So, this is something that is really hard
to answer sometimes.
What makes something
in Wikidata from the Met,
or from the New York Public Library,
or from your organization?
And the answer is not easy.
It's: depends.
It's complicated, it can be multi-factor.
So, you could say, "Well, if I had
an object ID in Wikidata,
that is an embed object."
But maybe someone didn't enter that.
Maybe they only put in
Collection: Met which is P195,
or they put in the accession number,
and they put collection as the qualifier
to that accession number.
So, there's actually, one, two, three
different ways to try to find Met objects.
And probably the best way to do it
is through a union like this.
So, you combine all three,
and you come back,
and you make a list out of it.
So unfortunately, there is
no one clean query
that'll guarantee you all the Met objects.
This is probably
the best approach for this.
And for some institutions,
they're probably doing
something similar to that right now.
Alright, so example here,
is that what you see here
manifests itself differently--
not differently, but as this in a query,
which can get pretty complex.
So, if we're looking
for all the collection highlights,
we'd break this out into the statement
and then the qualifier as this:
subject has role collection highlight.
So, that's one way that we sort out
some of these special
designations in Wikidata.
So, the summary is,
representing "The Met" is multifaceted,
and needs to balance simplicity
and findability.
How many people here have heard
of Sum of All Paintings as a project?
Ooh, God, good, a lot of you!
So, it's probably one
of the most active ones
that deals with these issues.
So, we always debate whether we should
model things super-accurately,
or should you model things
so that they're findable.
These are kind of at odds with each other.
So, we usually prefer findability.
It's no good if it's perfectly modeled,
but no one can ever find it,
because it's so strict
in terms of how it's defined at Wikidata.
And then, we have some challenges.
Multiple artifacts might be tied
to one object ID,
which might be different in Wikidata.
And then, mapping the Met classification
to instances has some complex cases.
So, the way that the Met classifies things
doesn't always fit
with how Wikidata classifies things.
So, we show you some examples here
of how this works.
So, this is a great example
of using a Python library
to actually ingest
what we know from the Met,
and then try to sort out what they have.
So, this is just for textiles.
You can see that they got
a lot of detail here
in terms of woven textiles, laces,
printed, trimmings, velvets.
We first looked into this in Wikidata.
We did not have
this level of detail in Wikidata.
We still don't have all this resolved.
You can see that this
is really complex here.
Anonymous is just not anonymous
for a lot of databases.
There's a lot of qualifications--
whether the nationality, or the century.
So, trying to map all this to Wikidata
can be complex, as well.
And then, this shows you
that of all the works in the Met,
about 46% are open access right now.
So, we still have about just over 50%
that are not CC0 yet.
(man) All the objects in the Met,
or all objects on display?
(Andrew) It's weird. It's not on display.
But it's not all objects either.
It's about 400 to 500 thousand objects
in their database at this point.
So, somewhere in between.
So, starting points.
This is always a hard one.
We just had this discussion
on the Facebook group recently
about where do people go
to find out where the modeling
should look like for a certain thing.
It's not easy.
So, normally, what we have to do
is just point people to,
I don't know, some project
that does it well now?
So, it's not a satisfying answer,
but we usually tell folks
to start at things like visual arts,
or Sum of All Paintings
does it pretty well,
or just go to the project chat to find out
where some of these things are.
We need better solutions for this.
This is just a basic flow
of what we're doing with the Met here.
We're basically taking
their CSV, and their API,
and we're consuming it
into a Python data frame.
We're taking the SPARQL code--
the one that you saw
before, this super union--
bring that in, and we're doing
a bi-directional diff,
and then seeing what new things
have been added here,
what things have been subtracted there,
and we're actually making those changes
either through QuickStatements,
or we're doing it through Pywikibot.
So, directly editing Wikidata.
So, this is the big slide
I also couldn't show at Wikimania,
because it would have flummoxed everyone.
So, this is a great example
of how we start with the Met database,
we have this crosswalk database,
and then we generate
the changes in Wikidata.
The way this works is this is an example
of one record from the Met.
This is an evening dress-- we're working
with the Costume Institute recently,
the one that puts on the Met Gala.
So, we have one evening dress
here, by Valentina.
Here's a date, accession number.
So, these things can be put
into Wikidata directly.
A field equals the date, accession number.
But what do we do with things like this?
This is an object name, which is basically
like a classification of what it is,
like an instance of for the Met.
And the designer's Valentina.
So, what we do is we take these
and we run all the unique object names
and all the unique designers
through OpenRefine.
So, we get maybe 60% matches
if we're lucky.
We put that into a spreadsheet.
Then we ask volunteers
or the curators at the Met
to help fill in this crosswalk database.
This is just simply Google Sheets.
So, we say, here are all the object names,
the unique object names
that match lexically exactly
with what's in the Met database,
and then you say this maps to this Q ID.
So, we first started
this maybe like only about--
well, 60% were failed,
some of these were blank.
So, we tap folks in specific groups.
So there's like a Wiki Loves Fashion
little chat group that we have.
And folks like user PKM
were super useful in this area.
So she spent a lot of time
looking through this, and saying,
"Okay, Evening suit is this,
Ewer is that."
So, we looked through
and made all this mappings here.
And then, what happens is now,
when we see this in the Met database,
we look it up in the crosswalk database,
and we say, "Oh, yeah.
These are the two Q numbers
we need to put into Wikidata."
And then, it generates
the QuickStatement right there.
Same thing here with Designer: Valentina.
If Valentina matches here,
then it gets generated
with that QuickStatement right there.
If Valentina does not exist,
then we'll create it.
You can see here, Weeks--
look at that high Q ID right there.
We just created that recently,
because there was no entry before.
Does that makes sense to everyone?
- (man 2) What's the extra statement?
- (Andrew) I'm sorry?
- (man 2) What's the extra statement?
- (Andrew) Oh, the extra statement.
So, believe it or not, we have
an Evening blouse, Evening dress,
Evening pants,
Evening ensemble, Evening hat--
do we want to make a new Wikidata item
for Evening pants,Evening everything?
So, we said, "No."
We probably don't want to.
We'll just say, "It's a dress,
but it's also evening wear",
which is what that is.
So, we're saying an instance
of both things.
I'm not sure it's the perfect solution,
but it's a solution at this point.
So, does everyone get that?
So, this is kind of a crosswalk database
that we maintain here.
And the nice thing about it,
it's just Google Sheets.
So, we can get people to help
that don't need to know
anything about this database,
don't need to know about QuickStatements,
don't need to know about queries.
They just go in and fill in the Q number.
Yeah.
(woman) So, when you copy
object name and you find the Q ID,
the initial 60%
that you mentioned as an example,
is that by exact match?
(Andrew) Well, it's through OpenRefine.
So, it does its best guess,
and then we verify to make sure
that the OpenRefine match makes sense.
Yeah.
Does that make sense to everyone?
So, some folks might be doing
some variation on this,
but I think the nice thing about this
is that, by using Google Sheets,
we remove a lot of the complexities
of these two areas from this.
And we'll show you some code
that does this later on.
- (man 3) How do you generate [inaudible]?
- (Andrew) How do you generate this?
- (man 3) Yes.
- (Andrew) Python code.
I'll show you a line that does this.
But you can also go up here.
This is the whole Python program
that does this, this, and that,
if you want to take a look at that.
Yes.
(man 4) Did you really use
your own vocabulary,
or is there something [inaudible].
- (Andrew) This right here?
- (man 4) Yeah.
(Andrew) Yeah. So, this
is the Met's own vocabulary.
So, most museums use
a system called TMS.
It's like their own management system.
So, they'll usually--
this is the museum world--
they'll usually roll
their own vocabulary for their own needs.
Museums are very late
to interoperable metadata.
Librarians and archivists have this
kind of as baked into them.
Museums are like, "Meh..."
Our primary goal
is to put objects on display,
and if it plays well with other people,
that's a side benefit.
But it's not a primary thing that they do.
So, that's why it's complicated
to work with museums.
You need to map their vocabulary,
which might be a mish-mash
of famous vocabularies,
like Getty AAT, and other things.
But usually, it's to serve
their exact needs at their museum.
And that's what's challenging.
And I see a lot of heads nodding,
so you've probably seen this a lot
at these museums.
So, I'll move on to show you
how this actually is done.
Oh, go ahead.
(man 5) How do you
bring people, to collaborate,
and put some Q codes into your database?
(Andrew) How do you-- I'm sorry?
(man 5) How do you bring...
collaborate people?
(Andrew) Ah, so for this,
these are projects we just go to,
for better or for worse,
like Facebook chat groups that we know,
are active in these areas.
Like Sum of All Paintings,
Wiki Loves Fashion--
which is a group
of maybe five or seven folks.
But we need a better way
to get this out to folks
so we get more collaborators on this.
This doesn't scale well, right now.
But for small groups,
it works pretty well.
I'm open to ideas.
(man 5) [inaudible]
(Andrew) Oh yeah. Please come on up.
If folks want to come up here,
there's a little more room
in the aisle right here.
So, we are utilizing Python
for this mostly.
If you don't know, there is
a Python notebook system
that WMFLabs has.
So, you can actually go on
and start playing with this.
So, it's pretty easy
to generate a lot of stuff
if you know some of the code that's there.
[inaudible], yeah.
(woman 2) Why do you put everything
into Wikidata,
and not into your own Wikibase?
(Andrew) If you're using
your own Wikibase?
(woman 2) Yeah. Why don't you
use your own Wikibase?
and then go to [inaudible]
(Andrew) That's its own ball of--
I don't want to maintain
my own Wikibase at this point. (laughs)
If I can avoid doing
the Wikibase maintenance,
I would not do it.
(man 6) Would you like a Wikibase?
(Andrew) We could. It's possible.
(man 7) But again,
what they use [inaudible]
about 2,000, 8,000, 10,000,
of 400,000 digital [inaudible].
So that's only 2.5%,
[inaudible]
(Andrew) So, I'd say, solve it for 1,500,
then scale up to 150 thousand.
So, we're trying to solve it
for the best
well-known objects, and then--
(man 7) When do you think
that will happen?
I understand that those are people
that shouldn't go onto Wikidata.
So you go to Commons
or your own Wikibase solution,
not to be a [inaudible]--
(Andrew) Right. That's why we're going
with the 2,000 and 8,000.
We're pretty confident
these are highly notable objects
that deserve to be in Wikidata.
Beyond that, it's debatable.
So, that's why we're not
vacuuming 400-thousand things at one shot.
We're starting with notable 2,000,
notable 8,000, then we'll talk after that.
So, these are the two lines of code
that do the most stuff here.
So, even if you don't know Python,
it's actually not that bad
if you look at this.
There's a read_csv function.
You're taking the crosswalk URL,
basically, the URL
of that Google Spreadsheet.
You're grabbing the spreadsheet
that's called "Object Name",
and you're basically creating
a data structure
that has the Object Name and the QID.
That's it. That's all you're doing.
Just pulling that in to the Python code.
Then, you're actually matching
whatever the entity's name is,
and then looking up the QID.
Okay, so, this is just to tell you
that's not super hard.
The code is available right there,
if you want to look at it.
But these two lines of code,
which takes a little while
when you're writing it from scratch
to create these two lines of code,
but once you have an example,
it's pretty darn easy to plug in
your own data set, your own crosswalk,
to generate the QuickStatements.
So, I've done a lot of the work already,
and I invite you
to steal the code and try it.
So, when it comes to images,
it's a little more challenging.
So, at this point, Pattypan
is probably your best bet.
Pattypan is a tool that is
a spreadsheet-oriented tool.
You fill in the metadata, you point
to the local file on your computer,
and it uploads it to Commons
with all that information,
or another alternative
is if you set P4765 to a URL--
because this is the Commons-compatible
image available at URL,
Martin Dahhmers has a bot,
at least for paintings,
that will just swoop through and say,
"Oh, we don't have this image.
Here's a Commons compatible one.
Why don't I slip it from that site
and put it into Commons?"
And that's what his bot does.
So, you can actually take
a look at his bot
and modify it for your own purposes,
but that is also another alternative
that doesn't require you
to do some spreadsheet work there.
If you might have heard
of GLAM Wiki Toolset,
it's effectively end
of life at this point.
It hasn't been updated, and even the folks
who have been working with it in the past
have said Pattypan
is probably your best bet.
Has anyone used GWT these days?
A few of you, a little bit.
It's just not being further developed,
and it's not compatible with a lot
of our authentication protocols
that we have now.
Okay. So, right now, we have basic
metadata added to Wikidata,
with pretty good results from the Met,
and we have a Python script here
to also analyze that.
You're welcome to steal
some of that code, as well.
So, this is what we are showing
to the Met folks, now.
We actually have Listeria lists
that are running
to show all the inventory
and all the information
that we have in Wikidata.
And I'll show you very quickly
about a project that we ran to show folks.
So, what are the benefits of adding
your collections to Wikidata?
One is to use AI in the image classifier
to actually help train
a machine learning model
with all the Met's images and keywords,
and let that be an engine for other folks
to recognize content.
So, this is a hack-a-thon that we had
with MIT and Microsoft last year.
The way this works, is we have
the paintings from the Met,
and we have the keywords
that they actually paid a crew
for six months to work on
to add hand keyword tags
to all the artworks.
We ingested that
into an AI system right here,
and then, what we did was say,
"Let's feed in new images that
this AI ML system had never seen before,
and see what comes out."
And the problem is that it comes out
with pretty good results,
but it's maybe only 60% accurate.
And for most folks,
60% accurate is garbage.
How do I get the 60% good
out of this pile of stuff?
The good news is that our community
knows how to do that.
We can actually feed this
into a Wikidata game
and get the good stuff out of that.
That's basically what we did.
So, this is the Wikidata game--
you'll notice this is
Magnus' interface right there--
being played at the Met Museum,
in the lobby.
We actually had folks at a cocktail party
drinking champagne
and hitting buttons on the screen.
Hopefully, accurately. (chuckles)
(applause)
We had journalists, curators,
we had some board members
from the Met there as well.
And this was great.
No log in, whatever.
(lowers voice) We created
an account just for this.
So, they just hit yes-no-yes-no.
This is great.
You saw this, it said,
"Is there a tree in this picture?"
You don't have to train anyone on this.
You just hit yes--
depicts a tree, not depicted.
I even had my eight-year-old boys
play this game with a finger tap.
And we also created a little tool
that showed all the depictions going by
so people could see them.
It basically is like--
how do you sift good from bad?
This is where the Wikimedia
community comes in,
that no other entity could ever do.
So, in that first few months
that we had this,
over 7,000 judgments,
resulting in about 5,000 edits.
We did really well on tree,
boat, flower, horse,
things that are in landscape paintings.
But when you go to things
like gender discrimination,
and cats and dogs, not so good, I know.
Because there's so many different
types of cats and dogs
in different positions.
But horses, a lot easier
than cats and dogs.
But also, I should note
that Wikimedia Foundation
is now looking into doing
image recognition on Commons uploads
to do these suggestions as well,
which is an awesome development.
Okay, so, dashboards.
Let's just show you
some of these dashboards.
Folks you work with love dashboards.
They just want to see stats.
So, we have them, like BaGLAMa.
We have InteGraality.
Is JeanFred here?
I think this is a very new thing
relative to last WikidataCon.
We actually have a tool
which will create
this property completeness
chart right here.
So, it's called InteGraality,
with two A's.
It's on that big chart
that I showed you before.
And it can just autogenerate
how complete your items are
in any set, which is really cool.
So, we can see that paintings
are by far the highest,
we have sculptures, drawings, photographs.
And then, they also like to see
what are the most popular artworks
in the Wikisphere?
So, just looking at the site links
in Wikidata--
you can see and rank
all these different artworks there.
Also another thing they'd like to see
is what are the most frequent creators
of content or Met artworks--
what are the most commonly
depicted things.
So, these are very easy
to generate in SPARQL,
you could look at it right there,
using bubble graphs.
Then place of birth
of the most prominent artists,
we have a chart there, as well.
So, structured data on Commons.
I just want to show you very briefly
in case you can't get to Sandra's session,
but you definitely should go
to Sandra's session.
You actually can search in Commons
for a specific Wikibase statement.
I don't always remember the syntax,
but you have burn in your brain
and say, it's haswbstatement:P1343=
whatever-- basically, your last
two parts of the triple.
I always get haswb and wbhas mixed up.
I always get the colon
and the equals mixed up.
So just do it once, remember it,
and you'll get the hang of it.
But simple searches are must faster
than SPARQL queries.
So, if you can just look
for one statement,
boom, you'll get the results.
So, things like this, you can look
for symbolically or semantically,
things that depict
the Met museum, for example.
So, finally, community campaigns.
Richard has been a pioneer in this area.
So, once you have the Wikidata items,
they can actually assist
in creating Wikipedia articles.
So, Richard, why don't you tell us
a little bit about the Mbabel tool
that you created for this.
(Richard) Hi, can I get this on?
(Andrew) Oh, use [Joisey's].
(Richard) It's on, now. I'm good.
So, we had all this information
on Wikidata.
[inaudible] browsing data
on our evenings and weekends
to learn about art-- not everyone does.
We have quite a bit more people
[inaudible] Wikipedia,
so how do we get this information
from Wikidata to Wikipedia?
One of the ways of doing this
is this so-called Mbabel,
which developed with the help
of a lot of people in [inaudible].
People like Martin and others.
So, basically to take
some basic art information,
and use it to populate
a Wikipedia article.
So, by who created this work,
who was the artist,
when it was created, et cetera.
The nice thing about this
is it can generate works.
We started with English Wikipedia,
but it's been developed
in other languages.
So, Portuguese Wikipedia,
our Brazilian friends
who've done a lot of work and taking it
to realms beyond art,
to stuff like elections
and political work as well.
And the nice thing about this
is we can query on Wikidata--
so different artists-- so for example,
we've done projects with Women in Red,
looking at women artists.
Projects related to Wiki Loves Pride,
looking at LGBT-identified artists,
African Diaspora Artists,
and a lot of different groups
and things of time periods,
different collections,
and also looking at articles
that have been and haven't been
translated to different languages.
So all of the articles that haven't
been translated to Arabic yet.
You need to find some interesting articles
maybe that are relevant to a culture
that haven't been translated
into that language yet.
We actually have a number of works
in the Met collection
that are in Wikipedias
that aren't in English yet,
because it's a global collection.
So, there are a lot of ways,
and hopefully, we can spread it around
of creating Wikipedia content, as well,
that is driven by these Wikidata items,
and that also maybe
can help spread the improvement
to Wikidata items, as well, in the future.
(Andrew) And there's a number of folks
here using Mbable already, right?
Who's using Mbable
in the room? Brazilians?
And also, if [Armin] is here,
we have our winner
of the Wikipedia Asia Month,
and Wiki Loves Pride contest.
So, thank you for joining,
and congratulations.
We'll have another Wiki Asia Month
campaign in November.
The way I like to describe it
[inaudible]
It doesn't give you a blank page.
It gives you the skeleton,
which is really a much better
user experience
for edit-a-thons and beginners.
So, it's a lot of great work
that Richard has done,
and people are building on it,
which is awesome.
(woman 3) [inaudible] for some of them,
which is really nice.
Yeah, exactly.
(woman 3) [inaudible]
Right. We should have put a URL here.
(man 8) [inaudible]
Oh, that's right.
We have the link right here.
So if you click-- this is a Listeria list,
it's autogenerating all that for you.
And then, you click on the red link,
it'll create the skeleton,
which is pretty cool.
Alright, we're on the final stretch here.
The tool that we're going
to be announcing--
well, we announced a few weeks ago,
but only to a small set of folks,
but we're making a big splash here,
is the depiction tool
that we just created.
Wikipedia has shown that volunteer
contributors can add a lot of these things
that museums can't.
So, what if we created a tool
that could let you enrich
the metadata about artworks
in terms of the depiction information?
And what we did was we applied
for a grant from the Knight Foundation,
and we created this tool--
and is Edward here?
Edward is our wonderful developer
who in like a month, said,
"Okay, here's a prototype."
After we gave him a specification,
and it's pretty cool.
- So what we can do--
- (applause)
Thanks, Edward.
We're working within collections of items.
So, what we do, is we can
bring up a page like this.
It's no longer looking
at a Wikidata item with a tiny picture.
If we're working with what's depicted
in the image, we want the picture big.
And we don't really have tools
that work with big images.
We have tools that deal
with lexical and typing.
So one of the big things that Edward did
was made a big version of the picture,
scrape whatever you can
from the object page
from a GLAM organization,
give you context.
I can see dogs, children, wigwam.
These are things that direct the user
to add meaningful information.
You have some metadata
that's scraped from the site, too.
Teepee, Comanche--
oh, it's Comanche, not Navajo,
because I know the object page said that.
And you can actually start typing
in the field, there.
And the cool thing is that
it gives you context,
It doesn't just match anything
to Wikidata,
it first matches things that have already
been used in other depiction statements.
Very simple thing,
but what a godsend it is
for folks who have tried this in the past.
Don't give me everything
that matches teepee.
Show me what other paintings
have used teepee in the past.
So, it's interactive, context-driven,
statistics-driven,
by showing you what is matched before.
And the cool thing is once you're done
with that painting,
you can start to work in other areas.
You want to work within the same artist,
the collection, location,
other criteria here.
And you can even browse
through the collections
of different organizations,
just work on their paintings.
So, we wanted people
to not live in Wikidata--
kind of onesy-twosies with items,
but live in a space
where you're looking at artworks
in collections that make sense.
And then, you can actually
look through it visually.
It kind of looks like Krotos
or these other tools,
but you can actually live edit
on Wikidata at the same time.
So, go ahead and try it out.
We've only have 14 users,
but we've had 2,100 paintings worked on,
with 5,000 plus depict statements.
That's pretty good for 14.
So, multiply that by 10--
imagine how many more things
we could do with that.
So, you can go ahead and go
to art.wikidata.link and try out the tool.
It uses OLAF authentication,
and you're off to the races.
And it should be very natural
without any kind of training
to add depiction statements to artworks.
But you can put any object.
We don't restrict the object right now.
So, you could put any Q number
to edit this content if you want.
But we primarily stick with paintings
and 2D artworks, right now.
Okay. You can actually look
at the recent changes
and see who's made edits recently to that.
Okay? Okay, so we're going
to wind it down.
Ooh, one minute, then we'll do some Q&A.
So, the final thing that I think
is useful for museum types especially,
is there's a very famous author
named Nina Simon in the museum world,
where she likes to talk about
how do we go from users,
or I guess your audience,
contributing stuff to your collections
to collaborating around content,
to actually being co-creative
and creating new things.
And that's always been tough.
And I'd like to argue that Wikidata
is this co-creative level.
So, it's not just uploading
a file to Commons,
which is contributing something.
It's not just editing an article
with someone else, which is collaborative.
But we are now seeing these tools
that let you make timelines,
and graphs, and bubble charts.
And this is actually the co-creative part
that's really interesting.
And that's what Wikidata provides you.
Because suddenly,
it's not language dependent--
we've got this database
that's got this rich information in it.
So, it's not just pictures, not just text,
but it's all this rich multimedia
that we have the opportunity to work on.
So, this is just another example
of this connected graph
that you can take a look at later on
to show another example
of The Death of Socrates,
and the different themes
around that painting.
And it's really easy
to make this graph yourself.
So again, another scary graphic
that only makes sense
for Wikidata folks, like you.
You just give it a list of Wikidata items,
and it'll do the rest, that's it.
You'll give the list.
Keep all this code the same.
So, fortunately, Martin and Lucas
helped do all this code here.
Just give it a list of items
and the magic will happen.
Hopefully, it won't blow up your computer,
because you're putting in
a reasonable number of items there.
But as long as you have the screen space,
it'll draw the graph,
which is pretty darn cool.
And then, finally, two tools--
I realized at 2 a.m. last night
a few people said,
"I didn't know about these tools."
And you should know about these tools.
So, one is Recoin, which shows you
the relative completeness of an item
compared to other items
of the same instance.
And then, Cradle, which is a way
to have a forms-based way
to create content.
So, these are very useful for edit-a-thons
where if you know that
you're working with just artworks,
don't just let people create items
with a blank screen.
Give them a form to fill out
to start entering in information
that's structured.
And then, finally, we've gone
through some of this, already.
This is my big chart that I love
to get people's feedback on.
How do we get people
across the chasm to be in this space?
We have a lot of folks who, now,
can do template coding,
spreadsheets, QuickStatements,
SPARQL queries, and then we got--
how do we get people to this side
where we have Python
and the things that can do more
sophisticated editing.
It's really hard
to get people across this.
But I would like to say
it's hard to get people across,
but the content and the technology
is not that hard.
We actually need more people
to learn about regular expressions.
And once you get some kind
of experience here,
you'll find that this is a wonderful world
that you can learn a lot in,
but it does take some time
to get across this chasm.
Yes, James.
(James) [inaudible]
No, what it means is that the graph
is not necessarily accurate
in terms of its data points.
But what it means-- I guess
it's more like this is a valley.
It's like we need to get people
across this valley here.
(woman 4) [inaudible]
I would say this is the key.
If we can get people who know this stuff,
but can grok this stuff,
it gets them to this stuff.
Does that make sense? Yeah.
So, my vision for the next few years,
we can get better training
in our community to get people
from batch processing,
which is pretty much what this is,
to kind of intelligent--
I wouldn't say intelligent,
but more sophisticated programming,
that would be a great thing,
because we're seeing this is a bottleneck
to a lot of the stuff
that I just showed you up there.
Yes.
(man 9) [inaudible]
Okay, wait, you want to show me something,
show me after the session, does that work?
Okay. Yes, Megan.
- (Megan) Can I have a microphone?
- Microphone, yes.
- (Megan) [inaudible]
- Yeah.
And we have lunch after this,
so if you want to stay
a little bit later, that's fine, too.
- [inaudible]
- We're already at lunch break? Okay.
(Megan) So, thank you so much
to both you and Richard
for all the work you're doing at the Met.
And I know that you're
very well supported in that.
(mic feedback)
I don't know what happened there.
For the average volunteer community,
how do you balance doing the work
for the cultural heritage organization
versus training the professionals
that are there to do that work?
Where do you find the balance
in terms of labor?
It's a good question.
(Megan) One that really comes up,
I think, with this as well.
- With this?
- (Megan) Yeah, and with building out...
where we put efforts in terms
of building out competencies.
Yeah. I don't have a great answer for you,
but it's a great question.
(Megan) Cool.
(Richard) There are a lot
of tech people at [inaudible]
who understand this side of the graph,
and don't understand it--
the people in [inaudible]
who understand this part of the graph,
and don't understand
this part of the graph.
So, the more we can get Wikimedians
who understand some of this,
with some tech professionals at museums
who understand this,
then that makes it a little bit easier--
and hopefully, as well as
training up Wikimedians,
we can also provide some guidance
and let the museums [inaudible]
to take care of themselves
in the [inaudible].
Yeah, that's a good point.
How many people here know
what regular expressions are?
Raise your hand.
Okay, so how many people are comfortable
specifying a regular expression?
So, yeah, we need more work here.
(laughter)
(man 10) I want to suggest that--
maybe not getting
every Wikidata practitioner,
or institution practitioner
to embrace Python programming is the way.
But as Richard just said, finding more
bridging people-- people like you--
who speak both--
who speak Python,
but also speak GLAM institution--
to help the GLAM's own
technical department, which may not--
they know Python,
they don't know this stuff.
That's, I think, what's needed.
People like you, people like me,
people who speak both of these jargons
to help make the connections,
to document the connections.
You're already doing this, of course.
You share your code, et cetera,
you're doing tutorials.
But we need more of this.
I'm not sure we need
to make everyone programmers.
We already have programmers.
We need to make them understand
the non-programming
material they need to--
I think that's a great point.
We don't need to make everyone
highly proficient in this,
but we do need people
knowledgeable to say that,
"Yeah, we can ingest 400 thousand rows
and do something with it."
Whereas, if you're stuck
on this side, you're like,
"400 thousand rows
sounds really big and scary."
But if you know that it's possible,
you're like, "No problem."
400 thousand is not a problem.
(woman 5) I would just like to chime in
a little bit in that
that there may be countries and areas
where you will not find a GLAM
with any skilled technologists.
So, you will have to invent
something there in the middle.
That's a good point.
Any questions? Sandra.
(Sandra) Yeah, I just wanted
to add to this discussion.
Actually, I've seen some very good cases
where it indeed has been successful
to train GLAM professionals to work
with this entire environment,
and where they've done fantastic jobs,
also at small institutions.
It also requires that you have chapters
or volunteers that can train the staff.
So, it's really like a bigger environment.
But I think that's a model
that if we can manage to make that grow,
it can scale very well, I think.
Good point.
(woman 5) [inaudible]
Sorry, just noting that we don't have
any structured trainings
right now for that.
We might want to develop those,
and that would be helpful.
We have been doing that for education
in terms of teaching people
Wikipedia and Wikidata.
It's just a matter of taking it
one step further.
Right. Stacy.
(Stacy) Well, I'd just like to say
that a lot of professionals
who work in this area of metadata
have all these skills already.
So, I think part of it is just proving
the value to these organizations,
but then it's also tapping
into professional associations who can--
or ways of collaborating within
those professional communities
to build this work, and the documentation
on how to do things
is really, really important,
because I'm not sure about the role
of depending on volunteers,
when some of this work is actually work
GLAM organizations do anyway.
We manage our collections
in a variety of ways through metadata,
and this is actually one more way.
So, should we also not be thinking
about ways to integrate this work
into a GLAM professional's regular job.
And then that way you're generating--
and when you think
about sustainability and scalability,
that's the real trick to making this
sustainable and both scalable,
is that once this is the regular
work of GLAM folks,
we're not worried as much about this part,
because it's just turning
that little switch to get this
to be a part of that work.
Right. Good point. [Shani]?.
(Shani) You're absolutely right.
But I want to echo what you said before.
And yes, Susana-- this might work
for more privileged countries
where they have money,
they have people doing it.
It doesn't work for places
that are still developing,
that don't have resources--
they don't have all of that.
And they can barely do
what they need to do.
So, it's difficult for them, and then,
the community is really helpful.
These are the cases where the community
can have a huge impact actually,
working with the GLAMS,
because they can't do it all
as part of their jobs.
So, we need to think about that as well.
And having these examples,
actually, is hugely important,
because it's helping
to still convince them,
that it's critical to invest in it
and to work with volunteers,
so, with non-professionals
of sorts, to get there.
I can imagine a future where
you don't have to know all this code.
These would just be
kind of like Lego bricks
you can slap together,
saying, "Here's my database.
Here's the crosswalk. Here's Wikidata,"
and just put it together,
and you don't have to even code,
you just have to make sure
the databases are in the right place.
Yep. Okay.
(man 11) Sorry. [inaudible]
I think if I would have done this project,
I'd probably have done it the same way.
So, I think that's maybe a good sign.
I was wondering how did
the whole financing work of this project?
How did the-- I'm sorry?
The financing of this project work.
- The financing?
- Yeah, the money.
That's a good question.
Well, so, there are different parts of it.
So, the Knight grant funded
the Wiki Art Depiction Explorer.
But I, for the last, maybe what--
nine months--
I've been their Wikimedia strategist.
So, I've been on
since February of this year.
So, that's pretty much they're paying
for my time to help with their--
not only the upload of their collections,
but developing these tools, as well.
- (Richard) So the Met's paying you?
- Yeah, that's right.
(Richard) The grant, at least part
of it has come from--
There was a grant for Open Access.
And this is under that campaign
and with the digital department.
So, working as contractors throughout
the Open Access campaign for the Met.
(man 12) I'm sorry.
I guess before you were hired,
and before there was a grant,
there was probably a lot
of volunteer work done to make sure--
Richard did a lot of work before that.
And then, Wikimedia New York
did a lot of work,
but it was kind of in bursts.
It wasn't as comprehensive
as we're talking about now
in terms of having-- making sure
those two layers are complete
in Wikidata.
Alright, yeah. I think that's it.
So, I'm happy to talk after lunch,
or after the break, if you want.
Okay. Thank you.
(applause)