-
Thanks folks.
-
As I mentioned before,
you can load up the slides here
-
by either the QR code or the short URL,
which is wikidatacon..., this is bit.ly,
-
wikidatacon19glamstrategies.
-
And the slides are also
on the program page
-
on the WikidataCon site.
-
And then, there's also an Etherpad here
that you can click on.
-
So, I'll be talking about a lot of things.
-
that you might have heard about it
at Wikimania, if you were there,
-
but we are going to go
into a lot more implementation details.
-
Because we're at WikidataCon,
we can dive deeper
-
into the Wikidata and technical aspects.
-
But Richard and myself, we are working
at the Met Museum right now
-
and their Open Access.
-
If you didn't know,
about two plus years ago,
-
entering to the third year,
-
there's been an Open Access
strategy at the Met,
-
where they're releasing their images
under CC0 license and their metadata.
-
And one of the things
they brought us on to do
-
is what things could we imagine doing
with this Open Access content.
-
So, we're going to talk
a little bit about that
-
in terms of the experiments
that we've been running,
-
and we'd love to hear your feedback.
-
So, I hope to talk about 20 minutes,
and then hope to get some conversation
-
with you folks, since we have
a lot of knowledge in this room.
-
This is the announcement,
and actually the one-year anniversary,
-
where Katherine Maher was actually there,
at the Met to talk about that anniversary.
-
So, one of the things that's challenging
I think for a lot of folks
-
is how do you explain Wikidata,
-
and this GLAM
contribution strategy to Wikidata
-
to C-level folks at an organization.
-
We can talk about it with data scientists,
Wikimedians, librarians, maybe curators,
-
but when it comes to talking about this
with a director of a museum,
-
or a director of a library,
what does it actually--
-
how does it resonate with them?
-
So, one way that we actually talked
about that I think makes sense,
-
is everyone knows about Wikipedia,
-
and for the English language edition,
-
at least, we're talking
about 6 million articles.
-
And it sounds like a lot,
but if you think about it,
-
Wikipedia is not really the sum
of all human knowledge,
-
it's the sum of all reliably sourced,
mostly western knowledge.
-
And there's a lot of stuff out there.
-
We have a lot of stuff
in Commons already--
-
56 million media files going up
every single day--
-
but these are very...
a different type of standard
-
to what goes into Wikimedia Commons.
-
And the way that we have described
Wikidata to GLAM professionals,
-
and especially the C levels,
-
is that what if we could have a repository
that has a notability bar
-
that is not as high as Wikipedia.
-
So, we want all these paintings,
-
but not every painting
necessarily needs an article.
-
Wikipedia is held back by the fact
-
that you need to have
language editions of Wikipedia.
-
So, can we store the famous thing--
things, not strings.
-
Can we be object oriented
and not really lexical oriented?
-
And can we store this in a database
-
that stores facts, figures,
and relationships?
-
And that's pretty much
what Wikidata does.
-
And Wikidata is also a universal
kind of crosswalk database to links
-
to other collections out there.
-
So, we think this really resonates
with folks when you're talking about
-
what is the value of Wikidata compared
to what they're normally familiar with,
-
which is just Wikipedia.
-
Alright, so what are the benefits?
-
You're interlinking
your collections with others.
-
So, unfortunately, I apologize
to librarians here,
-
I'll be talking mostly about museums,
-
but a lot of this also is valid
also for libraries.
-
But you're basically connecting
your collection with the global collection
-
of linked open data collections.
-
You can also receive enriched
and improved metadata back
-
after contributing and linking
your collections to the world.
-
And there are some pretty neat
interactive multimedia applications
-
that you get-- I don't want
to say for free,
-
but your collection in Wikidata
allows you to visualize things
-
that you've never seen before.
-
We'll show you some examples.
-
And so, how do you convey this
to GLAM professionals effectively?
-
Well, I usually like to start
with storytelling,
-
and not technical explanations.
-
Okay, so if everyone here
has a cell phone,
-
especially if you have an iPhone,
I want you to scan this QR code
-
and bring up the URL
that it comes up with.
-
Or if you don't have a QR scanner,
-
just type in w.wiki/Aij in a web browser.
-
So go ahead and scan that.
-
And what comes up?
-
Does anyone see a knowledge graph
pop up on your screen?
-
So, for folks here in WikidataCon,
-
this is probably not
revolutionary for you.
-
But what it does, it does a SPARQL query
with these objects,
-
and it shows the linkages between them.
-
And you can actually drag them
around the screen.
-
You can actually click on nodes.
-
If you're [inaudible] in a mobile,
it will expand that--
-
you can actually start to surf
through Wikidata this way.
-
So, for Wikidata veterans
this is pretty cool.
-
One shot, you get this.
-
For a lot folks who have never seen
Wikidata before,
-
this is a revolutionary moment for them.
-
To actually hand-manipulate
a knowledge graph,
-
and to start surfing through Wikidata
without having to know SPARQL,
-
without having to know what a Q item is,
-
without having to know
what a property proposal is,
-
they can suddenly start seeing
connections in a way that is magical.
-
Hey, I see [Jacob's] here.
-
Jacob's been using
some of this code, as well.
-
So, this is some code
that we'll talk about later on
-
that allows you to create
these visualizations in Wikidata.
-
And we've really seen this
turn a lot of heads
-
who have really
never gotten Wikidata before.
-
But after seeing these interactive
knowledge graphs, they get it.
-
They understand the power of this.
-
And especially this example here,
-
this was a really big eye-opener
for the folks at the Met,
-
because this is the artifact
that is the center of this graph,
-
right there, the Portrait of Madame X,
a very famous portrait.
-
And they did not even know
that this was the inspiration
-
for the black dress that Rita Hayworth
wore in the movie Gilda.
-
So, just by seeing this graph, they said,
-
"Wait a minute. This is one
of our most visited portraits.
-
I didn't know that this was true."
-
And there's actually two other books
published about that painting.
-
You can see all these things,
not just within the realm of GLAM,
-
but it extends to fashion,
it extends to literature.
-
You're starting to see
the global connections
-
that your artworks have,
or your collections have via Wikidata.
-
So, how do we do this?
-
If you can remember nothing else
from this presentation,
-
this one page is your one-stop shopping.
-
Now, fortunately, you don't have
to memorize all this.
-
It's actually right here at
Wikidata:Linked_open_data_workflow.
-
So, we'll be talking about some
of these different phases
-
of how you first prepare,
reconcile, and examine
-
what the GLAM organization might have
and what does Wikidata have.
-
And then, what are the tools
-
to actually ingest
and correct or enrich that
-
once it's in Wikidata.
-
And then, what are some of ways
to reuse that content,
-
or to report and create
new things out of it.
-
So, this is the simpler version of a chart
that Sandra and the GLAM folks
-
at the foundation have created.
-
But this is trying
to sum up, in one shot--
-
because we know how hard things
are to find in Wikidata--
-
to find in one shot all the different
tools you should pay attention to
-
as a GLAM organization.
-
So, just using the Met as an example,
we started with what is the ideal object
-
that we have in Wikidata
that comes from the Met?
-
This is a typical shot of a Wikidata item,
-
in the mobile mode there.
-
And this is one
of the more famous paintings
-
we used as a model, here.
-
We have the label,
description, and aliases.
-
And then, we found out,
-
"What are the core statements
that we wanted?"
-
We wanted instance of, image,
inception, collection.
-
And what are some other properties
we would like if we had it?
-
Depiction information,
material used, things like that.
-
We actually do have an identifier.
-
The Met object ID is P3634.
-
So, for some organizations,
you might want to propose
-
a property just to track your items
using an object ID.
-
And then, for the Met,
just trying to circumscribe
-
what objects do we want to upload
and keep in Wikidata--
-
the thing that we first identified
were collection highlights.
-
These are like a hand-selected set
of 1,500 to 1,000 items
-
that were going to be given priority
to upload to Wikidata.
-
So, Richard and the crew
out of Wikimedia in New York
-
did a lot of this early work.
-
And then, now, we're systematically
going through to make sure
-
they're all complete.
-
And there's a secondary set
-
called the Heilbrunn Timeline
of Art History-- about 8,000 items
-
that are seminal pieces of work,
artists' works throughout history.
-
And there are about 8,000
that the Met has identified,
-
and we're also putting that
on Wikidata, as well,
-
using a different destination.
-
Here, described by source--
Heilbrunn Timeline of Art History.
-
So, the collection highlight
is denoted here as collection--
-
Metropolitan Museum of Art,
-
subject has role collection highlight.
-
And then, these 8,000
or so are like that in Wikidata.
-
I couldn't show this chart at Wikimania,
because it's too complicated.
-
But WikidataCon, we can.
-
So, this is something that is really hard
to answer sometimes.
-
What makes something
in Wikidata from the Met,
-
or from the New York Public Library,
or from your organization?
-
And the answer is not easy.
It's: depends.
-
It's complicated, it can be multi-factor.
-
So, you could say, "Well, if I had
an object ID in Wikidata,
-
that is an embed object."
-
But maybe someone didn't enter that.
-
Maybe they only put in
Collection: Met which is P195,
-
or they put in the accession number,
-
and they put collection as the qualifier
to that accession number.
-
So, there's actually, one, two, three
different ways to try to find Met objects.
-
And probably the best way to do it
is through a union like this.
-
So, you combine all three,
and you come back,
-
and you make a list out of it.
-
So unfortunately, there is
no one clean query
-
that'll guarantee you all the Met objects.
-
This is probably
the best approach for this.
-
And for some institutions,
-
they're probably doing
something similar to that right now.
-
Alright, so example here,
is that what you see here
-
manifests itself differently--
not differently, but as this in a query,
-
which can get pretty complex.
-
So, if we're looking
for all the collection highlights,
-
we'd break this out into the statement
and then the qualifier as this:
-
subject has role collection highlight.
-
So, that's one way that we sort out
-
some of these special
designations in Wikidata.
-
So, the summary is,
representing "The Met" is multifaceted,
-
and needs to balance simplicity
and findability.
-
How many people here have heard
of Sum of All Paintings as a project?
-
Ooh, God, good, a lot of you!
-
So, it's probably one
of the most active ones
-
that deals with these issues.
-
So, we always debate whether we should
model things super-accurately,
-
or should you model things
so that they're findable.
-
These are kind of at odds with each other.
-
So, we usually prefer findability.
-
It's no good if it's perfectly modeled,
but no one can ever find it,
-
because it's so strict
in terms of how it's defined at Wikidata.
-
And then, we have some challenges.
-
Multiple artifacts might be tied
to one object ID,
-
which might be different in Wikidata.
-
And then, mapping the Met classification
to instances has some complex cases.
-
So, the way that the Met classifies things
-
doesn't always fit
with how Wikidata classifies things.
-
So, we show you some examples here
of how this works.
-
So, this is a great example
of using a Python library
-
to actually ingest
what we know from the Met,
-
and then try to sort out what they have.
-
So, this is just for textiles.
-
You can see that they got
a lot of detail here
-
in terms of woven textiles, laces,
printed, trimmings, velvets.
-
We first looked into this in Wikidata.
-
We did not have
this level of detail in Wikidata.
-
We still don't have all this resolved.
-
You can see that this
is really complex here.
-
Anonymous is just not anonymous
for a lot of databases.
-
There's a lot of qualifications--
-
whether the nationality, or the century.
-
So, trying to map all this to Wikidata
can be complex, as well.
-
And then, this shows you
that of all the works in the Met,
-
about 46% are open access right now.
-
So, we still have about just over 50%
that are not CC0 yet.
-
(man) All the objects in the Met,
or all objects on display?
-
(Andrew) It's weird. It's not on display.
-
But it's not all objects either.
-
It's about 400 to 500 thousand objects
in their database at this point.
-
So, somewhere in between.
-
So, starting points.
This is always a hard one.
-
We just had this discussion
on the Facebook group recently
-
about where do people go
-
to find out where the modeling
should look like for a certain thing.
-
It's not easy.
-
So, normally, what we have to do
is just point people to,
-
I don't know, some project
that does it well now?
-
So, it's not a satisfying answer,
-
but we usually tell folks
to start at things like visual arts,
-
or Sum of All Paintings
does it pretty well,
-
or just go to the project chat to find out
where some of these things are.
-
We need better solutions for this.
-
This is just a basic flow
of what we're doing with the Met here.
-
We're basically taking
their CSV, and their API,
-
and we're consuming it
into a Python data frame.
-
We're taking the SPARQL code--
-
the one that you saw
before, this super union--
-
bring that in, and we're doing
a bi-directional diff,
-
and then seeing what new things
have been added here,
-
what things have been subtracted there,
-
and we're actually making those changes
either through QuickStatements,
-
or we're doing it through Pywikibot.
-
So, directly editing Wikidata.
-
So, this is the big slide
I also couldn't show at Wikimania,
-
because it would have flummoxed everyone.
-
So, this is a great example
of how we start with the Met database,
-
we have this crosswalk database,
-
and then we generate
the changes in Wikidata.
-
The way this works is this is an example
of one record from the Met.
-
This is an evening dress-- we're working
with the Costume Institute recently,
-
the one that puts on the Met Gala.
-
So, we have one evening dress
here, by Valentina.
-
Here's a date, accession number.
-
So, these things can be put
into Wikidata directly.
-
A field equals the date, accession number.
-
But what do we do with things like this?
-
This is an object name, which is basically
like a classification of what it is,
-
like an instance of for the Met.
-
And the designer's Valentina.
-
So, what we do is we take these
and we run all the unique object names
-
and all the unique designers
through OpenRefine.
-
So, we get maybe 60% matches
if we're lucky.
-
We put that into a spreadsheet.
-
Then we ask volunteers
or the curators at the Met
-
to help fill in this crosswalk database.
-
This is just simply Google Sheets.
-
So, we say, here are all the object names,
the unique object names
-
that match lexically exactly
with what's in the Met database,
-
and then you say this maps to this Q ID.
-
So, we first started
this maybe like only about--
-
well, 60% were failed,
some of these were blank.
-
So, we tap folks in specific groups.
-
So there's like a Wiki Loves Fashion
little chat group that we have.
-
And folks like user PKM
were super useful in this area.
-
So she spent a lot of time
looking through this, and saying,
-
"Okay, Evening suit is this,
Ewer is that."
-
So, we looked through
and made all this mappings here.
-
And then, what happens is now,
when we see this in the Met database,
-
we look it up in the crosswalk database,
and we say, "Oh, yeah.
-
These are the two Q numbers
we need to put into Wikidata."
-
And then, it generates
the QuickStatement right there.
-
Same thing here with Designer: Valentina.
-
If Valentina matches here,
then it gets generated
-
with that QuickStatement right there.
-
If Valentina does not exist,
then we'll create it.
-
You can see here, Weeks--
look at that high Q ID right there.
-
We just created that recently,
because there was no entry before.
-
Does that makes sense to everyone?
-
- (man 2) What's the extra statement?
- (Andrew) I'm sorry?
-
- (man 2) What's the extra statement?
- (Andrew) Oh, the extra statement.
-
So, believe it or not, we have
an Evening blouse, Evening dress,
-
Evening pants,
Evening ensemble, Evening hat--
-
do we want to make a new Wikidata item
for Evening pants,Evening everything?
-
So, we said, "No."
We probably don't want to.
-
We'll just say, "It's a dress,
but it's also evening wear",
-
which is what that is.
-
So, we're saying an instance
of both things.
-
I'm not sure it's the perfect solution,
but it's a solution at this point.
-
So, does everyone get that?
-
So, this is kind of a crosswalk database
that we maintain here.
-
And the nice thing about it,
it's just Google Sheets.
-
So, we can get people to help
-
that don't need to know
anything about this database,
-
don't need to know about QuickStatements,
don't need to know about queries.
-
They just go in and fill in the Q number.
-
Yeah.
-
(woman) So, when you copy
object name and you find the Q ID,
-
the initial 60%
that you mentioned as an example,
-
is that by exact match?
-
(Andrew) Well, it's through OpenRefine.
-
So, it does its best guess,
and then we verify to make sure
-
that the OpenRefine match makes sense.
-
Yeah.
-
Does that make sense to everyone?
-
So, some folks might be doing
some variation on this,
-
but I think the nice thing about this
is that, by using Google Sheets,
-
we remove a lot of the complexities
of these two areas from this.
-
And we'll show you some code
that does this later on.
-
- (man 3) How do you generate [inaudible]?
- (Andrew) How do you generate this?
-
- (man 3) Yes.
- (Andrew) Python code.
-
I'll show you a line that does this.
-
But you can also go up here.
-
This is the whole Python program
that does this, this, and that,
-
if you want to take a look at that.
-
Yes.
-
(man 4) Did you really use
your own vocabulary,
-
or is there something [inaudible].
-
- (Andrew) This right here?
- (man 4) Yeah.
-
(Andrew) Yeah. So, this
is the Met's own vocabulary.
-
So, most museums use
a system called TMS.
-
It's like their own management system.
-
So, they'll usually--
this is the museum world--
-
they'll usually roll
their own vocabulary for their own needs.
-
Museums are very late
to interoperable metadata.
-
Librarians and archivists have this
kind of as baked into them.
-
Museums are like, "Meh..."
-
Our primary goal
is to put objects on display,
-
and if it plays well with other people,
that's a side benefit.
-
But it's not a primary thing that they do.
-
So, that's why it's complicated
to work with museums.
-
You need to map their vocabulary,
which might be a mish-mash
-
of famous vocabularies,
like Getty AAT, and other things.
-
But usually, it's to serve
their exact needs at their museum.
-
And that's what's challenging.
-
And I see a lot of heads nodding,
-
so you've probably seen this a lot
at these museums.
-
So, I'll move on to show you
how this actually is done.
-
Oh, go ahead.
-
(man 5) How do you
bring people, to collaborate,
-
and put some Q codes into your database?
-
(Andrew) How do you-- I'm sorry?
-
(man 5) How do you bring...
collaborate people?
-
(Andrew) Ah, so for this,
these are projects we just go to,
-
for better or for worse,
like Facebook chat groups that we know,
-
are active in these areas.
-
Like Sum of All Paintings,
Wiki Loves Fashion--
-
which is a group
of maybe five or seven folks.
-
But we need a better way
to get this out to folks
-
so we get more collaborators on this.
-
This doesn't scale well, right now.
-
But for small groups,
it works pretty well.
-
I'm open to ideas.
-
(man 5) [inaudible]
-
(Andrew) Oh yeah. Please come on up.
-
If folks want to come up here,
-
there's a little more room
in the aisle right here.
-
So, we are utilizing Python
for this mostly.
-
If you don't know, there is
a Python notebook system
-
that WMFLabs has.
-
So, you can actually go on
and start playing with this.
-
So, it's pretty easy
to generate a lot of stuff
-
if you know some of the code that's there.
-
[inaudible], yeah.
-
(woman 2) Why do you put everything
-
into Wikidata,
and not into your own Wikibase?
-
(Andrew) If you're using
your own Wikibase?
-
(woman 2) Yeah. Why don't you
use your own Wikibase?
-
and then go to [inaudible]
-
(Andrew) That's its own ball of--
-
I don't want to maintain
my own Wikibase at this point. (laughs)
-
If I can avoid doing
the Wikibase maintenance,
-
I would not do it.
-
(man 6) Would you like a Wikibase?
-
(Andrew) We could. It's possible.
-
(man 7) But again,
what they use [inaudible]
-
about 2,000, 8,000, 10,000,
of 400,000 digital [inaudible].
-
So that's only 2.5%,
-
[inaudible]
-
(Andrew) So, I'd say, solve it for 1,500,
then scale up to 150 thousand.
-
So, we're trying to solve it
-
for the best
well-known objects, and then--
-
(man 7) When do you think
that will happen?
-
I understand that those are people
that shouldn't go onto Wikidata.
-
So you go to Commons
or your own Wikibase solution,
-
not to be a [inaudible]--
-
(Andrew) Right. That's why we're going
with the 2,000 and 8,000.
-
We're pretty confident
these are highly notable objects
-
that deserve to be in Wikidata.
-
Beyond that, it's debatable.
-
So, that's why we're not
vacuuming 400-thousand things at one shot.
-
We're starting with notable 2,000,
notable 8,000, then we'll talk after that.
-
So, these are the two lines of code
that do the most stuff here.
-
So, even if you don't know Python,
-
it's actually not that bad
if you look at this.
-
There's a read_csv function.
-
You're taking the crosswalk URL,
-
basically, the URL
of that Google Spreadsheet.
-
You're grabbing the spreadsheet
that's called "Object Name",
-
and you're basically creating
a data structure
-
that has the Object Name and the QID.
-
That's it. That's all you're doing.
-
Just pulling that in to the Python code.
-
Then, you're actually matching
whatever the entity's name is,
-
and then looking up the QID.
-
Okay, so, this is just to tell you
that's not super hard.
-
The code is available right there,
if you want to look at it.
-
But these two lines of code,
which takes a little while
-
when you're writing it from scratch
to create these two lines of code,
-
but once you have an example,
-
it's pretty darn easy to plug in
your own data set, your own crosswalk,
-
to generate the QuickStatements.
-
So, I've done a lot of the work already,
-
and I invite you
to steal the code and try it.
-
So, when it comes to images,
it's a little more challenging.
-
So, at this point, Pattypan
is probably your best bet.
-
Pattypan is a tool that is
a spreadsheet-oriented tool.
-
You fill in the metadata, you point
to the local file on your computer,
-
and it uploads it to Commons
with all that information,
-
or another alternative
is if you set P4765 to a URL--
-
because this is the Commons-compatible
image available at URL,
-
Martin Dahhmers has a bot,
at least for paintings,
-
that will just swoop through and say,
"Oh, we don't have this image.
-
Here's a Commons compatible one.
-
Why don't I slip it from that site
and put it into Commons?"
-
And that's what his bot does.
-
So, you can actually take
a look at his bot
-
and modify it for your own purposes,
but that is also another alternative
-
that doesn't require you
to do some spreadsheet work there.
-
If you might have heard
of GLAM Wiki Toolset,
-
it's effectively end
of life at this point.
-
It hasn't been updated, and even the folks
who have been working with it in the past
-
have said Pattypan
is probably your best bet.
-
Has anyone used GWT these days?
-
A few of you, a little bit.
-
It's just not being further developed,
-
and it's not compatible with a lot
of our authentication protocols
-
that we have now.
-
Okay. So, right now, we have basic
metadata added to Wikidata,
-
with pretty good results from the Met,
-
and we have a Python script here
to also analyze that.
-
You're welcome to steal
some of that code, as well.
-
So, this is what we are showing
to the Met folks, now.
-
We actually have Listeria lists
that are running
-
to show all the inventory
-
and all the information
that we have in Wikidata.
-
And I'll show you very quickly
about a project that we ran to show folks.
-
So, what are the benefits of adding
your collections to Wikidata?
-
One is to use AI in the image classifier
-
to actually help train
a machine learning model
-
with all the Met's images and keywords,
and let that be an engine for other folks
-
to recognize content.
-
So, this is a hack-a-thon that we had
with MIT and Microsoft last year.
-
The way this works, is we have
the paintings from the Met,
-
and we have the keywords
-
that they actually paid a crew
for six months to work on
-
to add hand keyword tags
to all the artworks.
-
We ingested that
into an AI system right here,
-
and then, what we did was say,
-
"Let's feed in new images that
this AI ML system had never seen before,
-
and see what comes out."
-
And the problem is that it comes out
with pretty good results,
-
but it's maybe only 60% accurate.
-
And for most folks,
60% accurate is garbage.
-
How do I get the 60% good
out of this pile of stuff?
-
The good news is that our community
knows how to do that.
-
We can actually feed this
into a Wikidata game
-
and get the good stuff out of that.
-
That's basically what we did.
-
So, this is the Wikidata game--
-
you'll notice this is
Magnus' interface right there--
-
being played at the Met Museum,
-
in the lobby.
-
We actually had folks at a cocktail party
drinking champagne
-
and hitting buttons on the screen.
-
Hopefully, accurately. (chuckles)
-
(applause)
-
We had journalists, curators,
-
we had some board members
from the Met there as well.
-
And this was great.
-
No log in, whatever.
-
(lowers voice) We created
an account just for this.
-
So, they just hit yes-no-yes-no.
-
This is great.
-
You saw this, it said,
"Is there a tree in this picture?"
-
You don't have to train anyone on this.
-
You just hit yes--
depicts a tree, not depicted.
-
I even had my eight-year-old boys
play this game with a finger tap.
-
And we also created a little tool
that showed all the depictions going by
-
so people could see them.
-
It basically is like--
how do you sift good from bad?
-
This is where the Wikimedia
community comes in,
-
that no other entity could ever do.
-
So, in that first few months
that we had this,
-
over 7,000 judgments,
resulting in about 5,000 edits.
-
We did really well on tree,
boat, flower, horse,
-
things that are in landscape paintings.
-
But when you go to things
like gender discrimination,
-
and cats and dogs, not so good, I know.
-
Because there's so many different
types of cats and dogs
-
in different positions.
-
But horses, a lot easier
than cats and dogs.
-
But also, I should note
that Wikimedia Foundation
-
is now looking into doing
image recognition on Commons uploads
-
to do these suggestions as well,
which is an awesome development.
-
Okay, so, dashboards.
-
Let's just show you
some of these dashboards.
-
Folks you work with love dashboards.
-
They just want to see stats.
-
So, we have them, like BaGLAMa.
-
We have InteGraality.
-
Is JeanFred here?
-
I think this is a very new thing
relative to last WikidataCon.
-
We actually have a tool
which will create
-
this property completeness
chart right here.
-
So, it's called InteGraality,
with two A's.
-
It's on that big chart
that I showed you before.
-
And it can just autogenerate
how complete your items are
-
in any set, which is really cool.
-
So, we can see that paintings
are by far the highest,
-
we have sculptures, drawings, photographs.
-
And then, they also like to see
what are the most popular artworks
-
in the Wikisphere?
-
So, just looking at the site links
in Wikidata--
-
you can see and rank
all these different artworks there.
-
Also another thing they'd like to see
-
is what are the most frequent creators
of content or Met artworks--
-
what are the most commonly
depicted things.
-
So, these are very easy
to generate in SPARQL,
-
you could look at it right there,
using bubble graphs.
-
Then place of birth
of the most prominent artists,
-
we have a chart there, as well.
-
So, structured data on Commons.
-
I just want to show you very briefly
in case you can't get to Sandra's session,
-
but you definitely should go
to Sandra's session.
-
You actually can search in Commons
for a specific Wikibase statement.
-
I don't always remember the syntax,
but you have burn in your brain
-
and say, it's haswbstatement:P1343=
-
whatever-- basically, your last
two parts of the triple.
-
I always get haswb and wbhas mixed up.
-
I always get the colon
and the equals mixed up.
-
So just do it once, remember it,
and you'll get the hang of it.
-
But simple searches are must faster
than SPARQL queries.
-
So, if you can just look
for one statement,
-
boom, you'll get the results.
-
So, things like this, you can look
for symbolically or semantically,
-
things that depict
the Met museum, for example.
-
So, finally, community campaigns.
-
Richard has been a pioneer in this area.
-
So, once you have the Wikidata items,
-
they can actually assist
in creating Wikipedia articles.
-
So, Richard, why don't you tell us
a little bit about the Mbabel tool
-
that you created for this.
-
(Richard) Hi, can I get this on?
-
(Andrew) Oh, use [Joisey's].
-
(Richard) It's on, now. I'm good.
-
So, we had all this information
on Wikidata.
-
[inaudible] browsing data
on our evenings and weekends
-
to learn about art-- not everyone does.
-
We have quite a bit more people
[inaudible] Wikipedia,
-
so how do we get this information
from Wikidata to Wikipedia?
-
One of the ways of doing this
is this so-called Mbabel,
-
which developed with the help
of a lot of people in [inaudible].
-
People like Martin and others.
-
So, basically to take
some basic art information,
-
and use it to populate
a Wikipedia article.
-
So, by who created this work,
who was the artist,
-
when it was created, et cetera.
-
The nice thing about this
is it can generate works.
-
We started with English Wikipedia,
-
but it's been developed
in other languages.
-
So, Portuguese Wikipedia,
our Brazilian friends
-
who've done a lot of work and taking it
to realms beyond art,
-
to stuff like elections
and political work as well.
-
And the nice thing about this
is we can query on Wikidata--
-
so different artists-- so for example,
we've done projects with Women in Red,
-
looking at women artists.
-
Projects related to Wiki Loves Pride,
looking at LGBT-identified artists,
-
African Diaspora Artists,
-
and a lot of different groups
and things of time periods,
-
different collections,
and also looking at articles
-
that have been and haven't been
translated to different languages.
-
So all of the articles that haven't
been translated to Arabic yet.
-
You need to find some interesting articles
maybe that are relevant to a culture
-
that haven't been translated
into that language yet.
-
We actually have a number of works
in the Met collection
-
that are in Wikipedias
that aren't in English yet,
-
because it's a global collection.
-
So, there are a lot of ways,
and hopefully, we can spread it around
-
of creating Wikipedia content, as well,
that is driven by these Wikidata items,
-
and that also maybe
can help spread the improvement
-
to Wikidata items, as well, in the future.
-
(Andrew) And there's a number of folks
here using Mbable already, right?
-
Who's using Mbable
in the room? Brazilians?
-
And also, if [Armin] is here,
we have our winner
-
of the Wikipedia Asia Month,
and Wiki Loves Pride contest.
-
So, thank you for joining,
and congratulations.
-
We'll have another Wiki Asia Month
campaign in November.
-
The way I like to describe it
[inaudible]
-
It doesn't give you a blank page.
-
It gives you the skeleton,
-
which is really a much better
user experience
-
for edit-a-thons and beginners.
-
So, it's a lot of great work
that Richard has done,
-
and people are building on it,
which is awesome.
-
(woman 3) [inaudible] for some of them,
which is really nice.
-
Yeah, exactly.
-
(woman 3) [inaudible]
-
Right. We should have put a URL here.
-
(man 8) [inaudible]
-
Oh, that's right.
We have the link right here.
-
So if you click-- this is a Listeria list,
it's autogenerating all that for you.
-
And then, you click on the red link,
it'll create the skeleton,
-
which is pretty cool.
-
Alright, we're on the final stretch here.
-
The tool that we're going
to be announcing--
-
well, we announced a few weeks ago,
but only to a small set of folks,
-
but we're making a big splash here,
-
is the depiction tool
that we just created.
-
Wikipedia has shown that volunteer
contributors can add a lot of these things
-
that museums can't.
-
So, what if we created a tool
that could let you enrich
-
the metadata about artworks
in terms of the depiction information?
-
And what we did was we applied
for a grant from the Knight Foundation,
-
and we created this tool--
and is Edward here?
-
Edward is our wonderful developer
who in like a month, said,
-
"Okay, here's a prototype."
-
After we gave him a specification,
and it's pretty cool.
-
- So what we can do--
- (applause)
-
Thanks, Edward.
-
We're working within collections of items.
-
So, what we do, is we can
bring up a page like this.
-
It's no longer looking
at a Wikidata item with a tiny picture.
-
If we're working with what's depicted
in the image, we want the picture big.
-
And we don't really have tools
that work with big images.
-
We have tools that deal
with lexical and typing.
-
So one of the big things that Edward did
was made a big version of the picture,
-
scrape whatever you can
from the object page
-
from a GLAM organization,
give you context.
-
I can see dogs, children, wigwam.
-
These are things that direct the user
to add meaningful information.
-
You have some metadata
that's scraped from the site, too.
-
Teepee, Comanche--
oh, it's Comanche, not Navajo,
-
because I know the object page said that.
-
And you can actually start typing
in the field, there.
-
And the cool thing is that
it gives you context,
-
It doesn't just match anything
to Wikidata,
-
it first matches things that have already
been used in other depiction statements.
-
Very simple thing,
but what a godsend it is
-
for folks who have tried this in the past.
-
Don't give me everything
that matches teepee.
-
Show me what other paintings
have used teepee in the past.
-
So, it's interactive, context-driven,
statistics-driven,
-
by showing you what is matched before.
-
And the cool thing is once you're done
with that painting,
-
you can start to work in other areas.
-
You want to work within the same artist,
the collection, location,
-
other criteria here.
-
And you can even browse
through the collections
-
of different organizations,
just work on their paintings.
-
So, we wanted people
to not live in Wikidata--
-
kind of onesy-twosies with items,
but live in a space
-
where you're looking at artworks
in collections that make sense.
-
And then, you can actually
look through it visually.
-
It kind of looks like Krotos
or these other tools,
-
but you can actually live edit
on Wikidata at the same time.
-
So, go ahead and try it out.
-
We've only have 14 users,
-
but we've had 2,100 paintings worked on,
with 5,000 plus depict statements.
-
That's pretty good for 14.
-
So, multiply that by 10--
-
imagine how many more things
we could do with that.
-
So, you can go ahead and go
to art.wikidata.link and try out the tool.
-
It uses OLAF authentication,
and you're off to the races.
-
And it should be very natural
without any kind of training
-
to add depiction statements to artworks.
-
But you can put any object.
We don't restrict the object right now.
-
So, you could put any Q number
-
to edit this content if you want.
-
But we primarily stick with paintings
and 2D artworks, right now.
-
Okay. You can actually look
at the recent changes
-
and see who's made edits recently to that.
-
Okay? Okay, so we're going
to wind it down.
-
Ooh, one minute, then we'll do some Q&A.
-
So, the final thing that I think
is useful for museum types especially,
-
is there's a very famous author
named Nina Simon in the museum world,
-
where she likes to talk about
how do we go from users,
-
or I guess your audience,
contributing stuff to your collections
-
to collaborating around content,
to actually being co-creative
-
and creating new things.
-
And that's always been tough.
-
And I'd like to argue that Wikidata
is this co-creative level.
-
So, it's not just uploading
a file to Commons,
-
which is contributing something.
-
It's not just editing an article
with someone else, which is collaborative.
-
But we are now seeing these tools
that let you make timelines,
-
and graphs, and bubble charts.
-
And this is actually the co-creative part
that's really interesting.
-
And that's what Wikidata provides you.
-
Because suddenly,
it's not language dependent--
-
we've got this database
that's got this rich information in it.
-
So, it's not just pictures, not just text,
-
but it's all this rich multimedia
-
that we have the opportunity to work on.
-
So, this is just another example
of this connected graph
-
that you can take a look at later on
-
to show another example
of The Death of Socrates,
-
and the different themes
around that painting.
-
And it's really easy
to make this graph yourself.
-
So again, another scary graphic
that only makes sense
-
for Wikidata folks, like you.
-
You just give it a list of Wikidata items,
and it'll do the rest, that's it.
-
You'll give the list.
-
Keep all this code the same.
-
So, fortunately, Martin and Lucas
helped do all this code here.
-
Just give it a list of items
and the magic will happen.
-
Hopefully, it won't blow up your computer,
-
because you're putting in
a reasonable number of items there.
-
But as long as you have the screen space,
it'll draw the graph,
-
which is pretty darn cool.
-
And then, finally, two tools--
I realized at 2 a.m. last night
-
a few people said,
"I didn't know about these tools."
-
And you should know about these tools.
-
So, one is Recoin, which shows you
the relative completeness of an item
-
compared to other items
of the same instance.
-
And then, Cradle, which is a way
to have a forms-based way
-
to create content.
-
So, these are very useful for edit-a-thons
-
where if you know that
you're working with just artworks,
-
don't just let people create items
with a blank screen.
-
Give them a form to fill out
to start entering in information
-
that's structured.
-
And then, finally, we've gone
through some of this, already.
-
This is my big chart that I love
to get people's feedback on.
-
How do we get people
across the chasm to be in this space?
-
We have a lot of folks who, now,
can do template coding,
-
spreadsheets, QuickStatements,
SPARQL queries, and then we got--
-
how do we get people to this side
where we have Python
-
and the things that can do more
sophisticated editing.
-
It's really hard
to get people across this.
-
But I would like to say
it's hard to get people across,
-
but the content and the technology
is not that hard.
-
We actually need more people
to learn about regular expressions.
-
And once you get some kind
of experience here,
-
you'll find that this is a wonderful world
that you can learn a lot in,
-
but it does take some time
to get across this chasm.
-
Yes, James.
-
(James) [inaudible]
-
No, what it means is that the graph
is not necessarily accurate
-
in terms of its data points.
-
But what it means-- I guess
it's more like this is a valley.
-
It's like we need to get people
across this valley here.
-
(woman 4) [inaudible]
-
I would say this is the key.
-
If we can get people who know this stuff,
but can grok this stuff,
-
it gets them to this stuff.
-
Does that make sense? Yeah.
-
So, my vision for the next few years,
we can get better training
-
in our community to get people
from batch processing,
-
which is pretty much what this is,
to kind of intelligent--
-
I wouldn't say intelligent,
but more sophisticated programming,
-
that would be a great thing,
because we're seeing this is a bottleneck
-
to a lot of the stuff
that I just showed you up there.
-
Yes.
-
(man 9) [inaudible]
-
Okay, wait, you want to show me something,
show me after the session, does that work?
-
Okay. Yes, Megan.
-
- (Megan) Can I have a microphone?
- Microphone, yes.
-
- (Megan) [inaudible]
- Yeah.
-
And we have lunch after this,
-
so if you want to stay
a little bit later, that's fine, too.
-
- [inaudible]
- We're already at lunch break? Okay.
-
(Megan) So, thank you so much
to both you and Richard
-
for all the work you're doing at the Met.
-
And I know that you're
very well supported in that.
-
(mic feedback)
I don't know what happened there.
-
For the average volunteer community,
how do you balance doing the work
-
for the cultural heritage organization
versus training the professionals
-
that are there to do that work?
-
Where do you find the balance
in terms of labor?
-
It's a good question.
-
(Megan) One that really comes up,
I think, with this as well.
-
- With this?
- (Megan) Yeah, and with building out...
-
where we put efforts in terms
of building out competencies.
-
Yeah. I don't have a great answer for you,
but it's a great question.
-
(Megan) Cool.
-
(Richard) There are a lot
of tech people at [inaudible]
-
who understand this side of the graph,
and don't understand it--
-
the people in [inaudible]
who understand this part of the graph,
-
and don't understand
this part of the graph.
-
So, the more we can get Wikimedians
who understand some of this,
-
with some tech professionals at museums
who understand this,
-
then that makes it a little bit easier--
-
and hopefully, as well as
training up Wikimedians,
-
we can also provide some guidance
and let the museums [inaudible]
-
to take care of themselves
in the [inaudible].
-
Yeah, that's a good point.
-
How many people here know
what regular expressions are?
-
Raise your hand.
-
Okay, so how many people are comfortable
specifying a regular expression?
-
So, yeah, we need more work here.
-
(laughter)
-
(man 10) I want to suggest that--
-
maybe not getting
every Wikidata practitioner,
-
or institution practitioner
to embrace Python programming is the way.
-
But as Richard just said, finding more
bridging people-- people like you--
-
who speak both--
-
who speak Python,
but also speak GLAM institution--
-
to help the GLAM's own
technical department, which may not--
-
they know Python,
they don't know this stuff.
-
That's, I think, what's needed.
-
People like you, people like me,
people who speak both of these jargons
-
to help make the connections,
to document the connections.
-
You're already doing this, of course.
-
You share your code, et cetera,
you're doing tutorials.
-
But we need more of this.
-
I'm not sure we need
to make everyone programmers.
-
We already have programmers.
-
We need to make them understand
-
the non-programming
material they need to--
-
I think that's a great point.
-
We don't need to make everyone
highly proficient in this,
-
but we do need people
knowledgeable to say that,
-
"Yeah, we can ingest 400 thousand rows
and do something with it."
-
Whereas, if you're stuck
on this side, you're like,
-
"400 thousand rows
sounds really big and scary."
-
But if you know that it's possible,
you're like, "No problem."
-
400 thousand is not a problem.
-
(woman 5) I would just like to chime in
a little bit in that
-
that there may be countries and areas
where you will not find a GLAM
-
with any skilled technologists.
-
So, you will have to invent
something there in the middle.
-
That's a good point.
-
Any questions? Sandra.
-
(Sandra) Yeah, I just wanted
to add to this discussion.
-
Actually, I've seen some very good cases
where it indeed has been successful
-
to train GLAM professionals to work
with this entire environment,
-
and where they've done fantastic jobs,
also at small institutions.
-
It also requires that you have chapters
or volunteers that can train the staff.
-
So, it's really like a bigger environment.
-
But I think that's a model
that if we can manage to make that grow,
-
it can scale very well, I think.
-
Good point.
-
(woman 5) [inaudible]
-
Sorry, just noting that we don't have
-
any structured trainings
right now for that.
-
We might want to develop those,
and that would be helpful.
-
We have been doing that for education
-
in terms of teaching people
Wikipedia and Wikidata.
-
It's just a matter of taking it
one step further.
-
Right. Stacy.
-
(Stacy) Well, I'd just like to say
that a lot of professionals
-
who work in this area of metadata
have all these skills already.
-
So, I think part of it is just proving
the value to these organizations,
-
but then it's also tapping
into professional associations who can--
-
or ways of collaborating within
those professional communities
-
to build this work, and the documentation
on how to do things
-
is really, really important,
-
because I'm not sure about the role
of depending on volunteers,
-
when some of this work is actually work
GLAM organizations do anyway.
-
We manage our collections
in a variety of ways through metadata,
-
and this is actually one more way.
-
So, should we also not be thinking
about ways to integrate this work
-
into a GLAM professional's regular job.
-
And then that way you're generating--
-
and when you think
about sustainability and scalability,
-
that's the real trick to making this
sustainable and both scalable,
-
is that once this is the regular
work of GLAM folks,
-
we're not worried as much about this part,
-
because it's just turning
that little switch to get this
-
to be a part of that work.
-
Right. Good point. [Shani]?.
-
(Shani) You're absolutely right.
-
But I want to echo what you said before.
-
And yes, Susana-- this might work
for more privileged countries
-
where they have money,
they have people doing it.
-
It doesn't work for places
that are still developing,
-
that don't have resources--
they don't have all of that.
-
And they can barely do
what they need to do.
-
So, it's difficult for them, and then,
the community is really helpful.
-
These are the cases where the community
can have a huge impact actually,
-
working with the GLAMS,
because they can't do it all
-
as part of their jobs.
-
So, we need to think about that as well.
-
And having these examples,
actually, is hugely important,
-
because it's helping
to still convince them,
-
that it's critical to invest in it
and to work with volunteers,
-
so, with non-professionals
of sorts, to get there.
-
I can imagine a future where
you don't have to know all this code.
-
These would just be
kind of like Lego bricks
-
you can slap together,
-
saying, "Here's my database.
Here's the crosswalk. Here's Wikidata,"
-
and just put it together,
and you don't have to even code,
-
you just have to make sure
the databases are in the right place.
-
Yep. Okay.
-
(man 11) Sorry. [inaudible]
-
I think if I would have done this project,
I'd probably have done it the same way.
-
So, I think that's maybe a good sign.
-
I was wondering how did
the whole financing work of this project?
-
How did the-- I'm sorry?
-
The financing of this project work.
-
- The financing?
- Yeah, the money.
-
That's a good question.
-
Well, so, there are different parts of it.
-
So, the Knight grant funded
the Wiki Art Depiction Explorer.
-
But I, for the last, maybe what--
nine months--
-
I've been their Wikimedia strategist.
-
So, I've been on
since February of this year.
-
So, that's pretty much they're paying
for my time to help with their--
-
not only the upload of their collections,
but developing these tools, as well.
-
- (Richard) So the Met's paying you?
- Yeah, that's right.
-
(Richard) The grant, at least part
of it has come from--
-
There was a grant for Open Access.
-
And this is under that campaign
and with the digital department.
-
So, working as contractors throughout
the Open Access campaign for the Met.
-
(man 12) I'm sorry.
I guess before you were hired,
-
and before there was a grant,
-
there was probably a lot
of volunteer work done to make sure--
-
Richard did a lot of work before that.
-
And then, Wikimedia New York
did a lot of work,
-
but it was kind of in bursts.
-
It wasn't as comprehensive
as we're talking about now
-
in terms of having-- making sure
those two layers are complete
-
in Wikidata.
-
Alright, yeah. I think that's it.
-
So, I'm happy to talk after lunch,
or after the break, if you want.
-
Okay. Thank you.
-
(applause)