cdn.media.ccc.de/.../wikidatacon2019-1077-eng-Wikidata_Commons_contribution_strategies_for_GLAM_organizations_hd.mp4

Edit subtitles

0:07 - 0:08

Thanks folks.
0:10 - 0:12

As I mentioned before,
you can load up the slides here
0:12 - 0:17

by either the QR code or the short URL,
which is wikidatacon..., this is bit.ly,
0:17 - 0:20

wikidatacon19glamstrategies.
0:20 - 0:22

And the slides are also
on the program page
0:22 - 0:25

on the WikidataCon site.
0:25 - 0:27

And then, there's also an Etherpad here
that you can click on.
0:27 - 0:29

So, I'll be talking about a lot of things.
0:29 - 0:32

that you might have heard about it
at Wikimania, if you were there,
0:32 - 0:34

but we are going to go
into a lot more implementation details.
0:34 - 0:36

Because we're at WikidataCon,
we can dive deeper
0:36 - 0:38

into the Wikidata and technical aspects.
0:38 - 0:42

But Richard and myself, we are working
at the Met Museum right now
0:42 - 0:43

and their Open Access.
0:43 - 0:45

If you didn't know,
about two plus years ago,
0:45 - 0:47

entering to the third year,
0:47 - 0:49

there's been an Open Access
strategy at the Met,
0:49 - 0:53

where they're releasing their images
under CC0 license and their metadata.
0:53 - 0:55

And one of the things
they brought us on to do
0:55 - 0:58

is what things could we imagine doing
with this Open Access content.
0:58 - 1:00

So, we're going to talk
a little bit about that
1:00 - 1:03

in terms of the experiments
that we've been running,
1:03 - 1:04

and we'd love to hear your feedback.
1:04 - 1:07

So, I hope to talk about 20 minutes,
and then hope to get some conversation
1:07 - 1:10

with you folks, since we have
a lot of knowledge in this room.
1:10 - 1:12

This is the announcement,
and actually the one-year anniversary,
1:12 - 1:16

where Katherine Maher was actually there,
at the Met to talk about that anniversary.
1:16 - 1:19

So, one of the things that's challenging
I think for a lot of folks
1:19 - 1:21

is how do you explain Wikidata,
1:21 - 1:24

and this GLAM
contribution strategy to Wikidata
1:24 - 1:27

to C-level folks at an organization.
1:27 - 1:31

We can talk about it with data scientists,
Wikimedians, librarians, maybe curators,
1:31 - 1:34

but when it comes to talking about this
with a director of a museum,
1:34 - 1:37

or a director of a library,
what does it actually--
1:37 - 1:38

how does it resonate with them?
1:38 - 1:41

So, one way that we actually talked
about that I think makes sense,
1:41 - 1:44

is everyone knows about Wikipedia,
1:44 - 1:48

and for the English language edition,
1:48 - 1:50

at least, we're talking
about 6 million articles.
1:50 - 1:52

And it sounds like a lot,
but if you think about it,
1:52 - 1:54

Wikipedia is not really the sum
of all human knowledge,
1:54 - 2:00

it's the sum of all reliably sourced,
mostly western knowledge.
2:00 - 2:02

And there's a lot of stuff out there.
2:02 - 2:04

We have a lot of stuff
in Commons already--
2:04 - 2:07

56 million media files going up
every single day--
2:07 - 2:11

but these are very...
a different type of standard
2:11 - 2:13

to what goes into Wikimedia Commons.
2:13 - 2:16

And the way that we have described
Wikidata to GLAM professionals,
2:16 - 2:18

and especially the C levels,
2:18 - 2:22

is that what if we could have a repository
that has a notability bar
2:22 - 2:24

that is not as high as Wikipedia.
2:24 - 2:26

So, we want all these paintings,
2:26 - 2:28

but not every painting
necessarily needs an article.
2:29 - 2:30

Wikipedia is held back by the fact
2:30 - 2:33

that you need to have
language editions of Wikipedia.
2:33 - 2:37

So, can we store the famous thing--
things, not strings.
2:37 - 2:41

Can we be object oriented
and not really lexical oriented?
2:41 - 2:42

And can we store this in a database
2:42 - 2:45

that stores facts, figures,
and relationships?
2:45 - 2:46

And that's pretty much
what Wikidata does.
2:47 - 2:51

And Wikidata is also a universal
kind of crosswalk database to links
2:51 - 2:52

to other collections out there.
2:52 - 2:55

So, we think this really resonates
with folks when you're talking about
2:55 - 2:59

what is the value of Wikidata compared
to what they're normally familiar with,
2:59 - 3:00

which is just Wikipedia.
3:01 - 3:03

Alright, so what are the benefits?
3:03 - 3:05

You're interlinking
your collections with others.
3:05 - 3:08

So, unfortunately, I apologize
to librarians here,
3:08 - 3:09

I'll be talking mostly about museums,
3:09 - 3:12

but a lot of this also is valid
also for libraries.
3:12 - 3:16

But you're basically connecting
your collection with the global collection
3:16 - 3:18

of linked open data collections.
3:19 - 3:22

You can also receive enriched
and improved metadata back
3:22 - 3:26

after contributing and linking
your collections to the world.
3:26 - 3:28

And there are some pretty neat
interactive multimedia applications
3:28 - 3:31

that you get-- I don't want
to say for free,
3:31 - 3:34

but your collection in Wikidata
allows you to visualize things
3:34 - 3:35

that you've never seen before.
3:35 - 3:37

We'll show you some examples.
3:37 - 3:40

And so, how do you convey this
to GLAM professionals effectively?
3:40 - 3:42

Well, I usually like to start
with storytelling,
3:42 - 3:44

and not technical explanations.
3:44 - 3:46

Okay, so if everyone here
has a cell phone,
3:46 - 3:50

especially if you have an iPhone,
I want you to scan this QR code
3:50 - 3:52

and bring up the URL
that it comes up with.
3:52 - 3:53

Or if you don't have a QR scanner,
3:53 - 3:59

just type in w.wiki/Aij in a web browser.
4:00 - 4:02

So go ahead and scan that.
4:03 - 4:05

And what comes up?
4:07 - 4:09

Does anyone see a knowledge graph
pop up on your screen?
4:10 - 4:11

So, for folks here in WikidataCon,
4:11 - 4:13

this is probably not
revolutionary for you.
4:13 - 4:16

But what it does, it does a SPARQL query
with these objects,
4:16 - 4:19

and it shows the linkages between them.
4:19 - 4:21

And you can actually drag them
around the screen.
4:21 - 4:22

You can actually click on nodes.
4:22 - 4:24

If you're [inaudible] in a mobile,
it will expand that--
4:24 - 4:28

you can actually start to surf
through Wikidata this way.
4:28 - 4:30

So, for Wikidata veterans
this is pretty cool.
4:30 - 4:31

One shot, you get this.
4:31 - 4:33

For a lot folks who have never seen
Wikidata before,
4:33 - 4:36

this is a revolutionary moment for them.
4:36 - 4:39

To actually hand-manipulate
a knowledge graph,
4:39 - 4:42

and to start surfing through Wikidata
without having to know SPARQL,
4:42 - 4:44

without having to know what a Q item is,
4:44 - 4:46

without having to know
what a property proposal is,
4:46 - 4:49

they can suddenly start seeing
connections in a way that is magical.
4:49 - 4:50

Hey, I see [Jacob's] here.
4:50 - 4:52

Jacob's been using
some of this code, as well.
4:52 - 4:54

So, this is some code
that we'll talk about later on
4:54 - 4:57

that allows you to create
these visualizations in Wikidata.
4:57 - 4:59

And we've really seen this
turn a lot of heads
4:59 - 5:01

who have really
never gotten Wikidata before.
5:01 - 5:05

But after seeing these interactive
knowledge graphs, they get it.
5:05 - 5:06

They understand the power of this.
5:06 - 5:08

And especially this example here,
5:08 - 5:11

this was a really big eye-opener
for the folks at the Met,
5:11 - 5:15

because this is the artifact
that is the center of this graph,
5:15 - 5:18

right there, the Portrait of Madame X,
a very famous portrait.
5:18 - 5:21

And they did not even know
that this was the inspiration
5:21 - 5:25

for the black dress that Rita Hayworth
wore in the movie Gilda.
5:25 - 5:27

So, just by seeing this graph, they said,
5:27 - 5:29

"Wait a minute. This is one
of our most visited portraits.
5:29 - 5:32

I didn't know that this was true."
5:32 - 5:35

And there's actually two other books
published about that painting.
5:35 - 5:39

You can see all these things,
not just within the realm of GLAM,
5:39 - 5:41

but it extends to fashion,
it extends to literature.
5:41 - 5:43

You're starting to see
the global connections
5:43 - 5:47

that your artworks have,
or your collections have via Wikidata.
5:49 - 5:50

So, how do we do this?
5:51 - 5:53

If you can remember nothing else
from this presentation,
5:53 - 5:56

this one page is your one-stop shopping.
5:56 - 5:59

Now, fortunately, you don't have
to memorize all this.
5:59 - 6:03

It's actually right here at
Wikidata:Linked_open_data_workflow.
6:04 - 6:06

So, we'll be talking about some
of these different phases
6:06 - 6:11

of how you first prepare,
reconcile, and examine
6:11 - 6:14

what the GLAM organization might have
and what does Wikidata have.
6:14 - 6:15

And then, what are the tools
6:15 - 6:19

to actually ingest
and correct or enrich that
6:19 - 6:20

once it's in Wikidata.
6:20 - 6:23

And then, what are some of ways
to reuse that content,
6:23 - 6:25

or to report and create
new things out of it.
6:25 - 6:31

So, this is the simpler version of a chart
that Sandra and the GLAM folks
6:31 - 6:33

at the foundation have created.
6:33 - 6:36

But this is trying
to sum up, in one shot--
6:36 - 6:38

because we know how hard things
are to find in Wikidata--
6:38 - 6:42

to find in one shot all the different
tools you should pay attention to
6:42 - 6:43

as a GLAM organization.
6:45 - 6:51

So, just using the Met as an example,
we started with what is the ideal object
6:51 - 6:53

that we have in Wikidata
that comes from the Met?
6:53 - 6:56

This is a typical shot of a Wikidata item,
6:56 - 6:57

in the mobile mode there.
6:57 - 6:59

And this is one
of the more famous paintings
6:59 - 7:01

we used as a model, here.
7:01 - 7:03

We have the label,
description, and aliases.
7:04 - 7:05

And then, we found out,
7:05 - 7:07

"What are the core statements
that we wanted?"
7:07 - 7:10

We wanted instance of, image,
inception, collection.
7:10 - 7:13

And what are some other properties
we would like if we had it?
7:13 - 7:16

Depiction information,
material used, things like that.
7:17 - 7:19

We actually do have an identifier.
7:19 - 7:22

The Met object ID is P3634.
7:22 - 7:25

So, for some organizations,
you might want to propose
7:25 - 7:29

a property just to track your items
using an object ID.
7:29 - 7:32

And then, for the Met,
just trying to circumscribe
7:32 - 7:36

what objects do we want to upload
and keep in Wikidata--
7:36 - 7:39

the thing that we first identified
were collection highlights.
7:39 - 7:44

These are like a hand-selected set
of 1,500 to 1,000 items
7:44 - 7:49

that were going to be given priority
to upload to Wikidata.
7:49 - 7:52

So, Richard and the crew
out of Wikimedia in New York
7:52 - 7:53

did a lot of this early work.
7:53 - 7:56

And then, now, we're systematically
going through to make sure
7:56 - 7:57

they're all complete.
7:57 - 7:58

And there's a secondary set
7:58 - 8:01

called the Heilbrunn Timeline
of Art History-- about 8,000 items
8:01 - 8:07

that are seminal pieces of work,
artists' works throughout history.
8:07 - 8:09

And there are about 8,000
that the Met has identified,
8:09 - 8:12

and we're also putting that
on Wikidata, as well,
8:12 - 8:13

using a different destination.
8:13 - 8:16

Here, described by source--
Heilbrunn Timeline of Art History.
8:16 - 8:20

So, the collection highlight
is denoted here as collection--
8:20 - 8:21

Metropolitan Museum of Art,
8:21 - 8:23

subject has role collection highlight.
8:23 - 8:27

And then, these 8,000
or so are like that in Wikidata.
8:30 - 8:34

I couldn't show this chart at Wikimania,
because it's too complicated.
8:34 - 8:35

But WikidataCon, we can.
8:35 - 8:39

So, this is something that is really hard
to answer sometimes.
8:39 - 8:42

What makes something
in Wikidata from the Met,
8:42 - 8:45

or from the New York Public Library,
or from your organization?
8:45 - 8:48

And the answer is not easy.
It's: depends.
8:48 - 8:50

It's complicated, it can be multi-factor.
8:50 - 8:53

So, you could say, "Well, if I had
an object ID in Wikidata,
8:53 - 8:55

that is an embed object."
8:55 - 8:57

But maybe someone didn't enter that.
8:57 - 9:00

Maybe they only put in
Collection: Met which is P195,
9:00 - 9:03

or they put in the accession number,
9:03 - 9:07

and they put collection as the qualifier
to that accession number.
9:07 - 9:11

So, there's actually, one, two, three
different ways to try to find Met objects.
9:11 - 9:14

And probably the best way to do it
is through a union like this.
9:14 - 9:16

So, you combine all three,
and you come back,
9:16 - 9:18

and you make a list out of it.
9:18 - 9:21

So unfortunately, there is
no one clean query
9:21 - 9:24

that'll guarantee you all the Met objects.
9:24 - 9:28

This is probably
the best approach for this.
9:28 - 9:29

And for some institutions,
9:29 - 9:33

they're probably doing
something similar to that right now.
9:33 - 9:36

Alright, so example here,
is that what you see here
9:36 - 9:40

manifests itself differently--
not differently, but as this in a query,
9:40 - 9:41

which can get pretty complex.
9:41 - 9:43

So, if we're looking
for all the collection highlights,
9:43 - 9:48

we'd break this out into the statement
and then the qualifier as this:
9:48 - 9:50

subject has role collection highlight.
9:50 - 9:51

So, that's one way that we sort out
9:51 - 9:54

some of these special
designations in Wikidata.
9:55 - 9:59

So, the summary is,
representing "The Met" is multifaceted,
9:59 - 10:02

and needs to balance simplicity
and findability.
10:02 - 10:05

How many people here have heard
of Sum of All Paintings as a project?
10:05 - 10:07

Ooh, God, good, a lot of you!
10:07 - 10:09

So, it's probably one
of the most active ones
10:09 - 10:11

that deals with these issues.
10:11 - 10:17

So, we always debate whether we should
model things super-accurately,
10:17 - 10:20

or should you model things
so that they're findable.
10:20 - 10:22

These are kind of at odds with each other.
10:22 - 10:24

So, we usually prefer findability.
10:24 - 10:27

It's no good if it's perfectly modeled,
but no one can ever find it,
10:27 - 10:30

because it's so strict
in terms of how it's defined at Wikidata.
10:30 - 10:32

And then, we have some challenges.
10:32 - 10:35

Multiple artifacts might be tied
to one object ID,
10:35 - 10:37

which might be different in Wikidata.
10:37 - 10:42

And then, mapping the Met classification
to instances has some complex cases.
10:42 - 10:44

So, the way that the Met classifies things
10:44 - 10:47

doesn't always fit
with how Wikidata classifies things.
10:47 - 10:50

So, we show you some examples here
of how this works.
10:50 - 10:54

So, this is a great example
of using a Python library
10:54 - 10:56

to actually ingest
what we know from the Met,
10:56 - 10:58

and then try to sort out what they have.
10:58 - 11:00

So, this is just for textiles.
11:00 - 11:02

You can see that they got
a lot of detail here
11:02 - 11:05

in terms of woven textiles, laces,
printed, trimmings, velvets.
11:05 - 11:08

We first looked into this in Wikidata.
11:08 - 11:10

We did not have
this level of detail in Wikidata.
11:10 - 11:12

We still don't have all this resolved.
11:12 - 11:15

You can see that this
is really complex here.
11:15 - 11:18

Anonymous is just not anonymous
for a lot of databases.
11:18 - 11:20

There's a lot of qualifications--
11:20 - 11:23

whether the nationality, or the century.
11:23 - 11:26

So, trying to map all this to Wikidata
can be complex, as well.
11:26 - 11:30

And then, this shows you
that of all the works in the Met,
11:30 - 11:34

about 46% are open access right now.
11:34 - 11:39

So, we still have about just over 50%
that are not CC0 yet.
11:40 - 11:43

(man) All the objects in the Met,
or all objects on display?
11:43 - 11:46

(Andrew) It's weird. It's not on display.
11:46 - 11:48

But it's not all objects either.
11:48 - 11:52

It's about 400 to 500 thousand objects
in their database at this point.
11:52 - 11:54

So, somewhere in between.
11:55 - 11:58

So, starting points.
This is always a hard one.
11:58 - 12:04

We just had this discussion
on the Facebook group recently
12:04 - 12:05

about where do people go
12:05 - 12:08

to find out where the modeling
should look like for a certain thing.
12:08 - 12:09

It's not easy.
12:09 - 12:12

So, normally, what we have to do
is just point people to,
12:12 - 12:15

I don't know, some project
that does it well now?
12:15 - 12:17

So, it's not a satisfying answer,
12:17 - 12:20

but we usually tell folks
to start at things like visual arts,
12:20 - 12:22

or Sum of All Paintings
does it pretty well,
12:22 - 12:26

or just go to the project chat to find out
where some of these things are.
12:26 - 12:27

We need better solutions for this.
12:27 - 12:31

This is just a basic flow
of what we're doing with the Met here.
12:31 - 12:33

We're basically taking
their CSV, and their API,
12:33 - 12:36

and we're consuming it
into a Python data frame.
12:36 - 12:38

We're taking the SPARQL code--
12:38 - 12:40

the one that you saw
before, this super union--
12:40 - 12:44

bring that in, and we're doing
a bi-directional diff,
12:44 - 12:46

and then seeing what new things
have been added here,
12:46 - 12:48

what things have been subtracted there,
12:48 - 12:52

and we're actually making those changes
either through QuickStatements,
12:52 - 12:53

or we're doing it through Pywikibot.
12:53 - 12:56

So, directly editing Wikidata.
12:56 - 12:59

So, this is the big slide
I also couldn't show at Wikimania,
12:59 - 13:01

because it would have flummoxed everyone.
13:01 - 13:05

So, this is a great example
of how we start with the Met database,
13:05 - 13:07

we have this crosswalk database,
13:07 - 13:09

and then we generate
the changes in Wikidata.
13:09 - 13:13

The way this works is this is an example
of one record from the Met.
13:13 - 13:16

This is an evening dress-- we're working
with the Costume Institute recently,
13:16 - 13:18

the one that puts on the Met Gala.
13:18 - 13:20

So, we have one evening dress
here, by Valentina.
13:20 - 13:22

Here's a date, accession number.
13:22 - 13:25

So, these things can be put
into Wikidata directly.
13:25 - 13:28

A field equals the date, accession number.
13:28 - 13:29

But what do we do with things like this?
13:29 - 13:34

This is an object name, which is basically
like a classification of what it is,
13:34 - 13:36

like an instance of for the Met.
13:36 - 13:37

And the designer's Valentina.
13:37 - 13:42

So, what we do is we take these
and we run all the unique object names
13:42 - 13:44

and all the unique designers
through OpenRefine.
13:44 - 13:47

So, we get maybe 60% matches
if we're lucky.
13:47 - 13:48

We put that into a spreadsheet.
13:48 - 13:53

Then we ask volunteers
or the curators at the Met
13:53 - 13:55

to help fill in this crosswalk database.
13:55 - 13:57

This is just simply Google Sheets.
13:57 - 14:00

So, we say, here are all the object names,
the unique object names
14:00 - 14:03

that match lexically exactly
with what's in the Met database,
14:03 - 14:06

and then you say this maps to this Q ID.
14:06 - 14:09

So, we first started
this maybe like only about--
14:09 - 14:11

well, 60% were failed,
some of these were blank.
14:11 - 14:14

So, we tap folks in specific groups.
14:14 - 14:17

So there's like a Wiki Loves Fashion
little chat group that we have.
14:17 - 14:20

And folks like user PKM
were super useful in this area.
14:20 - 14:23

So she spent a lot of time
looking through this, and saying,
14:23 - 14:25

"Okay, Evening suit is this,
Ewer is that."
14:25 - 14:28

So, we looked through
and made all this mappings here.
14:28 - 14:31

And then, what happens is now,
when we see this in the Met database,
14:31 - 14:33

we look it up in the crosswalk database,
and we say, "Oh, yeah.
14:33 - 14:36

These are the two Q numbers
we need to put into Wikidata."
14:36 - 14:39

And then, it generates
the QuickStatement right there.
14:39 - 14:41

Same thing here with Designer: Valentina.
14:41 - 14:44

If Valentina matches here,
then it gets generated
14:44 - 14:46

with that QuickStatement right there.
14:46 - 14:48

If Valentina does not exist,
then we'll create it.
14:48 - 14:51

You can see here, Weeks--
look at that high Q ID right there.
14:51 - 14:54

We just created that recently,
because there was no entry before.
14:54 - 14:55

Does that makes sense to everyone?
14:55 - 14:58

- (man 2) What's the extra statement?
- (Andrew) I'm sorry?
14:58 - 15:01

- (man 2) What's the extra statement?
- (Andrew) Oh, the extra statement.
15:01 - 15:03

So, believe it or not, we have
an Evening blouse, Evening dress,
15:03 - 15:05

Evening pants,
Evening ensemble, Evening hat--
15:05 - 15:09

do we want to make a new Wikidata item
for Evening pants,Evening everything?
15:09 - 15:10

So, we said, "No."
We probably don't want to.
15:10 - 15:14

We'll just say, "It's a dress,
but it's also evening wear",
15:14 - 15:15

which is what that is.
15:15 - 15:17

So, we're saying an instance
of both things.
15:18 - 15:21

I'm not sure it's the perfect solution,
but it's a solution at this point.
15:22 - 15:23

So, does everyone get that?
15:23 - 15:26

So, this is kind of a crosswalk database
that we maintain here.
15:26 - 15:28

And the nice thing about it,
it's just Google Sheets.
15:28 - 15:29

So, we can get people to help
15:29 - 15:31

that don't need to know
anything about this database,
15:31 - 15:34

don't need to know about QuickStatements,
don't need to know about queries.
15:34 - 15:36

They just go in and fill in the Q number.
15:36 - 15:37

Yeah.
15:37 - 15:41

(woman) So, when you copy
object name and you find the Q ID,
15:41 - 15:43

the initial 60%
that you mentioned as an example,
15:43 - 15:45

is that by exact match?
15:46 - 15:48

(Andrew) Well, it's through OpenRefine.
15:48 - 15:52

So, it does its best guess,
and then we verify to make sure
15:52 - 15:54

that the OpenRefine match makes sense.
15:54 - 15:56

Yeah.
15:56 - 15:58

Does that make sense to everyone?
15:58 - 16:00

So, some folks might be doing
some variation on this,
16:00 - 16:03

but I think the nice thing about this
is that, by using Google Sheets,
16:03 - 16:08

we remove a lot of the complexities
of these two areas from this.
16:08 - 16:11

And we'll show you some code
that does this later on.
16:12 - 16:15

- (man 3) How do you generate [inaudible]?
- (Andrew) How do you generate this?
16:15 - 16:17

- (man 3) Yes.
- (Andrew) Python code.
16:17 - 16:19

I'll show you a line that does this.
16:19 - 16:21

But you can also go up here.
16:21 - 16:25

This is the whole Python program
that does this, this, and that,
16:25 - 16:27

if you want to take a look at that.
16:28 - 16:29

Yes.
16:29 - 16:31

(man 4) Did you really use
your own vocabulary,
16:31 - 16:35

or is there something [inaudible].
16:35 - 16:37

- (Andrew) This right here?
- (man 4) Yeah.
16:37 - 16:40

(Andrew) Yeah. So, this
is the Met's own vocabulary.
16:40 - 16:43

So, most museums use
a system called TMS.
16:43 - 16:45

It's like their own management system.
16:45 - 16:48

So, they'll usually--
this is the museum world--
16:48 - 16:51

they'll usually roll
their own vocabulary for their own needs.
16:51 - 16:54

Museums are very late
to interoperable metadata.
16:54 - 16:57

Librarians and archivists have this
kind of as baked into them.
16:57 - 16:59

Museums are like, "Meh..."
16:59 - 17:01

Our primary goal
is to put objects on display,
17:01 - 17:04

and if it plays well with other people,
that's a side benefit.
17:04 - 17:06

But it's not a primary thing that they do.
17:06 - 17:08

So, that's why it's complicated
to work with museums.
17:08 - 17:11

You need to map their vocabulary,
which might be a mish-mash
17:11 - 17:15

of famous vocabularies,
like Getty AAT, and other things.
17:15 - 17:18

But usually, it's to serve
their exact needs at their museum.
17:18 - 17:20

And that's what's challenging.
17:20 - 17:21

And I see a lot of heads nodding,
17:21 - 17:23

so you've probably seen this a lot
at these museums.
17:23 - 17:25

So, I'll move on to show you
how this actually is done.
17:25 - 17:27

Oh, go ahead.
17:27 - 17:29

(man 5) How do you
bring people, to collaborate,
17:29 - 17:32

and put some Q codes into your database?
17:32 - 17:33

(Andrew) How do you-- I'm sorry?
17:33 - 17:35

(man 5) How do you bring...
collaborate people?
17:35 - 17:38

(Andrew) Ah, so for this,
these are projects we just go to,
17:39 - 17:42

for better or for worse,
like Facebook chat groups that we know,
17:42 - 17:43

are active in these areas.
17:43 - 17:46

Like Sum of All Paintings,
Wiki Loves Fashion--
17:46 - 17:48

which is a group
of maybe five or seven folks.
17:49 - 17:51

But we need a better way
to get this out to folks
17:51 - 17:52

so we get more collaborators on this.
17:52 - 17:54

This doesn't scale well, right now.
17:54 - 17:56

But for small groups,
it works pretty well.
17:56 - 17:58

I'm open to ideas.
17:58 - 18:00

(man 5) [inaudible]
18:00 - 18:02

(Andrew) Oh yeah. Please come on up.
18:02 - 18:03

If folks want to come up here,
18:03 - 18:05

there's a little more room
in the aisle right here.
18:06 - 18:10

So, we are utilizing Python
for this mostly.
18:10 - 18:13

If you don't know, there is
a Python notebook system
18:13 - 18:15

that WMFLabs has.
18:15 - 18:17

So, you can actually go on
and start playing with this.
18:17 - 18:20

So, it's pretty easy
to generate a lot of stuff
18:20 - 18:21

if you know some of the code that's there.
18:21 - 18:22

[inaudible], yeah.
18:22 - 18:24

(woman 2) Why do you put everything
18:24 - 18:28

into Wikidata,
and not into your own Wikibase?
18:29 - 18:31

(Andrew) If you're using
your own Wikibase?
18:31 - 18:34

(woman 2) Yeah. Why don't you
use your own Wikibase?
18:34 - 18:36

and then go to [inaudible]
18:36 - 18:38

(Andrew) That's its own ball of--
18:38 - 18:42

I don't want to maintain
my own Wikibase at this point. (laughs)
18:42 - 18:44

If I can avoid doing
the Wikibase maintenance,
18:44 - 18:46

I would not do it.
18:47 - 18:48

(man 6) Would you like a Wikibase?
18:48 - 18:50

(Andrew) We could. It's possible.
18:50 - 18:54

(man 7) But again,
what they use [inaudible]
18:54 - 19:00

about 2,000, 8,000, 10,000,
of 400,000 digital [inaudible].
19:00 - 19:04

So that's only 2.5%,
19:04 - 19:09

[inaudible]
19:09 - 19:13

(Andrew) So, I'd say, solve it for 1,500,
then scale up to 150 thousand.
19:13 - 19:14

So, we're trying to solve it
19:14 - 19:17

for the best
well-known objects, and then--
19:17 - 19:20

(man 7) When do you think
that will happen?
19:21 - 19:26

I understand that those are people
that shouldn't go onto Wikidata.
19:26 - 19:30

So you go to Commons
or your own Wikibase solution,
19:30 - 19:32

not to be a [inaudible]--
19:32 - 19:35

(Andrew) Right. That's why we're going
with the 2,000 and 8,000.
19:35 - 19:37

We're pretty confident
these are highly notable objects
19:37 - 19:39

that deserve to be in Wikidata.
19:39 - 19:40

Beyond that, it's debatable.
19:40 - 19:44

So, that's why we're not
vacuuming 400-thousand things at one shot.
19:44 - 19:49

We're starting with notable 2,000,
notable 8,000, then we'll talk after that.
19:50 - 19:53

So, these are the two lines of code
that do the most stuff here.
19:53 - 19:54

So, even if you don't know Python,
19:54 - 19:56

it's actually not that bad
if you look at this.
19:56 - 19:58

There's a read_csv function.
19:58 - 20:00

You're taking the crosswalk URL,
20:00 - 20:02

basically, the URL
of that Google Spreadsheet.
20:02 - 20:05

You're grabbing the spreadsheet
that's called "Object Name",
20:05 - 20:07

and you're basically creating
a data structure
20:07 - 20:08

that has the Object Name and the QID.
20:08 - 20:10

That's it. That's all you're doing.
20:10 - 20:12

Just pulling that in to the Python code.
20:12 - 20:16

Then, you're actually matching
whatever the entity's name is,
20:16 - 20:18

and then looking up the QID.
20:18 - 20:22

Okay, so, this is just to tell you
that's not super hard.
20:22 - 20:24

The code is available right there,
if you want to look at it.
20:24 - 20:26

But these two lines of code,
which takes a little while
20:26 - 20:30

when you're writing it from scratch
to create these two lines of code,
20:30 - 20:31

but once you have an example,
20:31 - 20:34

it's pretty darn easy to plug in
your own data set, your own crosswalk,
20:34 - 20:37

to generate the QuickStatements.
20:37 - 20:39

So, I've done a lot of the work already,
20:39 - 20:41

and I invite you
to steal the code and try it.
20:42 - 20:45

So, when it comes to images,
it's a little more challenging.
20:45 - 20:48

So, at this point, Pattypan
is probably your best bet.
20:48 - 20:51

Pattypan is a tool that is
a spreadsheet-oriented tool.
20:51 - 20:55

You fill in the metadata, you point
to the local file on your computer,
20:55 - 20:57

and it uploads it to Commons
with all that information,
20:57 - 21:02

or another alternative
is if you set P4765 to a URL--
21:03 - 21:06

because this is the Commons-compatible
image available at URL,
21:06 - 21:09

Martin Dahhmers has a bot,
at least for paintings,
21:09 - 21:12

that will just swoop through and say,
"Oh, we don't have this image.
21:12 - 21:15

Here's a Commons compatible one.
21:15 - 21:18

Why don't I slip it from that site
and put it into Commons?"
21:18 - 21:19

And that's what his bot does.
21:19 - 21:21

So, you can actually take
a look at his bot
21:21 - 21:24

and modify it for your own purposes,
but that is also another alternative
21:24 - 21:28

that doesn't require you
to do some spreadsheet work there.
21:28 - 21:30

If you might have heard
of GLAM Wiki Toolset,
21:30 - 21:33

it's effectively end
of life at this point.
21:33 - 21:37

It hasn't been updated, and even the folks
who have been working with it in the past
21:37 - 21:39

have said Pattypan
is probably your best bet.
21:39 - 21:42

Has anyone used GWT these days?
21:42 - 21:44

A few of you, a little bit.
21:44 - 21:45

It's just not being further developed,
21:45 - 21:48

and it's not compatible with a lot
of our authentication protocols
21:48 - 21:49

that we have now.
21:49 - 21:53

Okay. So, right now, we have basic
metadata added to Wikidata,
21:53 - 21:55

with pretty good results from the Met,
21:55 - 21:58

and we have a Python script here
to also analyze that.
21:58 - 22:00

You're welcome to steal
some of that code, as well.
22:00 - 22:03

So, this is what we are showing
to the Met folks, now.
22:03 - 22:06

We actually have Listeria lists
that are running
22:06 - 22:08

to show all the inventory
22:08 - 22:11

and all the information
that we have in Wikidata.
22:11 - 22:16

And I'll show you very quickly
about a project that we ran to show folks.
22:16 - 22:19

So, what are the benefits of adding
your collections to Wikidata?
22:19 - 22:22

One is to use AI in the image classifier
22:22 - 22:25

to actually help train
a machine learning model
22:25 - 22:29

with all the Met's images and keywords,
and let that be an engine for other folks
22:29 - 22:32

to recognize content.
22:32 - 22:36

So, this is a hack-a-thon that we had
with MIT and Microsoft last year.
22:36 - 22:39

The way this works, is we have
the paintings from the Met,
22:39 - 22:40

and we have the keywords
22:40 - 22:43

that they actually paid a crew
for six months to work on
22:43 - 22:47

to add hand keyword tags
to all the artworks.
22:48 - 22:50

We ingested that
into an AI system right here,
22:50 - 22:51

and then, what we did was say,
22:51 - 22:55

"Let's feed in new images that
this AI ML system had never seen before,
22:55 - 22:57

and see what comes out."
22:57 - 23:00

And the problem is that it comes out
with pretty good results,
23:00 - 23:02

but it's maybe only 60% accurate.
23:02 - 23:05

And for most folks,
60% accurate is garbage.
23:05 - 23:09

How do I get the 60% good
out of this pile of stuff?
23:09 - 23:11

The good news is that our community
knows how to do that.
23:11 - 23:13

We can actually feed this
into a Wikidata game
23:13 - 23:15

and get the good stuff out of that.
23:15 - 23:16

That's basically what we did.
23:16 - 23:18

So, this is the Wikidata game--
23:18 - 23:20

you'll notice this is
Magnus' interface right there--
23:20 - 23:21

being played at the Met Museum,
23:21 - 23:22

in the lobby.
23:22 - 23:25

We actually had folks at a cocktail party
drinking champagne
23:25 - 23:27

and hitting buttons on the screen.
23:27 - 23:31

Hopefully, accurately. (chuckles)
23:31 - 23:33

(applause)
23:33 - 23:35

We had journalists, curators,
23:35 - 23:38

we had some board members
from the Met there as well.
23:38 - 23:39

And this was great.
23:39 - 23:40

No log in, whatever.
23:40 - 23:42

(lowers voice) We created
an account just for this.
23:42 - 23:44

So, they just hit yes-no-yes-no.
23:44 - 23:45

This is great.
23:45 - 23:48

You saw this, it said,
"Is there a tree in this picture?"
23:48 - 23:49

You don't have to train anyone on this.
23:49 - 23:52

You just hit yes--
depicts a tree, not depicted.
23:52 - 23:56

I even had my eight-year-old boys
play this game with a finger tap.
23:57 - 24:00

And we also created a little tool
that showed all the depictions going by
24:00 - 24:02

so people could see them.
24:03 - 24:06

It basically is like--
how do you sift good from bad?
24:06 - 24:08

This is where the Wikimedia
community comes in,
24:08 - 24:11

that no other entity could ever do.
24:12 - 24:15

So, in that first few months
that we had this,
24:15 - 24:19

over 7,000 judgments,
resulting in about 5,000 edits.
24:20 - 24:22

We did really well on tree,
boat, flower, horse,
24:22 - 24:25

things that are in landscape paintings.
24:25 - 24:27

But when you go to things
like gender discrimination,
24:27 - 24:30

and cats and dogs, not so good, I know.
24:30 - 24:32

Because there's so many different
types of cats and dogs
24:32 - 24:33

in different positions.
24:33 - 24:36

But horses, a lot easier
than cats and dogs.
24:37 - 24:39

But also, I should note
that Wikimedia Foundation
24:39 - 24:43

is now looking into doing
image recognition on Commons uploads
24:43 - 24:46

to do these suggestions as well,
which is an awesome development.
24:47 - 24:50

Okay, so, dashboards.
24:51 - 24:53

Let's just show you
some of these dashboards.
24:53 - 24:55

Folks you work with love dashboards.
24:55 - 24:57

They just want to see stats.
24:57 - 24:59

So, we have them, like BaGLAMa.
24:59 - 25:01

We have InteGraality.
25:01 - 25:03

Is JeanFred here?
25:03 - 25:06

I think this is a very new thing
relative to last WikidataCon.
25:06 - 25:08

We actually have a tool
which will create
25:08 - 25:11

this property completeness
chart right here.
25:11 - 25:13

So, it's called InteGraality,
with two A's.
25:13 - 25:16

It's on that big chart
that I showed you before.
25:16 - 25:19

And it can just autogenerate
how complete your items are
25:19 - 25:21

in any set, which is really cool.
25:22 - 25:24

So, we can see that paintings
are by far the highest,
25:24 - 25:26

we have sculptures, drawings, photographs.
25:26 - 25:29

And then, they also like to see
what are the most popular artworks
25:29 - 25:31

in the Wikisphere?
25:31 - 25:33

So, just looking at the site links
in Wikidata--
25:33 - 25:38

you can see and rank
all these different artworks there.
25:40 - 25:42

Also another thing they'd like to see
25:42 - 25:47

is what are the most frequent creators
of content or Met artworks--
25:47 - 25:49

what are the most commonly
depicted things.
25:49 - 25:52

So, these are very easy
to generate in SPARQL,
25:52 - 25:55

you could look at it right there,
using bubble graphs.
25:55 - 25:57

Then place of birth
of the most prominent artists,
25:57 - 25:59

we have a chart there, as well.
25:59 - 26:01

So, structured data on Commons.
26:01 - 26:04

I just want to show you very briefly
in case you can't get to Sandra's session,
26:04 - 26:06

but you definitely should go
to Sandra's session.
26:06 - 26:11

You actually can search in Commons
for a specific Wikibase statement.
26:11 - 26:15

I don't always remember the syntax,
but you have burn in your brain
26:15 - 26:20

and say, it's haswbstatement:P1343=
26:20 - 26:23

whatever-- basically, your last
two parts of the triple.
26:23 - 26:26

I always get haswb and wbhas mixed up.
26:26 - 26:28

I always get the colon
and the equals mixed up.
26:28 - 26:32

So just do it once, remember it,
and you'll get the hang of it.
26:32 - 26:35

But simple searches are must faster
than SPARQL queries.
26:35 - 26:36

So, if you can just look
for one statement,
26:36 - 26:38

boom, you'll get the results.
26:39 - 26:44

So, things like this, you can look
for symbolically or semantically,
26:44 - 26:48

things that depict
the Met museum, for example.
26:48 - 26:50

So, finally, community campaigns.
26:50 - 26:52

Richard has been a pioneer in this area.
26:52 - 26:54

So, once you have the Wikidata items,
26:54 - 26:57

they can actually assist
in creating Wikipedia articles.
26:57 - 27:00

So, Richard, why don't you tell us
a little bit about the Mbabel tool
27:00 - 27:01

that you created for this.
27:01 - 27:03

(Richard) Hi, can I get this on?
27:05 - 27:06

(Andrew) Oh, use [Joisey's].
27:06 - 27:08

(Richard) It's on, now. I'm good.
27:09 - 27:11

So, we had all this information
on Wikidata.
27:11 - 27:14

[inaudible] browsing data
on our evenings and weekends
27:14 - 27:16

to learn about art-- not everyone does.
27:16 - 27:19

We have quite a bit more people
[inaudible] Wikipedia,
27:19 - 27:22

so how do we get this information
from Wikidata to Wikipedia?
27:22 - 27:25

One of the ways of doing this
is this so-called Mbabel,
27:25 - 27:28

which developed with the help
of a lot of people in [inaudible].
27:28 - 27:31

People like Martin and others.
27:32 - 27:35

So, basically to take
some basic art information,
27:35 - 27:38

and use it to populate
a Wikipedia article.
27:38 - 27:40

So, by who created this work,
who was the artist,
27:40 - 27:42

when it was created, et cetera.
27:42 - 27:45

The nice thing about this
is it can generate works.
27:45 - 27:46

We started with English Wikipedia,
27:46 - 27:49

but it's been developed
in other languages.
27:49 - 27:51

So, Portuguese Wikipedia,
our Brazilian friends
27:51 - 27:54

who've done a lot of work and taking it
to realms beyond art,
27:54 - 27:57

to stuff like elections
and political work as well.
27:57 - 28:01

And the nice thing about this
is we can query on Wikidata--
28:02 - 28:07

so different artists-- so for example,
we've done projects with Women in Red,
28:07 - 28:08

looking at women artists.
28:08 - 28:13

Projects related to Wiki Loves Pride,
looking at LGBT-identified artists,
28:13 - 28:14

African Diaspora Artists,
28:14 - 28:16

and a lot of different groups
and things of time periods,
28:16 - 28:19

different collections,
and also looking at articles
28:19 - 28:22

that have been and haven't been
translated to different languages.
28:22 - 28:25

So all of the articles that haven't
been translated to Arabic yet.
28:25 - 28:28

You need to find some interesting articles
maybe that are relevant to a culture
28:28 - 28:30

that haven't been translated
into that language yet.
28:30 - 28:33

We actually have a number of works
in the Met collection
28:33 - 28:35

that are in Wikipedias
that aren't in English yet,
28:35 - 28:37

because it's a global collection.
28:38 - 28:40

So, there are a lot of ways,
and hopefully, we can spread it around
28:40 - 28:45

of creating Wikipedia content, as well,
that is driven by these Wikidata items,
28:45 - 28:48

and that also maybe
can help spread the improvement
28:48 - 28:50

to Wikidata items, as well, in the future.
28:50 - 28:52

(Andrew) And there's a number of folks
here using Mbable already, right?
28:52 - 28:54

Who's using Mbable
in the room? Brazilians?
28:54 - 28:59

And also, if [Armin] is here,
we have our winner
28:59 - 29:03

of the Wikipedia Asia Month,
and Wiki Loves Pride contest.
29:03 - 29:06

So, thank you for joining,
and congratulations.
29:06 - 29:10

We'll have another Wiki Asia Month
campaign in November.
29:10 - 29:13

The way I like to describe it
[inaudible]
29:13 - 29:15

It doesn't give you a blank page.
29:15 - 29:17

It gives you the skeleton,
29:17 - 29:19

which is really a much better
user experience
29:19 - 29:21

for edit-a-thons and beginners.
29:21 - 29:24

So, it's a lot of great work
that Richard has done,
29:24 - 29:26

and people are building on it,
which is awesome.
29:26 - 29:29

(woman 3) [inaudible] for some of them,
which is really nice.
29:29 - 29:30

Yeah, exactly.
29:30 - 29:33

(woman 3) [inaudible]
29:33 - 29:36

Right. We should have put a URL here.
29:36 - 29:38

(man 8) [inaudible]
29:38 - 29:40

Oh, that's right.
We have the link right here.
29:40 - 29:44

So if you click-- this is a Listeria list,
it's autogenerating all that for you.
29:44 - 29:46

And then, you click on the red link,
it'll create the skeleton,
29:46 - 29:47

which is pretty cool.
29:47 - 29:49

Alright, we're on the final stretch here.
29:49 - 29:52

The tool that we're going
to be announcing--
29:52 - 29:55

well, we announced a few weeks ago,
but only to a small set of folks,
29:55 - 29:57

but we're making a big splash here,
29:57 - 29:59

is the depiction tool
that we just created.
29:59 - 30:05

Wikipedia has shown that volunteer
contributors can add a lot of these things
30:05 - 30:07

that museums can't.
30:07 - 30:10

So, what if we created a tool
that could let you enrich
30:10 - 30:16

the metadata about artworks
in terms of the depiction information?
30:16 - 30:19

And what we did was we applied
for a grant from the Knight Foundation,
30:19 - 30:23

and we created this tool--
and is Edward here?
30:23 - 30:27

Edward is our wonderful developer
who in like a month, said,
30:27 - 30:28

"Okay, here's a prototype."
30:28 - 30:33

After we gave him a specification,
and it's pretty cool.
30:34 - 30:36

- So what we can do--
- (applause)
30:36 - 30:37

Thanks, Edward.
30:38 - 30:39

We're working within collections of items.
30:39 - 30:42

So, what we do, is we can
bring up a page like this.
30:42 - 30:45

It's no longer looking
at a Wikidata item with a tiny picture.
30:45 - 30:48

If we're working with what's depicted
in the image, we want the picture big.
30:48 - 30:51

And we don't really have tools
that work with big images.
30:51 - 30:53

We have tools that deal
with lexical and typing.
30:53 - 30:57

So one of the big things that Edward did
was made a big version of the picture,
30:57 - 30:59

scrape whatever you can
from the object page
30:59 - 31:01

from a GLAM organization,
give you context.
31:01 - 31:03

I can see dogs, children, wigwam.
31:03 - 31:06

These are things that direct the user
to add meaningful information.
31:06 - 31:09

You have some metadata
that's scraped from the site, too.
31:09 - 31:12

Teepee, Comanche--
oh, it's Comanche, not Navajo,
31:12 - 31:14

because I know the object page said that.
31:14 - 31:16

And you can actually start typing
in the field, there.
31:16 - 31:18

And the cool thing is that
it gives you context,
31:18 - 31:20

It doesn't just match anything
to Wikidata,
31:20 - 31:23

it first matches things that have already
been used in other depiction statements.
31:23 - 31:25

Very simple thing,
but what a godsend it is
31:25 - 31:27

for folks who have tried this in the past.
31:27 - 31:29

Don't give me everything
that matches teepee.
31:29 - 31:33

Show me what other paintings
have used teepee in the past.
31:33 - 31:36

So, it's interactive, context-driven,
statistics-driven,
31:36 - 31:38

by showing you what is matched before.
31:38 - 31:40

And the cool thing is once you're done
with that painting,
31:40 - 31:42

you can start to work in other areas.
31:42 - 31:45

You want to work within the same artist,
the collection, location,
31:46 - 31:47

other criteria here.
31:47 - 31:49

And you can even browse
through the collections
31:49 - 31:52

of different organizations,
just work on their paintings.
31:52 - 31:54

So, we wanted people
to not live in Wikidata--
31:54 - 31:56

kind of onesy-twosies with items,
but live in a space
31:56 - 31:59

where you're looking at artworks
in collections that make sense.
32:00 - 32:02

And then, you can actually
look through it visually.
32:02 - 32:04

It kind of looks like Krotos
or these other tools,
32:04 - 32:08

but you can actually live edit
on Wikidata at the same time.
32:08 - 32:09

So, go ahead and try it out.
32:09 - 32:11

We've only have 14 users,
32:11 - 32:15

but we've had 2,100 paintings worked on,
with 5,000 plus depict statements.
32:15 - 32:16

That's pretty good for 14.
32:16 - 32:18

So, multiply that by 10--
32:18 - 32:21

imagine how many more things
we could do with that.
32:21 - 32:24

So, you can go ahead and go
to art.wikidata.link and try out the tool.
32:24 - 32:27

It uses OLAF authentication,
and you're off to the races.
32:27 - 32:29

And it should be very natural
without any kind of training
32:29 - 32:32

to add depiction statements to artworks.
32:32 - 32:35

But you can put any object.
We don't restrict the object right now.
32:35 - 32:37

So, you could put any Q number
32:38 - 32:41

to edit this content if you want.
32:41 - 32:45

But we primarily stick with paintings
and 2D artworks, right now.
32:46 - 32:49

Okay. You can actually look
at the recent changes
32:49 - 32:52

and see who's made edits recently to that.
32:53 - 32:55

Okay? Okay, so we're going
to wind it down.
32:55 - 32:58

Ooh, one minute, then we'll do some Q&A.
32:59 - 33:03

So, the final thing that I think
is useful for museum types especially,
33:03 - 33:07

is there's a very famous author
named Nina Simon in the museum world,
33:07 - 33:11

where she likes to talk about
how do we go from users,
33:11 - 33:15

or I guess your audience,
contributing stuff to your collections
33:15 - 33:18

to collaborating around content,
to actually being co-creative
33:18 - 33:20

and creating new things.
33:20 - 33:21

And that's always been tough.
33:21 - 33:24

And I'd like to argue that Wikidata
is this co-creative level.
33:24 - 33:27

So, it's not just uploading
a file to Commons,
33:27 - 33:28

which is contributing something.
33:28 - 33:31

It's not just editing an article
with someone else, which is collaborative.
33:31 - 33:35

But we are now seeing these tools
that let you make timelines,
33:35 - 33:36

and graphs, and bubble charts.
33:36 - 33:39

And this is actually the co-creative part
that's really interesting.
33:39 - 33:40

And that's what Wikidata provides you.
33:40 - 33:42

Because suddenly,
it's not language dependent--
33:42 - 33:45

we've got this database
that's got this rich information in it.
33:46 - 33:49

So, it's not just pictures, not just text,
33:49 - 33:51

but it's all this rich multimedia
33:51 - 33:53

that we have the opportunity to work on.
33:53 - 33:56

So, this is just another example
of this connected graph
33:56 - 33:57

that you can take a look at later on
33:57 - 34:00

to show another example
of The Death of Socrates,
34:00 - 34:02

and the different themes
around that painting.
34:03 - 34:06

And it's really easy
to make this graph yourself.
34:06 - 34:08

So again, another scary graphic
that only makes sense
34:08 - 34:10

for Wikidata folks, like you.
34:10 - 34:14

You just give it a list of Wikidata items,
and it'll do the rest, that's it.
34:14 - 34:16

You'll give the list.
34:16 - 34:18

Keep all this code the same.
34:18 - 34:21

So, fortunately, Martin and Lucas
helped do all this code here.
34:21 - 34:24

Just give it a list of items
and the magic will happen.
34:24 - 34:26

Hopefully, it won't blow up your computer,
34:26 - 34:29

because you're putting in
a reasonable number of items there.
34:29 - 34:32

But as long as you have the screen space,
it'll draw the graph,
34:32 - 34:33

which is pretty darn cool.
34:33 - 34:37

And then, finally, two tools--
I realized at 2 a.m. last night
34:37 - 34:40

a few people said,
"I didn't know about these tools."
34:40 - 34:41

And you should know about these tools.
34:41 - 34:45

So, one is Recoin, which shows you
the relative completeness of an item
34:45 - 34:47

compared to other items
of the same instance.
34:47 - 34:49

And then, Cradle, which is a way
to have a forms-based way
34:49 - 34:51

to create content.
34:51 - 34:52

So, these are very useful for edit-a-thons
34:52 - 34:55

where if you know that
you're working with just artworks,
34:55 - 34:58

don't just let people create items
with a blank screen.
34:58 - 35:00

Give them a form to fill out
to start entering in information
35:00 - 35:02

that's structured.
35:02 - 35:05

And then, finally, we've gone
through some of this, already.
35:06 - 35:10

This is my big chart that I love
to get people's feedback on.
35:10 - 35:14

How do we get people
across the chasm to be in this space?
35:14 - 35:17

We have a lot of folks who, now,
can do template coding,
35:17 - 35:20

spreadsheets, QuickStatements,
SPARQL queries, and then we got--
35:21 - 35:24

how do we get people to this side
where we have Python
35:24 - 35:27

and the things that can do more
sophisticated editing.
35:27 - 35:29

It's really hard
to get people across this.
35:29 - 35:31

But I would like to say
it's hard to get people across,
35:31 - 35:33

but the content and the technology
is not that hard.
35:33 - 35:35

We actually need more people
to learn about regular expressions.
35:35 - 35:38

And once you get some kind
of experience here,
35:38 - 35:42

you'll find that this is a wonderful world
that you can learn a lot in,
35:42 - 35:45

but it does take some time
to get across this chasm.
35:45 - 35:46

Yes, James.
35:46 - 35:52

(James) [inaudible]
35:53 - 35:57

No, what it means is that the graph
is not necessarily accurate
35:57 - 35:59

in terms of its data points.
35:59 - 36:03

But what it means-- I guess
it's more like this is a valley.
36:04 - 36:07

It's like we need to get people
across this valley here.
36:07 - 36:10

(woman 4) [inaudible]
36:10 - 36:12

I would say this is the key.
36:12 - 36:16

If we can get people who know this stuff,
but can grok this stuff,
36:16 - 36:18

it gets them to this stuff.
36:18 - 36:20

Does that make sense? Yeah.
36:20 - 36:24

So, my vision for the next few years,
we can get better training
36:24 - 36:28

in our community to get people
from batch processing,
36:28 - 36:30

which is pretty much what this is,
to kind of intelligent--
36:30 - 36:33

I wouldn't say intelligent,
but more sophisticated programming,
36:33 - 36:35

that would be a great thing,
because we're seeing this is a bottleneck
36:35 - 36:38

to a lot of the stuff
that I just showed you up there.
36:38 - 36:39

Yes.
36:39 - 36:42

(man 9) [inaudible]
36:42 - 36:46

Okay, wait, you want to show me something,
show me after the session, does that work?
36:46 - 36:48

Okay. Yes, Megan.
36:48 - 36:51

- (Megan) Can I have a microphone?
- Microphone, yes.
36:51 - 36:55

- (Megan) [inaudible]
- Yeah.
36:55 - 36:57

And we have lunch after this,
36:57 - 36:59

so if you want to stay
a little bit later, that's fine, too.
36:59 - 37:01

- [inaudible]
- We're already at lunch break? Okay.
37:01 - 37:03

(Megan) So, thank you so much
to both you and Richard
37:03 - 37:05

for all the work you're doing at the Met.
37:05 - 37:07

And I know that you're
very well supported in that.
37:07 - 37:09

(mic feedback)
I don't know what happened there.
37:09 - 37:15

For the average volunteer community,
how do you balance doing the work
37:15 - 37:19

for the cultural heritage organization
versus training the professionals
37:19 - 37:22

that are there to do that work?
37:22 - 37:24

Where do you find the balance
in terms of labor?
37:26 - 37:27

It's a good question.
37:27 - 37:30

(Megan) One that really comes up,
I think, with this as well.
37:30 - 37:33

- With this?
- (Megan) Yeah, and with building out...
37:33 - 37:36

where we put efforts in terms
of building out competencies.
37:36 - 37:39

Yeah. I don't have a great answer for you,
but it's a great question.
37:39 - 37:41

(Megan) Cool.
37:41 - 37:44

(Richard) There are a lot
of tech people at [inaudible]
37:44 - 37:46

who understand this side of the graph,
and don't understand it--
37:46 - 37:49

the people in [inaudible]
who understand this part of the graph,
37:49 - 37:51

and don't understand
this part of the graph.
37:51 - 37:54

So, the more we can get Wikimedians
who understand some of this,
37:54 - 37:58

with some tech professionals at museums
who understand this,
37:58 - 37:59

then that makes it a little bit easier--
37:59 - 38:02

and hopefully, as well as
training up Wikimedians,
38:02 - 38:06

we can also provide some guidance
and let the museums [inaudible]
38:06 - 38:07

to take care of themselves
in the [inaudible].
38:07 - 38:09

Yeah, that's a good point.
38:09 - 38:12

How many people here know
what regular expressions are?
38:12 - 38:13

Raise your hand.
38:13 - 38:17

Okay, so how many people are comfortable
specifying a regular expression?
38:17 - 38:19

So, yeah, we need more work here.
38:19 - 38:21

(laughter)
38:21 - 38:23

(man 10) I want to suggest that--
38:25 - 38:29

maybe not getting
every Wikidata practitioner,
38:29 - 38:34

or institution practitioner
to embrace Python programming is the way.
38:34 - 38:40

But as Richard just said, finding more
bridging people-- people like you--
38:40 - 38:41

who speak both--
38:41 - 38:44

who speak Python,
but also speak GLAM institution--
38:45 - 38:48

to help the GLAM's own
technical department, which may not--
38:49 - 38:52

they know Python,
they don't know this stuff.
38:53 - 38:54

That's, I think, what's needed.
38:54 - 38:59

People like you, people like me,
people who speak both of these jargons
38:59 - 39:02

to help make the connections,
to document the connections.
39:02 - 39:03

You're already doing this, of course.
39:03 - 39:06

You share your code, et cetera,
you're doing tutorials.
39:06 - 39:07

But we need more of this.
39:07 - 39:09

I'm not sure we need
to make everyone programmers.
39:09 - 39:11

We already have programmers.
39:11 - 39:12

We need to make them understand
39:12 - 39:15

the non-programming
material they need to--
39:15 - 39:16

I think that's a great point.
39:16 - 39:18

We don't need to make everyone
highly proficient in this,
39:18 - 39:20

but we do need people
knowledgeable to say that,
39:20 - 39:23

"Yeah, we can ingest 400 thousand rows
and do something with it."
39:23 - 39:25

Whereas, if you're stuck
on this side, you're like,
39:25 - 39:27

"400 thousand rows
sounds really big and scary."
39:27 - 39:30

But if you know that it's possible,
you're like, "No problem."
39:30 - 39:32

400 thousand is not a problem.
39:32 - 39:35

(woman 5) I would just like to chime in
a little bit in that
39:35 - 39:40

that there may be countries and areas
where you will not find a GLAM
39:40 - 39:44

with any skilled technologists.
39:44 - 39:48

So, you will have to invent
something there in the middle.
39:49 - 39:50

That's a good point.
39:50 - 39:51

Any questions? Sandra.
39:56 - 39:58

(Sandra) Yeah, I just wanted
to add to this discussion.
39:58 - 40:02

Actually, I've seen some very good cases
where it indeed has been successful
40:02 - 40:05

to train GLAM professionals to work
with this entire environment,
40:05 - 40:09

and where they've done fantastic jobs,
also at small institutions.
40:10 - 40:15

It also requires that you have chapters
or volunteers that can train the staff.
40:15 - 40:18

So, it's really like a bigger environment.
40:18 - 40:22

But I think that's a model
that if we can manage to make that grow,
40:22 - 40:24

it can scale very well, I think.
40:25 - 40:26

Good point.
40:26 - 40:31

(woman 5) [inaudible]
40:32 - 40:34

Sorry, just noting that we don't have
40:34 - 40:38

any structured trainings
right now for that.
40:38 - 40:42

We might want to develop those,
and that would be helpful.
40:43 - 40:44

We have been doing that for education
40:44 - 40:47

in terms of teaching people
Wikipedia and Wikidata.
40:47 - 40:50

It's just a matter of taking it
one step further.
40:51 - 40:52

Right. Stacy.
40:55 - 40:57

(Stacy) Well, I'd just like to say
that a lot of professionals
40:57 - 41:02

who work in this area of metadata
have all these skills already.
41:02 - 41:09

So, I think part of it is just proving
the value to these organizations,
41:09 - 41:13

but then it's also tapping
into professional associations who can--
41:13 - 41:17

or ways of collaborating within
those professional communities
41:17 - 41:21

to build this work, and the documentation
on how to do things
41:21 - 41:23

is really, really important,
41:23 - 41:27

because I'm not sure about the role
of depending on volunteers,
41:27 - 41:32

when some of this work is actually work
GLAM organizations do anyway.
41:32 - 41:35

We manage our collections
in a variety of ways through metadata,
41:35 - 41:37

and this is actually one more way.
41:37 - 41:40

So, should we also not be thinking
about ways to integrate this work
41:40 - 41:44

into a GLAM professional's regular job.
41:44 - 41:46

And then that way you're generating--
41:46 - 41:49

and when you think
about sustainability and scalability,
41:49 - 41:53

that's the real trick to making this
sustainable and both scalable,
41:54 - 41:59

is that once this is the regular
work of GLAM folks,
41:59 - 42:01

we're not worried as much about this part,
42:01 - 42:04

because it's just turning
that little switch to get this
42:04 - 42:06

to be a part of that work.
42:06 - 42:08

Right. Good point. [Shani]?.
42:12 - 42:13

(Shani) You're absolutely right.
42:13 - 42:16

But I want to echo what you said before.
42:16 - 42:22

And yes, Susana-- this might work
for more privileged countries
42:22 - 42:25

where they have money,
they have people doing it.
42:26 - 42:29

It doesn't work for places
that are still developing,
42:29 - 42:32

that don't have resources--
they don't have all of that.
42:33 - 42:37

And they can barely do
what they need to do.
42:37 - 42:41

So, it's difficult for them, and then,
the community is really helpful.
42:42 - 42:45

These are the cases where the community
can have a huge impact actually,
42:46 - 42:50

working with the GLAMS,
because they can't do it all
42:51 - 42:52

as part of their jobs.
42:53 - 42:55

So, we need to think about that as well.
42:55 - 42:58

And having these examples,
actually, is hugely important,
42:58 - 43:01

because it's helping
to still convince them,
43:01 - 43:06

that it's critical to invest in it
and to work with volunteers,
43:06 - 43:09

so, with non-professionals
of sorts, to get there.
43:10 - 43:13

I can imagine a future where
you don't have to know all this code.
43:13 - 43:14

These would just be
kind of like Lego bricks
43:14 - 43:16

you can slap together,
43:16 - 43:19

saying, "Here's my database.
Here's the crosswalk. Here's Wikidata,"
43:19 - 43:21

and just put it together,
and you don't have to even code,
43:21 - 43:24

you just have to make sure
the databases are in the right place.
43:24 - 43:25

Yep. Okay.
43:27 - 43:29

(man 11) Sorry. [inaudible]
43:29 - 43:34

I think if I would have done this project,
I'd probably have done it the same way.
43:34 - 43:36

So, I think that's maybe a good sign.
43:36 - 43:40

I was wondering how did
the whole financing work of this project?
43:40 - 43:41

How did the-- I'm sorry?
43:41 - 43:43

The financing of this project work.
43:44 - 43:46

- The financing?
- Yeah, the money.
43:46 - 43:48

That's a good question.
43:48 - 43:49

Well, so, there are different parts of it.
43:49 - 43:53

So, the Knight grant funded
the Wiki Art Depiction Explorer.
43:53 - 43:57

But I, for the last, maybe what--
nine months--
43:57 - 43:59

I've been their Wikimedia strategist.
43:59 - 44:02

So, I've been on
since February of this year.
44:02 - 44:05

So, that's pretty much they're paying
for my time to help with their--
44:05 - 44:08

not only the upload of their collections,
but developing these tools, as well.
44:08 - 44:12

- (Richard) So the Met's paying you?
- Yeah, that's right.
44:12 - 44:15

(Richard) The grant, at least part
of it has come from--
44:15 - 44:17

There was a grant for Open Access.
44:17 - 44:20

And this is under that campaign
and with the digital department.
44:20 - 44:24

So, working as contractors throughout
the Open Access campaign for the Met.
44:28 - 44:30

(man 12) I'm sorry.
I guess before you were hired,
44:30 - 44:31

and before there was a grant,
44:31 - 44:34

there was probably a lot
of volunteer work done to make sure--
44:34 - 44:35

Richard did a lot of work before that.
44:35 - 44:37

And then, Wikimedia New York
did a lot of work,
44:37 - 44:39

but it was kind of in bursts.
44:39 - 44:41

It wasn't as comprehensive
as we're talking about now
44:41 - 44:46

in terms of having-- making sure
those two layers are complete
44:46 - 44:47

in Wikidata.
44:49 - 44:51

Alright, yeah. I think that's it.
44:51 - 44:54

So, I'm happy to talk after lunch,
or after the break, if you want.
44:55 - 44:56

Okay. Thank you.
44:56 - 44:59

(applause)

Title:: cdn.media.ccc.de/.../wikidatacon2019-1077-eng-Wikidata_Commons_contribution_strategies_for_GLAM_organizations_hd.mp4
Video Language:: English
Duration:: 45:06

	Bar Sch edited English subtitles for cdn.media.ccc.de/.../wikidatacon2019-1077-eng-Wikidata_Commons_contribution_strategies_for_GLAM_organizations_hd.mp4
	C3Subtitles edited English subtitles for cdn.media.ccc.de/.../wikidatacon2019-1077-eng-Wikidata_Commons_contribution_strategies_for_GLAM_organizations_hd.mp4

English subtitles

Revisions

Revision 2 Uploaded

Bar Sch

cdn.media.ccc.de/.../wikidatacon2019-1077-eng-Wikidata_Commons_contribution_strategies_for_GLAM_organizations_hd.mp4

Revisions

Our website uses cookies

Operating cookies (Required)