WEBVTT
00:00:07.138 --> 00:00:08.288
Thanks folks.
00:00:09.627 --> 00:00:11.991
As I mentioned before,
you can load up the slides here
00:00:11.991 --> 00:00:16.661
by either the QR code or the short URL,
which is wikidatacon..., this is bit.ly,
00:00:16.661 --> 00:00:19.920
wikidatacon19glamstrategies.
00:00:19.980 --> 00:00:22.040
And the slides are also
on the program page
00:00:22.040 --> 00:00:24.520
on the WikidataCon site.
00:00:24.549 --> 00:00:27.269
And then, there's also an Etherpad here
that you can click on.
00:00:27.269 --> 00:00:28.959
So, I'll be talking about a lot of things.
00:00:28.959 --> 00:00:31.629
that you might have heard about it
at Wikimania, if you were there,
00:00:31.629 --> 00:00:34.089
but we are going to go
into a lot more implementation details.
00:00:34.089 --> 00:00:36.209
Because we're at WikidataCon,
we can dive deeper
00:00:36.209 --> 00:00:38.430
into the Wikidata and technical aspects.
00:00:38.430 --> 00:00:41.821
But Richard and myself, we are working
at the Met Museum right now
00:00:41.821 --> 00:00:43.200
and their Open Access.
00:00:43.200 --> 00:00:45.320
If you didn't know,
about two plus years ago,
00:00:45.320 --> 00:00:46.920
entering to the third year,
00:00:46.920 --> 00:00:49.320
there's been an Open Access
strategy at the Met,
00:00:49.320 --> 00:00:52.763
where they're releasing their images
under CC0 license and their metadata.
00:00:52.763 --> 00:00:54.639
And one of the things
they brought us on to do
00:00:54.639 --> 00:00:58.409
is what things could we imagine doing
with this Open Access content.
00:00:58.409 --> 00:01:00.469
So, we're going to talk
a little bit about that
00:01:00.469 --> 00:01:02.598
in terms of the experiments
that we've been running,
00:01:02.598 --> 00:01:04.044
and we'd love to hear your feedback.
00:01:04.044 --> 00:01:07.028
So, I hope to talk about 20 minutes,
and then hope to get some conversation
00:01:07.028 --> 00:01:09.853
with you folks, since we have
a lot of knowledge in this room.
00:01:09.923 --> 00:01:12.472
This is the announcement,
and actually the one-year anniversary,
00:01:12.472 --> 00:01:16.452
where Katherine Maher was actually there,
at the Met to talk about that anniversary.
00:01:16.452 --> 00:01:19.172
So, one of the things that's challenging
I think for a lot of folks
00:01:19.172 --> 00:01:21.097
is how do you explain Wikidata,
00:01:21.097 --> 00:01:23.911
and this GLAM
contribution strategy to Wikidata
00:01:23.911 --> 00:01:27.102
to C-level folks at an organization.
00:01:27.102 --> 00:01:31.392
We can talk about it with data scientists,
Wikimedians, librarians, maybe curators,
00:01:31.392 --> 00:01:34.452
but when it comes to talking about this
with a director of a museum,
00:01:34.452 --> 00:01:36.862
or a director of a library,
what does it actually--
00:01:36.862 --> 00:01:38.482
how does it resonate with them?
00:01:38.482 --> 00:01:41.352
So, one way that we actually talked
about that I think makes sense,
00:01:41.352 --> 00:01:43.978
is everyone knows about Wikipedia,
00:01:43.978 --> 00:01:47.799
and for the English language edition,
00:01:47.799 --> 00:01:49.733
at least, we're talking
about 6 million articles.
00:01:49.733 --> 00:01:51.792
And it sounds like a lot,
but if you think about it,
00:01:51.792 --> 00:01:54.361
Wikipedia is not really the sum
of all human knowledge,
00:01:54.361 --> 00:01:59.512
it's the sum of all reliably sourced,
mostly western knowledge.
00:02:00.281 --> 00:02:02.211
And there's a lot of stuff out there.
00:02:02.211 --> 00:02:04.141
We have a lot of stuff
in Commons already--
00:02:04.141 --> 00:02:07.382
56 million media files going up
every single day--
00:02:07.382 --> 00:02:11.484
but these are very...
a different type of standard
00:02:11.484 --> 00:02:13.011
to what goes into Wikimedia Commons.
00:02:13.011 --> 00:02:16.431
And the way that we have described
Wikidata to GLAM professionals,
00:02:16.431 --> 00:02:18.231
and especially the C levels,
00:02:18.231 --> 00:02:22.061
is that what if we could have a repository
that has a notability bar
00:02:22.061 --> 00:02:24.381
that is not as high as Wikipedia.
00:02:24.381 --> 00:02:26.001
So, we want all these paintings,
00:02:26.001 --> 00:02:28.161
but not every painting
necessarily needs an article.
00:02:28.581 --> 00:02:30.241
Wikipedia is held back by the fact
00:02:30.241 --> 00:02:33.082
that you need to have
language editions of Wikipedia.
00:02:33.171 --> 00:02:36.681
So, can we store the famous thing--
things, not strings.
00:02:36.681 --> 00:02:40.570
Can we be object oriented
and not really lexical oriented?
00:02:40.570 --> 00:02:42.181
And can we store this in a database
00:02:42.181 --> 00:02:44.540
that stores facts, figures,
and relationships?
00:02:44.540 --> 00:02:46.291
And that's pretty much
what Wikidata does.
00:02:46.711 --> 00:02:50.736
And Wikidata is also a universal
kind of crosswalk database to links
00:02:50.736 --> 00:02:52.321
to other collections out there.
00:02:52.321 --> 00:02:55.119
So, we think this really resonates
with folks when you're talking about
00:02:55.119 --> 00:02:58.596
what is the value of Wikidata compared
to what they're normally familiar with,
00:02:58.596 --> 00:03:00.326
which is just Wikipedia.
00:03:01.346 --> 00:03:02.876
Alright, so what are the benefits?
00:03:02.876 --> 00:03:05.086
You're interlinking
your collections with others.
00:03:05.086 --> 00:03:07.676
So, unfortunately, I apologize
to librarians here,
00:03:07.676 --> 00:03:09.337
I'll be talking mostly about museums,
00:03:09.337 --> 00:03:11.816
but a lot of this also is valid
also for libraries.
00:03:11.816 --> 00:03:15.867
But you're basically connecting
your collection with the global collection
00:03:15.867 --> 00:03:18.166
of linked open data collections.
00:03:18.846 --> 00:03:22.276
You can also receive enriched
and improved metadata back
00:03:22.276 --> 00:03:25.656
after contributing and linking
your collections to the world.
00:03:25.656 --> 00:03:28.436
And there are some pretty neat
interactive multimedia applications
00:03:28.436 --> 00:03:30.596
that you get-- I don't want
to say for free,
00:03:30.596 --> 00:03:33.596
but your collection in Wikidata
allows you to visualize things
00:03:33.596 --> 00:03:35.276
that you've never seen before.
00:03:35.276 --> 00:03:36.776
We'll show you some examples.
00:03:36.776 --> 00:03:39.737
And so, how do you convey this
to GLAM professionals effectively?
00:03:39.737 --> 00:03:41.746
Well, I usually like to start
with storytelling,
00:03:41.746 --> 00:03:43.536
and not technical explanations.
00:03:43.536 --> 00:03:46.368
Okay, so if everyone here
has a cell phone,
00:03:46.368 --> 00:03:49.574
especially if you have an iPhone,
I want you to scan this QR code
00:03:49.574 --> 00:03:51.645
and bring up the URL
that it comes up with.
00:03:51.645 --> 00:03:53.393
Or if you don't have a QR scanner,
00:03:53.393 --> 00:03:58.963
just type in w.wiki/Aij in a web browser.
00:04:00.036 --> 00:04:01.942
So go ahead and scan that.
00:04:03.280 --> 00:04:04.864
And what comes up?
00:04:06.778 --> 00:04:09.458
Does anyone see a knowledge graph
pop up on your screen?
00:04:09.516 --> 00:04:11.156
So, for folks here in WikidataCon,
00:04:11.156 --> 00:04:13.266
this is probably not
revolutionary for you.
00:04:13.266 --> 00:04:16.386
But what it does, it does a SPARQL query
with these objects,
00:04:16.386 --> 00:04:18.836
and it shows the linkages between them.
00:04:18.836 --> 00:04:20.897
And you can actually drag them
around the screen.
00:04:20.897 --> 00:04:22.204
You can actually click on nodes.
00:04:22.204 --> 00:04:24.458
If you're [inaudible] in a mobile,
it will expand that--
00:04:24.458 --> 00:04:27.554
you can actually start to surf
through Wikidata this way.
00:04:27.554 --> 00:04:29.741
So, for Wikidata veterans
this is pretty cool.
00:04:29.741 --> 00:04:31.206
One shot, you get this.
00:04:31.206 --> 00:04:33.313
For a lot folks who have never seen
Wikidata before,
00:04:33.313 --> 00:04:35.574
this is a revolutionary moment for them.
00:04:36.176 --> 00:04:39.236
To actually hand-manipulate
a knowledge graph,
00:04:39.236 --> 00:04:42.186
and to start surfing through Wikidata
without having to know SPARQL,
00:04:42.186 --> 00:04:43.823
without having to know what a Q item is,
00:04:43.823 --> 00:04:45.860
without having to know
what a property proposal is,
00:04:45.860 --> 00:04:48.623
they can suddenly start seeing
connections in a way that is magical.
00:04:48.623 --> 00:04:50.264
Hey, I see [Jacob's] here.
00:04:50.264 --> 00:04:52.143
Jacob's been using
some of this code, as well.
00:04:52.143 --> 00:04:54.443
So, this is some code
that we'll talk about later on
00:04:54.443 --> 00:04:57.254
that allows you to create
these visualizations in Wikidata.
00:04:57.254 --> 00:04:59.283
And we've really seen this
turn a lot of heads
00:04:59.283 --> 00:05:01.408
who have really
never gotten Wikidata before.
00:05:01.408 --> 00:05:04.653
But after seeing these interactive
knowledge graphs, they get it.
00:05:04.653 --> 00:05:06.233
They understand the power of this.
00:05:06.233 --> 00:05:08.293
And especially this example here,
00:05:08.293 --> 00:05:11.304
this was a really big eye-opener
for the folks at the Met,
00:05:11.304 --> 00:05:14.545
because this is the artifact
that is the center of this graph,
00:05:14.545 --> 00:05:17.823
right there, the Portrait of Madame X,
a very famous portrait.
00:05:17.823 --> 00:05:20.982
And they did not even know
that this was the inspiration
00:05:20.982 --> 00:05:24.693
for the black dress that Rita Hayworth
wore in the movie Gilda.
00:05:24.693 --> 00:05:26.783
So, just by seeing this graph, they said,
00:05:26.783 --> 00:05:29.353
"Wait a minute. This is one
of our most visited portraits.
00:05:29.353 --> 00:05:31.683
I didn't know that this was true."
00:05:31.683 --> 00:05:35.214
And there's actually two other books
published about that painting.
00:05:35.214 --> 00:05:38.983
You can see all these things,
not just within the realm of GLAM,
00:05:38.983 --> 00:05:41.441
but it extends to fashion,
it extends to literature.
00:05:41.441 --> 00:05:43.381
You're starting to see
the global connections
00:05:43.381 --> 00:05:47.481
that your artworks have,
or your collections have via Wikidata.
00:05:48.722 --> 00:05:50.342
So, how do we do this?
00:05:50.842 --> 00:05:53.098
If you can remember nothing else
from this presentation,
00:05:53.098 --> 00:05:56.432
this one page is your one-stop shopping.
00:05:56.432 --> 00:05:58.592
Now, fortunately, you don't have
to memorize all this.
00:05:58.592 --> 00:06:03.292
It's actually right here at
Wikidata:Linked_open_data_workflow.
00:06:03.560 --> 00:06:06.170
So, we'll be talking about some
of these different phases
00:06:06.170 --> 00:06:10.670
of how you first prepare,
reconcile, and examine
00:06:11.160 --> 00:06:14.190
what the GLAM organization might have
and what does Wikidata have.
00:06:14.190 --> 00:06:15.374
And then, what are the tools
00:06:15.374 --> 00:06:18.664
to actually ingest
and correct or enrich that
00:06:18.664 --> 00:06:20.241
once it's in Wikidata.
00:06:20.241 --> 00:06:22.691
And then, what are some of ways
to reuse that content,
00:06:22.691 --> 00:06:25.161
or to report and create
new things out of it.
00:06:25.161 --> 00:06:31.191
So, this is the simpler version of a chart
that Sandra and the GLAM folks
00:06:31.191 --> 00:06:33.111
at the foundation have created.
00:06:33.111 --> 00:06:35.534
But this is trying
to sum up, in one shot--
00:06:35.534 --> 00:06:38.133
because we know how hard things
are to find in Wikidata--
00:06:38.133 --> 00:06:41.733
to find in one shot all the different
tools you should pay attention to
00:06:41.733 --> 00:06:43.475
as a GLAM organization.
00:06:44.969 --> 00:06:50.606
So, just using the Met as an example,
we started with what is the ideal object
00:06:50.606 --> 00:06:53.398
that we have in Wikidata
that comes from the Met?
00:06:53.398 --> 00:06:55.882
This is a typical shot of a Wikidata item,
00:06:55.882 --> 00:06:57.385
in the mobile mode there.
00:06:57.385 --> 00:06:59.244
And this is one
of the more famous paintings
00:06:59.244 --> 00:07:00.729
we used as a model, here.
00:07:00.729 --> 00:07:03.315
We have the label,
description, and aliases.
00:07:03.915 --> 00:07:05.225
And then, we found out,
00:07:05.225 --> 00:07:07.035
"What are the core statements
that we wanted?"
00:07:07.035 --> 00:07:10.035
We wanted instance of, image,
inception, collection.
00:07:10.035 --> 00:07:13.239
And what are some other properties
we would like if we had it?
00:07:13.239 --> 00:07:15.960
Depiction information,
material used, things like that.
00:07:16.879 --> 00:07:19.369
We actually do have an identifier.
00:07:19.369 --> 00:07:22.199
The Met object ID is P3634.
00:07:22.199 --> 00:07:24.629
So, for some organizations,
you might want to propose
00:07:24.629 --> 00:07:28.529
a property just to track your items
using an object ID.
00:07:29.369 --> 00:07:31.899
And then, for the Met,
just trying to circumscribe
00:07:31.899 --> 00:07:35.519
what objects do we want to upload
and keep in Wikidata--
00:07:35.519 --> 00:07:38.927
the thing that we first identified
were collection highlights.
00:07:38.927 --> 00:07:43.649
These are like a hand-selected set
of 1,500 to 1,000 items
00:07:43.678 --> 00:07:48.878
that were going to be given priority
to upload to Wikidata.
00:07:48.939 --> 00:07:51.709
So, Richard and the crew
out of Wikimedia in New York
00:07:51.709 --> 00:07:53.105
did a lot of this early work.
00:07:53.105 --> 00:07:55.571
And then, now, we're systematically
going through to make sure
00:07:55.571 --> 00:07:56.689
they're all complete.
00:07:56.689 --> 00:07:58.221
And there's a secondary set
00:07:58.221 --> 00:08:01.390
called the Heilbrunn Timeline
of Art History-- about 8,000 items
00:08:01.390 --> 00:08:07.149
that are seminal pieces of work,
artists' works throughout history.
00:08:07.149 --> 00:08:09.499
And there are about 8,000
that the Met has identified,
00:08:09.499 --> 00:08:11.812
and we're also putting that
on Wikidata, as well,
00:08:11.812 --> 00:08:13.143
using a different destination.
00:08:13.143 --> 00:08:16.271
Here, described by source--
Heilbrunn Timeline of Art History.
00:08:16.271 --> 00:08:19.841
So, the collection highlight
is denoted here as collection--
00:08:19.841 --> 00:08:21.265
Metropolitan Museum of Art,
00:08:21.265 --> 00:08:22.976
subject has role collection highlight.
00:08:22.976 --> 00:08:26.872
And then, these 8,000
or so are like that in Wikidata.
00:08:29.741 --> 00:08:33.816
I couldn't show this chart at Wikimania,
because it's too complicated.
00:08:33.816 --> 00:08:35.389
But WikidataCon, we can.
00:08:35.389 --> 00:08:38.845
So, this is something that is really hard
to answer sometimes.
00:08:39.490 --> 00:08:42.169
What makes something
in Wikidata from the Met,
00:08:42.169 --> 00:08:44.658
or from the New York Public Library,
or from your organization?
00:08:44.658 --> 00:08:47.609
And the answer is not easy.
It's: depends.
00:08:47.644 --> 00:08:49.684
It's complicated, it can be multi-factor.
00:08:49.684 --> 00:08:53.254
So, you could say, "Well, if I had
an object ID in Wikidata,
00:08:53.254 --> 00:08:54.804
that is an embed object."
00:08:54.804 --> 00:08:56.674
But maybe someone didn't enter that.
00:08:56.674 --> 00:08:59.924
Maybe they only put in
Collection: Met which is P195,
00:08:59.924 --> 00:09:02.684
or they put in the accession number,
00:09:02.684 --> 00:09:06.984
and they put collection as the qualifier
to that accession number.
00:09:06.984 --> 00:09:11.454
So, there's actually, one, two, three
different ways to try to find Met objects.
00:09:11.454 --> 00:09:14.214
And probably the best way to do it
is through a union like this.
00:09:14.214 --> 00:09:16.173
So, you combine all three,
and you come back,
00:09:16.173 --> 00:09:18.064
and you make a list out of it.
00:09:18.064 --> 00:09:20.813
So unfortunately, there is
no one clean query
00:09:20.813 --> 00:09:23.684
that'll guarantee you all the Met objects.
00:09:23.684 --> 00:09:27.873
This is probably
the best approach for this.
00:09:27.873 --> 00:09:29.384
And for some institutions,
00:09:29.384 --> 00:09:32.505
they're probably doing
something similar to that right now.
00:09:32.505 --> 00:09:35.824
Alright, so example here,
is that what you see here
00:09:35.824 --> 00:09:39.684
manifests itself differently--
not differently, but as this in a query,
00:09:39.684 --> 00:09:40.904
which can get pretty complex.
00:09:40.904 --> 00:09:43.063
So, if we're looking
for all the collection highlights,
00:09:43.063 --> 00:09:47.713
we'd break this out into the statement
and then the qualifier as this:
00:09:47.782 --> 00:09:49.712
subject has role collection highlight.
00:09:49.712 --> 00:09:51.450
So, that's one way that we sort out
00:09:51.450 --> 00:09:54.124
some of these special
designations in Wikidata.
00:09:55.166 --> 00:09:58.716
So, the summary is,
representing "The Met" is multifaceted,
00:09:58.716 --> 00:10:01.536
and needs to balance simplicity
and findability.
00:10:01.536 --> 00:10:04.896
How many people here have heard
of Sum of All Paintings as a project?
00:10:04.995 --> 00:10:07.088
Ooh, God, good, a lot of you!
00:10:07.088 --> 00:10:09.105
So, it's probably one
of the most active ones
00:10:09.105 --> 00:10:10.525
that deals with these issues.
00:10:10.525 --> 00:10:17.057
So, we always debate whether we should
model things super-accurately,
00:10:17.057 --> 00:10:19.815
or should you model things
so that they're findable.
00:10:19.815 --> 00:10:21.997
These are kind of at odds with each other.
00:10:21.997 --> 00:10:24.232
So, we usually prefer findability.
00:10:24.232 --> 00:10:27.001
It's no good if it's perfectly modeled,
but no one can ever find it,
00:10:27.001 --> 00:10:30.013
because it's so strict
in terms of how it's defined at Wikidata.
00:10:30.013 --> 00:10:31.882
And then, we have some challenges.
00:10:31.882 --> 00:10:35.367
Multiple artifacts might be tied
to one object ID,
00:10:35.367 --> 00:10:37.396
which might be different in Wikidata.
00:10:37.396 --> 00:10:42.097
And then, mapping the Met classification
to instances has some complex cases.
00:10:42.097 --> 00:10:44.282
So, the way that the Met classifies things
00:10:44.282 --> 00:10:46.775
doesn't always fit
with how Wikidata classifies things.
00:10:46.775 --> 00:10:49.982
So, we show you some examples here
of how this works.
00:10:49.982 --> 00:10:53.602
So, this is a great example
of using a Python library
00:10:53.602 --> 00:10:56.487
to actually ingest
what we know from the Met,
00:10:56.487 --> 00:10:58.313
and then try to sort out what they have.
00:10:58.313 --> 00:10:59.887
So, this is just for textiles.
00:10:59.887 --> 00:11:02.076
You can see that they got
a lot of detail here
00:11:02.076 --> 00:11:05.399
in terms of woven textiles, laces,
printed, trimmings, velvets.
00:11:05.399 --> 00:11:07.907
We first looked into this in Wikidata.
00:11:07.907 --> 00:11:10.175
We did not have
this level of detail in Wikidata.
00:11:10.175 --> 00:11:12.207
We still don't have all this resolved.
00:11:12.207 --> 00:11:14.764
You can see that this
is really complex here.
00:11:14.764 --> 00:11:18.012
Anonymous is just not anonymous
for a lot of databases.
00:11:18.012 --> 00:11:20.126
There's a lot of qualifications--
00:11:20.126 --> 00:11:23.045
whether the nationality, or the century.
00:11:23.045 --> 00:11:26.282
So, trying to map all this to Wikidata
can be complex, as well.
00:11:26.282 --> 00:11:30.450
And then, this shows you
that of all the works in the Met,
00:11:30.450 --> 00:11:33.976
about 46% are open access right now.
00:11:33.976 --> 00:11:38.694
So, we still have about just over 50%
that are not CC0 yet.
00:11:40.134 --> 00:11:43.444
(man) All the objects in the Met,
or all objects on display?
00:11:43.444 --> 00:11:45.957
(Andrew) It's weird. It's not on display.
00:11:45.957 --> 00:11:47.866
But it's not all objects either.
00:11:47.866 --> 00:11:52.176
It's about 400 to 500 thousand objects
in their database at this point.
00:11:52.176 --> 00:11:53.840
So, somewhere in between.
00:11:55.380 --> 00:11:57.609
So, starting points.
This is always a hard one.
00:11:57.609 --> 00:12:03.514
We just had this discussion
on the Facebook group recently
00:12:03.514 --> 00:12:04.923
about where do people go
00:12:04.923 --> 00:12:07.887
to find out where the modeling
should look like for a certain thing.
00:12:07.887 --> 00:12:09.271
It's not easy.
00:12:09.271 --> 00:12:12.115
So, normally, what we have to do
is just point people to,
00:12:12.115 --> 00:12:15.281
I don't know, some project
that does it well now?
00:12:15.281 --> 00:12:17.230
So, it's not a satisfying answer,
00:12:17.230 --> 00:12:19.910
but we usually tell folks
to start at things like visual arts,
00:12:19.910 --> 00:12:22.308
or Sum of All Paintings
does it pretty well,
00:12:22.308 --> 00:12:25.569
or just go to the project chat to find out
where some of these things are.
00:12:25.569 --> 00:12:27.444
We need better solutions for this.
00:12:27.444 --> 00:12:30.939
This is just a basic flow
of what we're doing with the Met here.
00:12:30.939 --> 00:12:33.119
We're basically taking
their CSV, and their API,
00:12:33.119 --> 00:12:35.979
and we're consuming it
into a Python data frame.
00:12:35.979 --> 00:12:38.159
We're taking the SPARQL code--
00:12:38.159 --> 00:12:40.499
the one that you saw
before, this super union--
00:12:40.499 --> 00:12:43.779
bring that in, and we're doing
a bi-directional diff,
00:12:43.779 --> 00:12:45.999
and then seeing what new things
have been added here,
00:12:45.999 --> 00:12:47.729
what things have been subtracted there,
00:12:47.729 --> 00:12:51.529
and we're actually making those changes
either through QuickStatements,
00:12:51.529 --> 00:12:53.439
or we're doing it through Pywikibot.
00:12:53.439 --> 00:12:55.512
So, directly editing Wikidata.
00:12:56.204 --> 00:12:59.405
So, this is the big slide
I also couldn't show at Wikimania,
00:12:59.405 --> 00:13:01.485
because it would have flummoxed everyone.
00:13:01.485 --> 00:13:04.924
So, this is a great example
of how we start with the Met database,
00:13:04.924 --> 00:13:06.824
we have this crosswalk database,
00:13:06.824 --> 00:13:09.209
and then we generate
the changes in Wikidata.
00:13:09.209 --> 00:13:12.644
The way this works is this is an example
of one record from the Met.
00:13:12.644 --> 00:13:15.744
This is an evening dress-- we're working
with the Costume Institute recently,
00:13:15.744 --> 00:13:17.518
the one that puts on the Met Gala.
00:13:17.518 --> 00:13:20.442
So, we have one evening dress
here, by Valentina.
00:13:20.442 --> 00:13:22.100
Here's a date, accession number.
00:13:22.100 --> 00:13:25.105
So, these things can be put
into Wikidata directly.
00:13:25.105 --> 00:13:27.744
A field equals the date, accession number.
00:13:27.744 --> 00:13:29.404
But what do we do with things like this?
00:13:29.404 --> 00:13:33.868
This is an object name, which is basically
like a classification of what it is,
00:13:33.868 --> 00:13:35.648
like an instance of for the Met.
00:13:35.648 --> 00:13:37.396
And the designer's Valentina.
00:13:37.396 --> 00:13:41.571
So, what we do is we take these
and we run all the unique object names
00:13:41.571 --> 00:13:43.801
and all the unique designers
through OpenRefine.
00:13:43.801 --> 00:13:46.720
So, we get maybe 60% matches
if we're lucky.
00:13:46.720 --> 00:13:48.418
We put that into a spreadsheet.
00:13:48.418 --> 00:13:53.178
Then we ask volunteers
or the curators at the Met
00:13:53.178 --> 00:13:55.333
to help fill in this crosswalk database.
00:13:55.333 --> 00:13:57.312
This is just simply Google Sheets.
00:13:57.312 --> 00:13:59.911
So, we say, here are all the object names,
the unique object names
00:13:59.911 --> 00:14:02.731
that match lexically exactly
with what's in the Met database,
00:14:02.731 --> 00:14:05.912
and then you say this maps to this Q ID.
00:14:05.912 --> 00:14:08.556
So, we first started
this maybe like only about--
00:14:08.556 --> 00:14:11.233
well, 60% were failed,
some of these were blank.
00:14:11.233 --> 00:14:13.751
So, we tap folks in specific groups.
00:14:13.751 --> 00:14:17.316
So there's like a Wiki Loves Fashion
little chat group that we have.
00:14:17.316 --> 00:14:20.304
And folks like user PKM
were super useful in this area.
00:14:20.304 --> 00:14:22.794
So she spent a lot of time
looking through this, and saying,
00:14:22.794 --> 00:14:24.764
"Okay, Evening suit is this,
Ewer is that."
00:14:24.764 --> 00:14:27.759
So, we looked through
and made all this mappings here.
00:14:27.759 --> 00:14:30.719
And then, what happens is now,
when we see this in the Met database,
00:14:30.719 --> 00:14:33.201
we look it up in the crosswalk database,
and we say, "Oh, yeah.
00:14:33.201 --> 00:14:36.169
These are the two Q numbers
we need to put into Wikidata."
00:14:36.169 --> 00:14:39.089
And then, it generates
the QuickStatement right there.
00:14:39.089 --> 00:14:41.328
Same thing here with Designer: Valentina.
00:14:41.328 --> 00:14:44.138
If Valentina matches here,
then it gets generated
00:14:44.138 --> 00:14:45.838
with that QuickStatement right there.
00:14:45.838 --> 00:14:48.069
If Valentina does not exist,
then we'll create it.
00:14:48.069 --> 00:14:51.288
You can see here, Weeks--
look at that high Q ID right there.
00:14:51.288 --> 00:14:53.918
We just created that recently,
because there was no entry before.
00:14:53.918 --> 00:14:55.358
Does that makes sense to everyone?
00:14:55.358 --> 00:14:57.727
- (man 2) What's the extra statement?
- (Andrew) I'm sorry?
00:14:57.727 --> 00:15:00.610
- (man 2) What's the extra statement?
- (Andrew) Oh, the extra statement.
00:15:00.610 --> 00:15:03.131
So, believe it or not, we have
an Evening blouse, Evening dress,
00:15:03.131 --> 00:15:05.010
Evening pants,
Evening ensemble, Evening hat--
00:15:05.010 --> 00:15:08.650
do we want to make a new Wikidata item
for Evening pants,Evening everything?
00:15:08.650 --> 00:15:10.444
So, we said, "No."
We probably don't want to.
00:15:10.444 --> 00:15:13.859
We'll just say, "It's a dress,
but it's also evening wear",
00:15:13.859 --> 00:15:15.117
which is what that is.
00:15:15.117 --> 00:15:17.301
So, we're saying an instance
of both things.
00:15:17.931 --> 00:15:21.398
I'm not sure it's the perfect solution,
but it's a solution at this point.
00:15:21.744 --> 00:15:22.944
So, does everyone get that?
00:15:22.944 --> 00:15:25.564
So, this is kind of a crosswalk database
that we maintain here.
00:15:25.564 --> 00:15:28.025
And the nice thing about it,
it's just Google Sheets.
00:15:28.025 --> 00:15:29.264
So, we can get people to help
00:15:29.264 --> 00:15:31.375
that don't need to know
anything about this database,
00:15:31.375 --> 00:15:34.384
don't need to know about QuickStatements,
don't need to know about queries.
00:15:34.384 --> 00:15:36.226
They just go in and fill in the Q number.
00:15:36.226 --> 00:15:37.244
Yeah.
00:15:37.244 --> 00:15:40.902
(woman) So, when you copy
object name and you find the Q ID,
00:15:40.902 --> 00:15:43.145
the initial 60%
that you mentioned as an example,
00:15:43.145 --> 00:15:45.223
is that by exact match?
00:15:46.483 --> 00:15:48.103
(Andrew) Well, it's through OpenRefine.
00:15:48.103 --> 00:15:52.014
So, it does its best guess,
and then we verify to make sure
00:15:52.014 --> 00:15:54.444
that the OpenRefine match makes sense.
00:15:54.444 --> 00:15:56.114
Yeah.
00:15:56.203 --> 00:15:57.794
Does that make sense to everyone?
00:15:57.794 --> 00:16:00.304
So, some folks might be doing
some variation on this,
00:16:00.304 --> 00:16:03.403
but I think the nice thing about this
is that, by using Google Sheets,
00:16:03.403 --> 00:16:08.234
we remove a lot of the complexities
of these two areas from this.
00:16:08.234 --> 00:16:11.193
And we'll show you some code
that does this later on.
00:16:11.813 --> 00:16:15.273
- (man 3) How do you generate [inaudible]?
- (Andrew) How do you generate this?
00:16:15.273 --> 00:16:17.272
- (man 3) Yes.
- (Andrew) Python code.
00:16:17.272 --> 00:16:19.134
I'll show you a line that does this.
00:16:19.134 --> 00:16:21.136
But you can also go up here.
00:16:21.136 --> 00:16:25.096
This is the whole Python program
that does this, this, and that,
00:16:25.096 --> 00:16:27.296
if you want to take a look at that.
00:16:28.026 --> 00:16:29.026
Yes.
00:16:29.026 --> 00:16:31.207
(man 4) Did you really use
your own vocabulary,
00:16:31.207 --> 00:16:35.426
or is there something [inaudible].
00:16:35.426 --> 00:16:37.246
- (Andrew) This right here?
- (man 4) Yeah.
00:16:37.246 --> 00:16:39.721
(Andrew) Yeah. So, this
is the Met's own vocabulary.
00:16:39.721 --> 00:16:43.031
So, most museums use
a system called TMS.
00:16:43.031 --> 00:16:44.891
It's like their own management system.
00:16:44.891 --> 00:16:47.654
So, they'll usually--
this is the museum world--
00:16:47.654 --> 00:16:50.771
they'll usually roll
their own vocabulary for their own needs.
00:16:50.771 --> 00:16:54.022
Museums are very late
to interoperable metadata.
00:16:54.022 --> 00:16:57.282
Librarians and archivists have this
kind of as baked into them.
00:16:57.282 --> 00:16:58.664
Museums are like, "Meh..."
00:16:58.664 --> 00:17:01.471
Our primary goal
is to put objects on display,
00:17:01.471 --> 00:17:04.141
and if it plays well with other people,
that's a side benefit.
00:17:04.141 --> 00:17:05.931
But it's not a primary thing that they do.
00:17:05.931 --> 00:17:08.031
So, that's why it's complicated
to work with museums.
00:17:08.031 --> 00:17:11.161
You need to map their vocabulary,
which might be a mish-mash
00:17:11.161 --> 00:17:14.576
of famous vocabularies,
like Getty AAT, and other things.
00:17:14.576 --> 00:17:17.911
But usually, it's to serve
their exact needs at their museum.
00:17:17.911 --> 00:17:19.591
And that's what's challenging.
00:17:19.591 --> 00:17:21.091
And I see a lot of heads nodding,
00:17:21.091 --> 00:17:23.161
so you've probably seen this a lot
at these museums.
00:17:23.161 --> 00:17:25.429
So, I'll move on to show you
how this actually is done.
00:17:25.429 --> 00:17:26.749
Oh, go ahead.
00:17:26.749 --> 00:17:28.711
(man 5) How do you
bring people, to collaborate,
00:17:28.711 --> 00:17:31.595
and put some Q codes into your database?
00:17:31.595 --> 00:17:32.971
(Andrew) How do you-- I'm sorry?
00:17:32.971 --> 00:17:35.038
(man 5) How do you bring...
collaborate people?
00:17:35.038 --> 00:17:38.290
(Andrew) Ah, so for this,
these are projects we just go to,
00:17:38.780 --> 00:17:41.750
for better or for worse,
like Facebook chat groups that we know,
00:17:41.750 --> 00:17:43.007
are active in these areas.
00:17:43.007 --> 00:17:45.685
Like Sum of All Paintings,
Wiki Loves Fashion--
00:17:45.685 --> 00:17:47.918
which is a group
of maybe five or seven folks.
00:17:48.548 --> 00:17:50.759
But we need a better way
to get this out to folks
00:17:50.759 --> 00:17:52.339
so we get more collaborators on this.
00:17:52.339 --> 00:17:53.879
This doesn't scale well, right now.
00:17:53.879 --> 00:17:56.089
But for small groups,
it works pretty well.
00:17:56.108 --> 00:17:57.568
I'm open to ideas.
00:17:57.568 --> 00:17:59.619
(man 5) [inaudible]
00:17:59.619 --> 00:18:01.669
(Andrew) Oh yeah. Please come on up.
00:18:01.669 --> 00:18:02.948
If folks want to come up here,
00:18:02.948 --> 00:18:05.357
there's a little more room
in the aisle right here.
00:18:06.057 --> 00:18:09.629
So, we are utilizing Python
for this mostly.
00:18:09.774 --> 00:18:13.354
If you don't know, there is
a Python notebook system
00:18:13.354 --> 00:18:14.884
that WMFLabs has.
00:18:14.884 --> 00:18:17.345
So, you can actually go on
and start playing with this.
00:18:17.345 --> 00:18:19.624
So, it's pretty easy
to generate a lot of stuff
00:18:19.624 --> 00:18:21.401
if you know some of the code that's there.
00:18:21.401 --> 00:18:22.455
[inaudible], yeah.
00:18:22.485 --> 00:18:23.922
(woman 2) Why do you put everything
00:18:23.922 --> 00:18:27.821
into Wikidata,
and not into your own Wikibase?
00:18:29.401 --> 00:18:31.127
(Andrew) If you're using
your own Wikibase?
00:18:31.127 --> 00:18:33.741
(woman 2) Yeah. Why don't you
use your own Wikibase?
00:18:33.741 --> 00:18:35.990
and then go to [inaudible]
00:18:35.990 --> 00:18:38.390
(Andrew) That's its own ball of--
00:18:38.390 --> 00:18:41.630
I don't want to maintain
my own Wikibase at this point. (laughs)
00:18:42.190 --> 00:18:44.400
If I can avoid doing
the Wikibase maintenance,
00:18:44.400 --> 00:18:45.760
I would not do it.
00:18:46.530 --> 00:18:48.080
(man 6) Would you like a Wikibase?
00:18:48.080 --> 00:18:50.050
(Andrew) We could. It's possible.
00:18:50.050 --> 00:18:54.154
(man 7) But again,
what they use [inaudible]
00:18:54.154 --> 00:18:59.868
about 2,000, 8,000, 10,000,
of 400,000 digital [inaudible].
00:18:59.868 --> 00:19:04.300
So that's only 2.5%,
00:19:04.300 --> 00:19:08.782
[inaudible]
00:19:08.782 --> 00:19:12.601
(Andrew) So, I'd say, solve it for 1,500,
then scale up to 150 thousand.
00:19:12.601 --> 00:19:14.428
So, we're trying to solve it
00:19:14.428 --> 00:19:16.876
for the best
well-known objects, and then--
00:19:16.876 --> 00:19:19.875
(man 7) When do you think
that will happen?
00:19:20.855 --> 00:19:25.788
I understand that those are people
that shouldn't go onto Wikidata.
00:19:25.788 --> 00:19:29.856
So you go to Commons
or your own Wikibase solution,
00:19:29.856 --> 00:19:31.695
not to be a [inaudible]--
00:19:31.695 --> 00:19:34.588
(Andrew) Right. That's why we're going
with the 2,000 and 8,000.
00:19:34.588 --> 00:19:37.460
We're pretty confident
these are highly notable objects
00:19:37.460 --> 00:19:39.085
that deserve to be in Wikidata.
00:19:39.085 --> 00:19:40.465
Beyond that, it's debatable.
00:19:40.465 --> 00:19:44.265
So, that's why we're not
vacuuming 400-thousand things at one shot.
00:19:44.265 --> 00:19:48.936
We're starting with notable 2,000,
notable 8,000, then we'll talk after that.
00:19:49.515 --> 00:19:52.775
So, these are the two lines of code
that do the most stuff here.
00:19:52.775 --> 00:19:54.217
So, even if you don't know Python,
00:19:54.217 --> 00:19:56.146
it's actually not that bad
if you look at this.
00:19:56.146 --> 00:19:58.105
There's a read_csv function.
00:19:58.105 --> 00:20:00.015
You're taking the crosswalk URL,
00:20:00.015 --> 00:20:02.336
basically, the URL
of that Google Spreadsheet.
00:20:02.336 --> 00:20:04.875
You're grabbing the spreadsheet
that's called "Object Name",
00:20:04.875 --> 00:20:06.685
and you're basically creating
a data structure
00:20:06.685 --> 00:20:08.165
that has the Object Name and the QID.
00:20:08.165 --> 00:20:09.645
That's it. That's all you're doing.
00:20:09.645 --> 00:20:11.655
Just pulling that in to the Python code.
00:20:11.655 --> 00:20:15.914
Then, you're actually matching
whatever the entity's name is,
00:20:15.914 --> 00:20:17.754
and then looking up the QID.
00:20:17.754 --> 00:20:21.689
Okay, so, this is just to tell you
that's not super hard.
00:20:21.689 --> 00:20:24.234
The code is available right there,
if you want to look at it.
00:20:24.234 --> 00:20:26.474
But these two lines of code,
which takes a little while
00:20:26.474 --> 00:20:29.524
when you're writing it from scratch
to create these two lines of code,
00:20:29.524 --> 00:20:30.904
but once you have an example,
00:20:30.904 --> 00:20:34.484
it's pretty darn easy to plug in
your own data set, your own crosswalk,
00:20:34.484 --> 00:20:36.844
to generate the QuickStatements.
00:20:36.844 --> 00:20:38.525
So, I've done a lot of the work already,
00:20:38.525 --> 00:20:41.385
and I invite you
to steal the code and try it.
00:20:42.365 --> 00:20:44.936
So, when it comes to images,
it's a little more challenging.
00:20:44.936 --> 00:20:48.215
So, at this point, Pattypan
is probably your best bet.
00:20:48.215 --> 00:20:51.385
Pattypan is a tool that is
a spreadsheet-oriented tool.
00:20:51.385 --> 00:20:54.855
You fill in the metadata, you point
to the local file on your computer,
00:20:54.855 --> 00:20:57.435
and it uploads it to Commons
with all that information,
00:20:57.435 --> 00:21:02.125
or another alternative
is if you set P4765 to a URL--
00:21:03.105 --> 00:21:06.195
because this is the Commons-compatible
image available at URL,
00:21:06.195 --> 00:21:08.544
Martin Dahhmers has a bot,
at least for paintings,
00:21:08.544 --> 00:21:12.020
that will just swoop through and say,
"Oh, we don't have this image.
00:21:12.020 --> 00:21:15.113
Here's a Commons compatible one.
00:21:15.113 --> 00:21:17.709
Why don't I slip it from that site
and put it into Commons?"
00:21:17.709 --> 00:21:18.995
And that's what his bot does.
00:21:18.995 --> 00:21:20.733
So, you can actually take
a look at his bot
00:21:20.733 --> 00:21:24.102
and modify it for your own purposes,
but that is also another alternative
00:21:24.102 --> 00:21:28.061
that doesn't require you
to do some spreadsheet work there.
00:21:28.061 --> 00:21:30.452
If you might have heard
of GLAM Wiki Toolset,
00:21:30.452 --> 00:21:32.552
it's effectively end
of life at this point.
00:21:33.322 --> 00:21:37.362
It hasn't been updated, and even the folks
who have been working with it in the past
00:21:37.362 --> 00:21:39.332
have said Pattypan
is probably your best bet.
00:21:39.332 --> 00:21:41.722
Has anyone used GWT these days?
00:21:41.741 --> 00:21:43.591
A few of you, a little bit.
00:21:43.591 --> 00:21:45.161
It's just not being further developed,
00:21:45.161 --> 00:21:47.852
and it's not compatible with a lot
of our authentication protocols
00:21:47.852 --> 00:21:49.280
that we have now.
00:21:49.280 --> 00:21:52.928
Okay. So, right now, we have basic
metadata added to Wikidata,
00:21:52.928 --> 00:21:54.997
with pretty good results from the Met,
00:21:54.997 --> 00:21:58.117
and we have a Python script here
to also analyze that.
00:21:58.117 --> 00:22:00.307
You're welcome to steal
some of that code, as well.
00:22:00.307 --> 00:22:02.817
So, this is what we are showing
to the Met folks, now.
00:22:02.817 --> 00:22:06.087
We actually have Listeria lists
that are running
00:22:06.087 --> 00:22:07.627
to show all the inventory
00:22:07.627 --> 00:22:10.967
and all the information
that we have in Wikidata.
00:22:10.967 --> 00:22:15.612
And I'll show you very quickly
about a project that we ran to show folks.
00:22:15.612 --> 00:22:18.547
So, what are the benefits of adding
your collections to Wikidata?
00:22:18.547 --> 00:22:21.917
One is to use AI in the image classifier
00:22:21.917 --> 00:22:24.787
to actually help train
a machine learning model
00:22:24.787 --> 00:22:29.447
with all the Met's images and keywords,
and let that be an engine for other folks
00:22:29.447 --> 00:22:32.047
to recognize content.
00:22:32.047 --> 00:22:36.408
So, this is a hack-a-thon that we had
with MIT and Microsoft last year.
00:22:36.408 --> 00:22:39.238
The way this works, is we have
the paintings from the Met,
00:22:39.238 --> 00:22:40.277
and we have the keywords
00:22:40.277 --> 00:22:43.157
that they actually paid a crew
for six months to work on
00:22:43.157 --> 00:22:46.937
to add hand keyword tags
to all the artworks.
00:22:47.567 --> 00:22:50.077
We ingested that
into an AI system right here,
00:22:50.077 --> 00:22:51.367
and then, what we did was say,
00:22:51.367 --> 00:22:55.428
"Let's feed in new images that
this AI ML system had never seen before,
00:22:55.428 --> 00:22:56.747
and see what comes out."
00:22:56.747 --> 00:23:00.037
And the problem is that it comes out
with pretty good results,
00:23:00.037 --> 00:23:02.267
but it's maybe only 60% accurate.
00:23:02.267 --> 00:23:04.797
And for most folks,
60% accurate is garbage.
00:23:04.797 --> 00:23:08.627
How do I get the 60% good
out of this pile of stuff?
00:23:08.627 --> 00:23:11.127
The good news is that our community
knows how to do that.
00:23:11.127 --> 00:23:13.157
We can actually feed this
into a Wikidata game
00:23:13.157 --> 00:23:14.997
and get the good stuff out of that.
00:23:14.997 --> 00:23:16.228
That's basically what we did.
00:23:16.228 --> 00:23:17.647
So, this is the Wikidata game--
00:23:17.647 --> 00:23:19.757
you'll notice this is
Magnus' interface right there--
00:23:19.757 --> 00:23:21.182
being played at the Met Museum,
00:23:21.182 --> 00:23:22.207
in the lobby.
00:23:22.207 --> 00:23:25.437
We actually had folks at a cocktail party
drinking champagne
00:23:25.437 --> 00:23:27.427
and hitting buttons on the screen.
00:23:27.427 --> 00:23:31.048
Hopefully, accurately. (chuckles)
00:23:31.048 --> 00:23:33.444
(applause)
00:23:33.444 --> 00:23:35.116
We had journalists, curators,
00:23:35.116 --> 00:23:37.506
we had some board members
from the Met there as well.
00:23:37.506 --> 00:23:38.810
And this was great.
00:23:38.810 --> 00:23:40.061
No log in, whatever.
00:23:40.061 --> 00:23:42.106
(lowers voice) We created
an account just for this.
00:23:42.106 --> 00:23:44.117
So, they just hit yes-no-yes-no.
00:23:44.117 --> 00:23:45.256
This is great.
00:23:45.256 --> 00:23:47.526
You saw this, it said,
"Is there a tree in this picture?"
00:23:47.526 --> 00:23:49.148
You don't have to train anyone on this.
00:23:49.148 --> 00:23:52.213
You just hit yes--
depicts a tree, not depicted.
00:23:52.213 --> 00:23:55.910
I even had my eight-year-old boys
play this game with a finger tap.
00:23:56.540 --> 00:24:00.047
And we also created a little tool
that showed all the depictions going by
00:24:00.047 --> 00:24:01.505
so people could see them.
00:24:03.189 --> 00:24:06.453
It basically is like--
how do you sift good from bad?
00:24:06.453 --> 00:24:08.350
This is where the Wikimedia
community comes in,
00:24:08.350 --> 00:24:11.034
that no other entity could ever do.
00:24:12.084 --> 00:24:15.052
So, in that first few months
that we had this,
00:24:15.052 --> 00:24:19.017
over 7,000 judgments,
resulting in about 5,000 edits.
00:24:19.912 --> 00:24:22.227
We did really well on tree,
boat, flower, horse,
00:24:22.227 --> 00:24:24.907
things that are in landscape paintings.
00:24:25.146 --> 00:24:27.466
But when you go to things
like gender discrimination,
00:24:27.466 --> 00:24:29.901
and cats and dogs, not so good, I know.
00:24:29.901 --> 00:24:32.159
Because there's so many different
types of cats and dogs
00:24:32.159 --> 00:24:33.456
in different positions.
00:24:33.456 --> 00:24:36.105
But horses, a lot easier
than cats and dogs.
00:24:36.735 --> 00:24:38.742
But also, I should note
that Wikimedia Foundation
00:24:38.742 --> 00:24:42.697
is now looking into doing
image recognition on Commons uploads
00:24:42.697 --> 00:24:46.368
to do these suggestions as well,
which is an awesome development.
00:24:46.667 --> 00:24:49.627
Okay, so, dashboards.
00:24:50.750 --> 00:24:53.358
Let's just show you
some of these dashboards.
00:24:53.418 --> 00:24:55.097
Folks you work with love dashboards.
00:24:55.097 --> 00:24:56.817
They just want to see stats.
00:24:56.817 --> 00:24:58.797
So, we have them, like BaGLAMa.
00:24:58.797 --> 00:25:00.787
We have InteGraality.
00:25:00.787 --> 00:25:02.767
Is JeanFred here?
00:25:03.447 --> 00:25:06.247
I think this is a very new thing
relative to last WikidataCon.
00:25:06.247 --> 00:25:08.327
We actually have a tool
which will create
00:25:08.327 --> 00:25:10.967
this property completeness
chart right here.
00:25:10.967 --> 00:25:12.987
So, it's called InteGraality,
with two A's.
00:25:13.206 --> 00:25:15.526
It's on that big chart
that I showed you before.
00:25:15.526 --> 00:25:19.086
And it can just autogenerate
how complete your items are
00:25:19.086 --> 00:25:21.036
in any set, which is really cool.
00:25:21.566 --> 00:25:23.771
So, we can see that paintings
are by far the highest,
00:25:23.771 --> 00:25:26.057
we have sculptures, drawings, photographs.
00:25:26.121 --> 00:25:29.322
And then, they also like to see
what are the most popular artworks
00:25:29.322 --> 00:25:31.148
in the Wikisphere?
00:25:31.148 --> 00:25:33.417
So, just looking at the site links
in Wikidata--
00:25:33.417 --> 00:25:37.781
you can see and rank
all these different artworks there.
00:25:39.568 --> 00:25:41.926
Also another thing they'd like to see
00:25:41.926 --> 00:25:46.879
is what are the most frequent creators
of content or Met artworks--
00:25:46.879 --> 00:25:49.193
what are the most commonly
depicted things.
00:25:49.193 --> 00:25:51.982
So, these are very easy
to generate in SPARQL,
00:25:51.982 --> 00:25:54.622
you could look at it right there,
using bubble graphs.
00:25:54.673 --> 00:25:56.991
Then place of birth
of the most prominent artists,
00:25:56.991 --> 00:25:58.814
we have a chart there, as well.
00:25:58.814 --> 00:26:01.142
So, structured data on Commons.
00:26:01.142 --> 00:26:04.301
I just want to show you very briefly
in case you can't get to Sandra's session,
00:26:04.301 --> 00:26:06.226
but you definitely should go
to Sandra's session.
00:26:06.226 --> 00:26:10.693
You actually can search in Commons
for a specific Wikibase statement.
00:26:11.353 --> 00:26:15.333
I don't always remember the syntax,
but you have burn in your brain
00:26:15.333 --> 00:26:19.893
and say, it's haswbstatement:P1343=
00:26:19.893 --> 00:26:22.695
whatever-- basically, your last
two parts of the triple.
00:26:22.695 --> 00:26:26.162
I always get haswb and wbhas mixed up.
00:26:26.162 --> 00:26:28.183
I always get the colon
and the equals mixed up.
00:26:28.183 --> 00:26:32.022
So just do it once, remember it,
and you'll get the hang of it.
00:26:32.022 --> 00:26:34.772
But simple searches are must faster
than SPARQL queries.
00:26:34.772 --> 00:26:36.478
So, if you can just look
for one statement,
00:26:36.478 --> 00:26:38.392
boom, you'll get the results.
00:26:39.181 --> 00:26:43.711
So, things like this, you can look
for symbolically or semantically,
00:26:43.711 --> 00:26:47.511
things that depict
the Met museum, for example.
00:26:48.051 --> 00:26:50.051
So, finally, community campaigns.
00:26:50.051 --> 00:26:51.681
Richard has been a pioneer in this area.
00:26:51.681 --> 00:26:54.071
So, once you have the Wikidata items,
00:26:54.071 --> 00:26:57.050
they can actually assist
in creating Wikipedia articles.
00:26:57.050 --> 00:26:59.785
So, Richard, why don't you tell us
a little bit about the Mbabel tool
00:26:59.785 --> 00:27:01.009
that you created for this.
00:27:01.009 --> 00:27:03.192
(Richard) Hi, can I get this on?
00:27:04.649 --> 00:27:06.109
(Andrew) Oh, use [Joisey's].
00:27:06.109 --> 00:27:08.319
(Richard) It's on, now. I'm good.
00:27:08.949 --> 00:27:10.769
So, we had all this information
on Wikidata.
00:27:10.769 --> 00:27:13.729
[inaudible] browsing data
on our evenings and weekends
00:27:13.729 --> 00:27:15.649
to learn about art-- not everyone does.
00:27:15.649 --> 00:27:19.319
We have quite a bit more people
[inaudible] Wikipedia,
00:27:19.319 --> 00:27:22.260
so how do we get this information
from Wikidata to Wikipedia?
00:27:22.260 --> 00:27:25.289
One of the ways of doing this
is this so-called Mbabel,
00:27:25.289 --> 00:27:28.069
which developed with the help
of a lot of people in [inaudible].
00:27:28.069 --> 00:27:30.639
People like Martin and others.
00:27:31.689 --> 00:27:34.659
So, basically to take
some basic art information,
00:27:34.659 --> 00:27:37.688
and use it to populate
a Wikipedia article.
00:27:37.688 --> 00:27:40.241
So, by who created this work,
who was the artist,
00:27:40.241 --> 00:27:42.313
when it was created, et cetera.
00:27:42.313 --> 00:27:44.626
The nice thing about this
is it can generate works.
00:27:44.626 --> 00:27:46.210
We started with English Wikipedia,
00:27:46.210 --> 00:27:48.608
but it's been developed
in other languages.
00:27:48.608 --> 00:27:50.938
So, Portuguese Wikipedia,
our Brazilian friends
00:27:50.938 --> 00:27:53.508
who've done a lot of work and taking it
to realms beyond art,
00:27:53.508 --> 00:27:57.283
to stuff like elections
and political work as well.
00:27:57.283 --> 00:28:01.128
And the nice thing about this
is we can query on Wikidata--
00:28:01.758 --> 00:28:06.928
so different artists-- so for example,
we've done projects with Women in Red,
00:28:06.928 --> 00:28:08.472
looking at women artists.
00:28:08.472 --> 00:28:12.753
Projects related to Wiki Loves Pride,
looking at LGBT-identified artists,
00:28:12.753 --> 00:28:14.073
African Diaspora Artists,
00:28:14.073 --> 00:28:16.493
and a lot of different groups
and things of time periods,
00:28:16.493 --> 00:28:19.293
different collections,
and also looking at articles
00:28:19.293 --> 00:28:22.213
that have been and haven't been
translated to different languages.
00:28:22.213 --> 00:28:24.923
So all of the articles that haven't
been translated to Arabic yet.
00:28:24.923 --> 00:28:28.329
You need to find some interesting articles
maybe that are relevant to a culture
00:28:28.329 --> 00:28:30.459
that haven't been translated
into that language yet.
00:28:30.459 --> 00:28:32.659
We actually have a number of works
in the Met collection
00:28:32.659 --> 00:28:35.199
that are in Wikipedias
that aren't in English yet,
00:28:35.199 --> 00:28:37.259
because it's a global collection.
00:28:37.769 --> 00:28:40.449
So, there are a lot of ways,
and hopefully, we can spread it around
00:28:40.449 --> 00:28:44.709
of creating Wikipedia content, as well,
that is driven by these Wikidata items,
00:28:44.709 --> 00:28:47.549
and that also maybe
can help spread the improvement
00:28:47.549 --> 00:28:49.529
to Wikidata items, as well, in the future.
00:28:49.529 --> 00:28:52.403
(Andrew) And there's a number of folks
here using Mbable already, right?
00:28:52.403 --> 00:28:54.124
Who's using Mbable
in the room? Brazilians?
00:28:54.124 --> 00:28:58.690
And also, if [Armin] is here,
we have our winner
00:28:59.165 --> 00:29:03.146
of the Wikipedia Asia Month,
and Wiki Loves Pride contest.
00:29:03.146 --> 00:29:05.720
So, thank you for joining,
and congratulations.
00:29:06.493 --> 00:29:09.993
We'll have another Wiki Asia Month
campaign in November.
00:29:10.173 --> 00:29:13.383
The way I like to describe it
[inaudible]
00:29:13.383 --> 00:29:15.443
It doesn't give you a blank page.
00:29:15.443 --> 00:29:16.863
It gives you the skeleton,
00:29:16.863 --> 00:29:18.962
which is really a much better
user experience
00:29:18.962 --> 00:29:21.472
for edit-a-thons and beginners.
00:29:21.472 --> 00:29:23.526
So, it's a lot of great work
that Richard has done,
00:29:23.526 --> 00:29:25.841
and people are building on it,
which is awesome.
00:29:25.906 --> 00:29:29.066
(woman 3) [inaudible] for some of them,
which is really nice.
00:29:29.066 --> 00:29:30.376
Yeah, exactly.
00:29:30.376 --> 00:29:32.956
(woman 3) [inaudible]
00:29:32.956 --> 00:29:35.815
Right. We should have put a URL here.
00:29:35.815 --> 00:29:38.196
(man 8) [inaudible]
00:29:38.196 --> 00:29:40.055
Oh, that's right.
We have the link right here.
00:29:40.055 --> 00:29:43.725
So if you click-- this is a Listeria list,
it's autogenerating all that for you.
00:29:43.725 --> 00:29:46.205
And then, you click on the red link,
it'll create the skeleton,
00:29:46.205 --> 00:29:47.491
which is pretty cool.
00:29:47.491 --> 00:29:49.172
Alright, we're on the final stretch here.
00:29:49.172 --> 00:29:51.990
The tool that we're going
to be announcing--
00:29:51.990 --> 00:29:55.047
well, we announced a few weeks ago,
but only to a small set of folks,
00:29:55.047 --> 00:29:57.038
but we're making a big splash here,
00:29:57.038 --> 00:29:59.345
is the depiction tool
that we just created.
00:29:59.345 --> 00:30:05.298
Wikipedia has shown that volunteer
contributors can add a lot of these things
00:30:05.298 --> 00:30:06.681
that museums can't.
00:30:06.681 --> 00:30:10.263
So, what if we created a tool
that could let you enrich
00:30:10.263 --> 00:30:15.907
the metadata about artworks
in terms of the depiction information?
00:30:15.907 --> 00:30:19.477
And what we did was we applied
for a grant from the Knight Foundation,
00:30:19.477 --> 00:30:22.684
and we created this tool--
and is Edward here?
00:30:22.760 --> 00:30:26.590
Edward is our wonderful developer
who in like a month, said,
00:30:26.590 --> 00:30:28.050
"Okay, here's a prototype."
00:30:28.050 --> 00:30:33.103
After we gave him a specification,
and it's pretty cool.
00:30:33.900 --> 00:30:35.849
- So what we can do--
- (applause)
00:30:35.849 --> 00:30:37.169
Thanks, Edward.
00:30:37.569 --> 00:30:39.269
We're working within collections of items.
00:30:39.269 --> 00:30:41.629
So, what we do, is we can
bring up a page like this.
00:30:41.629 --> 00:30:44.789
It's no longer looking
at a Wikidata item with a tiny picture.
00:30:44.789 --> 00:30:48.484
If we're working with what's depicted
in the image, we want the picture big.
00:30:48.484 --> 00:30:51.201
And we don't really have tools
that work with big images.
00:30:51.201 --> 00:30:53.348
We have tools that deal
with lexical and typing.
00:30:53.348 --> 00:30:56.715
So one of the big things that Edward did
was made a big version of the picture,
00:30:56.715 --> 00:30:58.739
scrape whatever you can
from the object page
00:30:58.739 --> 00:31:00.633
from a GLAM organization,
give you context.
00:31:00.633 --> 00:31:02.773
I can see dogs, children, wigwam.
00:31:02.773 --> 00:31:05.782
These are things that direct the user
to add meaningful information.
00:31:05.782 --> 00:31:09.024
You have some metadata
that's scraped from the site, too.
00:31:09.024 --> 00:31:11.868
Teepee, Comanche--
oh, it's Comanche, not Navajo,
00:31:11.868 --> 00:31:13.556
because I know the object page said that.
00:31:13.556 --> 00:31:15.702
And you can actually start typing
in the field, there.
00:31:15.702 --> 00:31:17.628
And the cool thing is that
it gives you context,
00:31:17.628 --> 00:31:19.566
It doesn't just match anything
to Wikidata,
00:31:19.566 --> 00:31:23.107
it first matches things that have already
been used in other depiction statements.
00:31:23.107 --> 00:31:25.456
Very simple thing,
but what a godsend it is
00:31:25.456 --> 00:31:27.166
for folks who have tried this in the past.
00:31:27.166 --> 00:31:29.116
Don't give me everything
that matches teepee.
00:31:29.116 --> 00:31:33.321
Show me what other paintings
have used teepee in the past.
00:31:33.355 --> 00:31:36.175
So, it's interactive, context-driven,
statistics-driven,
00:31:36.175 --> 00:31:37.936
by showing you what is matched before.
00:31:37.936 --> 00:31:40.336
And the cool thing is once you're done
with that painting,
00:31:40.336 --> 00:31:42.196
you can start to work in other areas.
00:31:42.196 --> 00:31:44.936
You want to work within the same artist,
the collection, location,
00:31:45.876 --> 00:31:47.295
other criteria here.
00:31:47.295 --> 00:31:49.146
And you can even browse
through the collections
00:31:49.146 --> 00:31:51.582
of different organizations,
just work on their paintings.
00:31:51.582 --> 00:31:53.670
So, we wanted people
to not live in Wikidata--
00:31:53.670 --> 00:31:56.307
kind of onesy-twosies with items,
but live in a space
00:31:56.307 --> 00:31:59.232
where you're looking at artworks
in collections that make sense.
00:31:59.683 --> 00:32:01.792
And then, you can actually
look through it visually.
00:32:01.792 --> 00:32:04.237
It kind of looks like Krotos
or these other tools,
00:32:04.237 --> 00:32:07.726
but you can actually live edit
on Wikidata at the same time.
00:32:07.726 --> 00:32:09.104
So, go ahead and try it out.
00:32:09.104 --> 00:32:10.609
We've only have 14 users,
00:32:10.609 --> 00:32:14.667
but we've had 2,100 paintings worked on,
with 5,000 plus depict statements.
00:32:14.667 --> 00:32:16.126
That's pretty good for 14.
00:32:16.126 --> 00:32:18.119
So, multiply that by 10--
00:32:18.119 --> 00:32:20.515
imagine how many more things
we could do with that.
00:32:20.515 --> 00:32:23.797
So, you can go ahead and go
to art.wikidata.link and try out the tool.
00:32:23.797 --> 00:32:26.594
It uses OLAF authentication,
and you're off to the races.
00:32:26.594 --> 00:32:29.187
And it should be very natural
without any kind of training
00:32:29.187 --> 00:32:31.782
to add depiction statements to artworks.
00:32:31.837 --> 00:32:35.170
But you can put any object.
We don't restrict the object right now.
00:32:35.170 --> 00:32:37.278
So, you could put any Q number
00:32:38.468 --> 00:32:41.208
to edit this content if you want.
00:32:41.275 --> 00:32:44.645
But we primarily stick with paintings
and 2D artworks, right now.
00:32:46.184 --> 00:32:49.405
Okay. You can actually look
at the recent changes
00:32:49.405 --> 00:32:52.175
and see who's made edits recently to that.
00:32:52.815 --> 00:32:54.855
Okay? Okay, so we're going
to wind it down.
00:32:54.855 --> 00:32:58.386
Ooh, one minute, then we'll do some Q&A.
00:32:58.915 --> 00:33:03.081
So, the final thing that I think
is useful for museum types especially,
00:33:03.081 --> 00:33:07.307
is there's a very famous author
named Nina Simon in the museum world,
00:33:07.307 --> 00:33:11.204
where she likes to talk about
how do we go from users,
00:33:11.204 --> 00:33:14.968
or I guess your audience,
contributing stuff to your collections
00:33:14.968 --> 00:33:18.004
to collaborating around content,
to actually being co-creative
00:33:18.004 --> 00:33:19.714
and creating new things.
00:33:19.714 --> 00:33:20.984
And that's always been tough.
00:33:20.984 --> 00:33:24.154
And I'd like to argue that Wikidata
is this co-creative level.
00:33:24.154 --> 00:33:26.914
So, it's not just uploading
a file to Commons,
00:33:26.914 --> 00:33:28.234
which is contributing something.
00:33:28.234 --> 00:33:31.194
It's not just editing an article
with someone else, which is collaborative.
00:33:31.194 --> 00:33:34.833
But we are now seeing these tools
that let you make timelines,
00:33:34.833 --> 00:33:36.133
and graphs, and bubble charts.
00:33:36.133 --> 00:33:38.833
And this is actually the co-creative part
that's really interesting.
00:33:38.833 --> 00:33:40.353
And that's what Wikidata provides you.
00:33:40.353 --> 00:33:42.235
Because suddenly,
it's not language dependent--
00:33:42.235 --> 00:33:45.146
we've got this database
that's got this rich information in it.
00:33:45.946 --> 00:33:48.606
So, it's not just pictures, not just text,
00:33:48.606 --> 00:33:50.522
but it's all this rich multimedia
00:33:50.522 --> 00:33:52.607
that we have the opportunity to work on.
00:33:52.607 --> 00:33:55.851
So, this is just another example
of this connected graph
00:33:55.851 --> 00:33:57.389
that you can take a look at later on
00:33:57.389 --> 00:33:59.860
to show another example
of The Death of Socrates,
00:33:59.860 --> 00:34:02.312
and the different themes
around that painting.
00:34:03.252 --> 00:34:05.653
And it's really easy
to make this graph yourself.
00:34:05.653 --> 00:34:08.172
So again, another scary graphic
that only makes sense
00:34:08.172 --> 00:34:09.822
for Wikidata folks, like you.
00:34:09.822 --> 00:34:13.682
You just give it a list of Wikidata items,
and it'll do the rest, that's it.
00:34:14.102 --> 00:34:15.662
You'll give the list.
00:34:15.705 --> 00:34:17.664
Keep all this code the same.
00:34:17.664 --> 00:34:21.364
So, fortunately, Martin and Lucas
helped do all this code here.
00:34:21.364 --> 00:34:23.864
Just give it a list of items
and the magic will happen.
00:34:23.864 --> 00:34:25.624
Hopefully, it won't blow up your computer,
00:34:25.624 --> 00:34:28.755
because you're putting in
a reasonable number of items there.
00:34:28.755 --> 00:34:31.593
But as long as you have the screen space,
it'll draw the graph,
00:34:31.593 --> 00:34:33.283
which is pretty darn cool.
00:34:33.283 --> 00:34:37.223
And then, finally, two tools--
I realized at 2 a.m. last night
00:34:37.223 --> 00:34:39.744
a few people said,
"I didn't know about these tools."
00:34:39.744 --> 00:34:41.343
And you should know about these tools.
00:34:41.343 --> 00:34:44.613
So, one is Recoin, which shows you
the relative completeness of an item
00:34:44.613 --> 00:34:46.773
compared to other items
of the same instance.
00:34:46.773 --> 00:34:49.473
And then, Cradle, which is a way
to have a forms-based way
00:34:49.473 --> 00:34:50.693
to create content.
00:34:50.693 --> 00:34:52.453
So, these are very useful for edit-a-thons
00:34:52.453 --> 00:34:54.753
where if you know that
you're working with just artworks,
00:34:54.753 --> 00:34:57.553
don't just let people create items
with a blank screen.
00:34:57.553 --> 00:35:00.275
Give them a form to fill out
to start entering in information
00:35:00.275 --> 00:35:01.818
that's structured.
00:35:01.818 --> 00:35:04.588
And then, finally, we've gone
through some of this, already.
00:35:06.268 --> 00:35:09.539
This is my big chart that I love
to get people's feedback on.
00:35:09.539 --> 00:35:14.296
How do we get people
across the chasm to be in this space?
00:35:14.328 --> 00:35:16.839
We have a lot of folks who, now,
can do template coding,
00:35:16.839 --> 00:35:20.040
spreadsheets, QuickStatements,
SPARQL queries, and then we got--
00:35:20.935 --> 00:35:24.259
how do we get people to this side
where we have Python
00:35:24.259 --> 00:35:26.694
and the things that can do more
sophisticated editing.
00:35:26.694 --> 00:35:28.625
It's really hard
to get people across this.
00:35:28.625 --> 00:35:30.785
But I would like to say
it's hard to get people across,
00:35:30.785 --> 00:35:32.847
but the content and the technology
is not that hard.
00:35:32.847 --> 00:35:35.380
We actually need more people
to learn about regular expressions.
00:35:35.380 --> 00:35:38.307
And once you get some kind
of experience here,
00:35:38.307 --> 00:35:41.830
you'll find that this is a wonderful world
that you can learn a lot in,
00:35:41.830 --> 00:35:44.700
but it does take some time
to get across this chasm.
00:35:44.829 --> 00:35:46.289
Yes, James.
00:35:46.289 --> 00:35:52.148
(James) [inaudible]
00:35:53.127 --> 00:35:57.192
No, what it means is that the graph
is not necessarily accurate
00:35:57.192 --> 00:35:59.178
in terms of its data points.
00:35:59.308 --> 00:36:03.427
But what it means-- I guess
it's more like this is a valley.
00:36:03.786 --> 00:36:06.716
It's like we need to get people
across this valley here.
00:36:06.716 --> 00:36:10.146
(woman 4) [inaudible]
00:36:10.146 --> 00:36:11.546
I would say this is the key.
00:36:11.546 --> 00:36:16.296
If we can get people who know this stuff,
but can grok this stuff,
00:36:16.296 --> 00:36:17.918
it gets them to this stuff.
00:36:17.918 --> 00:36:19.668
Does that make sense? Yeah.
00:36:19.668 --> 00:36:24.155
So, my vision for the next few years,
we can get better training
00:36:24.155 --> 00:36:27.516
in our community to get people
from batch processing,
00:36:27.516 --> 00:36:29.847
which is pretty much what this is,
to kind of intelligent--
00:36:29.847 --> 00:36:32.726
I wouldn't say intelligent,
but more sophisticated programming,
00:36:32.726 --> 00:36:35.486
that would be a great thing,
because we're seeing this is a bottleneck
00:36:35.486 --> 00:36:37.846
to a lot of the stuff
that I just showed you up there.
00:36:37.846 --> 00:36:39.086
Yes.
00:36:39.135 --> 00:36:42.105
(man 9) [inaudible]
00:36:42.105 --> 00:36:45.984
Okay, wait, you want to show me something,
show me after the session, does that work?
00:36:45.984 --> 00:36:47.584
Okay. Yes, Megan.
00:36:47.584 --> 00:36:50.804
- (Megan) Can I have a microphone?
- Microphone, yes.
00:36:50.834 --> 00:36:54.528
- (Megan) [inaudible]
- Yeah.
00:36:55.316 --> 00:36:56.636
And we have lunch after this,
00:36:56.636 --> 00:36:59.006
so if you want to stay
a little bit later, that's fine, too.
00:36:59.006 --> 00:37:01.009
- [inaudible]
- We're already at lunch break? Okay.
00:37:01.009 --> 00:37:03.094
(Megan) So, thank you so much
to both you and Richard
00:37:03.094 --> 00:37:04.799
for all the work you're doing at the Met.
00:37:04.799 --> 00:37:07.027
And I know that you're
very well supported in that.
00:37:07.027 --> 00:37:09.100
(mic feedback)
I don't know what happened there.
00:37:09.100 --> 00:37:15.071
For the average volunteer community,
how do you balance doing the work
00:37:15.071 --> 00:37:19.124
for the cultural heritage organization
versus training the professionals
00:37:19.124 --> 00:37:21.792
that are there to do that work?
00:37:21.792 --> 00:37:24.412
Where do you find the balance
in terms of labor?
00:37:25.672 --> 00:37:26.962
It's a good question.
00:37:27.397 --> 00:37:30.467
(Megan) One that really comes up,
I think, with this as well.
00:37:30.467 --> 00:37:33.158
- With this?
- (Megan) Yeah, and with building out...
00:37:33.187 --> 00:37:36.277
where we put efforts in terms
of building out competencies.
00:37:36.333 --> 00:37:39.398
Yeah. I don't have a great answer for you,
but it's a great question.
00:37:39.398 --> 00:37:40.658
(Megan) Cool.
00:37:40.658 --> 00:37:43.580
(Richard) There are a lot
of tech people at [inaudible]
00:37:43.580 --> 00:37:46.158
who understand this side of the graph,
and don't understand it--
00:37:46.158 --> 00:37:48.878
the people in [inaudible]
who understand this part of the graph,
00:37:48.878 --> 00:37:50.658
and don't understand
this part of the graph.
00:37:50.658 --> 00:37:53.928
So, the more we can get Wikimedians
who understand some of this,
00:37:53.928 --> 00:37:57.748
with some tech professionals at museums
who understand this,
00:37:57.748 --> 00:37:59.408
then that makes it a little bit easier--
00:37:59.408 --> 00:38:01.968
and hopefully, as well as
training up Wikimedians,
00:38:01.968 --> 00:38:05.587
we can also provide some guidance
and let the museums [inaudible]
00:38:05.587 --> 00:38:07.438
to take care of themselves
in the [inaudible].
00:38:07.496 --> 00:38:09.285
Yeah, that's a good point.
00:38:09.285 --> 00:38:11.961
How many people here know
what regular expressions are?
00:38:11.961 --> 00:38:13.216
Raise your hand.
00:38:13.216 --> 00:38:17.397
Okay, so how many people are comfortable
specifying a regular expression?
00:38:17.397 --> 00:38:19.267
So, yeah, we need more work here.
00:38:19.267 --> 00:38:20.771
(laughter)
00:38:20.771 --> 00:38:23.199
(man 10) I want to suggest that--
00:38:24.648 --> 00:38:28.575
maybe not getting
every Wikidata practitioner,
00:38:28.575 --> 00:38:33.607
or institution practitioner
to embrace Python programming is the way.
00:38:33.717 --> 00:38:39.657
But as Richard just said, finding more
bridging people-- people like you--
00:38:39.657 --> 00:38:41.137
who speak both--
00:38:41.137 --> 00:38:44.042
who speak Python,
but also speak GLAM institution--
00:38:44.812 --> 00:38:48.392
to help the GLAM's own
technical department, which may not--
00:38:49.233 --> 00:38:51.951
they know Python,
they don't know this stuff.
00:38:52.640 --> 00:38:54.186
That's, I think, what's needed.
00:38:54.235 --> 00:38:59.034
People like you, people like me,
people who speak both of these jargons
00:38:59.034 --> 00:39:01.835
to help make the connections,
to document the connections.
00:39:01.835 --> 00:39:03.344
You're already doing this, of course.
00:39:03.344 --> 00:39:05.534
You share your code, et cetera,
you're doing tutorials.
00:39:05.534 --> 00:39:07.044
But we need more of this.
00:39:07.044 --> 00:39:09.223
I'm not sure we need
to make everyone programmers.
00:39:09.223 --> 00:39:10.612
We already have programmers.
00:39:10.612 --> 00:39:12.332
We need to make them understand
00:39:12.332 --> 00:39:14.612
the non-programming
material they need to--
00:39:14.612 --> 00:39:15.782
I think that's a great point.
00:39:15.782 --> 00:39:18.062
We don't need to make everyone
highly proficient in this,
00:39:18.062 --> 00:39:20.312
but we do need people
knowledgeable to say that,
00:39:20.312 --> 00:39:23.004
"Yeah, we can ingest 400 thousand rows
and do something with it."
00:39:23.004 --> 00:39:25.284
Whereas, if you're stuck
on this side, you're like,
00:39:25.284 --> 00:39:27.444
"400 thousand rows
sounds really big and scary."
00:39:27.444 --> 00:39:30.364
But if you know that it's possible,
you're like, "No problem."
00:39:30.364 --> 00:39:32.284
400 thousand is not a problem.
00:39:32.284 --> 00:39:35.414
(woman 5) I would just like to chime in
a little bit in that
00:39:35.414 --> 00:39:39.674
that there may be countries and areas
where you will not find a GLAM
00:39:39.674 --> 00:39:44.404
with any skilled technologists.
00:39:44.434 --> 00:39:47.834
So, you will have to invent
something there in the middle.
00:39:48.502 --> 00:39:49.634
That's a good point.
00:39:49.778 --> 00:39:51.378
Any questions? Sandra.
00:39:55.648 --> 00:39:57.807
(Sandra) Yeah, I just wanted
to add to this discussion.
00:39:57.807 --> 00:40:01.656
Actually, I've seen some very good cases
where it indeed has been successful
00:40:01.656 --> 00:40:05.476
to train GLAM professionals to work
with this entire environment,
00:40:05.476 --> 00:40:09.276
and where they've done fantastic jobs,
also at small institutions.
00:40:10.046 --> 00:40:14.986
It also requires that you have chapters
or volunteers that can train the staff.
00:40:15.163 --> 00:40:17.513
So, it's really like a bigger environment.
00:40:18.192 --> 00:40:22.044
But I think that's a model
that if we can manage to make that grow,
00:40:22.044 --> 00:40:24.263
it can scale very well, I think.
00:40:24.673 --> 00:40:25.693
Good point.
00:40:25.693 --> 00:40:30.896
(woman 5) [inaudible]
00:40:32.029 --> 00:40:34.217
Sorry, just noting that we don't have
00:40:34.217 --> 00:40:37.820
any structured trainings
right now for that.
00:40:38.209 --> 00:40:42.498
We might want to develop those,
and that would be helpful.
00:40:42.608 --> 00:40:44.408
We have been doing that for education
00:40:44.408 --> 00:40:47.488
in terms of teaching people
Wikipedia and Wikidata.
00:40:47.488 --> 00:40:50.008
It's just a matter of taking it
one step further.
00:40:50.528 --> 00:40:52.168
Right. Stacy.
00:40:54.518 --> 00:40:56.988
(Stacy) Well, I'd just like to say
that a lot of professionals
00:40:56.988 --> 00:41:02.006
who work in this area of metadata
have all these skills already.
00:41:02.006 --> 00:41:08.966
So, I think part of it is just proving
the value to these organizations,
00:41:08.966 --> 00:41:13.126
but then it's also tapping
into professional associations who can--
00:41:13.195 --> 00:41:16.745
or ways of collaborating within
those professional communities
00:41:16.745 --> 00:41:21.374
to build this work, and the documentation
on how to do things
00:41:21.374 --> 00:41:23.234
is really, really important,
00:41:23.234 --> 00:41:27.454
because I'm not sure about the role
of depending on volunteers,
00:41:27.454 --> 00:41:32.294
when some of this work is actually work
GLAM organizations do anyway.
00:41:32.395 --> 00:41:35.355
We manage our collections
in a variety of ways through metadata,
00:41:35.355 --> 00:41:37.126
and this is actually one more way.
00:41:37.126 --> 00:41:40.495
So, should we also not be thinking
about ways to integrate this work
00:41:40.495 --> 00:41:43.946
into a GLAM professional's regular job.
00:41:43.985 --> 00:41:46.125
And then that way you're generating--
00:41:46.125 --> 00:41:48.885
and when you think
about sustainability and scalability,
00:41:48.885 --> 00:41:53.426
that's the real trick to making this
sustainable and both scalable,
00:41:53.745 --> 00:41:58.695
is that once this is the regular
work of GLAM folks,
00:41:58.695 --> 00:42:00.885
we're not worried as much about this part,
00:42:00.885 --> 00:42:03.503
because it's just turning
that little switch to get this
00:42:03.503 --> 00:42:05.763
to be a part of that work.
00:42:05.863 --> 00:42:08.063
Right. Good point. [Shani]?.
00:42:11.603 --> 00:42:13.229
(Shani) You're absolutely right.
00:42:13.229 --> 00:42:16.122
But I want to echo what you said before.
00:42:16.152 --> 00:42:21.566
And yes, Susana-- this might work
for more privileged countries
00:42:22.082 --> 00:42:25.042
where they have money,
they have people doing it.
00:42:25.682 --> 00:42:29.042
It doesn't work for places
that are still developing,
00:42:29.042 --> 00:42:32.282
that don't have resources--
they don't have all of that.
00:42:32.592 --> 00:42:36.832
And they can barely do
what they need to do.
00:42:36.886 --> 00:42:41.066
So, it's difficult for them, and then,
the community is really helpful.
00:42:41.906 --> 00:42:45.495
These are the cases where the community
can have a huge impact actually,
00:42:45.985 --> 00:42:50.349
working with the GLAMS,
because they can't do it all
00:42:50.979 --> 00:42:52.296
as part of their jobs.
00:42:52.834 --> 00:42:55.034
So, we need to think about that as well.
00:42:55.053 --> 00:42:58.223
And having these examples,
actually, is hugely important,
00:42:58.223 --> 00:43:00.763
because it's helping
to still convince them,
00:43:00.763 --> 00:43:05.842
that it's critical to invest in it
and to work with volunteers,
00:43:05.842 --> 00:43:09.082
so, with non-professionals
of sorts, to get there.
00:43:10.003 --> 00:43:12.650
I can imagine a future where
you don't have to know all this code.
00:43:12.650 --> 00:43:14.379
These would just be
kind of like Lego bricks
00:43:14.379 --> 00:43:15.801
you can slap together,
00:43:15.801 --> 00:43:18.761
saying, "Here's my database.
Here's the crosswalk. Here's Wikidata,"
00:43:18.761 --> 00:43:21.311
and just put it together,
and you don't have to even code,
00:43:21.311 --> 00:43:23.835
you just have to make sure
the databases are in the right place.
00:43:23.835 --> 00:43:25.375
Yep. Okay.
00:43:26.747 --> 00:43:28.705
(man 11) Sorry. [inaudible]
00:43:28.705 --> 00:43:34.025
I think if I would have done this project,
I'd probably have done it the same way.
00:43:34.025 --> 00:43:36.146
So, I think that's maybe a good sign.
00:43:36.146 --> 00:43:39.725
I was wondering how did
the whole financing work of this project?
00:43:39.725 --> 00:43:40.840
How did the-- I'm sorry?
00:43:40.840 --> 00:43:43.255
The financing of this project work.
00:43:43.795 --> 00:43:45.755
- The financing?
- Yeah, the money.
00:43:46.425 --> 00:43:47.505
That's a good question.
00:43:47.505 --> 00:43:49.185
Well, so, there are different parts of it.
00:43:49.185 --> 00:43:53.073
So, the Knight grant funded
the Wiki Art Depiction Explorer.
00:43:53.198 --> 00:43:56.928
But I, for the last, maybe what--
nine months--
00:43:56.928 --> 00:43:58.768
I've been their Wikimedia strategist.
00:43:58.768 --> 00:44:01.618
So, I've been on
since February of this year.
00:44:01.618 --> 00:44:04.818
So, that's pretty much they're paying
for my time to help with their--
00:44:04.818 --> 00:44:07.968
not only the upload of their collections,
but developing these tools, as well.
00:44:07.968 --> 00:44:11.659
- (Richard) So the Met's paying you?
- Yeah, that's right.
00:44:11.762 --> 00:44:14.894
(Richard) The grant, at least part
of it has come from--
00:44:14.894 --> 00:44:16.959
There was a grant for Open Access.
00:44:16.959 --> 00:44:20.176
And this is under that campaign
and with the digital department.
00:44:20.176 --> 00:44:24.297
So, working as contractors throughout
the Open Access campaign for the Met.
00:44:27.948 --> 00:44:30.116
(man 12) I'm sorry.
I guess before you were hired,
00:44:30.116 --> 00:44:31.313
and before there was a grant,
00:44:31.313 --> 00:44:33.780
there was probably a lot
of volunteer work done to make sure--
00:44:33.780 --> 00:44:35.303
Richard did a lot of work before that.
00:44:35.303 --> 00:44:37.219
And then, Wikimedia New York
did a lot of work,
00:44:37.219 --> 00:44:38.927
but it was kind of in bursts.
00:44:38.927 --> 00:44:41.045
It wasn't as comprehensive
as we're talking about now
00:44:41.045 --> 00:44:45.915
in terms of having-- making sure
those two layers are complete
00:44:45.915 --> 00:44:47.310
in Wikidata.
00:44:48.640 --> 00:44:50.543
Alright, yeah. I think that's it.
00:44:50.543 --> 00:44:53.843
So, I'm happy to talk after lunch,
or after the break, if you want.
00:44:54.683 --> 00:44:56.223
Okay. Thank you.
00:44:56.223 --> 00:44:59.197
(applause)