1
00:00:07,138 --> 00:00:08,288
Thanks folks.
2
00:00:09,627 --> 00:00:11,991
As I mentioned before,
you can load up the slides here
3
00:00:11,991 --> 00:00:16,661
by either the QR code or the short URL,
which is wikidatacon..., this is bit.ly,
4
00:00:16,661 --> 00:00:19,920
wikidatacon19glamstrategies.
5
00:00:19,980 --> 00:00:22,040
And the slides are also
on the program page
6
00:00:22,040 --> 00:00:24,520
on the WikidataCon site.
7
00:00:24,549 --> 00:00:27,269
And then, there's also an Etherpad here
that you can click on.
8
00:00:27,269 --> 00:00:28,959
So, I'll be talking about a lot of things.
9
00:00:28,959 --> 00:00:31,629
that you might have heard about it
at Wikimania, if you were there,
10
00:00:31,629 --> 00:00:34,089
but we are going to go
into a lot more implementation details.
11
00:00:34,089 --> 00:00:36,209
Because we're at WikidataCon,
we can dive deeper
12
00:00:36,209 --> 00:00:38,430
into the Wikidata and technical aspects.
13
00:00:38,430 --> 00:00:41,821
But Richard and myself, we are working
at the Met Museum right now
14
00:00:41,821 --> 00:00:43,200
and their Open Access.
15
00:00:43,200 --> 00:00:45,320
If you didn't know,
about two plus years ago,
16
00:00:45,320 --> 00:00:46,920
entering to the third year,
17
00:00:46,920 --> 00:00:49,320
there's been an Open Access
strategy at the Met,
18
00:00:49,320 --> 00:00:52,763
where they're releasing their images
under CC0 license and their metadata.
19
00:00:52,763 --> 00:00:54,639
And one of the things
they brought us on to do
20
00:00:54,639 --> 00:00:58,409
is what things could we imagine doing
with this Open Access content.
21
00:00:58,409 --> 00:01:00,469
So, we're going to talk
a little bit about that
22
00:01:00,469 --> 00:01:02,598
in terms of the experiments
that we've been running,
23
00:01:02,598 --> 00:01:04,044
and we'd love to hear your feedback.
24
00:01:04,044 --> 00:01:07,028
So, I hope to talk about 20 minutes,
and then hope to get some conversation
25
00:01:07,028 --> 00:01:09,853
with you folks, since we have
a lot of knowledge in this room.
26
00:01:09,923 --> 00:01:12,472
This is the announcement,
and actually the one-year anniversary,
27
00:01:12,472 --> 00:01:16,452
where Katherine Maher was actually there,
at the Met to talk about that anniversary.
28
00:01:16,452 --> 00:01:19,172
So, one of the things that's challenging
I think for a lot of folks
29
00:01:19,172 --> 00:01:21,097
is how do you explain Wikidata,
30
00:01:21,097 --> 00:01:23,911
and this GLAM
contribution strategy to Wikidata
31
00:01:23,911 --> 00:01:27,102
to C-level folks at an organization.
32
00:01:27,102 --> 00:01:31,392
We can talk about it with data scientists,
Wikimedians, librarians, maybe curators,
33
00:01:31,392 --> 00:01:34,452
but when it comes to talking about this
with a director of a museum,
34
00:01:34,452 --> 00:01:36,862
or a director of a library,
what does it actually--
35
00:01:36,862 --> 00:01:38,482
how does it resonate with them?
36
00:01:38,482 --> 00:01:41,352
So, one way that we actually talked
about that I think makes sense,
37
00:01:41,352 --> 00:01:43,978
is everyone knows about Wikipedia,
38
00:01:43,978 --> 00:01:47,799
and for the English language edition,
39
00:01:47,799 --> 00:01:49,733
at least, we're talking
about 6 million articles.
40
00:01:49,733 --> 00:01:51,792
And it sounds like a lot,
but if you think about it,
41
00:01:51,792 --> 00:01:54,361
Wikipedia is not really the sum
of all human knowledge,
42
00:01:54,361 --> 00:01:59,512
it's the sum of all reliably sourced,
mostly western knowledge.
43
00:02:00,281 --> 00:02:02,211
And there's a lot of stuff out there.
44
00:02:02,211 --> 00:02:04,141
We have a lot of stuff
in Commons already--
45
00:02:04,141 --> 00:02:07,382
56 million media files going up
every single day--
46
00:02:07,382 --> 00:02:11,484
but these are very...
a different type of standard
47
00:02:11,484 --> 00:02:13,011
to what goes into Wikimedia Commons.
48
00:02:13,011 --> 00:02:16,431
And the way that we have described
Wikidata to GLAM professionals,
49
00:02:16,431 --> 00:02:18,231
and especially the C levels,
50
00:02:18,231 --> 00:02:22,061
is that what if we could have a repository
that has a notability bar
51
00:02:22,061 --> 00:02:24,381
that is not as high as Wikipedia.
52
00:02:24,381 --> 00:02:26,001
So, we want all these paintings,
53
00:02:26,001 --> 00:02:28,161
but not every painting
necessarily needs an article.
54
00:02:28,581 --> 00:02:30,241
Wikipedia is held back by the fact
55
00:02:30,241 --> 00:02:33,082
that you need to have
language editions of Wikipedia.
56
00:02:33,171 --> 00:02:36,681
So, can we store the famous thing--
things, not strings.
57
00:02:36,681 --> 00:02:40,570
Can we be object oriented
and not really lexical oriented?
58
00:02:40,570 --> 00:02:42,181
And can we store this in a database
59
00:02:42,181 --> 00:02:44,540
that stores facts, figures,
and relationships?
60
00:02:44,540 --> 00:02:46,291
And that's pretty much
what Wikidata does.
61
00:02:46,711 --> 00:02:50,736
And Wikidata is also a universal
kind of crosswalk database to links
62
00:02:50,736 --> 00:02:52,321
to other collections out there.
63
00:02:52,321 --> 00:02:55,119
So, we think this really resonates
with folks when you're talking about
64
00:02:55,119 --> 00:02:58,596
what is the value of Wikidata compared
to what they're normally familiar with,
65
00:02:58,596 --> 00:03:00,326
which is just Wikipedia.
66
00:03:01,346 --> 00:03:02,876
Alright, so what are the benefits?
67
00:03:02,876 --> 00:03:05,086
You're interlinking
your collections with others.
68
00:03:05,086 --> 00:03:07,676
So, unfortunately, I apologize
to librarians here,
69
00:03:07,676 --> 00:03:09,337
I'll be talking mostly about museums,
70
00:03:09,337 --> 00:03:11,816
but a lot of this also is valid
also for libraries.
71
00:03:11,816 --> 00:03:15,867
But you're basically connecting
your collection with the global collection
72
00:03:15,867 --> 00:03:18,166
of linked open data collections.
73
00:03:18,846 --> 00:03:22,276
You can also receive enriched
and improved metadata back
74
00:03:22,276 --> 00:03:25,656
after contributing and linking
your collections to the world.
75
00:03:25,656 --> 00:03:28,436
And there are some pretty neat
interactive multimedia applications
76
00:03:28,436 --> 00:03:30,596
that you get-- I don't want
to say for free,
77
00:03:30,596 --> 00:03:33,596
but your collection in Wikidata
allows you to visualize things
78
00:03:33,596 --> 00:03:35,276
that you've never seen before.
79
00:03:35,276 --> 00:03:36,776
We'll show you some examples.
80
00:03:36,776 --> 00:03:39,737
And so, how do you convey this
to GLAM professionals effectively?
81
00:03:39,737 --> 00:03:41,746
Well, I usually like to start
with storytelling,
82
00:03:41,746 --> 00:03:43,536
and not technical explanations.
83
00:03:43,536 --> 00:03:46,368
Okay, so if everyone here
has a cell phone,
84
00:03:46,368 --> 00:03:49,574
especially if you have an iPhone,
I want you to scan this QR code
85
00:03:49,574 --> 00:03:51,645
and bring up the URL
that it comes up with.
86
00:03:51,645 --> 00:03:53,393
Or if you don't have a QR scanner,
87
00:03:53,393 --> 00:03:58,963
just type in w.wiki/Aij in a web browser.
88
00:04:00,036 --> 00:04:01,942
So go ahead and scan that.
89
00:04:03,280 --> 00:04:04,864
And what comes up?
90
00:04:06,778 --> 00:04:09,458
Does anyone see a knowledge graph
pop up on your screen?
91
00:04:09,516 --> 00:04:11,156
So, for folks here in WikidataCon,
92
00:04:11,156 --> 00:04:13,266
this is probably not
revolutionary for you.
93
00:04:13,266 --> 00:04:16,386
But what it does, it does a SPARQL query
with these objects,
94
00:04:16,386 --> 00:04:18,836
and it shows the linkages between them.
95
00:04:18,836 --> 00:04:20,897
And you can actually drag them
around the screen.
96
00:04:20,897 --> 00:04:22,204
You can actually click on nodes.
97
00:04:22,204 --> 00:04:24,458
If you're [inaudible] in a mobile,
it will expand that--
98
00:04:24,458 --> 00:04:27,554
you can actually start to surf
through Wikidata this way.
99
00:04:27,554 --> 00:04:29,741
So, for Wikidata veterans
this is pretty cool.
100
00:04:29,741 --> 00:04:31,206
One shot, you get this.
101
00:04:31,206 --> 00:04:33,313
For a lot folks who have never seen
Wikidata before,
102
00:04:33,313 --> 00:04:35,574
this is a revolutionary moment for them.
103
00:04:36,176 --> 00:04:39,236
To actually hand-manipulate
a knowledge graph,
104
00:04:39,236 --> 00:04:42,186
and to start surfing through Wikidata
without having to know SPARQL,
105
00:04:42,186 --> 00:04:43,823
without having to know what a Q item is,
106
00:04:43,823 --> 00:04:45,860
without having to know
what a property proposal is,
107
00:04:45,860 --> 00:04:48,623
they can suddenly start seeing
connections in a way that is magical.
108
00:04:48,623 --> 00:04:50,264
Hey, I see [Jacob's] here.
109
00:04:50,264 --> 00:04:52,143
Jacob's been using
some of this code, as well.
110
00:04:52,143 --> 00:04:54,443
So, this is some code
that we'll talk about later on
111
00:04:54,443 --> 00:04:57,254
that allows you to create
these visualizations in Wikidata.
112
00:04:57,254 --> 00:04:59,283
And we've really seen this
turn a lot of heads
113
00:04:59,283 --> 00:05:01,408
who have really
never gotten Wikidata before.
114
00:05:01,408 --> 00:05:04,653
But after seeing these interactive
knowledge graphs, they get it.
115
00:05:04,653 --> 00:05:06,233
They understand the power of this.
116
00:05:06,233 --> 00:05:08,293
And especially this example here,
117
00:05:08,293 --> 00:05:11,304
this was a really big eye-opener
for the folks at the Met,
118
00:05:11,304 --> 00:05:14,545
because this is the artifact
that is the center of this graph,
119
00:05:14,545 --> 00:05:17,823
right there, the Portrait of Madame X,
a very famous portrait.
120
00:05:17,823 --> 00:05:20,982
And they did not even know
that this was the inspiration
121
00:05:20,982 --> 00:05:24,693
for the black dress that Rita Hayworth
wore in the movie Gilda.
122
00:05:24,693 --> 00:05:26,783
So, just by seeing this graph, they said,
123
00:05:26,783 --> 00:05:29,353
"Wait a minute. This is one
of our most visited portraits.
124
00:05:29,353 --> 00:05:31,683
I didn't know that this was true."
125
00:05:31,683 --> 00:05:35,214
And there's actually two other books
published about that painting.
126
00:05:35,214 --> 00:05:38,983
You can see all these things,
not just within the realm of GLAM,
127
00:05:38,983 --> 00:05:41,441
but it extends to fashion,
it extends to literature.
128
00:05:41,441 --> 00:05:43,381
You're starting to see
the global connections
129
00:05:43,381 --> 00:05:47,481
that your artworks have,
or your collections have via Wikidata.
130
00:05:48,722 --> 00:05:50,342
So, how do we do this?
131
00:05:50,842 --> 00:05:53,098
If you can remember nothing else
from this presentation,
132
00:05:53,098 --> 00:05:56,432
this one page is your one-stop shopping.
133
00:05:56,432 --> 00:05:58,592
Now, fortunately, you don't have
to memorize all this.
134
00:05:58,592 --> 00:06:03,292
It's actually right here at
Wikidata:Linked_open_data_workflow.
135
00:06:03,560 --> 00:06:06,170
So, we'll be talking about some
of these different phases
136
00:06:06,170 --> 00:06:10,670
of how you first prepare,
reconcile, and examine
137
00:06:11,160 --> 00:06:14,190
what the GLAM organization might have
and what does Wikidata have.
138
00:06:14,190 --> 00:06:15,374
And then, what are the tools
139
00:06:15,374 --> 00:06:18,664
to actually ingest
and correct or enrich that
140
00:06:18,664 --> 00:06:20,241
once it's in Wikidata.
141
00:06:20,241 --> 00:06:22,691
And then, what are some of ways
to reuse that content,
142
00:06:22,691 --> 00:06:25,161
or to report and create
new things out of it.
143
00:06:25,161 --> 00:06:31,191
So, this is the simpler version of a chart
that Sandra and the GLAM folks
144
00:06:31,191 --> 00:06:33,111
at the foundation have created.
145
00:06:33,111 --> 00:06:35,534
But this is trying
to sum up, in one shot--
146
00:06:35,534 --> 00:06:38,133
because we know how hard things
are to find in Wikidata--
147
00:06:38,133 --> 00:06:41,733
to find in one shot all the different
tools you should pay attention to
148
00:06:41,733 --> 00:06:43,475
as a GLAM organization.
149
00:06:44,969 --> 00:06:50,606
So, just using the Met as an example,
we started with what is the ideal object
150
00:06:50,606 --> 00:06:53,398
that we have in Wikidata
that comes from the Met?
151
00:06:53,398 --> 00:06:55,882
This is a typical shot of a Wikidata item,
152
00:06:55,882 --> 00:06:57,385
in the mobile mode there.
153
00:06:57,385 --> 00:06:59,244
And this is one
of the more famous paintings
154
00:06:59,244 --> 00:07:00,729
we used as a model, here.
155
00:07:00,729 --> 00:07:03,315
We have the label,
description, and aliases.
156
00:07:03,915 --> 00:07:05,225
And then, we found out,
157
00:07:05,225 --> 00:07:07,035
"What are the core statements
that we wanted?"
158
00:07:07,035 --> 00:07:10,035
We wanted instance of, image,
inception, collection.
159
00:07:10,035 --> 00:07:13,239
And what are some other properties
we would like if we had it?
160
00:07:13,239 --> 00:07:15,960
Depiction information,
material used, things like that.
161
00:07:16,879 --> 00:07:19,369
We actually do have an identifier.
162
00:07:19,369 --> 00:07:22,199
The Met object ID is P3634.
163
00:07:22,199 --> 00:07:24,629
So, for some organizations,
you might want to propose
164
00:07:24,629 --> 00:07:28,529
a property just to track your items
using an object ID.
165
00:07:29,369 --> 00:07:31,899
And then, for the Met,
just trying to circumscribe
166
00:07:31,899 --> 00:07:35,519
what objects do we want to upload
and keep in Wikidata--
167
00:07:35,519 --> 00:07:38,927
the thing that we first identified
were collection highlights.
168
00:07:38,927 --> 00:07:43,649
These are like a hand-selected set
of 1,500 to 1,000 items
169
00:07:43,678 --> 00:07:48,878
that were going to be given priority
to upload to Wikidata.
170
00:07:48,939 --> 00:07:51,709
So, Richard and the crew
out of Wikimedia in New York
171
00:07:51,709 --> 00:07:53,105
did a lot of this early work.
172
00:07:53,105 --> 00:07:55,571
And then, now, we're systematically
going through to make sure
173
00:07:55,571 --> 00:07:56,689
they're all complete.
174
00:07:56,689 --> 00:07:58,221
And there's a secondary set
175
00:07:58,221 --> 00:08:01,390
called the Heilbrunn Timeline
of Art History-- about 8,000 items
176
00:08:01,390 --> 00:08:07,149
that are seminal pieces of work,
artists' works throughout history.
177
00:08:07,149 --> 00:08:09,499
And there are about 8,000
that the Met has identified,
178
00:08:09,499 --> 00:08:11,812
and we're also putting that
on Wikidata, as well,
179
00:08:11,812 --> 00:08:13,143
using a different destination.
180
00:08:13,143 --> 00:08:16,271
Here, described by source--
Heilbrunn Timeline of Art History.
181
00:08:16,271 --> 00:08:19,841
So, the collection highlight
is denoted here as collection--
182
00:08:19,841 --> 00:08:21,265
Metropolitan Museum of Art,
183
00:08:21,265 --> 00:08:22,976
subject has role collection highlight.
184
00:08:22,976 --> 00:08:26,872
And then, these 8,000
or so are like that in Wikidata.
185
00:08:29,741 --> 00:08:33,816
I couldn't show this chart at Wikimania,
because it's too complicated.
186
00:08:33,816 --> 00:08:35,389
But WikidataCon, we can.
187
00:08:35,389 --> 00:08:38,845
So, this is something that is really hard
to answer sometimes.
188
00:08:39,490 --> 00:08:42,169
What makes something
in Wikidata from the Met,
189
00:08:42,169 --> 00:08:44,658
or from the New York Public Library,
or from your organization?
190
00:08:44,658 --> 00:08:47,609
And the answer is not easy.
It's: depends.
191
00:08:47,644 --> 00:08:49,684
It's complicated, it can be multi-factor.
192
00:08:49,684 --> 00:08:53,254
So, you could say, "Well, if I had
an object ID in Wikidata,
193
00:08:53,254 --> 00:08:54,804
that is an embed object."
194
00:08:54,804 --> 00:08:56,674
But maybe someone didn't enter that.
195
00:08:56,674 --> 00:08:59,924
Maybe they only put in
Collection: Met which is P195,
196
00:08:59,924 --> 00:09:02,684
or they put in the accession number,
197
00:09:02,684 --> 00:09:06,984
and they put collection as the qualifier
to that accession number.
198
00:09:06,984 --> 00:09:11,454
So, there's actually, one, two, three
different ways to try to find Met objects.
199
00:09:11,454 --> 00:09:14,214
And probably the best way to do it
is through a union like this.
200
00:09:14,214 --> 00:09:16,173
So, you combine all three,
and you come back,
201
00:09:16,173 --> 00:09:18,064
and you make a list out of it.
202
00:09:18,064 --> 00:09:20,813
So unfortunately, there is
no one clean query
203
00:09:20,813 --> 00:09:23,684
that'll guarantee you all the Met objects.
204
00:09:23,684 --> 00:09:27,873
This is probably
the best approach for this.
205
00:09:27,873 --> 00:09:29,384
And for some institutions,
206
00:09:29,384 --> 00:09:32,505
they're probably doing
something similar to that right now.
207
00:09:32,505 --> 00:09:35,824
Alright, so example here,
is that what you see here
208
00:09:35,824 --> 00:09:39,684
manifests itself differently--
not differently, but as this in a query,
209
00:09:39,684 --> 00:09:40,904
which can get pretty complex.
210
00:09:40,904 --> 00:09:43,063
So, if we're looking
for all the collection highlights,
211
00:09:43,063 --> 00:09:47,713
we'd break this out into the statement
and then the qualifier as this:
212
00:09:47,782 --> 00:09:49,712
subject has role collection highlight.
213
00:09:49,712 --> 00:09:51,450
So, that's one way that we sort out
214
00:09:51,450 --> 00:09:54,124
some of these special
designations in Wikidata.
215
00:09:55,166 --> 00:09:58,716
So, the summary is,
representing "The Met" is multifaceted,
216
00:09:58,716 --> 00:10:01,536
and needs to balance simplicity
and findability.
217
00:10:01,536 --> 00:10:04,896
How many people here have heard
of Sum of All Paintings as a project?
218
00:10:04,995 --> 00:10:07,088
Ooh, God, good, a lot of you!
219
00:10:07,088 --> 00:10:09,105
So, it's probably one
of the most active ones
220
00:10:09,105 --> 00:10:10,525
that deals with these issues.
221
00:10:10,525 --> 00:10:17,057
So, we always debate whether we should
model things super-accurately,
222
00:10:17,057 --> 00:10:19,815
or should you model things
so that they're findable.
223
00:10:19,815 --> 00:10:21,997
These are kind of at odds with each other.
224
00:10:21,997 --> 00:10:24,232
So, we usually prefer findability.
225
00:10:24,232 --> 00:10:27,001
It's no good if it's perfectly modeled,
but no one can ever find it,
226
00:10:27,001 --> 00:10:30,013
because it's so strict
in terms of how it's defined at Wikidata.
227
00:10:30,013 --> 00:10:31,882
And then, we have some challenges.
228
00:10:31,882 --> 00:10:35,367
Multiple artifacts might be tied
to one object ID,
229
00:10:35,367 --> 00:10:37,396
which might be different in Wikidata.
230
00:10:37,396 --> 00:10:42,097
And then, mapping the Met classification
to instances has some complex cases.
231
00:10:42,097 --> 00:10:44,282
So, the way that the Met classifies things
232
00:10:44,282 --> 00:10:46,775
doesn't always fit
with how Wikidata classifies things.
233
00:10:46,775 --> 00:10:49,982
So, we show you some examples here
of how this works.
234
00:10:49,982 --> 00:10:53,602
So, this is a great example
of using a Python library
235
00:10:53,602 --> 00:10:56,487
to actually ingest
what we know from the Met,
236
00:10:56,487 --> 00:10:58,313
and then try to sort out what they have.
237
00:10:58,313 --> 00:10:59,887
So, this is just for textiles.
238
00:10:59,887 --> 00:11:02,076
You can see that they got
a lot of detail here
239
00:11:02,076 --> 00:11:05,399
in terms of woven textiles, laces,
printed, trimmings, velvets.
240
00:11:05,399 --> 00:11:07,907
We first looked into this in Wikidata.
241
00:11:07,907 --> 00:11:10,175
We did not have
this level of detail in Wikidata.
242
00:11:10,175 --> 00:11:12,207
We still don't have all this resolved.
243
00:11:12,207 --> 00:11:14,764
You can see that this
is really complex here.
244
00:11:14,764 --> 00:11:18,012
Anonymous is just not anonymous
for a lot of databases.
245
00:11:18,012 --> 00:11:20,126
There's a lot of qualifications--
246
00:11:20,126 --> 00:11:23,045
whether the nationality, or the century.
247
00:11:23,045 --> 00:11:26,282
So, trying to map all this to Wikidata
can be complex, as well.
248
00:11:26,282 --> 00:11:30,450
And then, this shows you
that of all the works in the Met,
249
00:11:30,450 --> 00:11:33,976
about 46% are open access right now.
250
00:11:33,976 --> 00:11:38,694
So, we still have about just over 50%
that are not CC0 yet.
251
00:11:40,134 --> 00:11:43,444
(man) All the objects in the Met,
or all objects on display?
252
00:11:43,444 --> 00:11:45,957
(Andrew) It's weird. It's not on display.
253
00:11:45,957 --> 00:11:47,866
But it's not all objects either.
254
00:11:47,866 --> 00:11:52,176
It's about 400 to 500 thousand objects
in their database at this point.
255
00:11:52,176 --> 00:11:53,840
So, somewhere in between.
256
00:11:55,380 --> 00:11:57,609
So, starting points.
This is always a hard one.
257
00:11:57,609 --> 00:12:03,514
We just had this discussion
on the Facebook group recently
258
00:12:03,514 --> 00:12:04,923
about where do people go
259
00:12:04,923 --> 00:12:07,887
to find out where the modeling
should look like for a certain thing.
260
00:12:07,887 --> 00:12:09,271
It's not easy.
261
00:12:09,271 --> 00:12:12,115
So, normally, what we have to do
is just point people to,
262
00:12:12,115 --> 00:12:15,281
I don't know, some project
that does it well now?
263
00:12:15,281 --> 00:12:17,230
So, it's not a satisfying answer,
264
00:12:17,230 --> 00:12:19,910
but we usually tell folks
to start at things like visual arts,
265
00:12:19,910 --> 00:12:22,308
or Sum of All Paintings
does it pretty well,
266
00:12:22,308 --> 00:12:25,569
or just go to the project chat to find out
where some of these things are.
267
00:12:25,569 --> 00:12:27,444
We need better solutions for this.
268
00:12:27,444 --> 00:12:30,939
This is just a basic flow
of what we're doing with the Met here.
269
00:12:30,939 --> 00:12:33,119
We're basically taking
their CSV, and their API,
270
00:12:33,119 --> 00:12:35,979
and we're consuming it
into a Python data frame.
271
00:12:35,979 --> 00:12:38,159
We're taking the SPARQL code--
272
00:12:38,159 --> 00:12:40,499
the one that you saw
before, this super union--
273
00:12:40,499 --> 00:12:43,779
bring that in, and we're doing
a bi-directional diff,
274
00:12:43,779 --> 00:12:45,999
and then seeing what new things
have been added here,
275
00:12:45,999 --> 00:12:47,729
what things have been subtracted there,
276
00:12:47,729 --> 00:12:51,529
and we're actually making those changes
either through QuickStatements,
277
00:12:51,529 --> 00:12:53,439
or we're doing it through Pywikibot.
278
00:12:53,439 --> 00:12:55,512
So, directly editing Wikidata.
279
00:12:56,204 --> 00:12:59,405
So, this is the big slide
I also couldn't show at Wikimania,
280
00:12:59,405 --> 00:13:01,485
because it would have flummoxed everyone.
281
00:13:01,485 --> 00:13:04,924
So, this is a great example
of how we start with the Met database,
282
00:13:04,924 --> 00:13:06,824
we have this crosswalk database,
283
00:13:06,824 --> 00:13:09,209
and then we generate
the changes in Wikidata.
284
00:13:09,209 --> 00:13:12,644
The way this works is this is an example
of one record from the Met.
285
00:13:12,644 --> 00:13:15,744
This is an evening dress-- we're working
with the Costume Institute recently,
286
00:13:15,744 --> 00:13:17,518
the one that puts on the Met Gala.
287
00:13:17,518 --> 00:13:20,442
So, we have one evening dress
here, by Valentina.
288
00:13:20,442 --> 00:13:22,100
Here's a date, accession number.
289
00:13:22,100 --> 00:13:25,105
So, these things can be put
into Wikidata directly.
290
00:13:25,105 --> 00:13:27,744
A field equals the date, accession number.
291
00:13:27,744 --> 00:13:29,404
But what do we do with things like this?
292
00:13:29,404 --> 00:13:33,868
This is an object name, which is basically
like a classification of what it is,
293
00:13:33,868 --> 00:13:35,648
like an instance of for the Met.
294
00:13:35,648 --> 00:13:37,396
And the designer's Valentina.
295
00:13:37,396 --> 00:13:41,571
So, what we do is we take these
and we run all the unique object names
296
00:13:41,571 --> 00:13:43,801
and all the unique designers
through OpenRefine.
297
00:13:43,801 --> 00:13:46,720
So, we get maybe 60% matches
if we're lucky.
298
00:13:46,720 --> 00:13:48,418
We put that into a spreadsheet.
299
00:13:48,418 --> 00:13:53,178
Then we ask volunteers
or the curators at the Met
300
00:13:53,178 --> 00:13:55,333
to help fill in this crosswalk database.
301
00:13:55,333 --> 00:13:57,312
This is just simply Google Sheets.
302
00:13:57,312 --> 00:13:59,911
So, we say, here are all the object names,
the unique object names
303
00:13:59,911 --> 00:14:02,731
that match lexically exactly
with what's in the Met database,
304
00:14:02,731 --> 00:14:05,912
and then you say this maps to this Q ID.
305
00:14:05,912 --> 00:14:08,556
So, we first started
this maybe like only about--
306
00:14:08,556 --> 00:14:11,233
well, 60% were failed,
some of these were blank.
307
00:14:11,233 --> 00:14:13,751
So, we tap folks in specific groups.
308
00:14:13,751 --> 00:14:17,316
So there's like a Wiki Loves Fashion
little chat group that we have.
309
00:14:17,316 --> 00:14:20,304
And folks like user PKM
were super useful in this area.
310
00:14:20,304 --> 00:14:22,794
So she spent a lot of time
looking through this, and saying,
311
00:14:22,794 --> 00:14:24,764
"Okay, Evening suit is this,
Ewer is that."
312
00:14:24,764 --> 00:14:27,759
So, we looked through
and made all this mappings here.
313
00:14:27,759 --> 00:14:30,719
And then, what happens is now,
when we see this in the Met database,
314
00:14:30,719 --> 00:14:33,201
we look it up in the crosswalk database,
and we say, "Oh, yeah.
315
00:14:33,201 --> 00:14:36,169
These are the two Q numbers
we need to put into Wikidata."
316
00:14:36,169 --> 00:14:39,089
And then, it generates
the QuickStatement right there.
317
00:14:39,089 --> 00:14:41,328
Same thing here with Designer: Valentina.
318
00:14:41,328 --> 00:14:44,138
If Valentina matches here,
then it gets generated
319
00:14:44,138 --> 00:14:45,838
with that QuickStatement right there.
320
00:14:45,838 --> 00:14:48,069
If Valentina does not exist,
then we'll create it.
321
00:14:48,069 --> 00:14:51,288
You can see here, Weeks--
look at that high Q ID right there.
322
00:14:51,288 --> 00:14:53,918
We just created that recently,
because there was no entry before.
323
00:14:53,918 --> 00:14:55,358
Does that makes sense to everyone?
324
00:14:55,358 --> 00:14:57,727
- (man 2) What's the extra statement?
- (Andrew) I'm sorry?
325
00:14:57,727 --> 00:15:00,610
- (man 2) What's the extra statement?
- (Andrew) Oh, the extra statement.
326
00:15:00,610 --> 00:15:03,131
So, believe it or not, we have
an Evening blouse, Evening dress,
327
00:15:03,131 --> 00:15:05,010
Evening pants,
Evening ensemble, Evening hat--
328
00:15:05,010 --> 00:15:08,650
do we want to make a new Wikidata item
for Evening pants,Evening everything?
329
00:15:08,650 --> 00:15:10,444
So, we said, "No."
We probably don't want to.
330
00:15:10,444 --> 00:15:13,859
We'll just say, "It's a dress,
but it's also evening wear",
331
00:15:13,859 --> 00:15:15,117
which is what that is.
332
00:15:15,117 --> 00:15:17,301
So, we're saying an instance
of both things.
333
00:15:17,931 --> 00:15:21,398
I'm not sure it's the perfect solution,
but it's a solution at this point.
334
00:15:21,744 --> 00:15:22,944
So, does everyone get that?
335
00:15:22,944 --> 00:15:25,564
So, this is kind of a crosswalk database
that we maintain here.
336
00:15:25,564 --> 00:15:28,025
And the nice thing about it,
it's just Google Sheets.
337
00:15:28,025 --> 00:15:29,264
So, we can get people to help
338
00:15:29,264 --> 00:15:31,375
that don't need to know
anything about this database,
339
00:15:31,375 --> 00:15:34,384
don't need to know about QuickStatements,
don't need to know about queries.
340
00:15:34,384 --> 00:15:36,226
They just go in and fill in the Q number.
341
00:15:36,226 --> 00:15:37,244
Yeah.
342
00:15:37,244 --> 00:15:40,902
(woman) So, when you copy
object name and you find the Q ID,
343
00:15:40,902 --> 00:15:43,145
the initial 60%
that you mentioned as an example,
344
00:15:43,145 --> 00:15:45,223
is that by exact match?
345
00:15:46,483 --> 00:15:48,103
(Andrew) Well, it's through OpenRefine.
346
00:15:48,103 --> 00:15:52,014
So, it does its best guess,
and then we verify to make sure
347
00:15:52,014 --> 00:15:54,444
that the OpenRefine match makes sense.
348
00:15:54,444 --> 00:15:56,114
Yeah.
349
00:15:56,203 --> 00:15:57,794
Does that make sense to everyone?
350
00:15:57,794 --> 00:16:00,304
So, some folks might be doing
some variation on this,
351
00:16:00,304 --> 00:16:03,403
but I think the nice thing about this
is that, by using Google Sheets,
352
00:16:03,403 --> 00:16:08,234
we remove a lot of the complexities
of these two areas from this.
353
00:16:08,234 --> 00:16:11,193
And we'll show you some code
that does this later on.
354
00:16:11,813 --> 00:16:15,273
- (man 3) How do you generate [inaudible]?
- (Andrew) How do you generate this?
355
00:16:15,273 --> 00:16:17,272
- (man 3) Yes.
- (Andrew) Python code.
356
00:16:17,272 --> 00:16:19,134
I'll show you a line that does this.
357
00:16:19,134 --> 00:16:21,136
But you can also go up here.
358
00:16:21,136 --> 00:16:25,096
This is the whole Python program
that does this, this, and that,
359
00:16:25,096 --> 00:16:27,296
if you want to take a look at that.
360
00:16:28,026 --> 00:16:29,026
Yes.
361
00:16:29,026 --> 00:16:31,207
(man 4) Did you really use
your own vocabulary,
362
00:16:31,207 --> 00:16:35,426
or is there something [inaudible].
363
00:16:35,426 --> 00:16:37,246
- (Andrew) This right here?
- (man 4) Yeah.
364
00:16:37,246 --> 00:16:39,721
(Andrew) Yeah. So, this
is the Met's own vocabulary.
365
00:16:39,721 --> 00:16:43,031
So, most museums use
a system called TMS.
366
00:16:43,031 --> 00:16:44,891
It's like their own management system.
367
00:16:44,891 --> 00:16:47,654
So, they'll usually--
this is the museum world--
368
00:16:47,654 --> 00:16:50,771
they'll usually roll
their own vocabulary for their own needs.
369
00:16:50,771 --> 00:16:54,022
Museums are very late
to interoperable metadata.
370
00:16:54,022 --> 00:16:57,282
Librarians and archivists have this
kind of as baked into them.
371
00:16:57,282 --> 00:16:58,664
Museums are like, "Meh..."
372
00:16:58,664 --> 00:17:01,471
Our primary goal
is to put objects on display,
373
00:17:01,471 --> 00:17:04,141
and if it plays well with other people,
that's a side benefit.
374
00:17:04,141 --> 00:17:05,931
But it's not a primary thing that they do.
375
00:17:05,931 --> 00:17:08,031
So, that's why it's complicated
to work with museums.
376
00:17:08,031 --> 00:17:11,161
You need to map their vocabulary,
which might be a mish-mash
377
00:17:11,161 --> 00:17:14,576
of famous vocabularies,
like Getty AAT, and other things.
378
00:17:14,576 --> 00:17:17,911
But usually, it's to serve
their exact needs at their museum.
379
00:17:17,911 --> 00:17:19,591
And that's what's challenging.
380
00:17:19,591 --> 00:17:21,091
And I see a lot of heads nodding,
381
00:17:21,091 --> 00:17:23,161
so you've probably seen this a lot
at these museums.
382
00:17:23,161 --> 00:17:25,429
So, I'll move on to show you
how this actually is done.
383
00:17:25,429 --> 00:17:26,749
Oh, go ahead.
384
00:17:26,749 --> 00:17:28,711
(man 5) How do you
bring people, to collaborate,
385
00:17:28,711 --> 00:17:31,595
and put some Q codes into your database?
386
00:17:31,595 --> 00:17:32,971
(Andrew) How do you-- I'm sorry?
387
00:17:32,971 --> 00:17:35,038
(man 5) How do you bring...
collaborate people?
388
00:17:35,038 --> 00:17:38,290
(Andrew) Ah, so for this,
these are projects we just go to,
389
00:17:38,780 --> 00:17:41,750
for better or for worse,
like Facebook chat groups that we know,
390
00:17:41,750 --> 00:17:43,007
are active in these areas.
391
00:17:43,007 --> 00:17:45,685
Like Sum of All Paintings,
Wiki Loves Fashion--
392
00:17:45,685 --> 00:17:47,918
which is a group
of maybe five or seven folks.
393
00:17:48,548 --> 00:17:50,759
But we need a better way
to get this out to folks
394
00:17:50,759 --> 00:17:52,339
so we get more collaborators on this.
395
00:17:52,339 --> 00:17:53,879
This doesn't scale well, right now.
396
00:17:53,879 --> 00:17:56,089
But for small groups,
it works pretty well.
397
00:17:56,108 --> 00:17:57,568
I'm open to ideas.
398
00:17:57,568 --> 00:17:59,619
(man 5) [inaudible]
399
00:17:59,619 --> 00:18:01,669
(Andrew) Oh yeah. Please come on up.
400
00:18:01,669 --> 00:18:02,948
If folks want to come up here,
401
00:18:02,948 --> 00:18:05,357
there's a little more room
in the aisle right here.
402
00:18:06,057 --> 00:18:09,629
So, we are utilizing Python
for this mostly.
403
00:18:09,774 --> 00:18:13,354
If you don't know, there is
a Python notebook system
404
00:18:13,354 --> 00:18:14,884
that WMFLabs has.
405
00:18:14,884 --> 00:18:17,345
So, you can actually go on
and start playing with this.
406
00:18:17,345 --> 00:18:19,624
So, it's pretty easy
to generate a lot of stuff
407
00:18:19,624 --> 00:18:21,401
if you know some of the code that's there.
408
00:18:21,401 --> 00:18:22,455
[inaudible], yeah.
409
00:18:22,485 --> 00:18:23,922
(woman 2) Why do you put everything
410
00:18:23,922 --> 00:18:27,821
into Wikidata,
and not into your own Wikibase?
411
00:18:29,401 --> 00:18:31,127
(Andrew) If you're using
your own Wikibase?
412
00:18:31,127 --> 00:18:33,741
(woman 2) Yeah. Why don't you
use your own Wikibase?
413
00:18:33,741 --> 00:18:35,990
and then go to [inaudible]
414
00:18:35,990 --> 00:18:38,390
(Andrew) That's its own ball of--
415
00:18:38,390 --> 00:18:41,630
I don't want to maintain
my own Wikibase at this point. (laughs)
416
00:18:42,190 --> 00:18:44,400
If I can avoid doing
the Wikibase maintenance,
417
00:18:44,400 --> 00:18:45,760
I would not do it.
418
00:18:46,530 --> 00:18:48,080
(man 6) Would you like a Wikibase?
419
00:18:48,080 --> 00:18:50,050
(Andrew) We could. It's possible.
420
00:18:50,050 --> 00:18:54,154
(man 7) But again,
what they use [inaudible]
421
00:18:54,154 --> 00:18:59,868
about 2,000, 8,000, 10,000,
of 400,000 digital [inaudible].
422
00:18:59,868 --> 00:19:04,300
So that's only 2.5%,
423
00:19:04,300 --> 00:19:08,782
[inaudible]
424
00:19:08,782 --> 00:19:12,601
(Andrew) So, I'd say, solve it for 1,500,
then scale up to 150 thousand.
425
00:19:12,601 --> 00:19:14,428
So, we're trying to solve it
426
00:19:14,428 --> 00:19:16,876
for the best
well-known objects, and then--
427
00:19:16,876 --> 00:19:19,875
(man 7) When do you think
that will happen?
428
00:19:20,855 --> 00:19:25,788
I understand that those are people
that shouldn't go onto Wikidata.
429
00:19:25,788 --> 00:19:29,856
So you go to Commons
or your own Wikibase solution,
430
00:19:29,856 --> 00:19:31,695
not to be a [inaudible]--
431
00:19:31,695 --> 00:19:34,588
(Andrew) Right. That's why we're going
with the 2,000 and 8,000.
432
00:19:34,588 --> 00:19:37,460
We're pretty confident
these are highly notable objects
433
00:19:37,460 --> 00:19:39,085
that deserve to be in Wikidata.
434
00:19:39,085 --> 00:19:40,465
Beyond that, it's debatable.
435
00:19:40,465 --> 00:19:44,265
So, that's why we're not
vacuuming 400-thousand things at one shot.
436
00:19:44,265 --> 00:19:48,936
We're starting with notable 2,000,
notable 8,000, then we'll talk after that.
437
00:19:49,515 --> 00:19:52,775
So, these are the two lines of code
that do the most stuff here.
438
00:19:52,775 --> 00:19:54,217
So, even if you don't know Python,
439
00:19:54,217 --> 00:19:56,146
it's actually not that bad
if you look at this.
440
00:19:56,146 --> 00:19:58,105
There's a read_csv function.
441
00:19:58,105 --> 00:20:00,015
You're taking the crosswalk URL,
442
00:20:00,015 --> 00:20:02,336
basically, the URL
of that Google Spreadsheet.
443
00:20:02,336 --> 00:20:04,875
You're grabbing the spreadsheet
that's called "Object Name",
444
00:20:04,875 --> 00:20:06,685
and you're basically creating
a data structure
445
00:20:06,685 --> 00:20:08,165
that has the Object Name and the QID.
446
00:20:08,165 --> 00:20:09,645
That's it. That's all you're doing.
447
00:20:09,645 --> 00:20:11,655
Just pulling that in to the Python code.
448
00:20:11,655 --> 00:20:15,914
Then, you're actually matching
whatever the entity's name is,
449
00:20:15,914 --> 00:20:17,754
and then looking up the QID.
450
00:20:17,754 --> 00:20:21,689
Okay, so, this is just to tell you
that's not super hard.
451
00:20:21,689 --> 00:20:24,234
The code is available right there,
if you want to look at it.
452
00:20:24,234 --> 00:20:26,474
But these two lines of code,
which takes a little while
453
00:20:26,474 --> 00:20:29,524
when you're writing it from scratch
to create these two lines of code,
454
00:20:29,524 --> 00:20:30,904
but once you have an example,
455
00:20:30,904 --> 00:20:34,484
it's pretty darn easy to plug in
your own data set, your own crosswalk,
456
00:20:34,484 --> 00:20:36,844
to generate the QuickStatements.
457
00:20:36,844 --> 00:20:38,525
So, I've done a lot of the work already,
458
00:20:38,525 --> 00:20:41,385
and I invite you
to steal the code and try it.
459
00:20:42,365 --> 00:20:44,936
So, when it comes to images,
it's a little more challenging.
460
00:20:44,936 --> 00:20:48,215
So, at this point, Pattypan
is probably your best bet.
461
00:20:48,215 --> 00:20:51,385
Pattypan is a tool that is
a spreadsheet-oriented tool.
462
00:20:51,385 --> 00:20:54,855
You fill in the metadata, you point
to the local file on your computer,
463
00:20:54,855 --> 00:20:57,435
and it uploads it to Commons
with all that information,
464
00:20:57,435 --> 00:21:02,125
or another alternative
is if you set P4765 to a URL--
465
00:21:03,105 --> 00:21:06,195
because this is the Commons-compatible
image available at URL,
466
00:21:06,195 --> 00:21:08,544
Martin Dahhmers has a bot,
at least for paintings,
467
00:21:08,544 --> 00:21:12,020
that will just swoop through and say,
"Oh, we don't have this image.
468
00:21:12,020 --> 00:21:15,113
Here's a Commons compatible one.
469
00:21:15,113 --> 00:21:17,709
Why don't I slip it from that site
and put it into Commons?"
470
00:21:17,709 --> 00:21:18,995
And that's what his bot does.
471
00:21:18,995 --> 00:21:20,733
So, you can actually take
a look at his bot
472
00:21:20,733 --> 00:21:24,102
and modify it for your own purposes,
but that is also another alternative
473
00:21:24,102 --> 00:21:28,061
that doesn't require you
to do some spreadsheet work there.
474
00:21:28,061 --> 00:21:30,452
If you might have heard
of GLAM Wiki Toolset,
475
00:21:30,452 --> 00:21:32,552
it's effectively end
of life at this point.
476
00:21:33,322 --> 00:21:37,362
It hasn't been updated, and even the folks
who have been working with it in the past
477
00:21:37,362 --> 00:21:39,332
have said Pattypan
is probably your best bet.
478
00:21:39,332 --> 00:21:41,722
Has anyone used GWT these days?
479
00:21:41,741 --> 00:21:43,591
A few of you, a little bit.
480
00:21:43,591 --> 00:21:45,161
It's just not being further developed,
481
00:21:45,161 --> 00:21:47,852
and it's not compatible with a lot
of our authentication protocols
482
00:21:47,852 --> 00:21:49,280
that we have now.
483
00:21:49,280 --> 00:21:52,928
Okay. So, right now, we have basic
metadata added to Wikidata,
484
00:21:52,928 --> 00:21:54,997
with pretty good results from the Met,
485
00:21:54,997 --> 00:21:58,117
and we have a Python script here
to also analyze that.
486
00:21:58,117 --> 00:22:00,307
You're welcome to steal
some of that code, as well.
487
00:22:00,307 --> 00:22:02,817
So, this is what we are showing
to the Met folks, now.
488
00:22:02,817 --> 00:22:06,087
We actually have Listeria lists
that are running
489
00:22:06,087 --> 00:22:07,627
to show all the inventory
490
00:22:07,627 --> 00:22:10,967
and all the information
that we have in Wikidata.
491
00:22:10,967 --> 00:22:15,612
And I'll show you very quickly
about a project that we ran to show folks.
492
00:22:15,612 --> 00:22:18,547
So, what are the benefits of adding
your collections to Wikidata?
493
00:22:18,547 --> 00:22:21,917
One is to use AI in the image classifier
494
00:22:21,917 --> 00:22:24,787
to actually help train
a machine learning model
495
00:22:24,787 --> 00:22:29,447
with all the Met's images and keywords,
and let that be an engine for other folks
496
00:22:29,447 --> 00:22:32,047
to recognize content.
497
00:22:32,047 --> 00:22:36,408
So, this is a hack-a-thon that we had
with MIT and Microsoft last year.
498
00:22:36,408 --> 00:22:39,238
The way this works, is we have
the paintings from the Met,
499
00:22:39,238 --> 00:22:40,277
and we have the keywords
500
00:22:40,277 --> 00:22:43,157
that they actually paid a crew
for six months to work on
501
00:22:43,157 --> 00:22:46,937
to add hand keyword tags
to all the artworks.
502
00:22:47,567 --> 00:22:50,077
We ingested that
into an AI system right here,
503
00:22:50,077 --> 00:22:51,367
and then, what we did was say,
504
00:22:51,367 --> 00:22:55,428
"Let's feed in new images that
this AI ML system had never seen before,
505
00:22:55,428 --> 00:22:56,747
and see what comes out."
506
00:22:56,747 --> 00:23:00,037
And the problem is that it comes out
with pretty good results,
507
00:23:00,037 --> 00:23:02,267
but it's maybe only 60% accurate.
508
00:23:02,267 --> 00:23:04,797
And for most folks,
60% accurate is garbage.
509
00:23:04,797 --> 00:23:08,627
How do I get the 60% good
out of this pile of stuff?
510
00:23:08,627 --> 00:23:11,127
The good news is that our community
knows how to do that.
511
00:23:11,127 --> 00:23:13,157
We can actually feed this
into a Wikidata game
512
00:23:13,157 --> 00:23:14,997
and get the good stuff out of that.
513
00:23:14,997 --> 00:23:16,228
That's basically what we did.
514
00:23:16,228 --> 00:23:17,647
So, this is the Wikidata game--
515
00:23:17,647 --> 00:23:19,757
you'll notice this is
Magnus' interface right there--
516
00:23:19,757 --> 00:23:21,182
being played at the Met Museum,
517
00:23:21,182 --> 00:23:22,207
in the lobby.
518
00:23:22,207 --> 00:23:25,437
We actually had folks at a cocktail party
drinking champagne
519
00:23:25,437 --> 00:23:27,427
and hitting buttons on the screen.
520
00:23:27,427 --> 00:23:31,048
Hopefully, accurately. (chuckles)
521
00:23:31,048 --> 00:23:33,444
(applause)
522
00:23:33,444 --> 00:23:35,116
We had journalists, curators,
523
00:23:35,116 --> 00:23:37,506
we had some board members
from the Met there as well.
524
00:23:37,506 --> 00:23:38,810
And this was great.
525
00:23:38,810 --> 00:23:40,061
No log in, whatever.
526
00:23:40,061 --> 00:23:42,106
(lowers voice) We created
an account just for this.
527
00:23:42,106 --> 00:23:44,117
So, they just hit yes-no-yes-no.
528
00:23:44,117 --> 00:23:45,256
This is great.
529
00:23:45,256 --> 00:23:47,526
You saw this, it said,
"Is there a tree in this picture?"
530
00:23:47,526 --> 00:23:49,148
You don't have to train anyone on this.
531
00:23:49,148 --> 00:23:52,213
You just hit yes--
depicts a tree, not depicted.
532
00:23:52,213 --> 00:23:55,910
I even had my eight-year-old boys
play this game with a finger tap.
533
00:23:56,540 --> 00:24:00,047
And we also created a little tool
that showed all the depictions going by
534
00:24:00,047 --> 00:24:01,505
so people could see them.
535
00:24:03,189 --> 00:24:06,453
It basically is like--
how do you sift good from bad?
536
00:24:06,453 --> 00:24:08,350
This is where the Wikimedia
community comes in,
537
00:24:08,350 --> 00:24:11,034
that no other entity could ever do.
538
00:24:12,084 --> 00:24:15,052
So, in that first few months
that we had this,
539
00:24:15,052 --> 00:24:19,017
over 7,000 judgments,
resulting in about 5,000 edits.
540
00:24:19,912 --> 00:24:22,227
We did really well on tree,
boat, flower, horse,
541
00:24:22,227 --> 00:24:24,907
things that are in landscape paintings.
542
00:24:25,146 --> 00:24:27,466
But when you go to things
like gender discrimination,
543
00:24:27,466 --> 00:24:29,901
and cats and dogs, not so good, I know.
544
00:24:29,901 --> 00:24:32,159
Because there's so many different
types of cats and dogs
545
00:24:32,159 --> 00:24:33,456
in different positions.
546
00:24:33,456 --> 00:24:36,105
But horses, a lot easier
than cats and dogs.
547
00:24:36,735 --> 00:24:38,742
But also, I should note
that Wikimedia Foundation
548
00:24:38,742 --> 00:24:42,697
is now looking into doing
image recognition on Commons uploads
549
00:24:42,697 --> 00:24:46,368
to do these suggestions as well,
which is an awesome development.
550
00:24:46,667 --> 00:24:49,627
Okay, so, dashboards.
551
00:24:50,750 --> 00:24:53,358
Let's just show you
some of these dashboards.
552
00:24:53,418 --> 00:24:55,097
Folks you work with love dashboards.
553
00:24:55,097 --> 00:24:56,817
They just want to see stats.
554
00:24:56,817 --> 00:24:58,797
So, we have them, like BaGLAMa.
555
00:24:58,797 --> 00:25:00,787
We have InteGraality.
556
00:25:00,787 --> 00:25:02,767
Is JeanFred here?
557
00:25:03,447 --> 00:25:06,247
I think this is a very new thing
relative to last WikidataCon.
558
00:25:06,247 --> 00:25:08,327
We actually have a tool
which will create
559
00:25:08,327 --> 00:25:10,967
this property completeness
chart right here.
560
00:25:10,967 --> 00:25:12,987
So, it's called InteGraality,
with two A's.
561
00:25:13,206 --> 00:25:15,526
It's on that big chart
that I showed you before.
562
00:25:15,526 --> 00:25:19,086
And it can just autogenerate
how complete your items are
563
00:25:19,086 --> 00:25:21,036
in any set, which is really cool.
564
00:25:21,566 --> 00:25:23,771
So, we can see that paintings
are by far the highest,
565
00:25:23,771 --> 00:25:26,057
we have sculptures, drawings, photographs.
566
00:25:26,121 --> 00:25:29,322
And then, they also like to see
what are the most popular artworks
567
00:25:29,322 --> 00:25:31,148
in the Wikisphere?
568
00:25:31,148 --> 00:25:33,417
So, just looking at the site links
in Wikidata--
569
00:25:33,417 --> 00:25:37,781
you can see and rank
all these different artworks there.
570
00:25:39,568 --> 00:25:41,926
Also another thing they'd like to see
571
00:25:41,926 --> 00:25:46,879
is what are the most frequent creators
of content or Met artworks--
572
00:25:46,879 --> 00:25:49,193
what are the most commonly
depicted things.
573
00:25:49,193 --> 00:25:51,982
So, these are very easy
to generate in SPARQL,
574
00:25:51,982 --> 00:25:54,622
you could look at it right there,
using bubble graphs.
575
00:25:54,673 --> 00:25:56,991
Then place of birth
of the most prominent artists,
576
00:25:56,991 --> 00:25:58,814
we have a chart there, as well.
577
00:25:58,814 --> 00:26:01,142
So, structured data on Commons.
578
00:26:01,142 --> 00:26:04,301
I just want to show you very briefly
in case you can't get to Sandra's session,
579
00:26:04,301 --> 00:26:06,226
but you definitely should go
to Sandra's session.
580
00:26:06,226 --> 00:26:10,693
You actually can search in Commons
for a specific Wikibase statement.
581
00:26:11,353 --> 00:26:15,333
I don't always remember the syntax,
but you have burn in your brain
582
00:26:15,333 --> 00:26:19,893
and say, it's haswbstatement:P1343=
583
00:26:19,893 --> 00:26:22,695
whatever-- basically, your last
two parts of the triple.
584
00:26:22,695 --> 00:26:26,162
I always get haswb and wbhas mixed up.
585
00:26:26,162 --> 00:26:28,183
I always get the colon
and the equals mixed up.
586
00:26:28,183 --> 00:26:32,022
So just do it once, remember it,
and you'll get the hang of it.
587
00:26:32,022 --> 00:26:34,772
But simple searches are must faster
than SPARQL queries.
588
00:26:34,772 --> 00:26:36,478
So, if you can just look
for one statement,
589
00:26:36,478 --> 00:26:38,392
boom, you'll get the results.
590
00:26:39,181 --> 00:26:43,711
So, things like this, you can look
for symbolically or semantically,
591
00:26:43,711 --> 00:26:47,511
things that depict
the Met museum, for example.
592
00:26:48,051 --> 00:26:50,051
So, finally, community campaigns.
593
00:26:50,051 --> 00:26:51,681
Richard has been a pioneer in this area.
594
00:26:51,681 --> 00:26:54,071
So, once you have the Wikidata items,
595
00:26:54,071 --> 00:26:57,050
they can actually assist
in creating Wikipedia articles.
596
00:26:57,050 --> 00:26:59,785
So, Richard, why don't you tell us
a little bit about the Mbabel tool
597
00:26:59,785 --> 00:27:01,009
that you created for this.
598
00:27:01,009 --> 00:27:03,192
(Richard) Hi, can I get this on?
599
00:27:04,649 --> 00:27:06,109
(Andrew) Oh, use [Joisey's].
600
00:27:06,109 --> 00:27:08,319
(Richard) It's on, now. I'm good.
601
00:27:08,949 --> 00:27:10,769
So, we had all this information
on Wikidata.
602
00:27:10,769 --> 00:27:13,729
[inaudible] browsing data
on our evenings and weekends
603
00:27:13,729 --> 00:27:15,649
to learn about art-- not everyone does.
604
00:27:15,649 --> 00:27:19,319
We have quite a bit more people
[inaudible] Wikipedia,
605
00:27:19,319 --> 00:27:22,260
so how do we get this information
from Wikidata to Wikipedia?
606
00:27:22,260 --> 00:27:25,289
One of the ways of doing this
is this so-called Mbabel,
607
00:27:25,289 --> 00:27:28,069
which developed with the help
of a lot of people in [inaudible].
608
00:27:28,069 --> 00:27:30,639
People like Martin and others.
609
00:27:31,689 --> 00:27:34,659
So, basically to take
some basic art information,
610
00:27:34,659 --> 00:27:37,688
and use it to populate
a Wikipedia article.
611
00:27:37,688 --> 00:27:40,241
So, by who created this work,
who was the artist,
612
00:27:40,241 --> 00:27:42,313
when it was created, et cetera.
613
00:27:42,313 --> 00:27:44,626
The nice thing about this
is it can generate works.
614
00:27:44,626 --> 00:27:46,210
We started with English Wikipedia,
615
00:27:46,210 --> 00:27:48,608
but it's been developed
in other languages.
616
00:27:48,608 --> 00:27:50,938
So, Portuguese Wikipedia,
our Brazilian friends
617
00:27:50,938 --> 00:27:53,508
who've done a lot of work and taking it
to realms beyond art,
618
00:27:53,508 --> 00:27:57,283
to stuff like elections
and political work as well.
619
00:27:57,283 --> 00:28:01,128
And the nice thing about this
is we can query on Wikidata--
620
00:28:01,758 --> 00:28:06,928
so different artists-- so for example,
we've done projects with Women in Red,
621
00:28:06,928 --> 00:28:08,472
looking at women artists.
622
00:28:08,472 --> 00:28:12,753
Projects related to Wiki Loves Pride,
looking at LGBT-identified artists,
623
00:28:12,753 --> 00:28:14,073
African Diaspora Artists,
624
00:28:14,073 --> 00:28:16,493
and a lot of different groups
and things of time periods,
625
00:28:16,493 --> 00:28:19,293
different collections,
and also looking at articles
626
00:28:19,293 --> 00:28:22,213
that have been and haven't been
translated to different languages.
627
00:28:22,213 --> 00:28:24,923
So all of the articles that haven't
been translated to Arabic yet.
628
00:28:24,923 --> 00:28:28,329
You need to find some interesting articles
maybe that are relevant to a culture
629
00:28:28,329 --> 00:28:30,459
that haven't been translated
into that language yet.
630
00:28:30,459 --> 00:28:32,659
We actually have a number of works
in the Met collection
631
00:28:32,659 --> 00:28:35,199
that are in Wikipedias
that aren't in English yet,
632
00:28:35,199 --> 00:28:37,259
because it's a global collection.
633
00:28:37,769 --> 00:28:40,449
So, there are a lot of ways,
and hopefully, we can spread it around
634
00:28:40,449 --> 00:28:44,709
of creating Wikipedia content, as well,
that is driven by these Wikidata items,
635
00:28:44,709 --> 00:28:47,549
and that also maybe
can help spread the improvement
636
00:28:47,549 --> 00:28:49,529
to Wikidata items, as well, in the future.
637
00:28:49,529 --> 00:28:52,403
(Andrew) And there's a number of folks
here using Mbable already, right?
638
00:28:52,403 --> 00:28:54,124
Who's using Mbable
in the room? Brazilians?
639
00:28:54,124 --> 00:28:58,690
And also, if [Armin] is here,
we have our winner
640
00:28:59,165 --> 00:29:03,146
of the Wikipedia Asia Month,
and Wiki Loves Pride contest.
641
00:29:03,146 --> 00:29:05,720
So, thank you for joining,
and congratulations.
642
00:29:06,493 --> 00:29:09,993
We'll have another Wiki Asia Month
campaign in November.
643
00:29:10,173 --> 00:29:13,383
The way I like to describe it
[inaudible]
644
00:29:13,383 --> 00:29:15,443
It doesn't give you a blank page.
645
00:29:15,443 --> 00:29:16,863
It gives you the skeleton,
646
00:29:16,863 --> 00:29:18,962
which is really a much better
user experience
647
00:29:18,962 --> 00:29:21,472
for edit-a-thons and beginners.
648
00:29:21,472 --> 00:29:23,526
So, it's a lot of great work
that Richard has done,
649
00:29:23,526 --> 00:29:25,841
and people are building on it,
which is awesome.
650
00:29:25,906 --> 00:29:29,066
(woman 3) [inaudible] for some of them,
which is really nice.
651
00:29:29,066 --> 00:29:30,376
Yeah, exactly.
652
00:29:30,376 --> 00:29:32,956
(woman 3) [inaudible]
653
00:29:32,956 --> 00:29:35,815
Right. We should have put a URL here.
654
00:29:35,815 --> 00:29:38,196
(man 8) [inaudible]
655
00:29:38,196 --> 00:29:40,055
Oh, that's right.
We have the link right here.
656
00:29:40,055 --> 00:29:43,725
So if you click-- this is a Listeria list,
it's autogenerating all that for you.
657
00:29:43,725 --> 00:29:46,205
And then, you click on the red link,
it'll create the skeleton,
658
00:29:46,205 --> 00:29:47,491
which is pretty cool.
659
00:29:47,491 --> 00:29:49,172
Alright, we're on the final stretch here.
660
00:29:49,172 --> 00:29:51,990
The tool that we're going
to be announcing--
661
00:29:51,990 --> 00:29:55,047
well, we announced a few weeks ago,
but only to a small set of folks,
662
00:29:55,047 --> 00:29:57,038
but we're making a big splash here,
663
00:29:57,038 --> 00:29:59,345
is the depiction tool
that we just created.
664
00:29:59,345 --> 00:30:05,298
Wikipedia has shown that volunteer
contributors can add a lot of these things
665
00:30:05,298 --> 00:30:06,681
that museums can't.
666
00:30:06,681 --> 00:30:10,263
So, what if we created a tool
that could let you enrich
667
00:30:10,263 --> 00:30:15,907
the metadata about artworks
in terms of the depiction information?
668
00:30:15,907 --> 00:30:19,477
And what we did was we applied
for a grant from the Knight Foundation,
669
00:30:19,477 --> 00:30:22,684
and we created this tool--
and is Edward here?
670
00:30:22,760 --> 00:30:26,590
Edward is our wonderful developer
who in like a month, said,
671
00:30:26,590 --> 00:30:28,050
"Okay, here's a prototype."
672
00:30:28,050 --> 00:30:33,103
After we gave him a specification,
and it's pretty cool.
673
00:30:33,900 --> 00:30:35,849
- So what we can do--
- (applause)
674
00:30:35,849 --> 00:30:37,169
Thanks, Edward.
675
00:30:37,569 --> 00:30:39,269
We're working within collections of items.
676
00:30:39,269 --> 00:30:41,629
So, what we do, is we can
bring up a page like this.
677
00:30:41,629 --> 00:30:44,789
It's no longer looking
at a Wikidata item with a tiny picture.
678
00:30:44,789 --> 00:30:48,484
If we're working with what's depicted
in the image, we want the picture big.
679
00:30:48,484 --> 00:30:51,201
And we don't really have tools
that work with big images.
680
00:30:51,201 --> 00:30:53,348
We have tools that deal
with lexical and typing.
681
00:30:53,348 --> 00:30:56,715
So one of the big things that Edward did
was made a big version of the picture,
682
00:30:56,715 --> 00:30:58,739
scrape whatever you can
from the object page
683
00:30:58,739 --> 00:31:00,633
from a GLAM organization,
give you context.
684
00:31:00,633 --> 00:31:02,773
I can see dogs, children, wigwam.
685
00:31:02,773 --> 00:31:05,782
These are things that direct the user
to add meaningful information.
686
00:31:05,782 --> 00:31:09,024
You have some metadata
that's scraped from the site, too.
687
00:31:09,024 --> 00:31:11,868
Teepee, Comanche--
oh, it's Comanche, not Navajo,
688
00:31:11,868 --> 00:31:13,556
because I know the object page said that.
689
00:31:13,556 --> 00:31:15,702
And you can actually start typing
in the field, there.
690
00:31:15,702 --> 00:31:17,628
And the cool thing is that
it gives you context,
691
00:31:17,628 --> 00:31:19,566
It doesn't just match anything
to Wikidata,
692
00:31:19,566 --> 00:31:23,107
it first matches things that have already
been used in other depiction statements.
693
00:31:23,107 --> 00:31:25,456
Very simple thing,
but what a godsend it is
694
00:31:25,456 --> 00:31:27,166
for folks who have tried this in the past.
695
00:31:27,166 --> 00:31:29,116
Don't give me everything
that matches teepee.
696
00:31:29,116 --> 00:31:33,321
Show me what other paintings
have used teepee in the past.
697
00:31:33,355 --> 00:31:36,175
So, it's interactive, context-driven,
statistics-driven,
698
00:31:36,175 --> 00:31:37,936
by showing you what is matched before.
699
00:31:37,936 --> 00:31:40,336
And the cool thing is once you're done
with that painting,
700
00:31:40,336 --> 00:31:42,196
you can start to work in other areas.
701
00:31:42,196 --> 00:31:44,936
You want to work within the same artist,
the collection, location,
702
00:31:45,876 --> 00:31:47,295
other criteria here.
703
00:31:47,295 --> 00:31:49,146
And you can even browse
through the collections
704
00:31:49,146 --> 00:31:51,582
of different organizations,
just work on their paintings.
705
00:31:51,582 --> 00:31:53,670
So, we wanted people
to not live in Wikidata--
706
00:31:53,670 --> 00:31:56,307
kind of onesy-twosies with items,
but live in a space
707
00:31:56,307 --> 00:31:59,232
where you're looking at artworks
in collections that make sense.
708
00:31:59,683 --> 00:32:01,792
And then, you can actually
look through it visually.
709
00:32:01,792 --> 00:32:04,237
It kind of looks like Krotos
or these other tools,
710
00:32:04,237 --> 00:32:07,726
but you can actually live edit
on Wikidata at the same time.
711
00:32:07,726 --> 00:32:09,104
So, go ahead and try it out.
712
00:32:09,104 --> 00:32:10,609
We've only have 14 users,
713
00:32:10,609 --> 00:32:14,667
but we've had 2,100 paintings worked on,
with 5,000 plus depict statements.
714
00:32:14,667 --> 00:32:16,126
That's pretty good for 14.
715
00:32:16,126 --> 00:32:18,119
So, multiply that by 10--
716
00:32:18,119 --> 00:32:20,515
imagine how many more things
we could do with that.
717
00:32:20,515 --> 00:32:23,797
So, you can go ahead and go
to art.wikidata.link and try out the tool.
718
00:32:23,797 --> 00:32:26,594
It uses OLAF authentication,
and you're off to the races.
719
00:32:26,594 --> 00:32:29,187
And it should be very natural
without any kind of training
720
00:32:29,187 --> 00:32:31,782
to add depiction statements to artworks.
721
00:32:31,837 --> 00:32:35,170
But you can put any object.
We don't restrict the object right now.
722
00:32:35,170 --> 00:32:37,278
So, you could put any Q number
723
00:32:38,468 --> 00:32:41,208
to edit this content if you want.
724
00:32:41,275 --> 00:32:44,645
But we primarily stick with paintings
and 2D artworks, right now.
725
00:32:46,184 --> 00:32:49,405
Okay. You can actually look
at the recent changes
726
00:32:49,405 --> 00:32:52,175
and see who's made edits recently to that.
727
00:32:52,815 --> 00:32:54,855
Okay? Okay, so we're going
to wind it down.
728
00:32:54,855 --> 00:32:58,386
Ooh, one minute, then we'll do some Q&A.
729
00:32:58,915 --> 00:33:03,081
So, the final thing that I think
is useful for museum types especially,
730
00:33:03,081 --> 00:33:07,307
is there's a very famous author
named Nina Simon in the museum world,
731
00:33:07,307 --> 00:33:11,204
where she likes to talk about
how do we go from users,
732
00:33:11,204 --> 00:33:14,968
or I guess your audience,
contributing stuff to your collections
733
00:33:14,968 --> 00:33:18,004
to collaborating around content,
to actually being co-creative
734
00:33:18,004 --> 00:33:19,714
and creating new things.
735
00:33:19,714 --> 00:33:20,984
And that's always been tough.
736
00:33:20,984 --> 00:33:24,154
And I'd like to argue that Wikidata
is this co-creative level.
737
00:33:24,154 --> 00:33:26,914
So, it's not just uploading
a file to Commons,
738
00:33:26,914 --> 00:33:28,234
which is contributing something.
739
00:33:28,234 --> 00:33:31,194
It's not just editing an article
with someone else, which is collaborative.
740
00:33:31,194 --> 00:33:34,833
But we are now seeing these tools
that let you make timelines,
741
00:33:34,833 --> 00:33:36,133
and graphs, and bubble charts.
742
00:33:36,133 --> 00:33:38,833
And this is actually the co-creative part
that's really interesting.
743
00:33:38,833 --> 00:33:40,353
And that's what Wikidata provides you.
744
00:33:40,353 --> 00:33:42,235
Because suddenly,
it's not language dependent--
745
00:33:42,235 --> 00:33:45,146
we've got this database
that's got this rich information in it.
746
00:33:45,946 --> 00:33:48,606
So, it's not just pictures, not just text,
747
00:33:48,606 --> 00:33:50,522
but it's all this rich multimedia
748
00:33:50,522 --> 00:33:52,607
that we have the opportunity to work on.
749
00:33:52,607 --> 00:33:55,851
So, this is just another example
of this connected graph
750
00:33:55,851 --> 00:33:57,389
that you can take a look at later on
751
00:33:57,389 --> 00:33:59,860
to show another example
of The Death of Socrates,
752
00:33:59,860 --> 00:34:02,312
and the different themes
around that painting.
753
00:34:03,252 --> 00:34:05,653
And it's really easy
to make this graph yourself.
754
00:34:05,653 --> 00:34:08,172
So again, another scary graphic
that only makes sense
755
00:34:08,172 --> 00:34:09,822
for Wikidata folks, like you.
756
00:34:09,822 --> 00:34:13,682
You just give it a list of Wikidata items,
and it'll do the rest, that's it.
757
00:34:14,102 --> 00:34:15,662
You'll give the list.
758
00:34:15,705 --> 00:34:17,664
Keep all this code the same.
759
00:34:17,664 --> 00:34:21,364
So, fortunately, Martin and Lucas
helped do all this code here.
760
00:34:21,364 --> 00:34:23,864
Just give it a list of items
and the magic will happen.
761
00:34:23,864 --> 00:34:25,624
Hopefully, it won't blow up your computer,
762
00:34:25,624 --> 00:34:28,755
because you're putting in
a reasonable number of items there.
763
00:34:28,755 --> 00:34:31,593
But as long as you have the screen space,
it'll draw the graph,
764
00:34:31,593 --> 00:34:33,283
which is pretty darn cool.
765
00:34:33,283 --> 00:34:37,223
And then, finally, two tools--
I realized at 2 a.m. last night
766
00:34:37,223 --> 00:34:39,744
a few people said,
"I didn't know about these tools."
767
00:34:39,744 --> 00:34:41,343
And you should know about these tools.
768
00:34:41,343 --> 00:34:44,613
So, one is Recoin, which shows you
the relative completeness of an item
769
00:34:44,613 --> 00:34:46,773
compared to other items
of the same instance.
770
00:34:46,773 --> 00:34:49,473
And then, Cradle, which is a way
to have a forms-based way
771
00:34:49,473 --> 00:34:50,693
to create content.
772
00:34:50,693 --> 00:34:52,453
So, these are very useful for edit-a-thons
773
00:34:52,453 --> 00:34:54,753
where if you know that
you're working with just artworks,
774
00:34:54,753 --> 00:34:57,553
don't just let people create items
with a blank screen.
775
00:34:57,553 --> 00:35:00,275
Give them a form to fill out
to start entering in information
776
00:35:00,275 --> 00:35:01,818
that's structured.
777
00:35:01,818 --> 00:35:04,588
And then, finally, we've gone
through some of this, already.
778
00:35:06,268 --> 00:35:09,539
This is my big chart that I love
to get people's feedback on.
779
00:35:09,539 --> 00:35:14,296
How do we get people
across the chasm to be in this space?
780
00:35:14,328 --> 00:35:16,839
We have a lot of folks who, now,
can do template coding,
781
00:35:16,839 --> 00:35:20,040
spreadsheets, QuickStatements,
SPARQL queries, and then we got--
782
00:35:20,935 --> 00:35:24,259
how do we get people to this side
where we have Python
783
00:35:24,259 --> 00:35:26,694
and the things that can do more
sophisticated editing.
784
00:35:26,694 --> 00:35:28,625
It's really hard
to get people across this.
785
00:35:28,625 --> 00:35:30,785
But I would like to say
it's hard to get people across,
786
00:35:30,785 --> 00:35:32,847
but the content and the technology
is not that hard.
787
00:35:32,847 --> 00:35:35,380
We actually need more people
to learn about regular expressions.
788
00:35:35,380 --> 00:35:38,307
And once you get some kind
of experience here,
789
00:35:38,307 --> 00:35:41,830
you'll find that this is a wonderful world
that you can learn a lot in,
790
00:35:41,830 --> 00:35:44,700
but it does take some time
to get across this chasm.
791
00:35:44,829 --> 00:35:46,289
Yes, James.
792
00:35:46,289 --> 00:35:52,148
(James) [inaudible]
793
00:35:53,127 --> 00:35:57,192
No, what it means is that the graph
is not necessarily accurate
794
00:35:57,192 --> 00:35:59,178
in terms of its data points.
795
00:35:59,308 --> 00:36:03,427
But what it means-- I guess
it's more like this is a valley.
796
00:36:03,786 --> 00:36:06,716
It's like we need to get people
across this valley here.
797
00:36:06,716 --> 00:36:10,146
(woman 4) [inaudible]
798
00:36:10,146 --> 00:36:11,546
I would say this is the key.
799
00:36:11,546 --> 00:36:16,296
If we can get people who know this stuff,
but can grok this stuff,
800
00:36:16,296 --> 00:36:17,918
it gets them to this stuff.
801
00:36:17,918 --> 00:36:19,668
Does that make sense? Yeah.
802
00:36:19,668 --> 00:36:24,155
So, my vision for the next few years,
we can get better training
803
00:36:24,155 --> 00:36:27,516
in our community to get people
from batch processing,
804
00:36:27,516 --> 00:36:29,847
which is pretty much what this is,
to kind of intelligent--
805
00:36:29,847 --> 00:36:32,726
I wouldn't say intelligent,
but more sophisticated programming,
806
00:36:32,726 --> 00:36:35,486
that would be a great thing,
because we're seeing this is a bottleneck
807
00:36:35,486 --> 00:36:37,846
to a lot of the stuff
that I just showed you up there.
808
00:36:37,846 --> 00:36:39,086
Yes.
809
00:36:39,135 --> 00:36:42,105
(man 9) [inaudible]
810
00:36:42,105 --> 00:36:45,984
Okay, wait, you want to show me something,
show me after the session, does that work?
811
00:36:45,984 --> 00:36:47,584
Okay. Yes, Megan.
812
00:36:47,584 --> 00:36:50,804
- (Megan) Can I have a microphone?
- Microphone, yes.
813
00:36:50,834 --> 00:36:54,528
- (Megan) [inaudible]
- Yeah.
814
00:36:55,316 --> 00:36:56,636
And we have lunch after this,
815
00:36:56,636 --> 00:36:59,006
so if you want to stay
a little bit later, that's fine, too.
816
00:36:59,006 --> 00:37:01,009
- [inaudible]
- We're already at lunch break? Okay.
817
00:37:01,009 --> 00:37:03,094
(Megan) So, thank you so much
to both you and Richard
818
00:37:03,094 --> 00:37:04,799
for all the work you're doing at the Met.
819
00:37:04,799 --> 00:37:07,027
And I know that you're
very well supported in that.
820
00:37:07,027 --> 00:37:09,100
(mic feedback)
I don't know what happened there.
821
00:37:09,100 --> 00:37:15,071
For the average volunteer community,
how do you balance doing the work
822
00:37:15,071 --> 00:37:19,124
for the cultural heritage organization
versus training the professionals
823
00:37:19,124 --> 00:37:21,792
that are there to do that work?
824
00:37:21,792 --> 00:37:24,412
Where do you find the balance
in terms of labor?
825
00:37:25,672 --> 00:37:26,962
It's a good question.
826
00:37:27,397 --> 00:37:30,467
(Megan) One that really comes up,
I think, with this as well.
827
00:37:30,467 --> 00:37:33,158
- With this?
- (Megan) Yeah, and with building out...
828
00:37:33,187 --> 00:37:36,277
where we put efforts in terms
of building out competencies.
829
00:37:36,333 --> 00:37:39,398
Yeah. I don't have a great answer for you,
but it's a great question.
830
00:37:39,398 --> 00:37:40,658
(Megan) Cool.
831
00:37:40,658 --> 00:37:43,580
(Richard) There are a lot
of tech people at [inaudible]
832
00:37:43,580 --> 00:37:46,158
who understand this side of the graph,
and don't understand it--
833
00:37:46,158 --> 00:37:48,878
the people in [inaudible]
who understand this part of the graph,
834
00:37:48,878 --> 00:37:50,658
and don't understand
this part of the graph.
835
00:37:50,658 --> 00:37:53,928
So, the more we can get Wikimedians
who understand some of this,
836
00:37:53,928 --> 00:37:57,748
with some tech professionals at museums
who understand this,
837
00:37:57,748 --> 00:37:59,408
then that makes it a little bit easier--
838
00:37:59,408 --> 00:38:01,968
and hopefully, as well as
training up Wikimedians,
839
00:38:01,968 --> 00:38:05,587
we can also provide some guidance
and let the museums [inaudible]
840
00:38:05,587 --> 00:38:07,438
to take care of themselves
in the [inaudible].
841
00:38:07,496 --> 00:38:09,285
Yeah, that's a good point.
842
00:38:09,285 --> 00:38:11,961
How many people here know
what regular expressions are?
843
00:38:11,961 --> 00:38:13,216
Raise your hand.
844
00:38:13,216 --> 00:38:17,397
Okay, so how many people are comfortable
specifying a regular expression?
845
00:38:17,397 --> 00:38:19,267
So, yeah, we need more work here.
846
00:38:19,267 --> 00:38:20,771
(laughter)
847
00:38:20,771 --> 00:38:23,199
(man 10) I want to suggest that--
848
00:38:24,648 --> 00:38:28,575
maybe not getting
every Wikidata practitioner,
849
00:38:28,575 --> 00:38:33,607
or institution practitioner
to embrace Python programming is the way.
850
00:38:33,717 --> 00:38:39,657
But as Richard just said, finding more
bridging people-- people like you--
851
00:38:39,657 --> 00:38:41,137
who speak both--
852
00:38:41,137 --> 00:38:44,042
who speak Python,
but also speak GLAM institution--
853
00:38:44,812 --> 00:38:48,392
to help the GLAM's own
technical department, which may not--
854
00:38:49,233 --> 00:38:51,951
they know Python,
they don't know this stuff.
855
00:38:52,640 --> 00:38:54,186
That's, I think, what's needed.
856
00:38:54,235 --> 00:38:59,034
People like you, people like me,
people who speak both of these jargons
857
00:38:59,034 --> 00:39:01,835
to help make the connections,
to document the connections.
858
00:39:01,835 --> 00:39:03,344
You're already doing this, of course.
859
00:39:03,344 --> 00:39:05,534
You share your code, et cetera,
you're doing tutorials.
860
00:39:05,534 --> 00:39:07,044
But we need more of this.
861
00:39:07,044 --> 00:39:09,223
I'm not sure we need
to make everyone programmers.
862
00:39:09,223 --> 00:39:10,612
We already have programmers.
863
00:39:10,612 --> 00:39:12,332
We need to make them understand
864
00:39:12,332 --> 00:39:14,612
the non-programming
material they need to--
865
00:39:14,612 --> 00:39:15,782
I think that's a great point.
866
00:39:15,782 --> 00:39:18,062
We don't need to make everyone
highly proficient in this,
867
00:39:18,062 --> 00:39:20,312
but we do need people
knowledgeable to say that,
868
00:39:20,312 --> 00:39:23,004
"Yeah, we can ingest 400 thousand rows
and do something with it."
869
00:39:23,004 --> 00:39:25,284
Whereas, if you're stuck
on this side, you're like,
870
00:39:25,284 --> 00:39:27,444
"400 thousand rows
sounds really big and scary."
871
00:39:27,444 --> 00:39:30,364
But if you know that it's possible,
you're like, "No problem."
872
00:39:30,364 --> 00:39:32,284
400 thousand is not a problem.
873
00:39:32,284 --> 00:39:35,414
(woman 5) I would just like to chime in
a little bit in that
874
00:39:35,414 --> 00:39:39,674
that there may be countries and areas
where you will not find a GLAM
875
00:39:39,674 --> 00:39:44,404
with any skilled technologists.
876
00:39:44,434 --> 00:39:47,834
So, you will have to invent
something there in the middle.
877
00:39:48,502 --> 00:39:49,634
That's a good point.
878
00:39:49,778 --> 00:39:51,378
Any questions? Sandra.
879
00:39:55,648 --> 00:39:57,807
(Sandra) Yeah, I just wanted
to add to this discussion.
880
00:39:57,807 --> 00:40:01,656
Actually, I've seen some very good cases
where it indeed has been successful
881
00:40:01,656 --> 00:40:05,476
to train GLAM professionals to work
with this entire environment,
882
00:40:05,476 --> 00:40:09,276
and where they've done fantastic jobs,
also at small institutions.
883
00:40:10,046 --> 00:40:14,986
It also requires that you have chapters
or volunteers that can train the staff.
884
00:40:15,163 --> 00:40:17,513
So, it's really like a bigger environment.
885
00:40:18,192 --> 00:40:22,044
But I think that's a model
that if we can manage to make that grow,
886
00:40:22,044 --> 00:40:24,263
it can scale very well, I think.
887
00:40:24,673 --> 00:40:25,693
Good point.
888
00:40:25,693 --> 00:40:30,896
(woman 5) [inaudible]
889
00:40:32,029 --> 00:40:34,217
Sorry, just noting that we don't have
890
00:40:34,217 --> 00:40:37,820
any structured trainings
right now for that.
891
00:40:38,209 --> 00:40:42,498
We might want to develop those,
and that would be helpful.
892
00:40:42,608 --> 00:40:44,408
We have been doing that for education
893
00:40:44,408 --> 00:40:47,488
in terms of teaching people
Wikipedia and Wikidata.
894
00:40:47,488 --> 00:40:50,008
It's just a matter of taking it
one step further.
895
00:40:50,528 --> 00:40:52,168
Right. Stacy.
896
00:40:54,518 --> 00:40:56,988
(Stacy) Well, I'd just like to say
that a lot of professionals
897
00:40:56,988 --> 00:41:02,006
who work in this area of metadata
have all these skills already.
898
00:41:02,006 --> 00:41:08,966
So, I think part of it is just proving
the value to these organizations,
899
00:41:08,966 --> 00:41:13,126
but then it's also tapping
into professional associations who can--
900
00:41:13,195 --> 00:41:16,745
or ways of collaborating within
those professional communities
901
00:41:16,745 --> 00:41:21,374
to build this work, and the documentation
on how to do things
902
00:41:21,374 --> 00:41:23,234
is really, really important,
903
00:41:23,234 --> 00:41:27,454
because I'm not sure about the role
of depending on volunteers,
904
00:41:27,454 --> 00:41:32,294
when some of this work is actually work
GLAM organizations do anyway.
905
00:41:32,395 --> 00:41:35,355
We manage our collections
in a variety of ways through metadata,
906
00:41:35,355 --> 00:41:37,126
and this is actually one more way.
907
00:41:37,126 --> 00:41:40,495
So, should we also not be thinking
about ways to integrate this work
908
00:41:40,495 --> 00:41:43,946
into a GLAM professional's regular job.
909
00:41:43,985 --> 00:41:46,125
And then that way you're generating--
910
00:41:46,125 --> 00:41:48,885
and when you think
about sustainability and scalability,
911
00:41:48,885 --> 00:41:53,426
that's the real trick to making this
sustainable and both scalable,
912
00:41:53,745 --> 00:41:58,695
is that once this is the regular
work of GLAM folks,
913
00:41:58,695 --> 00:42:00,885
we're not worried as much about this part,
914
00:42:00,885 --> 00:42:03,503
because it's just turning
that little switch to get this
915
00:42:03,503 --> 00:42:05,763
to be a part of that work.
916
00:42:05,863 --> 00:42:08,063
Right. Good point. [Shani]?.
917
00:42:11,603 --> 00:42:13,229
(Shani) You're absolutely right.
918
00:42:13,229 --> 00:42:16,122
But I want to echo what you said before.
919
00:42:16,152 --> 00:42:21,566
And yes, Susana-- this might work
for more privileged countries
920
00:42:22,082 --> 00:42:25,042
where they have money,
they have people doing it.
921
00:42:25,682 --> 00:42:29,042
It doesn't work for places
that are still developing,
922
00:42:29,042 --> 00:42:32,282
that don't have resources--
they don't have all of that.
923
00:42:32,592 --> 00:42:36,832
And they can barely do
what they need to do.
924
00:42:36,886 --> 00:42:41,066
So, it's difficult for them, and then,
the community is really helpful.
925
00:42:41,906 --> 00:42:45,495
These are the cases where the community
can have a huge impact actually,
926
00:42:45,985 --> 00:42:50,349
working with the GLAMS,
because they can't do it all
927
00:42:50,979 --> 00:42:52,296
as part of their jobs.
928
00:42:52,834 --> 00:42:55,034
So, we need to think about that as well.
929
00:42:55,053 --> 00:42:58,223
And having these examples,
actually, is hugely important,
930
00:42:58,223 --> 00:43:00,763
because it's helping
to still convince them,
931
00:43:00,763 --> 00:43:05,842
that it's critical to invest in it
and to work with volunteers,
932
00:43:05,842 --> 00:43:09,082
so, with non-professionals
of sorts, to get there.
933
00:43:10,003 --> 00:43:12,650
I can imagine a future where
you don't have to know all this code.
934
00:43:12,650 --> 00:43:14,379
These would just be
kind of like Lego bricks
935
00:43:14,379 --> 00:43:15,801
you can slap together,
936
00:43:15,801 --> 00:43:18,761
saying, "Here's my database.
Here's the crosswalk. Here's Wikidata,"
937
00:43:18,761 --> 00:43:21,311
and just put it together,
and you don't have to even code,
938
00:43:21,311 --> 00:43:23,835
you just have to make sure
the databases are in the right place.
939
00:43:23,835 --> 00:43:25,375
Yep. Okay.
940
00:43:26,747 --> 00:43:28,705
(man 11) Sorry. [inaudible]
941
00:43:28,705 --> 00:43:34,025
I think if I would have done this project,
I'd probably have done it the same way.
942
00:43:34,025 --> 00:43:36,146
So, I think that's maybe a good sign.
943
00:43:36,146 --> 00:43:39,725
I was wondering how did
the whole financing work of this project?
944
00:43:39,725 --> 00:43:40,840
How did the-- I'm sorry?
945
00:43:40,840 --> 00:43:43,255
The financing of this project work.
946
00:43:43,795 --> 00:43:45,755
- The financing?
- Yeah, the money.
947
00:43:46,425 --> 00:43:47,505
That's a good question.
948
00:43:47,505 --> 00:43:49,185
Well, so, there are different parts of it.
949
00:43:49,185 --> 00:43:53,073
So, the Knight grant funded
the Wiki Art Depiction Explorer.
950
00:43:53,198 --> 00:43:56,928
But I, for the last, maybe what--
nine months--
951
00:43:56,928 --> 00:43:58,768
I've been their Wikimedia strategist.
952
00:43:58,768 --> 00:44:01,618
So, I've been on
since February of this year.
953
00:44:01,618 --> 00:44:04,818
So, that's pretty much they're paying
for my time to help with their--
954
00:44:04,818 --> 00:44:07,968
not only the upload of their collections,
but developing these tools, as well.
955
00:44:07,968 --> 00:44:11,659
- (Richard) So the Met's paying you?
- Yeah, that's right.
956
00:44:11,762 --> 00:44:14,894
(Richard) The grant, at least part
of it has come from--
957
00:44:14,894 --> 00:44:16,959
There was a grant for Open Access.
958
00:44:16,959 --> 00:44:20,176
And this is under that campaign
and with the digital department.
959
00:44:20,176 --> 00:44:24,297
So, working as contractors throughout
the Open Access campaign for the Met.
960
00:44:27,948 --> 00:44:30,116
(man 12) I'm sorry.
I guess before you were hired,
961
00:44:30,116 --> 00:44:31,313
and before there was a grant,
962
00:44:31,313 --> 00:44:33,780
there was probably a lot
of volunteer work done to make sure--
963
00:44:33,780 --> 00:44:35,303
Richard did a lot of work before that.
964
00:44:35,303 --> 00:44:37,219
And then, Wikimedia New York
did a lot of work,
965
00:44:37,219 --> 00:44:38,927
but it was kind of in bursts.
966
00:44:38,927 --> 00:44:41,045
It wasn't as comprehensive
as we're talking about now
967
00:44:41,045 --> 00:44:45,915
in terms of having-- making sure
those two layers are complete
968
00:44:45,915 --> 00:44:47,310
in Wikidata.
969
00:44:48,640 --> 00:44:50,543
Alright, yeah. I think that's it.
970
00:44:50,543 --> 00:44:53,843
So, I'm happy to talk after lunch,
or after the break, if you want.
971
00:44:54,683 --> 00:44:56,223
Okay. Thank you.
972
00:44:56,223 --> 00:44:59,197
(applause)