1 00:00:07,138 --> 00:00:08,288 Thanks folks. 2 00:00:09,627 --> 00:00:11,991 As I mentioned before, you can load up the slides here 3 00:00:11,991 --> 00:00:16,661 by either the QR code or the short URL, which is wikidatacon..., this is bit.ly, 4 00:00:16,661 --> 00:00:19,920 wikidatacon19glamstrategies. 5 00:00:19,980 --> 00:00:22,040 And the slides are also on the program page 6 00:00:22,040 --> 00:00:24,520 on the WikidataCon site. 7 00:00:24,549 --> 00:00:27,269 And then, there's also an Etherpad here that you can click on. 8 00:00:27,269 --> 00:00:28,959 So, I'll be talking about a lot of things. 9 00:00:28,959 --> 00:00:31,629 that you might have heard about it at Wikimania, if you were there, 10 00:00:31,629 --> 00:00:34,089 but we are going to go into a lot more implementation details. 11 00:00:34,089 --> 00:00:36,209 Because we're at WikidataCon, we can dive deeper 12 00:00:36,209 --> 00:00:38,430 into the Wikidata and technical aspects. 13 00:00:38,430 --> 00:00:41,821 But Richard and myself, we are working at the Met Museum right now 14 00:00:41,821 --> 00:00:43,200 and their Open Access. 15 00:00:43,200 --> 00:00:45,320 If you didn't know, about two plus years ago, 16 00:00:45,320 --> 00:00:46,920 entering to the third year, 17 00:00:46,920 --> 00:00:49,320 there's been an Open Access strategy at the Met, 18 00:00:49,320 --> 00:00:52,763 where they're releasing their images under CC0 license and their metadata. 19 00:00:52,763 --> 00:00:54,639 And one of the things they brought us on to do 20 00:00:54,639 --> 00:00:58,409 is what things could we imagine doing with this Open Access content. 21 00:00:58,409 --> 00:01:00,469 So, we're going to talk a little bit about that 22 00:01:00,469 --> 00:01:02,598 in terms of the experiments that we've been running, 23 00:01:02,598 --> 00:01:04,044 and we'd love to hear your feedback. 24 00:01:04,044 --> 00:01:07,028 So, I hope to talk about 20 minutes, and then hope to get some conversation 25 00:01:07,028 --> 00:01:09,853 with you folks, since we have a lot of knowledge in this room. 26 00:01:09,923 --> 00:01:12,472 This is the announcement, and actually the one-year anniversary, 27 00:01:12,472 --> 00:01:16,452 where Katherine Maher was actually there, at the Met to talk about that anniversary. 28 00:01:16,452 --> 00:01:19,172 So, one of the things that's challenging I think for a lot of folks 29 00:01:19,172 --> 00:01:21,097 is how do you explain Wikidata, 30 00:01:21,097 --> 00:01:23,911 and this GLAM contribution strategy to Wikidata 31 00:01:23,911 --> 00:01:27,102 to C-level folks at an organization. 32 00:01:27,102 --> 00:01:31,392 We can talk about it with data scientists, Wikimedians, librarians, maybe curators, 33 00:01:31,392 --> 00:01:34,452 but when it comes to talking about this with a director of a museum, 34 00:01:34,452 --> 00:01:36,862 or a director of a library, what does it actually-- 35 00:01:36,862 --> 00:01:38,482 how does it resonate with them? 36 00:01:38,482 --> 00:01:41,352 So, one way that we actually talked about that I think makes sense, 37 00:01:41,352 --> 00:01:43,978 is everyone knows about Wikipedia, 38 00:01:43,978 --> 00:01:47,799 and for the English language edition, 39 00:01:47,799 --> 00:01:49,733 at least, we're talking about 6 million articles. 40 00:01:49,733 --> 00:01:51,792 And it sounds like a lot, but if you think about it, 41 00:01:51,792 --> 00:01:54,361 Wikipedia is not really the sum of all human knowledge, 42 00:01:54,361 --> 00:01:59,512 it's the sum of all reliably sourced, mostly western knowledge. 43 00:02:00,281 --> 00:02:02,211 And there's a lot of stuff out there. 44 00:02:02,211 --> 00:02:04,141 We have a lot of stuff in Commons already-- 45 00:02:04,141 --> 00:02:07,382 56 million media files going up every single day-- 46 00:02:07,382 --> 00:02:11,484 but these are very... a different type of standard 47 00:02:11,484 --> 00:02:13,011 to what goes into Wikimedia Commons. 48 00:02:13,011 --> 00:02:16,431 And the way that we have described Wikidata to GLAM professionals, 49 00:02:16,431 --> 00:02:18,231 and especially the C levels, 50 00:02:18,231 --> 00:02:22,061 is that what if we could have a repository that has a notability bar 51 00:02:22,061 --> 00:02:24,381 that is not as high as Wikipedia. 52 00:02:24,381 --> 00:02:26,001 So, we want all these paintings, 53 00:02:26,001 --> 00:02:28,161 but not every painting necessarily needs an article. 54 00:02:28,581 --> 00:02:30,241 Wikipedia is held back by the fact 55 00:02:30,241 --> 00:02:33,082 that you need to have language editions of Wikipedia. 56 00:02:33,171 --> 00:02:36,681 So, can we store the famous thing-- things, not strings. 57 00:02:36,681 --> 00:02:40,570 Can we be object oriented and not really lexical oriented? 58 00:02:40,570 --> 00:02:42,181 And can we store this in a database 59 00:02:42,181 --> 00:02:44,540 that stores facts, figures, and relationships? 60 00:02:44,540 --> 00:02:46,291 And that's pretty much what Wikidata does. 61 00:02:46,711 --> 00:02:50,736 And Wikidata is also a universal kind of crosswalk database to links 62 00:02:50,736 --> 00:02:52,321 to other collections out there. 63 00:02:52,321 --> 00:02:55,119 So, we think this really resonates with folks when you're talking about 64 00:02:55,119 --> 00:02:58,596 what is the value of Wikidata compared to what they're normally familiar with, 65 00:02:58,596 --> 00:03:00,326 which is just Wikipedia. 66 00:03:01,346 --> 00:03:02,876 Alright, so what are the benefits? 67 00:03:02,876 --> 00:03:05,086 You're interlinking your collections with others. 68 00:03:05,086 --> 00:03:07,676 So, unfortunately, I apologize to librarians here, 69 00:03:07,676 --> 00:03:09,337 I'll be talking mostly about museums, 70 00:03:09,337 --> 00:03:11,816 but a lot of this also is valid also for libraries. 71 00:03:11,816 --> 00:03:15,867 But you're basically connecting your collection with the global collection 72 00:03:15,867 --> 00:03:18,166 of linked open data collections. 73 00:03:18,846 --> 00:03:22,276 You can also receive enriched and improved metadata back 74 00:03:22,276 --> 00:03:25,656 after contributing and linking your collections to the world. 75 00:03:25,656 --> 00:03:28,436 And there are some pretty neat interactive multimedia applications 76 00:03:28,436 --> 00:03:30,596 that you get-- I don't want to say for free, 77 00:03:30,596 --> 00:03:33,596 but your collection in Wikidata allows you to visualize things 78 00:03:33,596 --> 00:03:35,276 that you've never seen before. 79 00:03:35,276 --> 00:03:36,776 We'll show you some examples. 80 00:03:36,776 --> 00:03:39,737 And so, how do you convey this to GLAM professionals effectively? 81 00:03:39,737 --> 00:03:41,746 Well, I usually like to start with storytelling, 82 00:03:41,746 --> 00:03:43,536 and not technical explanations. 83 00:03:43,536 --> 00:03:46,368 Okay, so if everyone here has a cell phone, 84 00:03:46,368 --> 00:03:49,574 especially if you have an iPhone, I want you to scan this QR code 85 00:03:49,574 --> 00:03:51,645 and bring up the URL that it comes up with. 86 00:03:51,645 --> 00:03:53,393 Or if you don't have a QR scanner, 87 00:03:53,393 --> 00:03:58,963 just type in w.wiki/Aij in a web browser. 88 00:04:00,036 --> 00:04:01,942 So go ahead and scan that. 89 00:04:03,280 --> 00:04:04,864 And what comes up? 90 00:04:06,778 --> 00:04:09,458 Does anyone see a knowledge graph pop up on your screen? 91 00:04:09,516 --> 00:04:11,156 So, for folks here in WikidataCon, 92 00:04:11,156 --> 00:04:13,266 this is probably not revolutionary for you. 93 00:04:13,266 --> 00:04:16,386 But what it does, it does a SPARQL query with these objects, 94 00:04:16,386 --> 00:04:18,836 and it shows the linkages between them. 95 00:04:18,836 --> 00:04:20,897 And you can actually drag them around the screen. 96 00:04:20,897 --> 00:04:22,204 You can actually click on nodes. 97 00:04:22,204 --> 00:04:24,458 If you're [inaudible] in a mobile, it will expand that-- 98 00:04:24,458 --> 00:04:27,554 you can actually start to surf through Wikidata this way. 99 00:04:27,554 --> 00:04:29,741 So, for Wikidata veterans this is pretty cool. 100 00:04:29,741 --> 00:04:31,206 One shot, you get this. 101 00:04:31,206 --> 00:04:33,313 For a lot folks who have never seen Wikidata before, 102 00:04:33,313 --> 00:04:35,574 this is a revolutionary moment for them. 103 00:04:36,176 --> 00:04:39,236 To actually hand-manipulate a knowledge graph, 104 00:04:39,236 --> 00:04:42,186 and to start surfing through Wikidata without having to know SPARQL, 105 00:04:42,186 --> 00:04:43,823 without having to know what a Q item is, 106 00:04:43,823 --> 00:04:45,860 without having to know what a property proposal is, 107 00:04:45,860 --> 00:04:48,623 they can suddenly start seeing connections in a way that is magical. 108 00:04:48,623 --> 00:04:50,264 Hey, I see [Jacob's] here. 109 00:04:50,264 --> 00:04:52,143 Jacob's been using some of this code, as well. 110 00:04:52,143 --> 00:04:54,443 So, this is some code that we'll talk about later on 111 00:04:54,443 --> 00:04:57,254 that allows you to create these visualizations in Wikidata. 112 00:04:57,254 --> 00:04:59,283 And we've really seen this turn a lot of heads 113 00:04:59,283 --> 00:05:01,408 who have really never gotten Wikidata before. 114 00:05:01,408 --> 00:05:04,653 But after seeing these interactive knowledge graphs, they get it. 115 00:05:04,653 --> 00:05:06,233 They understand the power of this. 116 00:05:06,233 --> 00:05:08,293 And especially this example here, 117 00:05:08,293 --> 00:05:11,304 this was a really big eye-opener for the folks at the Met, 118 00:05:11,304 --> 00:05:14,545 because this is the artifact that is the center of this graph, 119 00:05:14,545 --> 00:05:17,823 right there, the Portrait of Madame X, a very famous portrait. 120 00:05:17,823 --> 00:05:20,982 And they did not even know that this was the inspiration 121 00:05:20,982 --> 00:05:24,693 for the black dress that Rita Hayworth wore in the movie Gilda. 122 00:05:24,693 --> 00:05:26,783 So, just by seeing this graph, they said, 123 00:05:26,783 --> 00:05:29,353 "Wait a minute. This is one of our most visited portraits. 124 00:05:29,353 --> 00:05:31,683 I didn't know that this was true." 125 00:05:31,683 --> 00:05:35,214 And there's actually two other books published about that painting. 126 00:05:35,214 --> 00:05:38,983 You can see all these things, not just within the realm of GLAM, 127 00:05:38,983 --> 00:05:41,441 but it extends to fashion, it extends to literature. 128 00:05:41,441 --> 00:05:43,381 You're starting to see the global connections 129 00:05:43,381 --> 00:05:47,481 that your artworks have, or your collections have via Wikidata. 130 00:05:48,722 --> 00:05:50,342 So, how do we do this? 131 00:05:50,842 --> 00:05:53,098 If you can remember nothing else from this presentation, 132 00:05:53,098 --> 00:05:56,432 this one page is your one-stop shopping. 133 00:05:56,432 --> 00:05:58,592 Now, fortunately, you don't have to memorize all this. 134 00:05:58,592 --> 00:06:03,292 It's actually right here at Wikidata:Linked_open_data_workflow. 135 00:06:03,560 --> 00:06:06,170 So, we'll be talking about some of these different phases 136 00:06:06,170 --> 00:06:10,670 of how you first prepare, reconcile, and examine 137 00:06:11,160 --> 00:06:14,190 what the GLAM organization might have and what does Wikidata have. 138 00:06:14,190 --> 00:06:15,374 And then, what are the tools 139 00:06:15,374 --> 00:06:18,664 to actually ingest and correct or enrich that 140 00:06:18,664 --> 00:06:20,241 once it's in Wikidata. 141 00:06:20,241 --> 00:06:22,691 And then, what are some of ways to reuse that content, 142 00:06:22,691 --> 00:06:25,161 or to report and create new things out of it. 143 00:06:25,161 --> 00:06:31,191 So, this is the simpler version of a chart that Sandra and the GLAM folks 144 00:06:31,191 --> 00:06:33,111 at the foundation have created. 145 00:06:33,111 --> 00:06:35,534 But this is trying to sum up, in one shot-- 146 00:06:35,534 --> 00:06:38,133 because we know how hard things are to find in Wikidata-- 147 00:06:38,133 --> 00:06:41,733 to find in one shot all the different tools you should pay attention to 148 00:06:41,733 --> 00:06:43,475 as a GLAM organization. 149 00:06:44,969 --> 00:06:50,606 So, just using the Met as an example, we started with what is the ideal object 150 00:06:50,606 --> 00:06:53,398 that we have in Wikidata that comes from the Met? 151 00:06:53,398 --> 00:06:55,882 This is a typical shot of a Wikidata item, 152 00:06:55,882 --> 00:06:57,385 in the mobile mode there. 153 00:06:57,385 --> 00:06:59,244 And this is one of the more famous paintings 154 00:06:59,244 --> 00:07:00,729 we used as a model, here. 155 00:07:00,729 --> 00:07:03,315 We have the label, description, and aliases. 156 00:07:03,915 --> 00:07:05,225 And then, we found out, 157 00:07:05,225 --> 00:07:07,035 "What are the core statements that we wanted?" 158 00:07:07,035 --> 00:07:10,035 We wanted instance of, image, inception, collection. 159 00:07:10,035 --> 00:07:13,239 And what are some other properties we would like if we had it? 160 00:07:13,239 --> 00:07:15,960 Depiction information, material used, things like that. 161 00:07:16,879 --> 00:07:19,369 We actually do have an identifier. 162 00:07:19,369 --> 00:07:22,199 The Met object ID is P3634. 163 00:07:22,199 --> 00:07:24,629 So, for some organizations, you might want to propose 164 00:07:24,629 --> 00:07:28,529 a property just to track your items using an object ID. 165 00:07:29,369 --> 00:07:31,899 And then, for the Met, just trying to circumscribe 166 00:07:31,899 --> 00:07:35,519 what objects do we want to upload and keep in Wikidata-- 167 00:07:35,519 --> 00:07:38,927 the thing that we first identified were collection highlights. 168 00:07:38,927 --> 00:07:43,649 These are like a hand-selected set of 1,500 to 1,000 items 169 00:07:43,678 --> 00:07:48,878 that were going to be given priority to upload to Wikidata. 170 00:07:48,939 --> 00:07:51,709 So, Richard and the crew out of Wikimedia in New York 171 00:07:51,709 --> 00:07:53,105 did a lot of this early work. 172 00:07:53,105 --> 00:07:55,571 And then, now, we're systematically going through to make sure 173 00:07:55,571 --> 00:07:56,689 they're all complete. 174 00:07:56,689 --> 00:07:58,221 And there's a secondary set 175 00:07:58,221 --> 00:08:01,390 called the Heilbrunn Timeline of Art History-- about 8,000 items 176 00:08:01,390 --> 00:08:07,149 that are seminal pieces of work, artists' works throughout history. 177 00:08:07,149 --> 00:08:09,499 And there are about 8,000 that the Met has identified, 178 00:08:09,499 --> 00:08:11,812 and we're also putting that on Wikidata, as well, 179 00:08:11,812 --> 00:08:13,143 using a different destination. 180 00:08:13,143 --> 00:08:16,271 Here, described by source-- Heilbrunn Timeline of Art History. 181 00:08:16,271 --> 00:08:19,841 So, the collection highlight is denoted here as collection-- 182 00:08:19,841 --> 00:08:21,265 Metropolitan Museum of Art, 183 00:08:21,265 --> 00:08:22,976 subject has role collection highlight. 184 00:08:22,976 --> 00:08:26,872 And then, these 8,000 or so are like that in Wikidata. 185 00:08:29,741 --> 00:08:33,816 I couldn't show this chart at Wikimania, because it's too complicated. 186 00:08:33,816 --> 00:08:35,389 But WikidataCon, we can. 187 00:08:35,389 --> 00:08:38,845 So, this is something that is really hard to answer sometimes. 188 00:08:39,490 --> 00:08:42,169 What makes something in Wikidata from the Met, 189 00:08:42,169 --> 00:08:44,658 or from the New York Public Library, or from your organization? 190 00:08:44,658 --> 00:08:47,609 And the answer is not easy. It's: depends. 191 00:08:47,644 --> 00:08:49,684 It's complicated, it can be multi-factor. 192 00:08:49,684 --> 00:08:53,254 So, you could say, "Well, if I had an object ID in Wikidata, 193 00:08:53,254 --> 00:08:54,804 that is an embed object." 194 00:08:54,804 --> 00:08:56,674 But maybe someone didn't enter that. 195 00:08:56,674 --> 00:08:59,924 Maybe they only put in Collection: Met which is P195, 196 00:08:59,924 --> 00:09:02,684 or they put in the accession number, 197 00:09:02,684 --> 00:09:06,984 and they put collection as the qualifier to that accession number. 198 00:09:06,984 --> 00:09:11,454 So, there's actually, one, two, three different ways to try to find Met objects. 199 00:09:11,454 --> 00:09:14,214 And probably the best way to do it is through a union like this. 200 00:09:14,214 --> 00:09:16,173 So, you combine all three, and you come back, 201 00:09:16,173 --> 00:09:18,064 and you make a list out of it. 202 00:09:18,064 --> 00:09:20,813 So unfortunately, there is no one clean query 203 00:09:20,813 --> 00:09:23,684 that'll guarantee you all the Met objects. 204 00:09:23,684 --> 00:09:27,873 This is probably the best approach for this. 205 00:09:27,873 --> 00:09:29,384 And for some institutions, 206 00:09:29,384 --> 00:09:32,505 they're probably doing something similar to that right now. 207 00:09:32,505 --> 00:09:35,824 Alright, so example here, is that what you see here 208 00:09:35,824 --> 00:09:39,684 manifests itself differently-- not differently, but as this in a query, 209 00:09:39,684 --> 00:09:40,904 which can get pretty complex. 210 00:09:40,904 --> 00:09:43,063 So, if we're looking for all the collection highlights, 211 00:09:43,063 --> 00:09:47,713 we'd break this out into the statement and then the qualifier as this: 212 00:09:47,782 --> 00:09:49,712 subject has role collection highlight. 213 00:09:49,712 --> 00:09:51,450 So, that's one way that we sort out 214 00:09:51,450 --> 00:09:54,124 some of these special designations in Wikidata. 215 00:09:55,166 --> 00:09:58,716 So, the summary is, representing "The Met" is multifaceted, 216 00:09:58,716 --> 00:10:01,536 and needs to balance simplicity and findability. 217 00:10:01,536 --> 00:10:04,896 How many people here have heard of Sum of All Paintings as a project? 218 00:10:04,995 --> 00:10:07,088 Ooh, God, good, a lot of you! 219 00:10:07,088 --> 00:10:09,105 So, it's probably one of the most active ones 220 00:10:09,105 --> 00:10:10,525 that deals with these issues. 221 00:10:10,525 --> 00:10:17,057 So, we always debate whether we should model things super-accurately, 222 00:10:17,057 --> 00:10:19,815 or should you model things so that they're findable. 223 00:10:19,815 --> 00:10:21,997 These are kind of at odds with each other. 224 00:10:21,997 --> 00:10:24,232 So, we usually prefer findability. 225 00:10:24,232 --> 00:10:27,001 It's no good if it's perfectly modeled, but no one can ever find it, 226 00:10:27,001 --> 00:10:30,013 because it's so strict in terms of how it's defined at Wikidata. 227 00:10:30,013 --> 00:10:31,882 And then, we have some challenges. 228 00:10:31,882 --> 00:10:35,367 Multiple artifacts might be tied to one object ID, 229 00:10:35,367 --> 00:10:37,396 which might be different in Wikidata. 230 00:10:37,396 --> 00:10:42,097 And then, mapping the Met classification to instances has some complex cases. 231 00:10:42,097 --> 00:10:44,282 So, the way that the Met classifies things 232 00:10:44,282 --> 00:10:46,775 doesn't always fit with how Wikidata classifies things. 233 00:10:46,775 --> 00:10:49,982 So, we show you some examples here of how this works. 234 00:10:49,982 --> 00:10:53,602 So, this is a great example of using a Python library 235 00:10:53,602 --> 00:10:56,487 to actually ingest what we know from the Met, 236 00:10:56,487 --> 00:10:58,313 and then try to sort out what they have. 237 00:10:58,313 --> 00:10:59,887 So, this is just for textiles. 238 00:10:59,887 --> 00:11:02,076 You can see that they got a lot of detail here 239 00:11:02,076 --> 00:11:05,399 in terms of woven textiles, laces, printed, trimmings, velvets. 240 00:11:05,399 --> 00:11:07,907 We first looked into this in Wikidata. 241 00:11:07,907 --> 00:11:10,175 We did not have this level of detail in Wikidata. 242 00:11:10,175 --> 00:11:12,207 We still don't have all this resolved. 243 00:11:12,207 --> 00:11:14,764 You can see that this is really complex here. 244 00:11:14,764 --> 00:11:18,012 Anonymous is just not anonymous for a lot of databases. 245 00:11:18,012 --> 00:11:20,126 There's a lot of qualifications-- 246 00:11:20,126 --> 00:11:23,045 whether the nationality, or the century. 247 00:11:23,045 --> 00:11:26,282 So, trying to map all this to Wikidata can be complex, as well. 248 00:11:26,282 --> 00:11:30,450 And then, this shows you that of all the works in the Met, 249 00:11:30,450 --> 00:11:33,976 about 46% are open access right now. 250 00:11:33,976 --> 00:11:38,694 So, we still have about just over 50% that are not CC0 yet. 251 00:11:40,134 --> 00:11:43,444 (man) All the objects in the Met, or all objects on display? 252 00:11:43,444 --> 00:11:45,957 (Andrew) It's weird. It's not on display. 253 00:11:45,957 --> 00:11:47,866 But it's not all objects either. 254 00:11:47,866 --> 00:11:52,176 It's about 400 to 500 thousand objects in their database at this point. 255 00:11:52,176 --> 00:11:53,840 So, somewhere in between. 256 00:11:55,380 --> 00:11:57,609 So, starting points. This is always a hard one. 257 00:11:57,609 --> 00:12:03,514 We just had this discussion on the Facebook group recently 258 00:12:03,514 --> 00:12:04,923 about where do people go 259 00:12:04,923 --> 00:12:07,887 to find out where the modeling should look like for a certain thing. 260 00:12:07,887 --> 00:12:09,271 It's not easy. 261 00:12:09,271 --> 00:12:12,115 So, normally, what we have to do is just point people to, 262 00:12:12,115 --> 00:12:15,281 I don't know, some project that does it well now? 263 00:12:15,281 --> 00:12:17,230 So, it's not a satisfying answer, 264 00:12:17,230 --> 00:12:19,910 but we usually tell folks to start at things like visual arts, 265 00:12:19,910 --> 00:12:22,308 or Sum of All Paintings does it pretty well, 266 00:12:22,308 --> 00:12:25,569 or just go to the project chat to find out where some of these things are. 267 00:12:25,569 --> 00:12:27,444 We need better solutions for this. 268 00:12:27,444 --> 00:12:30,939 This is just a basic flow of what we're doing with the Met here. 269 00:12:30,939 --> 00:12:33,119 We're basically taking their CSV, and their API, 270 00:12:33,119 --> 00:12:35,979 and we're consuming it into a Python data frame. 271 00:12:35,979 --> 00:12:38,159 We're taking the SPARQL code-- 272 00:12:38,159 --> 00:12:40,499 the one that you saw before, this super union-- 273 00:12:40,499 --> 00:12:43,779 bring that in, and we're doing a bi-directional diff, 274 00:12:43,779 --> 00:12:45,999 and then seeing what new things have been added here, 275 00:12:45,999 --> 00:12:47,729 what things have been subtracted there, 276 00:12:47,729 --> 00:12:51,529 and we're actually making those changes either through QuickStatements, 277 00:12:51,529 --> 00:12:53,439 or we're doing it through Pywikibot. 278 00:12:53,439 --> 00:12:55,512 So, directly editing Wikidata. 279 00:12:56,204 --> 00:12:59,405 So, this is the big slide I also couldn't show at Wikimania, 280 00:12:59,405 --> 00:13:01,485 because it would have flummoxed everyone. 281 00:13:01,485 --> 00:13:04,924 So, this is a great example of how we start with the Met database, 282 00:13:04,924 --> 00:13:06,824 we have this crosswalk database, 283 00:13:06,824 --> 00:13:09,209 and then we generate the changes in Wikidata. 284 00:13:09,209 --> 00:13:12,644 The way this works is this is an example of one record from the Met. 285 00:13:12,644 --> 00:13:15,744 This is an evening dress-- we're working with the Costume Institute recently, 286 00:13:15,744 --> 00:13:17,518 the one that puts on the Met Gala. 287 00:13:17,518 --> 00:13:20,442 So, we have one evening dress here, by Valentina. 288 00:13:20,442 --> 00:13:22,100 Here's a date, accession number. 289 00:13:22,100 --> 00:13:25,105 So, these things can be put into Wikidata directly. 290 00:13:25,105 --> 00:13:27,744 A field equals the date, accession number. 291 00:13:27,744 --> 00:13:29,404 But what do we do with things like this? 292 00:13:29,404 --> 00:13:33,868 This is an object name, which is basically like a classification of what it is, 293 00:13:33,868 --> 00:13:35,648 like an instance of for the Met. 294 00:13:35,648 --> 00:13:37,396 And the designer's Valentina. 295 00:13:37,396 --> 00:13:41,571 So, what we do is we take these and we run all the unique object names 296 00:13:41,571 --> 00:13:43,801 and all the unique designers through OpenRefine. 297 00:13:43,801 --> 00:13:46,720 So, we get maybe 60% matches if we're lucky. 298 00:13:46,720 --> 00:13:48,418 We put that into a spreadsheet. 299 00:13:48,418 --> 00:13:53,178 Then we ask volunteers or the curators at the Met 300 00:13:53,178 --> 00:13:55,333 to help fill in this crosswalk database. 301 00:13:55,333 --> 00:13:57,312 This is just simply Google Sheets. 302 00:13:57,312 --> 00:13:59,911 So, we say, here are all the object names, the unique object names 303 00:13:59,911 --> 00:14:02,731 that match lexically exactly with what's in the Met database, 304 00:14:02,731 --> 00:14:05,912 and then you say this maps to this Q ID. 305 00:14:05,912 --> 00:14:08,556 So, we first started this maybe like only about-- 306 00:14:08,556 --> 00:14:11,233 well, 60% were failed, some of these were blank. 307 00:14:11,233 --> 00:14:13,751 So, we tap folks in specific groups. 308 00:14:13,751 --> 00:14:17,316 So there's like a Wiki Loves Fashion little chat group that we have. 309 00:14:17,316 --> 00:14:20,304 And folks like user PKM were super useful in this area. 310 00:14:20,304 --> 00:14:22,794 So she spent a lot of time looking through this, and saying, 311 00:14:22,794 --> 00:14:24,764 "Okay, Evening suit is this, Ewer is that." 312 00:14:24,764 --> 00:14:27,759 So, we looked through and made all this mappings here. 313 00:14:27,759 --> 00:14:30,719 And then, what happens is now, when we see this in the Met database, 314 00:14:30,719 --> 00:14:33,201 we look it up in the crosswalk database, and we say, "Oh, yeah. 315 00:14:33,201 --> 00:14:36,169 These are the two Q numbers we need to put into Wikidata." 316 00:14:36,169 --> 00:14:39,089 And then, it generates the QuickStatement right there. 317 00:14:39,089 --> 00:14:41,328 Same thing here with Designer: Valentina. 318 00:14:41,328 --> 00:14:44,138 If Valentina matches here, then it gets generated 319 00:14:44,138 --> 00:14:45,838 with that QuickStatement right there. 320 00:14:45,838 --> 00:14:48,069 If Valentina does not exist, then we'll create it. 321 00:14:48,069 --> 00:14:51,288 You can see here, Weeks-- look at that high Q ID right there. 322 00:14:51,288 --> 00:14:53,918 We just created that recently, because there was no entry before. 323 00:14:53,918 --> 00:14:55,358 Does that makes sense to everyone? 324 00:14:55,358 --> 00:14:57,727 - (man 2) What's the extra statement? - (Andrew) I'm sorry? 325 00:14:57,727 --> 00:15:00,610 - (man 2) What's the extra statement? - (Andrew) Oh, the extra statement. 326 00:15:00,610 --> 00:15:03,131 So, believe it or not, we have an Evening blouse, Evening dress, 327 00:15:03,131 --> 00:15:05,010 Evening pants, Evening ensemble, Evening hat-- 328 00:15:05,010 --> 00:15:08,650 do we want to make a new Wikidata item for Evening pants,Evening everything? 329 00:15:08,650 --> 00:15:10,444 So, we said, "No." We probably don't want to. 330 00:15:10,444 --> 00:15:13,859 We'll just say, "It's a dress, but it's also evening wear", 331 00:15:13,859 --> 00:15:15,117 which is what that is. 332 00:15:15,117 --> 00:15:17,301 So, we're saying an instance of both things. 333 00:15:17,931 --> 00:15:21,398 I'm not sure it's the perfect solution, but it's a solution at this point. 334 00:15:21,744 --> 00:15:22,944 So, does everyone get that? 335 00:15:22,944 --> 00:15:25,564 So, this is kind of a crosswalk database that we maintain here. 336 00:15:25,564 --> 00:15:28,025 And the nice thing about it, it's just Google Sheets. 337 00:15:28,025 --> 00:15:29,264 So, we can get people to help 338 00:15:29,264 --> 00:15:31,375 that don't need to know anything about this database, 339 00:15:31,375 --> 00:15:34,384 don't need to know about QuickStatements, don't need to know about queries. 340 00:15:34,384 --> 00:15:36,226 They just go in and fill in the Q number. 341 00:15:36,226 --> 00:15:37,244 Yeah. 342 00:15:37,244 --> 00:15:40,902 (woman) So, when you copy object name and you find the Q ID, 343 00:15:40,902 --> 00:15:43,145 the initial 60% that you mentioned as an example, 344 00:15:43,145 --> 00:15:45,223 is that by exact match? 345 00:15:46,483 --> 00:15:48,103 (Andrew) Well, it's through OpenRefine. 346 00:15:48,103 --> 00:15:52,014 So, it does its best guess, and then we verify to make sure 347 00:15:52,014 --> 00:15:54,444 that the OpenRefine match makes sense. 348 00:15:54,444 --> 00:15:56,114 Yeah. 349 00:15:56,203 --> 00:15:57,794 Does that make sense to everyone? 350 00:15:57,794 --> 00:16:00,304 So, some folks might be doing some variation on this, 351 00:16:00,304 --> 00:16:03,403 but I think the nice thing about this is that, by using Google Sheets, 352 00:16:03,403 --> 00:16:08,234 we remove a lot of the complexities of these two areas from this. 353 00:16:08,234 --> 00:16:11,193 And we'll show you some code that does this later on. 354 00:16:11,813 --> 00:16:15,273 - (man 3) How do you generate [inaudible]? - (Andrew) How do you generate this? 355 00:16:15,273 --> 00:16:17,272 - (man 3) Yes. - (Andrew) Python code. 356 00:16:17,272 --> 00:16:19,134 I'll show you a line that does this. 357 00:16:19,134 --> 00:16:21,136 But you can also go up here. 358 00:16:21,136 --> 00:16:25,096 This is the whole Python program that does this, this, and that, 359 00:16:25,096 --> 00:16:27,296 if you want to take a look at that. 360 00:16:28,026 --> 00:16:29,026 Yes. 361 00:16:29,026 --> 00:16:31,207 (man 4) Did you really use your own vocabulary, 362 00:16:31,207 --> 00:16:35,426 or is there something [inaudible]. 363 00:16:35,426 --> 00:16:37,246 - (Andrew) This right here? - (man 4) Yeah. 364 00:16:37,246 --> 00:16:39,721 (Andrew) Yeah. So, this is the Met's own vocabulary. 365 00:16:39,721 --> 00:16:43,031 So, most museums use a system called TMS. 366 00:16:43,031 --> 00:16:44,891 It's like their own management system. 367 00:16:44,891 --> 00:16:47,654 So, they'll usually-- this is the museum world-- 368 00:16:47,654 --> 00:16:50,771 they'll usually roll their own vocabulary for their own needs. 369 00:16:50,771 --> 00:16:54,022 Museums are very late to interoperable metadata. 370 00:16:54,022 --> 00:16:57,282 Librarians and archivists have this kind of as baked into them. 371 00:16:57,282 --> 00:16:58,664 Museums are like, "Meh..." 372 00:16:58,664 --> 00:17:01,471 Our primary goal is to put objects on display, 373 00:17:01,471 --> 00:17:04,141 and if it plays well with other people, that's a side benefit. 374 00:17:04,141 --> 00:17:05,931 But it's not a primary thing that they do. 375 00:17:05,931 --> 00:17:08,031 So, that's why it's complicated to work with museums. 376 00:17:08,031 --> 00:17:11,161 You need to map their vocabulary, which might be a mish-mash 377 00:17:11,161 --> 00:17:14,576 of famous vocabularies, like Getty AAT, and other things. 378 00:17:14,576 --> 00:17:17,911 But usually, it's to serve their exact needs at their museum. 379 00:17:17,911 --> 00:17:19,591 And that's what's challenging. 380 00:17:19,591 --> 00:17:21,091 And I see a lot of heads nodding, 381 00:17:21,091 --> 00:17:23,161 so you've probably seen this a lot at these museums. 382 00:17:23,161 --> 00:17:25,429 So, I'll move on to show you how this actually is done. 383 00:17:25,429 --> 00:17:26,749 Oh, go ahead. 384 00:17:26,749 --> 00:17:28,711 (man 5) How do you bring people, to collaborate, 385 00:17:28,711 --> 00:17:31,595 and put some Q codes into your database? 386 00:17:31,595 --> 00:17:32,971 (Andrew) How do you-- I'm sorry? 387 00:17:32,971 --> 00:17:35,038 (man 5) How do you bring... collaborate people? 388 00:17:35,038 --> 00:17:38,290 (Andrew) Ah, so for this, these are projects we just go to, 389 00:17:38,780 --> 00:17:41,750 for better or for worse, like Facebook chat groups that we know, 390 00:17:41,750 --> 00:17:43,007 are active in these areas. 391 00:17:43,007 --> 00:17:45,685 Like Sum of All Paintings, Wiki Loves Fashion-- 392 00:17:45,685 --> 00:17:47,918 which is a group of maybe five or seven folks. 393 00:17:48,548 --> 00:17:50,759 But we need a better way to get this out to folks 394 00:17:50,759 --> 00:17:52,339 so we get more collaborators on this. 395 00:17:52,339 --> 00:17:53,879 This doesn't scale well, right now. 396 00:17:53,879 --> 00:17:56,089 But for small groups, it works pretty well. 397 00:17:56,108 --> 00:17:57,568 I'm open to ideas. 398 00:17:57,568 --> 00:17:59,619 (man 5) [inaudible] 399 00:17:59,619 --> 00:18:01,669 (Andrew) Oh yeah. Please come on up. 400 00:18:01,669 --> 00:18:02,948 If folks want to come up here, 401 00:18:02,948 --> 00:18:05,357 there's a little more room in the aisle right here. 402 00:18:06,057 --> 00:18:09,629 So, we are utilizing Python for this mostly. 403 00:18:09,774 --> 00:18:13,354 If you don't know, there is a Python notebook system 404 00:18:13,354 --> 00:18:14,884 that WMFLabs has. 405 00:18:14,884 --> 00:18:17,345 So, you can actually go on and start playing with this. 406 00:18:17,345 --> 00:18:19,624 So, it's pretty easy to generate a lot of stuff 407 00:18:19,624 --> 00:18:21,401 if you know some of the code that's there. 408 00:18:21,401 --> 00:18:22,455 [inaudible], yeah. 409 00:18:22,485 --> 00:18:23,922 (woman 2) Why do you put everything 410 00:18:23,922 --> 00:18:27,821 into Wikidata, and not into your own Wikibase? 411 00:18:29,401 --> 00:18:31,127 (Andrew) If you're using your own Wikibase? 412 00:18:31,127 --> 00:18:33,741 (woman 2) Yeah. Why don't you use your own Wikibase? 413 00:18:33,741 --> 00:18:35,990 and then go to [inaudible] 414 00:18:35,990 --> 00:18:38,390 (Andrew) That's its own ball of-- 415 00:18:38,390 --> 00:18:41,630 I don't want to maintain my own Wikibase at this point. (laughs) 416 00:18:42,190 --> 00:18:44,400 If I can avoid doing the Wikibase maintenance, 417 00:18:44,400 --> 00:18:45,760 I would not do it. 418 00:18:46,530 --> 00:18:48,080 (man 6) Would you like a Wikibase? 419 00:18:48,080 --> 00:18:50,050 (Andrew) We could. It's possible. 420 00:18:50,050 --> 00:18:54,154 (man 7) But again, what they use [inaudible] 421 00:18:54,154 --> 00:18:59,868 about 2,000, 8,000, 10,000, of 400,000 digital [inaudible]. 422 00:18:59,868 --> 00:19:04,300 So that's only 2.5%, 423 00:19:04,300 --> 00:19:08,782 [inaudible] 424 00:19:08,782 --> 00:19:12,601 (Andrew) So, I'd say, solve it for 1,500, then scale up to 150 thousand. 425 00:19:12,601 --> 00:19:14,428 So, we're trying to solve it 426 00:19:14,428 --> 00:19:16,876 for the best well-known objects, and then-- 427 00:19:16,876 --> 00:19:19,875 (man 7) When do you think that will happen? 428 00:19:20,855 --> 00:19:25,788 I understand that those are people that shouldn't go onto Wikidata. 429 00:19:25,788 --> 00:19:29,856 So you go to Commons or your own Wikibase solution, 430 00:19:29,856 --> 00:19:31,695 not to be a [inaudible]-- 431 00:19:31,695 --> 00:19:34,588 (Andrew) Right. That's why we're going with the 2,000 and 8,000. 432 00:19:34,588 --> 00:19:37,460 We're pretty confident these are highly notable objects 433 00:19:37,460 --> 00:19:39,085 that deserve to be in Wikidata. 434 00:19:39,085 --> 00:19:40,465 Beyond that, it's debatable. 435 00:19:40,465 --> 00:19:44,265 So, that's why we're not vacuuming 400-thousand things at one shot. 436 00:19:44,265 --> 00:19:48,936 We're starting with notable 2,000, notable 8,000, then we'll talk after that. 437 00:19:49,515 --> 00:19:52,775 So, these are the two lines of code that do the most stuff here. 438 00:19:52,775 --> 00:19:54,217 So, even if you don't know Python, 439 00:19:54,217 --> 00:19:56,146 it's actually not that bad if you look at this. 440 00:19:56,146 --> 00:19:58,105 There's a read_csv function. 441 00:19:58,105 --> 00:20:00,015 You're taking the crosswalk URL, 442 00:20:00,015 --> 00:20:02,336 basically, the URL of that Google Spreadsheet. 443 00:20:02,336 --> 00:20:04,875 You're grabbing the spreadsheet that's called "Object Name", 444 00:20:04,875 --> 00:20:06,685 and you're basically creating a data structure 445 00:20:06,685 --> 00:20:08,165 that has the Object Name and the QID. 446 00:20:08,165 --> 00:20:09,645 That's it. That's all you're doing. 447 00:20:09,645 --> 00:20:11,655 Just pulling that in to the Python code. 448 00:20:11,655 --> 00:20:15,914 Then, you're actually matching whatever the entity's name is, 449 00:20:15,914 --> 00:20:17,754 and then looking up the QID. 450 00:20:17,754 --> 00:20:21,689 Okay, so, this is just to tell you that's not super hard. 451 00:20:21,689 --> 00:20:24,234 The code is available right there, if you want to look at it. 452 00:20:24,234 --> 00:20:26,474 But these two lines of code, which takes a little while 453 00:20:26,474 --> 00:20:29,524 when you're writing it from scratch to create these two lines of code, 454 00:20:29,524 --> 00:20:30,904 but once you have an example, 455 00:20:30,904 --> 00:20:34,484 it's pretty darn easy to plug in your own data set, your own crosswalk, 456 00:20:34,484 --> 00:20:36,844 to generate the QuickStatements. 457 00:20:36,844 --> 00:20:38,525 So, I've done a lot of the work already, 458 00:20:38,525 --> 00:20:41,385 and I invite you to steal the code and try it. 459 00:20:42,365 --> 00:20:44,936 So, when it comes to images, it's a little more challenging. 460 00:20:44,936 --> 00:20:48,215 So, at this point, Pattypan is probably your best bet. 461 00:20:48,215 --> 00:20:51,385 Pattypan is a tool that is a spreadsheet-oriented tool. 462 00:20:51,385 --> 00:20:54,855 You fill in the metadata, you point to the local file on your computer, 463 00:20:54,855 --> 00:20:57,435 and it uploads it to Commons with all that information, 464 00:20:57,435 --> 00:21:02,125 or another alternative is if you set P4765 to a URL-- 465 00:21:03,105 --> 00:21:06,195 because this is the Commons-compatible image available at URL, 466 00:21:06,195 --> 00:21:08,544 Martin Dahhmers has a bot, at least for paintings, 467 00:21:08,544 --> 00:21:12,020 that will just swoop through and say, "Oh, we don't have this image. 468 00:21:12,020 --> 00:21:15,113 Here's a Commons compatible one. 469 00:21:15,113 --> 00:21:17,709 Why don't I slip it from that site and put it into Commons?" 470 00:21:17,709 --> 00:21:18,995 And that's what his bot does. 471 00:21:18,995 --> 00:21:20,733 So, you can actually take a look at his bot 472 00:21:20,733 --> 00:21:24,102 and modify it for your own purposes, but that is also another alternative 473 00:21:24,102 --> 00:21:28,061 that doesn't require you to do some spreadsheet work there. 474 00:21:28,061 --> 00:21:30,452 If you might have heard of GLAM Wiki Toolset, 475 00:21:30,452 --> 00:21:32,552 it's effectively end of life at this point. 476 00:21:33,322 --> 00:21:37,362 It hasn't been updated, and even the folks who have been working with it in the past 477 00:21:37,362 --> 00:21:39,332 have said Pattypan is probably your best bet. 478 00:21:39,332 --> 00:21:41,722 Has anyone used GWT these days? 479 00:21:41,741 --> 00:21:43,591 A few of you, a little bit. 480 00:21:43,591 --> 00:21:45,161 It's just not being further developed, 481 00:21:45,161 --> 00:21:47,852 and it's not compatible with a lot of our authentication protocols 482 00:21:47,852 --> 00:21:49,280 that we have now. 483 00:21:49,280 --> 00:21:52,928 Okay. So, right now, we have basic metadata added to Wikidata, 484 00:21:52,928 --> 00:21:54,997 with pretty good results from the Met, 485 00:21:54,997 --> 00:21:58,117 and we have a Python script here to also analyze that. 486 00:21:58,117 --> 00:22:00,307 You're welcome to steal some of that code, as well. 487 00:22:00,307 --> 00:22:02,817 So, this is what we are showing to the Met folks, now. 488 00:22:02,817 --> 00:22:06,087 We actually have Listeria lists that are running 489 00:22:06,087 --> 00:22:07,627 to show all the inventory 490 00:22:07,627 --> 00:22:10,967 and all the information that we have in Wikidata. 491 00:22:10,967 --> 00:22:15,612 And I'll show you very quickly about a project that we ran to show folks. 492 00:22:15,612 --> 00:22:18,547 So, what are the benefits of adding your collections to Wikidata? 493 00:22:18,547 --> 00:22:21,917 One is to use AI in the image classifier 494 00:22:21,917 --> 00:22:24,787 to actually help train a machine learning model 495 00:22:24,787 --> 00:22:29,447 with all the Met's images and keywords, and let that be an engine for other folks 496 00:22:29,447 --> 00:22:32,047 to recognize content. 497 00:22:32,047 --> 00:22:36,408 So, this is a hack-a-thon that we had with MIT and Microsoft last year. 498 00:22:36,408 --> 00:22:39,238 The way this works, is we have the paintings from the Met, 499 00:22:39,238 --> 00:22:40,277 and we have the keywords 500 00:22:40,277 --> 00:22:43,157 that they actually paid a crew for six months to work on 501 00:22:43,157 --> 00:22:46,937 to add hand keyword tags to all the artworks. 502 00:22:47,567 --> 00:22:50,077 We ingested that into an AI system right here, 503 00:22:50,077 --> 00:22:51,367 and then, what we did was say, 504 00:22:51,367 --> 00:22:55,428 "Let's feed in new images that this AI ML system had never seen before, 505 00:22:55,428 --> 00:22:56,747 and see what comes out." 506 00:22:56,747 --> 00:23:00,037 And the problem is that it comes out with pretty good results, 507 00:23:00,037 --> 00:23:02,267 but it's maybe only 60% accurate. 508 00:23:02,267 --> 00:23:04,797 And for most folks, 60% accurate is garbage. 509 00:23:04,797 --> 00:23:08,627 How do I get the 60% good out of this pile of stuff? 510 00:23:08,627 --> 00:23:11,127 The good news is that our community knows how to do that. 511 00:23:11,127 --> 00:23:13,157 We can actually feed this into a Wikidata game 512 00:23:13,157 --> 00:23:14,997 and get the good stuff out of that. 513 00:23:14,997 --> 00:23:16,228 That's basically what we did. 514 00:23:16,228 --> 00:23:17,647 So, this is the Wikidata game-- 515 00:23:17,647 --> 00:23:19,757 you'll notice this is Magnus' interface right there-- 516 00:23:19,757 --> 00:23:21,182 being played at the Met Museum, 517 00:23:21,182 --> 00:23:22,207 in the lobby. 518 00:23:22,207 --> 00:23:25,437 We actually had folks at a cocktail party drinking champagne 519 00:23:25,437 --> 00:23:27,427 and hitting buttons on the screen. 520 00:23:27,427 --> 00:23:31,048 Hopefully, accurately. (chuckles) 521 00:23:31,048 --> 00:23:33,444 (applause) 522 00:23:33,444 --> 00:23:35,116 We had journalists, curators, 523 00:23:35,116 --> 00:23:37,506 we had some board members from the Met there as well. 524 00:23:37,506 --> 00:23:38,810 And this was great. 525 00:23:38,810 --> 00:23:40,061 No log in, whatever. 526 00:23:40,061 --> 00:23:42,106 (lowers voice) We created an account just for this. 527 00:23:42,106 --> 00:23:44,117 So, they just hit yes-no-yes-no. 528 00:23:44,117 --> 00:23:45,256 This is great. 529 00:23:45,256 --> 00:23:47,526 You saw this, it said, "Is there a tree in this picture?" 530 00:23:47,526 --> 00:23:49,148 You don't have to train anyone on this. 531 00:23:49,148 --> 00:23:52,213 You just hit yes-- depicts a tree, not depicted. 532 00:23:52,213 --> 00:23:55,910 I even had my eight-year-old boys play this game with a finger tap. 533 00:23:56,540 --> 00:24:00,047 And we also created a little tool that showed all the depictions going by 534 00:24:00,047 --> 00:24:01,505 so people could see them. 535 00:24:03,189 --> 00:24:06,453 It basically is like-- how do you sift good from bad? 536 00:24:06,453 --> 00:24:08,350 This is where the Wikimedia community comes in, 537 00:24:08,350 --> 00:24:11,034 that no other entity could ever do. 538 00:24:12,084 --> 00:24:15,052 So, in that first few months that we had this, 539 00:24:15,052 --> 00:24:19,017 over 7,000 judgments, resulting in about 5,000 edits. 540 00:24:19,912 --> 00:24:22,227 We did really well on tree, boat, flower, horse, 541 00:24:22,227 --> 00:24:24,907 things that are in landscape paintings. 542 00:24:25,146 --> 00:24:27,466 But when you go to things like gender discrimination, 543 00:24:27,466 --> 00:24:29,901 and cats and dogs, not so good, I know. 544 00:24:29,901 --> 00:24:32,159 Because there's so many different types of cats and dogs 545 00:24:32,159 --> 00:24:33,456 in different positions. 546 00:24:33,456 --> 00:24:36,105 But horses, a lot easier than cats and dogs. 547 00:24:36,735 --> 00:24:38,742 But also, I should note that Wikimedia Foundation 548 00:24:38,742 --> 00:24:42,697 is now looking into doing image recognition on Commons uploads 549 00:24:42,697 --> 00:24:46,368 to do these suggestions as well, which is an awesome development. 550 00:24:46,667 --> 00:24:49,627 Okay, so, dashboards. 551 00:24:50,750 --> 00:24:53,358 Let's just show you some of these dashboards. 552 00:24:53,418 --> 00:24:55,097 Folks you work with love dashboards. 553 00:24:55,097 --> 00:24:56,817 They just want to see stats. 554 00:24:56,817 --> 00:24:58,797 So, we have them, like BaGLAMa. 555 00:24:58,797 --> 00:25:00,787 We have InteGraality. 556 00:25:00,787 --> 00:25:02,767 Is JeanFred here? 557 00:25:03,447 --> 00:25:06,247 I think this is a very new thing relative to last WikidataCon. 558 00:25:06,247 --> 00:25:08,327 We actually have a tool which will create 559 00:25:08,327 --> 00:25:10,967 this property completeness chart right here. 560 00:25:10,967 --> 00:25:12,987 So, it's called InteGraality, with two A's. 561 00:25:13,206 --> 00:25:15,526 It's on that big chart that I showed you before. 562 00:25:15,526 --> 00:25:19,086 And it can just autogenerate how complete your items are 563 00:25:19,086 --> 00:25:21,036 in any set, which is really cool. 564 00:25:21,566 --> 00:25:23,771 So, we can see that paintings are by far the highest, 565 00:25:23,771 --> 00:25:26,057 we have sculptures, drawings, photographs. 566 00:25:26,121 --> 00:25:29,322 And then, they also like to see what are the most popular artworks 567 00:25:29,322 --> 00:25:31,148 in the Wikisphere? 568 00:25:31,148 --> 00:25:33,417 So, just looking at the site links in Wikidata-- 569 00:25:33,417 --> 00:25:37,781 you can see and rank all these different artworks there. 570 00:25:39,568 --> 00:25:41,926 Also another thing they'd like to see 571 00:25:41,926 --> 00:25:46,879 is what are the most frequent creators of content or Met artworks-- 572 00:25:46,879 --> 00:25:49,193 what are the most commonly depicted things. 573 00:25:49,193 --> 00:25:51,982 So, these are very easy to generate in SPARQL, 574 00:25:51,982 --> 00:25:54,622 you could look at it right there, using bubble graphs. 575 00:25:54,673 --> 00:25:56,991 Then place of birth of the most prominent artists, 576 00:25:56,991 --> 00:25:58,814 we have a chart there, as well. 577 00:25:58,814 --> 00:26:01,142 So, structured data on Commons. 578 00:26:01,142 --> 00:26:04,301 I just want to show you very briefly in case you can't get to Sandra's session, 579 00:26:04,301 --> 00:26:06,226 but you definitely should go to Sandra's session. 580 00:26:06,226 --> 00:26:10,693 You actually can search in Commons for a specific Wikibase statement. 581 00:26:11,353 --> 00:26:15,333 I don't always remember the syntax, but you have burn in your brain 582 00:26:15,333 --> 00:26:19,893 and say, it's haswbstatement:P1343= 583 00:26:19,893 --> 00:26:22,695 whatever-- basically, your last two parts of the triple. 584 00:26:22,695 --> 00:26:26,162 I always get haswb and wbhas mixed up. 585 00:26:26,162 --> 00:26:28,183 I always get the colon and the equals mixed up. 586 00:26:28,183 --> 00:26:32,022 So just do it once, remember it, and you'll get the hang of it. 587 00:26:32,022 --> 00:26:34,772 But simple searches are must faster than SPARQL queries. 588 00:26:34,772 --> 00:26:36,478 So, if you can just look for one statement, 589 00:26:36,478 --> 00:26:38,392 boom, you'll get the results. 590 00:26:39,181 --> 00:26:43,711 So, things like this, you can look for symbolically or semantically, 591 00:26:43,711 --> 00:26:47,511 things that depict the Met museum, for example. 592 00:26:48,051 --> 00:26:50,051 So, finally, community campaigns. 593 00:26:50,051 --> 00:26:51,681 Richard has been a pioneer in this area. 594 00:26:51,681 --> 00:26:54,071 So, once you have the Wikidata items, 595 00:26:54,071 --> 00:26:57,050 they can actually assist in creating Wikipedia articles. 596 00:26:57,050 --> 00:26:59,785 So, Richard, why don't you tell us a little bit about the Mbabel tool 597 00:26:59,785 --> 00:27:01,009 that you created for this. 598 00:27:01,009 --> 00:27:03,192 (Richard) Hi, can I get this on? 599 00:27:04,649 --> 00:27:06,109 (Andrew) Oh, use [Joisey's]. 600 00:27:06,109 --> 00:27:08,319 (Richard) It's on, now. I'm good. 601 00:27:08,949 --> 00:27:10,769 So, we had all this information on Wikidata. 602 00:27:10,769 --> 00:27:13,729 [inaudible] browsing data on our evenings and weekends 603 00:27:13,729 --> 00:27:15,649 to learn about art-- not everyone does. 604 00:27:15,649 --> 00:27:19,319 We have quite a bit more people [inaudible] Wikipedia, 605 00:27:19,319 --> 00:27:22,260 so how do we get this information from Wikidata to Wikipedia? 606 00:27:22,260 --> 00:27:25,289 One of the ways of doing this is this so-called Mbabel, 607 00:27:25,289 --> 00:27:28,069 which developed with the help of a lot of people in [inaudible]. 608 00:27:28,069 --> 00:27:30,639 People like Martin and others. 609 00:27:31,689 --> 00:27:34,659 So, basically to take some basic art information, 610 00:27:34,659 --> 00:27:37,688 and use it to populate a Wikipedia article. 611 00:27:37,688 --> 00:27:40,241 So, by who created this work, who was the artist, 612 00:27:40,241 --> 00:27:42,313 when it was created, et cetera. 613 00:27:42,313 --> 00:27:44,626 The nice thing about this is it can generate works. 614 00:27:44,626 --> 00:27:46,210 We started with English Wikipedia, 615 00:27:46,210 --> 00:27:48,608 but it's been developed in other languages. 616 00:27:48,608 --> 00:27:50,938 So, Portuguese Wikipedia, our Brazilian friends 617 00:27:50,938 --> 00:27:53,508 who've done a lot of work and taking it to realms beyond art, 618 00:27:53,508 --> 00:27:57,283 to stuff like elections and political work as well. 619 00:27:57,283 --> 00:28:01,128 And the nice thing about this is we can query on Wikidata-- 620 00:28:01,758 --> 00:28:06,928 so different artists-- so for example, we've done projects with Women in Red, 621 00:28:06,928 --> 00:28:08,472 looking at women artists. 622 00:28:08,472 --> 00:28:12,753 Projects related to Wiki Loves Pride, looking at LGBT-identified artists, 623 00:28:12,753 --> 00:28:14,073 African Diaspora Artists, 624 00:28:14,073 --> 00:28:16,493 and a lot of different groups and things of time periods, 625 00:28:16,493 --> 00:28:19,293 different collections, and also looking at articles 626 00:28:19,293 --> 00:28:22,213 that have been and haven't been translated to different languages. 627 00:28:22,213 --> 00:28:24,923 So all of the articles that haven't been translated to Arabic yet. 628 00:28:24,923 --> 00:28:28,329 You need to find some interesting articles maybe that are relevant to a culture 629 00:28:28,329 --> 00:28:30,459 that haven't been translated into that language yet. 630 00:28:30,459 --> 00:28:32,659 We actually have a number of works in the Met collection 631 00:28:32,659 --> 00:28:35,199 that are in Wikipedias that aren't in English yet, 632 00:28:35,199 --> 00:28:37,259 because it's a global collection. 633 00:28:37,769 --> 00:28:40,449 So, there are a lot of ways, and hopefully, we can spread it around 634 00:28:40,449 --> 00:28:44,709 of creating Wikipedia content, as well, that is driven by these Wikidata items, 635 00:28:44,709 --> 00:28:47,549 and that also maybe can help spread the improvement 636 00:28:47,549 --> 00:28:49,529 to Wikidata items, as well, in the future. 637 00:28:49,529 --> 00:28:52,403 (Andrew) And there's a number of folks here using Mbable already, right? 638 00:28:52,403 --> 00:28:54,124 Who's using Mbable in the room? Brazilians? 639 00:28:54,124 --> 00:28:58,690 And also, if [Armin] is here, we have our winner 640 00:28:59,165 --> 00:29:03,146 of the Wikipedia Asia Month, and Wiki Loves Pride contest. 641 00:29:03,146 --> 00:29:05,720 So, thank you for joining, and congratulations. 642 00:29:06,493 --> 00:29:09,993 We'll have another Wiki Asia Month campaign in November. 643 00:29:10,173 --> 00:29:13,383 The way I like to describe it [inaudible] 644 00:29:13,383 --> 00:29:15,443 It doesn't give you a blank page. 645 00:29:15,443 --> 00:29:16,863 It gives you the skeleton, 646 00:29:16,863 --> 00:29:18,962 which is really a much better user experience 647 00:29:18,962 --> 00:29:21,472 for edit-a-thons and beginners. 648 00:29:21,472 --> 00:29:23,526 So, it's a lot of great work that Richard has done, 649 00:29:23,526 --> 00:29:25,841 and people are building on it, which is awesome. 650 00:29:25,906 --> 00:29:29,066 (woman 3) [inaudible] for some of them, which is really nice. 651 00:29:29,066 --> 00:29:30,376 Yeah, exactly. 652 00:29:30,376 --> 00:29:32,956 (woman 3) [inaudible] 653 00:29:32,956 --> 00:29:35,815 Right. We should have put a URL here. 654 00:29:35,815 --> 00:29:38,196 (man 8) [inaudible] 655 00:29:38,196 --> 00:29:40,055 Oh, that's right. We have the link right here. 656 00:29:40,055 --> 00:29:43,725 So if you click-- this is a Listeria list, it's autogenerating all that for you. 657 00:29:43,725 --> 00:29:46,205 And then, you click on the red link, it'll create the skeleton, 658 00:29:46,205 --> 00:29:47,491 which is pretty cool. 659 00:29:47,491 --> 00:29:49,172 Alright, we're on the final stretch here. 660 00:29:49,172 --> 00:29:51,990 The tool that we're going to be announcing-- 661 00:29:51,990 --> 00:29:55,047 well, we announced a few weeks ago, but only to a small set of folks, 662 00:29:55,047 --> 00:29:57,038 but we're making a big splash here, 663 00:29:57,038 --> 00:29:59,345 is the depiction tool that we just created. 664 00:29:59,345 --> 00:30:05,298 Wikipedia has shown that volunteer contributors can add a lot of these things 665 00:30:05,298 --> 00:30:06,681 that museums can't. 666 00:30:06,681 --> 00:30:10,263 So, what if we created a tool that could let you enrich 667 00:30:10,263 --> 00:30:15,907 the metadata about artworks in terms of the depiction information? 668 00:30:15,907 --> 00:30:19,477 And what we did was we applied for a grant from the Knight Foundation, 669 00:30:19,477 --> 00:30:22,684 and we created this tool-- and is Edward here? 670 00:30:22,760 --> 00:30:26,590 Edward is our wonderful developer who in like a month, said, 671 00:30:26,590 --> 00:30:28,050 "Okay, here's a prototype." 672 00:30:28,050 --> 00:30:33,103 After we gave him a specification, and it's pretty cool. 673 00:30:33,900 --> 00:30:35,849 - So what we can do-- - (applause) 674 00:30:35,849 --> 00:30:37,169 Thanks, Edward. 675 00:30:37,569 --> 00:30:39,269 We're working within collections of items. 676 00:30:39,269 --> 00:30:41,629 So, what we do, is we can bring up a page like this. 677 00:30:41,629 --> 00:30:44,789 It's no longer looking at a Wikidata item with a tiny picture. 678 00:30:44,789 --> 00:30:48,484 If we're working with what's depicted in the image, we want the picture big. 679 00:30:48,484 --> 00:30:51,201 And we don't really have tools that work with big images. 680 00:30:51,201 --> 00:30:53,348 We have tools that deal with lexical and typing. 681 00:30:53,348 --> 00:30:56,715 So one of the big things that Edward did was made a big version of the picture, 682 00:30:56,715 --> 00:30:58,739 scrape whatever you can from the object page 683 00:30:58,739 --> 00:31:00,633 from a GLAM organization, give you context. 684 00:31:00,633 --> 00:31:02,773 I can see dogs, children, wigwam. 685 00:31:02,773 --> 00:31:05,782 These are things that direct the user to add meaningful information. 686 00:31:05,782 --> 00:31:09,024 You have some metadata that's scraped from the site, too. 687 00:31:09,024 --> 00:31:11,868 Teepee, Comanche-- oh, it's Comanche, not Navajo, 688 00:31:11,868 --> 00:31:13,556 because I know the object page said that. 689 00:31:13,556 --> 00:31:15,702 And you can actually start typing in the field, there. 690 00:31:15,702 --> 00:31:17,628 And the cool thing is that it gives you context, 691 00:31:17,628 --> 00:31:19,566 It doesn't just match anything to Wikidata, 692 00:31:19,566 --> 00:31:23,107 it first matches things that have already been used in other depiction statements. 693 00:31:23,107 --> 00:31:25,456 Very simple thing, but what a godsend it is 694 00:31:25,456 --> 00:31:27,166 for folks who have tried this in the past. 695 00:31:27,166 --> 00:31:29,116 Don't give me everything that matches teepee. 696 00:31:29,116 --> 00:31:33,321 Show me what other paintings have used teepee in the past. 697 00:31:33,355 --> 00:31:36,175 So, it's interactive, context-driven, statistics-driven, 698 00:31:36,175 --> 00:31:37,936 by showing you what is matched before. 699 00:31:37,936 --> 00:31:40,336 And the cool thing is once you're done with that painting, 700 00:31:40,336 --> 00:31:42,196 you can start to work in other areas. 701 00:31:42,196 --> 00:31:44,936 You want to work within the same artist, the collection, location, 702 00:31:45,876 --> 00:31:47,295 other criteria here. 703 00:31:47,295 --> 00:31:49,146 And you can even browse through the collections 704 00:31:49,146 --> 00:31:51,582 of different organizations, just work on their paintings. 705 00:31:51,582 --> 00:31:53,670 So, we wanted people to not live in Wikidata-- 706 00:31:53,670 --> 00:31:56,307 kind of onesy-twosies with items, but live in a space 707 00:31:56,307 --> 00:31:59,232 where you're looking at artworks in collections that make sense. 708 00:31:59,683 --> 00:32:01,792 And then, you can actually look through it visually. 709 00:32:01,792 --> 00:32:04,237 It kind of looks like Krotos or these other tools, 710 00:32:04,237 --> 00:32:07,726 but you can actually live edit on Wikidata at the same time. 711 00:32:07,726 --> 00:32:09,104 So, go ahead and try it out. 712 00:32:09,104 --> 00:32:10,609 We've only have 14 users, 713 00:32:10,609 --> 00:32:14,667 but we've had 2,100 paintings worked on, with 5,000 plus depict statements. 714 00:32:14,667 --> 00:32:16,126 That's pretty good for 14. 715 00:32:16,126 --> 00:32:18,119 So, multiply that by 10-- 716 00:32:18,119 --> 00:32:20,515 imagine how many more things we could do with that. 717 00:32:20,515 --> 00:32:23,797 So, you can go ahead and go to art.wikidata.link and try out the tool. 718 00:32:23,797 --> 00:32:26,594 It uses OLAF authentication, and you're off to the races. 719 00:32:26,594 --> 00:32:29,187 And it should be very natural without any kind of training 720 00:32:29,187 --> 00:32:31,782 to add depiction statements to artworks. 721 00:32:31,837 --> 00:32:35,170 But you can put any object. We don't restrict the object right now. 722 00:32:35,170 --> 00:32:37,278 So, you could put any Q number 723 00:32:38,468 --> 00:32:41,208 to edit this content if you want. 724 00:32:41,275 --> 00:32:44,645 But we primarily stick with paintings and 2D artworks, right now. 725 00:32:46,184 --> 00:32:49,405 Okay. You can actually look at the recent changes 726 00:32:49,405 --> 00:32:52,175 and see who's made edits recently to that. 727 00:32:52,815 --> 00:32:54,855 Okay? Okay, so we're going to wind it down. 728 00:32:54,855 --> 00:32:58,386 Ooh, one minute, then we'll do some Q&A. 729 00:32:58,915 --> 00:33:03,081 So, the final thing that I think is useful for museum types especially, 730 00:33:03,081 --> 00:33:07,307 is there's a very famous author named Nina Simon in the museum world, 731 00:33:07,307 --> 00:33:11,204 where she likes to talk about how do we go from users, 732 00:33:11,204 --> 00:33:14,968 or I guess your audience, contributing stuff to your collections 733 00:33:14,968 --> 00:33:18,004 to collaborating around content, to actually being co-creative 734 00:33:18,004 --> 00:33:19,714 and creating new things. 735 00:33:19,714 --> 00:33:20,984 And that's always been tough. 736 00:33:20,984 --> 00:33:24,154 And I'd like to argue that Wikidata is this co-creative level. 737 00:33:24,154 --> 00:33:26,914 So, it's not just uploading a file to Commons, 738 00:33:26,914 --> 00:33:28,234 which is contributing something. 739 00:33:28,234 --> 00:33:31,194 It's not just editing an article with someone else, which is collaborative. 740 00:33:31,194 --> 00:33:34,833 But we are now seeing these tools that let you make timelines, 741 00:33:34,833 --> 00:33:36,133 and graphs, and bubble charts. 742 00:33:36,133 --> 00:33:38,833 And this is actually the co-creative part that's really interesting. 743 00:33:38,833 --> 00:33:40,353 And that's what Wikidata provides you. 744 00:33:40,353 --> 00:33:42,235 Because suddenly, it's not language dependent-- 745 00:33:42,235 --> 00:33:45,146 we've got this database that's got this rich information in it. 746 00:33:45,946 --> 00:33:48,606 So, it's not just pictures, not just text, 747 00:33:48,606 --> 00:33:50,522 but it's all this rich multimedia 748 00:33:50,522 --> 00:33:52,607 that we have the opportunity to work on. 749 00:33:52,607 --> 00:33:55,851 So, this is just another example of this connected graph 750 00:33:55,851 --> 00:33:57,389 that you can take a look at later on 751 00:33:57,389 --> 00:33:59,860 to show another example of The Death of Socrates, 752 00:33:59,860 --> 00:34:02,312 and the different themes around that painting. 753 00:34:03,252 --> 00:34:05,653 And it's really easy to make this graph yourself. 754 00:34:05,653 --> 00:34:08,172 So again, another scary graphic that only makes sense 755 00:34:08,172 --> 00:34:09,822 for Wikidata folks, like you. 756 00:34:09,822 --> 00:34:13,682 You just give it a list of Wikidata items, and it'll do the rest, that's it. 757 00:34:14,102 --> 00:34:15,662 You'll give the list. 758 00:34:15,705 --> 00:34:17,664 Keep all this code the same. 759 00:34:17,664 --> 00:34:21,364 So, fortunately, Martin and Lucas helped do all this code here. 760 00:34:21,364 --> 00:34:23,864 Just give it a list of items and the magic will happen. 761 00:34:23,864 --> 00:34:25,624 Hopefully, it won't blow up your computer, 762 00:34:25,624 --> 00:34:28,755 because you're putting in a reasonable number of items there. 763 00:34:28,755 --> 00:34:31,593 But as long as you have the screen space, it'll draw the graph, 764 00:34:31,593 --> 00:34:33,283 which is pretty darn cool. 765 00:34:33,283 --> 00:34:37,223 And then, finally, two tools-- I realized at 2 a.m. last night 766 00:34:37,223 --> 00:34:39,744 a few people said, "I didn't know about these tools." 767 00:34:39,744 --> 00:34:41,343 And you should know about these tools. 768 00:34:41,343 --> 00:34:44,613 So, one is Recoin, which shows you the relative completeness of an item 769 00:34:44,613 --> 00:34:46,773 compared to other items of the same instance. 770 00:34:46,773 --> 00:34:49,473 And then, Cradle, which is a way to have a forms-based way 771 00:34:49,473 --> 00:34:50,693 to create content. 772 00:34:50,693 --> 00:34:52,453 So, these are very useful for edit-a-thons 773 00:34:52,453 --> 00:34:54,753 where if you know that you're working with just artworks, 774 00:34:54,753 --> 00:34:57,553 don't just let people create items with a blank screen. 775 00:34:57,553 --> 00:35:00,275 Give them a form to fill out to start entering in information 776 00:35:00,275 --> 00:35:01,818 that's structured. 777 00:35:01,818 --> 00:35:04,588 And then, finally, we've gone through some of this, already. 778 00:35:06,268 --> 00:35:09,539 This is my big chart that I love to get people's feedback on. 779 00:35:09,539 --> 00:35:14,296 How do we get people across the chasm to be in this space? 780 00:35:14,328 --> 00:35:16,839 We have a lot of folks who, now, can do template coding, 781 00:35:16,839 --> 00:35:20,040 spreadsheets, QuickStatements, SPARQL queries, and then we got-- 782 00:35:20,935 --> 00:35:24,259 how do we get people to this side where we have Python 783 00:35:24,259 --> 00:35:26,694 and the things that can do more sophisticated editing. 784 00:35:26,694 --> 00:35:28,625 It's really hard to get people across this. 785 00:35:28,625 --> 00:35:30,785 But I would like to say it's hard to get people across, 786 00:35:30,785 --> 00:35:32,847 but the content and the technology is not that hard. 787 00:35:32,847 --> 00:35:35,380 We actually need more people to learn about regular expressions. 788 00:35:35,380 --> 00:35:38,307 And once you get some kind of experience here, 789 00:35:38,307 --> 00:35:41,830 you'll find that this is a wonderful world that you can learn a lot in, 790 00:35:41,830 --> 00:35:44,700 but it does take some time to get across this chasm. 791 00:35:44,829 --> 00:35:46,289 Yes, James. 792 00:35:46,289 --> 00:35:52,148 (James) [inaudible] 793 00:35:53,127 --> 00:35:57,192 No, what it means is that the graph is not necessarily accurate 794 00:35:57,192 --> 00:35:59,178 in terms of its data points. 795 00:35:59,308 --> 00:36:03,427 But what it means-- I guess it's more like this is a valley. 796 00:36:03,786 --> 00:36:06,716 It's like we need to get people across this valley here. 797 00:36:06,716 --> 00:36:10,146 (woman 4) [inaudible] 798 00:36:10,146 --> 00:36:11,546 I would say this is the key. 799 00:36:11,546 --> 00:36:16,296 If we can get people who know this stuff, but can grok this stuff, 800 00:36:16,296 --> 00:36:17,918 it gets them to this stuff. 801 00:36:17,918 --> 00:36:19,668 Does that make sense? Yeah. 802 00:36:19,668 --> 00:36:24,155 So, my vision for the next few years, we can get better training 803 00:36:24,155 --> 00:36:27,516 in our community to get people from batch processing, 804 00:36:27,516 --> 00:36:29,847 which is pretty much what this is, to kind of intelligent-- 805 00:36:29,847 --> 00:36:32,726 I wouldn't say intelligent, but more sophisticated programming, 806 00:36:32,726 --> 00:36:35,486 that would be a great thing, because we're seeing this is a bottleneck 807 00:36:35,486 --> 00:36:37,846 to a lot of the stuff that I just showed you up there. 808 00:36:37,846 --> 00:36:39,086 Yes. 809 00:36:39,135 --> 00:36:42,105 (man 9) [inaudible] 810 00:36:42,105 --> 00:36:45,984 Okay, wait, you want to show me something, show me after the session, does that work? 811 00:36:45,984 --> 00:36:47,584 Okay. Yes, Megan. 812 00:36:47,584 --> 00:36:50,804 - (Megan) Can I have a microphone? - Microphone, yes. 813 00:36:50,834 --> 00:36:54,528 - (Megan) [inaudible] - Yeah. 814 00:36:55,316 --> 00:36:56,636 And we have lunch after this, 815 00:36:56,636 --> 00:36:59,006 so if you want to stay a little bit later, that's fine, too. 816 00:36:59,006 --> 00:37:01,009 - [inaudible] - We're already at lunch break? Okay. 817 00:37:01,009 --> 00:37:03,094 (Megan) So, thank you so much to both you and Richard 818 00:37:03,094 --> 00:37:04,799 for all the work you're doing at the Met. 819 00:37:04,799 --> 00:37:07,027 And I know that you're very well supported in that. 820 00:37:07,027 --> 00:37:09,100 (mic feedback) I don't know what happened there. 821 00:37:09,100 --> 00:37:15,071 For the average volunteer community, how do you balance doing the work 822 00:37:15,071 --> 00:37:19,124 for the cultural heritage organization versus training the professionals 823 00:37:19,124 --> 00:37:21,792 that are there to do that work? 824 00:37:21,792 --> 00:37:24,412 Where do you find the balance in terms of labor? 825 00:37:25,672 --> 00:37:26,962 It's a good question. 826 00:37:27,397 --> 00:37:30,467 (Megan) One that really comes up, I think, with this as well. 827 00:37:30,467 --> 00:37:33,158 - With this? - (Megan) Yeah, and with building out... 828 00:37:33,187 --> 00:37:36,277 where we put efforts in terms of building out competencies. 829 00:37:36,333 --> 00:37:39,398 Yeah. I don't have a great answer for you, but it's a great question. 830 00:37:39,398 --> 00:37:40,658 (Megan) Cool. 831 00:37:40,658 --> 00:37:43,580 (Richard) There are a lot of tech people at [inaudible] 832 00:37:43,580 --> 00:37:46,158 who understand this side of the graph, and don't understand it-- 833 00:37:46,158 --> 00:37:48,878 the people in [inaudible] who understand this part of the graph, 834 00:37:48,878 --> 00:37:50,658 and don't understand this part of the graph. 835 00:37:50,658 --> 00:37:53,928 So, the more we can get Wikimedians who understand some of this, 836 00:37:53,928 --> 00:37:57,748 with some tech professionals at museums who understand this, 837 00:37:57,748 --> 00:37:59,408 then that makes it a little bit easier-- 838 00:37:59,408 --> 00:38:01,968 and hopefully, as well as training up Wikimedians, 839 00:38:01,968 --> 00:38:05,587 we can also provide some guidance and let the museums [inaudible] 840 00:38:05,587 --> 00:38:07,438 to take care of themselves in the [inaudible]. 841 00:38:07,496 --> 00:38:09,285 Yeah, that's a good point. 842 00:38:09,285 --> 00:38:11,961 How many people here know what regular expressions are? 843 00:38:11,961 --> 00:38:13,216 Raise your hand. 844 00:38:13,216 --> 00:38:17,397 Okay, so how many people are comfortable specifying a regular expression? 845 00:38:17,397 --> 00:38:19,267 So, yeah, we need more work here. 846 00:38:19,267 --> 00:38:20,771 (laughter) 847 00:38:20,771 --> 00:38:23,199 (man 10) I want to suggest that-- 848 00:38:24,648 --> 00:38:28,575 maybe not getting every Wikidata practitioner, 849 00:38:28,575 --> 00:38:33,607 or institution practitioner to embrace Python programming is the way. 850 00:38:33,717 --> 00:38:39,657 But as Richard just said, finding more bridging people-- people like you-- 851 00:38:39,657 --> 00:38:41,137 who speak both-- 852 00:38:41,137 --> 00:38:44,042 who speak Python, but also speak GLAM institution-- 853 00:38:44,812 --> 00:38:48,392 to help the GLAM's own technical department, which may not-- 854 00:38:49,233 --> 00:38:51,951 they know Python, they don't know this stuff. 855 00:38:52,640 --> 00:38:54,186 That's, I think, what's needed. 856 00:38:54,235 --> 00:38:59,034 People like you, people like me, people who speak both of these jargons 857 00:38:59,034 --> 00:39:01,835 to help make the connections, to document the connections. 858 00:39:01,835 --> 00:39:03,344 You're already doing this, of course. 859 00:39:03,344 --> 00:39:05,534 You share your code, et cetera, you're doing tutorials. 860 00:39:05,534 --> 00:39:07,044 But we need more of this. 861 00:39:07,044 --> 00:39:09,223 I'm not sure we need to make everyone programmers. 862 00:39:09,223 --> 00:39:10,612 We already have programmers. 863 00:39:10,612 --> 00:39:12,332 We need to make them understand 864 00:39:12,332 --> 00:39:14,612 the non-programming material they need to-- 865 00:39:14,612 --> 00:39:15,782 I think that's a great point. 866 00:39:15,782 --> 00:39:18,062 We don't need to make everyone highly proficient in this, 867 00:39:18,062 --> 00:39:20,312 but we do need people knowledgeable to say that, 868 00:39:20,312 --> 00:39:23,004 "Yeah, we can ingest 400 thousand rows and do something with it." 869 00:39:23,004 --> 00:39:25,284 Whereas, if you're stuck on this side, you're like, 870 00:39:25,284 --> 00:39:27,444 "400 thousand rows sounds really big and scary." 871 00:39:27,444 --> 00:39:30,364 But if you know that it's possible, you're like, "No problem." 872 00:39:30,364 --> 00:39:32,284 400 thousand is not a problem. 873 00:39:32,284 --> 00:39:35,414 (woman 5) I would just like to chime in a little bit in that 874 00:39:35,414 --> 00:39:39,674 that there may be countries and areas where you will not find a GLAM 875 00:39:39,674 --> 00:39:44,404 with any skilled technologists. 876 00:39:44,434 --> 00:39:47,834 So, you will have to invent something there in the middle. 877 00:39:48,502 --> 00:39:49,634 That's a good point. 878 00:39:49,778 --> 00:39:51,378 Any questions? Sandra. 879 00:39:55,648 --> 00:39:57,807 (Sandra) Yeah, I just wanted to add to this discussion. 880 00:39:57,807 --> 00:40:01,656 Actually, I've seen some very good cases where it indeed has been successful 881 00:40:01,656 --> 00:40:05,476 to train GLAM professionals to work with this entire environment, 882 00:40:05,476 --> 00:40:09,276 and where they've done fantastic jobs, also at small institutions. 883 00:40:10,046 --> 00:40:14,986 It also requires that you have chapters or volunteers that can train the staff. 884 00:40:15,163 --> 00:40:17,513 So, it's really like a bigger environment. 885 00:40:18,192 --> 00:40:22,044 But I think that's a model that if we can manage to make that grow, 886 00:40:22,044 --> 00:40:24,263 it can scale very well, I think. 887 00:40:24,673 --> 00:40:25,693 Good point. 888 00:40:25,693 --> 00:40:30,896 (woman 5) [inaudible] 889 00:40:32,029 --> 00:40:34,217 Sorry, just noting that we don't have 890 00:40:34,217 --> 00:40:37,820 any structured trainings right now for that. 891 00:40:38,209 --> 00:40:42,498 We might want to develop those, and that would be helpful. 892 00:40:42,608 --> 00:40:44,408 We have been doing that for education 893 00:40:44,408 --> 00:40:47,488 in terms of teaching people Wikipedia and Wikidata. 894 00:40:47,488 --> 00:40:50,008 It's just a matter of taking it one step further. 895 00:40:50,528 --> 00:40:52,168 Right. Stacy. 896 00:40:54,518 --> 00:40:56,988 (Stacy) Well, I'd just like to say that a lot of professionals 897 00:40:56,988 --> 00:41:02,006 who work in this area of metadata have all these skills already. 898 00:41:02,006 --> 00:41:08,966 So, I think part of it is just proving the value to these organizations, 899 00:41:08,966 --> 00:41:13,126 but then it's also tapping into professional associations who can-- 900 00:41:13,195 --> 00:41:16,745 or ways of collaborating within those professional communities 901 00:41:16,745 --> 00:41:21,374 to build this work, and the documentation on how to do things 902 00:41:21,374 --> 00:41:23,234 is really, really important, 903 00:41:23,234 --> 00:41:27,454 because I'm not sure about the role of depending on volunteers, 904 00:41:27,454 --> 00:41:32,294 when some of this work is actually work GLAM organizations do anyway. 905 00:41:32,395 --> 00:41:35,355 We manage our collections in a variety of ways through metadata, 906 00:41:35,355 --> 00:41:37,126 and this is actually one more way. 907 00:41:37,126 --> 00:41:40,495 So, should we also not be thinking about ways to integrate this work 908 00:41:40,495 --> 00:41:43,946 into a GLAM professional's regular job. 909 00:41:43,985 --> 00:41:46,125 And then that way you're generating-- 910 00:41:46,125 --> 00:41:48,885 and when you think about sustainability and scalability, 911 00:41:48,885 --> 00:41:53,426 that's the real trick to making this sustainable and both scalable, 912 00:41:53,745 --> 00:41:58,695 is that once this is the regular work of GLAM folks, 913 00:41:58,695 --> 00:42:00,885 we're not worried as much about this part, 914 00:42:00,885 --> 00:42:03,503 because it's just turning that little switch to get this 915 00:42:03,503 --> 00:42:05,763 to be a part of that work. 916 00:42:05,863 --> 00:42:08,063 Right. Good point. [Shani]?. 917 00:42:11,603 --> 00:42:13,229 (Shani) You're absolutely right. 918 00:42:13,229 --> 00:42:16,122 But I want to echo what you said before. 919 00:42:16,152 --> 00:42:21,566 And yes, Susana-- this might work for more privileged countries 920 00:42:22,082 --> 00:42:25,042 where they have money, they have people doing it. 921 00:42:25,682 --> 00:42:29,042 It doesn't work for places that are still developing, 922 00:42:29,042 --> 00:42:32,282 that don't have resources-- they don't have all of that. 923 00:42:32,592 --> 00:42:36,832 And they can barely do what they need to do. 924 00:42:36,886 --> 00:42:41,066 So, it's difficult for them, and then, the community is really helpful. 925 00:42:41,906 --> 00:42:45,495 These are the cases where the community can have a huge impact actually, 926 00:42:45,985 --> 00:42:50,349 working with the GLAMS, because they can't do it all 927 00:42:50,979 --> 00:42:52,296 as part of their jobs. 928 00:42:52,834 --> 00:42:55,034 So, we need to think about that as well. 929 00:42:55,053 --> 00:42:58,223 And having these examples, actually, is hugely important, 930 00:42:58,223 --> 00:43:00,763 because it's helping to still convince them, 931 00:43:00,763 --> 00:43:05,842 that it's critical to invest in it and to work with volunteers, 932 00:43:05,842 --> 00:43:09,082 so, with non-professionals of sorts, to get there. 933 00:43:10,003 --> 00:43:12,650 I can imagine a future where you don't have to know all this code. 934 00:43:12,650 --> 00:43:14,379 These would just be kind of like Lego bricks 935 00:43:14,379 --> 00:43:15,801 you can slap together, 936 00:43:15,801 --> 00:43:18,761 saying, "Here's my database. Here's the crosswalk. Here's Wikidata," 937 00:43:18,761 --> 00:43:21,311 and just put it together, and you don't have to even code, 938 00:43:21,311 --> 00:43:23,835 you just have to make sure the databases are in the right place. 939 00:43:23,835 --> 00:43:25,375 Yep. Okay. 940 00:43:26,747 --> 00:43:28,705 (man 11) Sorry. [inaudible] 941 00:43:28,705 --> 00:43:34,025 I think if I would have done this project, I'd probably have done it the same way. 942 00:43:34,025 --> 00:43:36,146 So, I think that's maybe a good sign. 943 00:43:36,146 --> 00:43:39,725 I was wondering how did the whole financing work of this project? 944 00:43:39,725 --> 00:43:40,840 How did the-- I'm sorry? 945 00:43:40,840 --> 00:43:43,255 The financing of this project work. 946 00:43:43,795 --> 00:43:45,755 - The financing? - Yeah, the money. 947 00:43:46,425 --> 00:43:47,505 That's a good question. 948 00:43:47,505 --> 00:43:49,185 Well, so, there are different parts of it. 949 00:43:49,185 --> 00:43:53,073 So, the Knight grant funded the Wiki Art Depiction Explorer. 950 00:43:53,198 --> 00:43:56,928 But I, for the last, maybe what-- nine months-- 951 00:43:56,928 --> 00:43:58,768 I've been their Wikimedia strategist. 952 00:43:58,768 --> 00:44:01,618 So, I've been on since February of this year. 953 00:44:01,618 --> 00:44:04,818 So, that's pretty much they're paying for my time to help with their-- 954 00:44:04,818 --> 00:44:07,968 not only the upload of their collections, but developing these tools, as well. 955 00:44:07,968 --> 00:44:11,659 - (Richard) So the Met's paying you? - Yeah, that's right. 956 00:44:11,762 --> 00:44:14,894 (Richard) The grant, at least part of it has come from-- 957 00:44:14,894 --> 00:44:16,959 There was a grant for Open Access. 958 00:44:16,959 --> 00:44:20,176 And this is under that campaign and with the digital department. 959 00:44:20,176 --> 00:44:24,297 So, working as contractors throughout the Open Access campaign for the Met. 960 00:44:27,948 --> 00:44:30,116 (man 12) I'm sorry. I guess before you were hired, 961 00:44:30,116 --> 00:44:31,313 and before there was a grant, 962 00:44:31,313 --> 00:44:33,780 there was probably a lot of volunteer work done to make sure-- 963 00:44:33,780 --> 00:44:35,303 Richard did a lot of work before that. 964 00:44:35,303 --> 00:44:37,219 And then, Wikimedia New York did a lot of work, 965 00:44:37,219 --> 00:44:38,927 but it was kind of in bursts. 966 00:44:38,927 --> 00:44:41,045 It wasn't as comprehensive as we're talking about now 967 00:44:41,045 --> 00:44:45,915 in terms of having-- making sure those two layers are complete 968 00:44:45,915 --> 00:44:47,310 in Wikidata. 969 00:44:48,640 --> 00:44:50,543 Alright, yeah. I think that's it. 970 00:44:50,543 --> 00:44:53,843 So, I'm happy to talk after lunch, or after the break, if you want. 971 00:44:54,683 --> 00:44:56,223 Okay. Thank you. 972 00:44:56,223 --> 00:44:59,197 (applause)