Return to Video

cdn.media.ccc.de/.../wikidatacon2019-1077-eng-Wikidata_Commons_contribution_strategies_for_GLAM_organizations_hd.mp4

  • 0:07 - 0:08
    Thanks folks.
  • 0:10 - 0:12
    As I mentioned before,
    you can load up the slides here
  • 0:12 - 0:17
    by either the QR code or the short URL,
    which is wikidatacon..., this is bit.ly,
  • 0:17 - 0:20
    wikidatacon19glamstrategies.
  • 0:20 - 0:22
    And the slides are also
    on the program page
  • 0:22 - 0:25
    on the WikidataCon site.
  • 0:25 - 0:27
    And then, there's also an Etherpad here
    that you can click on.
  • 0:27 - 0:29
    So, I'll be talking about a lot of things.
  • 0:29 - 0:32
    that you might have heard about it
    at Wikimania, if you were there,
  • 0:32 - 0:34
    but we are going to go
    into a lot more implementation details.
  • 0:34 - 0:36
    Because we're at WikidataCon,
    we can dive deeper
  • 0:36 - 0:38
    into the Wikidata and technical aspects.
  • 0:38 - 0:42
    But Richard and myself, we are working
    at the Met Museum right now
  • 0:42 - 0:43
    and their Open Access.
  • 0:43 - 0:45
    If you didn't know,
    about two plus years ago,
  • 0:45 - 0:47
    entering to the third year,
  • 0:47 - 0:49
    there's been an Open Access
    strategy at the Met,
  • 0:49 - 0:53
    where they're releasing their images
    under CC0 license and their metadata.
  • 0:53 - 0:55
    And one of the things
    they brought us on to do
  • 0:55 - 0:58
    is what things could we imagine doing
    with this Open Access content.
  • 0:58 - 1:00
    So, we're going to talk
    a little bit about that
  • 1:00 - 1:03
    in terms of the experiments
    that we've been running,
  • 1:03 - 1:04
    and we'd love to hear your feedback.
  • 1:04 - 1:07
    So, I hope to talk about 20 minutes,
    and then hope to get some conversation
  • 1:07 - 1:10
    with you folks, since we have
    a lot of knowledge in this room.
  • 1:10 - 1:12
    This is the announcement,
    and actually the one-year anniversary,
  • 1:12 - 1:16
    where Katherine Maher was actually there,
    at the Met to talk about that anniversary.
  • 1:16 - 1:19
    So, one of the things that's challenging
    I think for a lot of folks
  • 1:19 - 1:21
    is how do you explain Wikidata,
  • 1:21 - 1:24
    and this GLAM
    contribution strategy to Wikidata
  • 1:24 - 1:27
    to C-level folks at an organization.
  • 1:27 - 1:31
    We can talk about it with data scientists,
    Wikimedians, librarians, maybe curators,
  • 1:31 - 1:34
    but when it comes to talking about this
    with a director of a museum,
  • 1:34 - 1:37
    or a director of a library,
    what does it actually--
  • 1:37 - 1:38
    how does it resonate with them?
  • 1:38 - 1:41
    So, one way that we actually talked
    about that I think makes sense,
  • 1:41 - 1:44
    is everyone knows about Wikipedia,
  • 1:44 - 1:48
    and for the English language edition,
  • 1:48 - 1:50
    at least, we're talking
    about 6 million articles.
  • 1:50 - 1:52
    And it sounds like a lot,
    but if you think about it,
  • 1:52 - 1:54
    Wikipedia is not really the sum
    of all human knowledge,
  • 1:54 - 2:00
    it's the sum of all reliably sourced,
    mostly western knowledge.
  • 2:00 - 2:02
    And there's a lot of stuff out there.
  • 2:02 - 2:04
    We have a lot of stuff
    in Commons already--
  • 2:04 - 2:07
    56 million media files going up
    every single day--
  • 2:07 - 2:11
    but these are very...
    a different type of standard
  • 2:11 - 2:13
    to what goes into Wikimedia Commons.
  • 2:13 - 2:16
    And the way that we have described
    Wikidata to GLAM professionals,
  • 2:16 - 2:18
    and especially the C levels,
  • 2:18 - 2:22
    is that what if we could have a repository
    that has a notability bar
  • 2:22 - 2:24
    that is not as high as Wikipedia.
  • 2:24 - 2:26
    So, we want all these paintings,
  • 2:26 - 2:28
    but not every painting
    necessarily needs an article.
  • 2:29 - 2:30
    Wikipedia is held back by the fact
  • 2:30 - 2:33
    that you need to have
    language editions of Wikipedia.
  • 2:33 - 2:37
    So, can we store the famous thing--
    things, not strings.
  • 2:37 - 2:41
    Can we be object oriented
    and not really lexical oriented?
  • 2:41 - 2:42
    And can we store this in a database
  • 2:42 - 2:45
    that stores facts, figures,
    and relationships?
  • 2:45 - 2:46
    And that's pretty much
    what Wikidata does.
  • 2:47 - 2:51
    And Wikidata is also a universal
    kind of crosswalk database to links
  • 2:51 - 2:52
    to other collections out there.
  • 2:52 - 2:55
    So, we think this really resonates
    with folks when you're talking about
  • 2:55 - 2:59
    what is the value of Wikidata compared
    to what they're normally familiar with,
  • 2:59 - 3:00
    which is just Wikipedia.
  • 3:01 - 3:03
    Alright, so what are the benefits?
  • 3:03 - 3:05
    You're interlinking
    your collections with others.
  • 3:05 - 3:08
    So, unfortunately, I apologize
    to librarians here,
  • 3:08 - 3:09
    I'll be talking mostly about museums,
  • 3:09 - 3:12
    but a lot of this also is valid
    also for libraries.
  • 3:12 - 3:16
    But you're basically connecting
    your collection with the global collection
  • 3:16 - 3:18
    of linked open data collections.
  • 3:19 - 3:22
    You can also receive enriched
    and improved metadata back
  • 3:22 - 3:26
    after contributing and linking
    your collections to the world.
  • 3:26 - 3:28
    And there are some pretty neat
    interactive multimedia applications
  • 3:28 - 3:31
    that you get-- I don't want
    to say for free,
  • 3:31 - 3:34
    but your collection in Wikidata
    allows you to visualize things
  • 3:34 - 3:35
    that you've never seen before.
  • 3:35 - 3:37
    We'll show you some examples.
  • 3:37 - 3:40
    And so, how do you convey this
    to GLAM professionals effectively?
  • 3:40 - 3:42
    Well, I usually like to start
    with storytelling,
  • 3:42 - 3:44
    and not technical explanations.
  • 3:44 - 3:46
    Okay, so if everyone here
    has a cell phone,
  • 3:46 - 3:50
    especially if you have an iPhone,
    I want you to scan this QR code
  • 3:50 - 3:52
    and bring up the URL
    that it comes up with.
  • 3:52 - 3:53
    Or if you don't have a QR scanner,
  • 3:53 - 3:59
    just type in w.wiki/Aij in a web browser.
  • 4:00 - 4:02
    So go ahead and scan that.
  • 4:03 - 4:05
    And what comes up?
  • 4:07 - 4:09
    Does anyone see a knowledge graph
    pop up on your screen?
  • 4:10 - 4:11
    So, for folks here in WikidataCon,
  • 4:11 - 4:13
    this is probably not
    revolutionary for you.
  • 4:13 - 4:16
    But what it does, it does a SPARQL query
    with these objects,
  • 4:16 - 4:19
    and it shows the linkages between them.
  • 4:19 - 4:21
    And you can actually drag them
    around the screen.
  • 4:21 - 4:22
    You can actually click on nodes.
  • 4:22 - 4:24
    If you're [inaudible] in a mobile,
    it will expand that--
  • 4:24 - 4:28
    you can actually start to surf
    through Wikidata this way.
  • 4:28 - 4:30
    So, for Wikidata veterans
    this is pretty cool.
  • 4:30 - 4:31
    One shot, you get this.
  • 4:31 - 4:33
    For a lot folks who have never seen
    Wikidata before,
  • 4:33 - 4:36
    this is a revolutionary moment for them.
  • 4:36 - 4:39
    To actually hand-manipulate
    a knowledge graph,
  • 4:39 - 4:42
    and to start surfing through Wikidata
    without having to know SPARQL,
  • 4:42 - 4:44
    without having to know what a Q item is,
  • 4:44 - 4:46
    without having to know
    what a property proposal is,
  • 4:46 - 4:49
    they can suddenly start seeing
    connections in a way that is magical.
  • 4:49 - 4:50
    Hey, I see [Jacob's] here.
  • 4:50 - 4:52
    Jacob's been using
    some of this code, as well.
  • 4:52 - 4:54
    So, this is some code
    that we'll talk about later on
  • 4:54 - 4:57
    that allows you to create
    these visualizations in Wikidata.
  • 4:57 - 4:59
    And we've really seen this
    turn a lot of heads
  • 4:59 - 5:01
    who have really
    never gotten Wikidata before.
  • 5:01 - 5:05
    But after seeing these interactive
    knowledge graphs, they get it.
  • 5:05 - 5:06
    They understand the power of this.
  • 5:06 - 5:08
    And especially this example here,
  • 5:08 - 5:11
    this was a really big eye-opener
    for the folks at the Met,
  • 5:11 - 5:15
    because this is the artifact
    that is the center of this graph,
  • 5:15 - 5:18
    right there, the Portrait of Madame X,
    a very famous portrait.
  • 5:18 - 5:21
    And they did not even know
    that this was the inspiration
  • 5:21 - 5:25
    for the black dress that Rita Hayworth
    wore in the movie Gilda.
  • 5:25 - 5:27
    So, just by seeing this graph, they said,
  • 5:27 - 5:29
    "Wait a minute. This is one
    of our most visited portraits.
  • 5:29 - 5:32
    I didn't know that this was true."
  • 5:32 - 5:35
    And there's actually two other books
    published about that painting.
  • 5:35 - 5:39
    You can see all these things,
    not just within the realm of GLAM,
  • 5:39 - 5:41
    but it extends to fashion,
    it extends to literature.
  • 5:41 - 5:43
    You're starting to see
    the global connections
  • 5:43 - 5:47
    that your artworks have,
    or your collections have via Wikidata.
  • 5:49 - 5:50
    So, how do we do this?
  • 5:51 - 5:53
    If you can remember nothing else
    from this presentation,
  • 5:53 - 5:56
    this one page is your one-stop shopping.
  • 5:56 - 5:59
    Now, fortunately, you don't have
    to memorize all this.
  • 5:59 - 6:03
    It's actually right here at
    Wikidata:Linked_open_data_workflow.
  • 6:04 - 6:06
    So, we'll be talking about some
    of these different phases
  • 6:06 - 6:11
    of how you first prepare,
    reconcile, and examine
  • 6:11 - 6:14
    what the GLAM organization might have
    and what does Wikidata have.
  • 6:14 - 6:15
    And then, what are the tools
  • 6:15 - 6:19
    to actually ingest
    and correct or enrich that
  • 6:19 - 6:20
    once it's in Wikidata.
  • 6:20 - 6:23
    And then, what are some of ways
    to reuse that content,
  • 6:23 - 6:25
    or to report and create
    new things out of it.
  • 6:25 - 6:31
    So, this is the simpler version of a chart
    that Sandra and the GLAM folks
  • 6:31 - 6:33
    at the foundation have created.
  • 6:33 - 6:36
    But this is trying
    to sum up, in one shot--
  • 6:36 - 6:38
    because we know how hard things
    are to find in Wikidata--
  • 6:38 - 6:42
    to find in one shot all the different
    tools you should pay attention to
  • 6:42 - 6:43
    as a GLAM organization.
  • 6:45 - 6:51
    So, just using the Met as an example,
    we started with what is the ideal object
  • 6:51 - 6:53
    that we have in Wikidata
    that comes from the Met?
  • 6:53 - 6:56
    This is a typical shot of a Wikidata item,
  • 6:56 - 6:57
    in the mobile mode there.
  • 6:57 - 6:59
    And this is one
    of the more famous paintings
  • 6:59 - 7:01
    we used as a model, here.
  • 7:01 - 7:03
    We have the label,
    description, and aliases.
  • 7:04 - 7:05
    And then, we found out,
  • 7:05 - 7:07
    "What are the core statements
    that we wanted?"
  • 7:07 - 7:10
    We wanted instance of, image,
    inception, collection.
  • 7:10 - 7:13
    And what are some other properties
    we would like if we had it?
  • 7:13 - 7:16
    Depiction information,
    material used, things like that.
  • 7:17 - 7:19
    We actually do have an identifier.
  • 7:19 - 7:22
    The Met object ID is P3634.
  • 7:22 - 7:25
    So, for some organizations,
    you might want to propose
  • 7:25 - 7:29
    a property just to track your items
    using an object ID.
  • 7:29 - 7:32
    And then, for the Met,
    just trying to circumscribe
  • 7:32 - 7:36
    what objects do we want to upload
    and keep in Wikidata--
  • 7:36 - 7:39
    the thing that we first identified
    were collection highlights.
  • 7:39 - 7:44
    These are like a hand-selected set
    of 1,500 to 1,000 items
  • 7:44 - 7:49
    that were going to be given priority
    to upload to Wikidata.
  • 7:49 - 7:52
    So, Richard and the crew
    out of Wikimedia in New York
  • 7:52 - 7:53
    did a lot of this early work.
  • 7:53 - 7:56
    And then, now, we're systematically
    going through to make sure
  • 7:56 - 7:57
    they're all complete.
  • 7:57 - 7:58
    And there's a secondary set
  • 7:58 - 8:01
    called the Heilbrunn Timeline
    of Art History-- about 8,000 items
  • 8:01 - 8:07
    that are seminal pieces of work,
    artists' works throughout history.
  • 8:07 - 8:09
    And there are about 8,000
    that the Met has identified,
  • 8:09 - 8:12
    and we're also putting that
    on Wikidata, as well,
  • 8:12 - 8:13
    using a different destination.
  • 8:13 - 8:16
    Here, described by source--
    Heilbrunn Timeline of Art History.
  • 8:16 - 8:20
    So, the collection highlight
    is denoted here as collection--
  • 8:20 - 8:21
    Metropolitan Museum of Art,
  • 8:21 - 8:23
    subject has role collection highlight.
  • 8:23 - 8:27
    And then, these 8,000
    or so are like that in Wikidata.
  • 8:30 - 8:34
    I couldn't show this chart at Wikimania,
    because it's too complicated.
  • 8:34 - 8:35
    But WikidataCon, we can.
  • 8:35 - 8:39
    So, this is something that is really hard
    to answer sometimes.
  • 8:39 - 8:42
    What makes something
    in Wikidata from the Met,
  • 8:42 - 8:45
    or from the New York Public Library,
    or from your organization?
  • 8:45 - 8:48
    And the answer is not easy.
    It's: depends.
  • 8:48 - 8:50
    It's complicated, it can be multi-factor.
  • 8:50 - 8:53
    So, you could say, "Well, if I had
    an object ID in Wikidata,
  • 8:53 - 8:55
    that is an embed object."
  • 8:55 - 8:57
    But maybe someone didn't enter that.
  • 8:57 - 9:00
    Maybe they only put in
    Collection: Met which is P195,
  • 9:00 - 9:03
    or they put in the accession number,
  • 9:03 - 9:07
    and they put collection as the qualifier
    to that accession number.
  • 9:07 - 9:11
    So, there's actually, one, two, three
    different ways to try to find Met objects.
  • 9:11 - 9:14
    And probably the best way to do it
    is through a union like this.
  • 9:14 - 9:16
    So, you combine all three,
    and you come back,
  • 9:16 - 9:18
    and you make a list out of it.
  • 9:18 - 9:21
    So unfortunately, there is
    no one clean query
  • 9:21 - 9:24
    that'll guarantee you all the Met objects.
  • 9:24 - 9:28
    This is probably
    the best approach for this.
  • 9:28 - 9:29
    And for some institutions,
  • 9:29 - 9:33
    they're probably doing
    something similar to that right now.
  • 9:33 - 9:36
    Alright, so example here,
    is that what you see here
  • 9:36 - 9:40
    manifests itself differently--
    not differently, but as this in a query,
  • 9:40 - 9:41
    which can get pretty complex.
  • 9:41 - 9:43
    So, if we're looking
    for all the collection highlights,
  • 9:43 - 9:48
    we'd break this out into the statement
    and then the qualifier as this:
  • 9:48 - 9:50
    subject has role collection highlight.
  • 9:50 - 9:51
    So, that's one way that we sort out
  • 9:51 - 9:54
    some of these special
    designations in Wikidata.
  • 9:55 - 9:59
    So, the summary is,
    representing "The Met" is multifaceted,
  • 9:59 - 10:02
    and needs to balance simplicity
    and findability.
  • 10:02 - 10:05
    How many people here have heard
    of Sum of All Paintings as a project?
  • 10:05 - 10:07
    Ooh, God, good, a lot of you!
  • 10:07 - 10:09
    So, it's probably one
    of the most active ones
  • 10:09 - 10:11
    that deals with these issues.
  • 10:11 - 10:17
    So, we always debate whether we should
    model things super-accurately,
  • 10:17 - 10:20
    or should you model things
    so that they're findable.
  • 10:20 - 10:22
    These are kind of at odds with each other.
  • 10:22 - 10:24
    So, we usually prefer findability.
  • 10:24 - 10:27
    It's no good if it's perfectly modeled,
    but no one can ever find it,
  • 10:27 - 10:30
    because it's so strict
    in terms of how it's defined at Wikidata.
  • 10:30 - 10:32
    And then, we have some challenges.
  • 10:32 - 10:35
    Multiple artifacts might be tied
    to one object ID,
  • 10:35 - 10:37
    which might be different in Wikidata.
  • 10:37 - 10:42
    And then, mapping the Met classification
    to instances has some complex cases.
  • 10:42 - 10:44
    So, the way that the Met classifies things
  • 10:44 - 10:47
    doesn't always fit
    with how Wikidata classifies things.
  • 10:47 - 10:50
    So, we show you some examples here
    of how this works.
  • 10:50 - 10:54
    So, this is a great example
    of using a Python library
  • 10:54 - 10:56
    to actually ingest
    what we know from the Met,
  • 10:56 - 10:58
    and then try to sort out what they have.
  • 10:58 - 11:00
    So, this is just for textiles.
  • 11:00 - 11:02
    You can see that they got
    a lot of detail here
  • 11:02 - 11:05
    in terms of woven textiles, laces,
    printed, trimmings, velvets.
  • 11:05 - 11:08
    We first looked into this in Wikidata.
  • 11:08 - 11:10
    We did not have
    this level of detail in Wikidata.
  • 11:10 - 11:12
    We still don't have all this resolved.
  • 11:12 - 11:15
    You can see that this
    is really complex here.
  • 11:15 - 11:18
    Anonymous is just not anonymous
    for a lot of databases.
  • 11:18 - 11:20
    There's a lot of qualifications--
  • 11:20 - 11:23
    whether the nationality, or the century.
  • 11:23 - 11:26
    So, trying to map all this to Wikidata
    can be complex, as well.
  • 11:26 - 11:30
    And then, this shows you
    that of all the works in the Met,
  • 11:30 - 11:34
    about 46% are open access right now.
  • 11:34 - 11:39
    So, we still have about just over 50%
    that are not CC0 yet.
  • 11:40 - 11:43
    (man) All the objects in the Met,
    or all objects on display?
  • 11:43 - 11:46
    (Andrew) It's weird. It's not on display.
  • 11:46 - 11:48
    But it's not all objects either.
  • 11:48 - 11:52
    It's about 400 to 500 thousand objects
    in their database at this point.
  • 11:52 - 11:54
    So, somewhere in between.
  • 11:55 - 11:58
    So, starting points.
    This is always a hard one.
  • 11:58 - 12:04
    We just had this discussion
    on the Facebook group recently
  • 12:04 - 12:05
    about where do people go
  • 12:05 - 12:08
    to find out where the modeling
    should look like for a certain thing.
  • 12:08 - 12:09
    It's not easy.
  • 12:09 - 12:12
    So, normally, what we have to do
    is just point people to,
  • 12:12 - 12:15
    I don't know, some project
    that does it well now?
  • 12:15 - 12:17
    So, it's not a satisfying answer,
  • 12:17 - 12:20
    but we usually tell folks
    to start at things like visual arts,
  • 12:20 - 12:22
    or Sum of All Paintings
    does it pretty well,
  • 12:22 - 12:26
    or just go to the project chat to find out
    where some of these things are.
  • 12:26 - 12:27
    We need better solutions for this.
  • 12:27 - 12:31
    This is just a basic flow
    of what we're doing with the Met here.
  • 12:31 - 12:33
    We're basically taking
    their CSV, and their API,
  • 12:33 - 12:36
    and we're consuming it
    into a Python data frame.
  • 12:36 - 12:38
    We're taking the SPARQL code--
  • 12:38 - 12:40
    the one that you saw
    before, this super union--
  • 12:40 - 12:44
    bring that in, and we're doing
    a bi-directional diff,
  • 12:44 - 12:46
    and then seeing what new things
    have been added here,
  • 12:46 - 12:48
    what things have been subtracted there,
  • 12:48 - 12:52
    and we're actually making those changes
    either through QuickStatements,
  • 12:52 - 12:53
    or we're doing it through Pywikibot.
  • 12:53 - 12:56
    So, directly editing Wikidata.
  • 12:56 - 12:59
    So, this is the big slide
    I also couldn't show at Wikimania,
  • 12:59 - 13:01
    because it would have flummoxed everyone.
  • 13:01 - 13:05
    So, this is a great example
    of how we start with the Met database,
  • 13:05 - 13:07
    we have this crosswalk database,
  • 13:07 - 13:09
    and then we generate
    the changes in Wikidata.
  • 13:09 - 13:13
    The way this works is this is an example
    of one record from the Met.
  • 13:13 - 13:16
    This is an evening dress-- we're working
    with the Costume Institute recently,
  • 13:16 - 13:18
    the one that puts on the Met Gala.
  • 13:18 - 13:20
    So, we have one evening dress
    here, by Valentina.
  • 13:20 - 13:22
    Here's a date, accession number.
  • 13:22 - 13:25
    So, these things can be put
    into Wikidata directly.
  • 13:25 - 13:28
    A field equals the date, accession number.
  • 13:28 - 13:29
    But what do we do with things like this?
  • 13:29 - 13:34
    This is an object name, which is basically
    like a classification of what it is,
  • 13:34 - 13:36
    like an instance of for the Met.
  • 13:36 - 13:37
    And the designer's Valentina.
  • 13:37 - 13:42
    So, what we do is we take these
    and we run all the unique object names
  • 13:42 - 13:44
    and all the unique designers
    through OpenRefine.
  • 13:44 - 13:47
    So, we get maybe 60% matches
    if we're lucky.
  • 13:47 - 13:48
    We put that into a spreadsheet.
  • 13:48 - 13:53
    Then we ask volunteers
    or the curators at the Met
  • 13:53 - 13:55
    to help fill in this crosswalk database.
  • 13:55 - 13:57
    This is just simply Google Sheets.
  • 13:57 - 14:00
    So, we say, here are all the object names,
    the unique object names
  • 14:00 - 14:03
    that match lexically exactly
    with what's in the Met database,
  • 14:03 - 14:06
    and then you say this maps to this Q ID.
  • 14:06 - 14:09
    So, we first started
    this maybe like only about--
  • 14:09 - 14:11
    well, 60% were failed,
    some of these were blank.
  • 14:11 - 14:14
    So, we tap folks in specific groups.
  • 14:14 - 14:17
    So there's like a Wiki Loves Fashion
    little chat group that we have.
  • 14:17 - 14:20
    And folks like user PKM
    were super useful in this area.
  • 14:20 - 14:23
    So she spent a lot of time
    looking through this, and saying,
  • 14:23 - 14:25
    "Okay, Evening suit is this,
    Ewer is that."
  • 14:25 - 14:28
    So, we looked through
    and made all this mappings here.
  • 14:28 - 14:31
    And then, what happens is now,
    when we see this in the Met database,
  • 14:31 - 14:33
    we look it up in the crosswalk database,
    and we say, "Oh, yeah.
  • 14:33 - 14:36
    These are the two Q numbers
    we need to put into Wikidata."
  • 14:36 - 14:39
    And then, it generates
    the QuickStatement right there.
  • 14:39 - 14:41
    Same thing here with Designer: Valentina.
  • 14:41 - 14:44
    If Valentina matches here,
    then it gets generated
  • 14:44 - 14:46
    with that QuickStatement right there.
  • 14:46 - 14:48
    If Valentina does not exist,
    then we'll create it.
  • 14:48 - 14:51
    You can see here, Weeks--
    look at that high Q ID right there.
  • 14:51 - 14:54
    We just created that recently,
    because there was no entry before.
  • 14:54 - 14:55
    Does that makes sense to everyone?
  • 14:55 - 14:58
    - (man 2) What's the extra statement?
    - (Andrew) I'm sorry?
  • 14:58 - 15:01
    - (man 2) What's the extra statement?
    - (Andrew) Oh, the extra statement.
  • 15:01 - 15:03
    So, believe it or not, we have
    an Evening blouse, Evening dress,
  • 15:03 - 15:05
    Evening pants,
    Evening ensemble, Evening hat--
  • 15:05 - 15:09
    do we want to make a new Wikidata item
    for Evening pants,Evening everything?
  • 15:09 - 15:10
    So, we said, "No."
    We probably don't want to.
  • 15:10 - 15:14
    We'll just say, "It's a dress,
    but it's also evening wear",
  • 15:14 - 15:15
    which is what that is.
  • 15:15 - 15:17
    So, we're saying an instance
    of both things.
  • 15:18 - 15:21
    I'm not sure it's the perfect solution,
    but it's a solution at this point.
  • 15:22 - 15:23
    So, does everyone get that?
  • 15:23 - 15:26
    So, this is kind of a crosswalk database
    that we maintain here.
  • 15:26 - 15:28
    And the nice thing about it,
    it's just Google Sheets.
  • 15:28 - 15:29
    So, we can get people to help
  • 15:29 - 15:31
    that don't need to know
    anything about this database,
  • 15:31 - 15:34
    don't need to know about QuickStatements,
    don't need to know about queries.
  • 15:34 - 15:36
    They just go in and fill in the Q number.
  • 15:36 - 15:37
    Yeah.
  • 15:37 - 15:41
    (woman) So, when you copy
    object name and you find the Q ID,
  • 15:41 - 15:43
    the initial 60%
    that you mentioned as an example,
  • 15:43 - 15:45
    is that by exact match?
  • 15:46 - 15:48
    (Andrew) Well, it's through OpenRefine.
  • 15:48 - 15:52
    So, it does its best guess,
    and then we verify to make sure
  • 15:52 - 15:54
    that the OpenRefine match makes sense.
  • 15:54 - 15:56
    Yeah.
  • 15:56 - 15:58
    Does that make sense to everyone?
  • 15:58 - 16:00
    So, some folks might be doing
    some variation on this,
  • 16:00 - 16:03
    but I think the nice thing about this
    is that, by using Google Sheets,
  • 16:03 - 16:08
    we remove a lot of the complexities
    of these two areas from this.
  • 16:08 - 16:11
    And we'll show you some code
    that does this later on.
  • 16:12 - 16:15
    - (man 3) How do you generate [inaudible]?
    - (Andrew) How do you generate this?
  • 16:15 - 16:17
    - (man 3) Yes.
    - (Andrew) Python code.
  • 16:17 - 16:19
    I'll show you a line that does this.
  • 16:19 - 16:21
    But you can also go up here.
  • 16:21 - 16:25
    This is the whole Python program
    that does this, this, and that,
  • 16:25 - 16:27
    if you want to take a look at that.
  • 16:28 - 16:29
    Yes.
  • 16:29 - 16:31
    (man 4) Did you really use
    your own vocabulary,
  • 16:31 - 16:35
    or is there something [inaudible].
  • 16:35 - 16:37
    - (Andrew) This right here?
    - (man 4) Yeah.
  • 16:37 - 16:40
    (Andrew) Yeah. So, this
    is the Met's own vocabulary.
  • 16:40 - 16:43
    So, most museums use
    a system called TMS.
  • 16:43 - 16:45
    It's like their own management system.
  • 16:45 - 16:48
    So, they'll usually--
    this is the museum world--
  • 16:48 - 16:51
    they'll usually roll
    their own vocabulary for their own needs.
  • 16:51 - 16:54
    Museums are very late
    to interoperable metadata.
  • 16:54 - 16:57
    Librarians and archivists have this
    kind of as baked into them.
  • 16:57 - 16:59
    Museums are like, "Meh..."
  • 16:59 - 17:01
    Our primary goal
    is to put objects on display,
  • 17:01 - 17:04
    and if it plays well with other people,
    that's a side benefit.
  • 17:04 - 17:06
    But it's not a primary thing that they do.
  • 17:06 - 17:08
    So, that's why it's complicated
    to work with museums.
  • 17:08 - 17:11
    You need to map their vocabulary,
    which might be a mish-mash
  • 17:11 - 17:15
    of famous vocabularies,
    like Getty AAT, and other things.
  • 17:15 - 17:18
    But usually, it's to serve
    their exact needs at their museum.
  • 17:18 - 17:20
    And that's what's challenging.
  • 17:20 - 17:21
    And I see a lot of heads nodding,
  • 17:21 - 17:23
    so you've probably seen this a lot
    at these museums.
  • 17:23 - 17:25
    So, I'll move on to show you
    how this actually is done.
  • 17:25 - 17:27
    Oh, go ahead.
  • 17:27 - 17:29
    (man 5) How do you
    bring people, to collaborate,
  • 17:29 - 17:32
    and put some Q codes into your database?
  • 17:32 - 17:33
    (Andrew) How do you-- I'm sorry?
  • 17:33 - 17:35
    (man 5) How do you bring...
    collaborate people?
  • 17:35 - 17:38
    (Andrew) Ah, so for this,
    these are projects we just go to,
  • 17:39 - 17:42
    for better or for worse,
    like Facebook chat groups that we know,
  • 17:42 - 17:43
    are active in these areas.
  • 17:43 - 17:46
    Like Sum of All Paintings,
    Wiki Loves Fashion--
  • 17:46 - 17:48
    which is a group
    of maybe five or seven folks.
  • 17:49 - 17:51
    But we need a better way
    to get this out to folks
  • 17:51 - 17:52
    so we get more collaborators on this.
  • 17:52 - 17:54
    This doesn't scale well, right now.
  • 17:54 - 17:56
    But for small groups,
    it works pretty well.
  • 17:56 - 17:58
    I'm open to ideas.
  • 17:58 - 18:00
    (man 5) [inaudible]
  • 18:00 - 18:02
    (Andrew) Oh yeah. Please come on up.
  • 18:02 - 18:03
    If folks want to come up here,
  • 18:03 - 18:05
    there's a little more room
    in the aisle right here.
  • 18:06 - 18:10
    So, we are utilizing Python
    for this mostly.
  • 18:10 - 18:13
    If you don't know, there is
    a Python notebook system
  • 18:13 - 18:15
    that WMFLabs has.
  • 18:15 - 18:17
    So, you can actually go on
    and start playing with this.
  • 18:17 - 18:20
    So, it's pretty easy
    to generate a lot of stuff
  • 18:20 - 18:21
    if you know some of the code that's there.
  • 18:21 - 18:22
    [inaudible], yeah.
  • 18:22 - 18:24
    (woman 2) Why do you put everything
  • 18:24 - 18:28
    into Wikidata,
    and not into your own Wikibase?
  • 18:29 - 18:31
    (Andrew) If you're using
    your own Wikibase?
  • 18:31 - 18:34
    (woman 2) Yeah. Why don't you
    use your own Wikibase?
  • 18:34 - 18:36
    and then go to [inaudible]
  • 18:36 - 18:38
    (Andrew) That's its own ball of--
  • 18:38 - 18:42
    I don't want to maintain
    my own Wikibase at this point. (laughs)
  • 18:42 - 18:44
    If I can avoid doing
    the Wikibase maintenance,
  • 18:44 - 18:46
    I would not do it.
  • 18:47 - 18:48
    (man 6) Would you like a Wikibase?
  • 18:48 - 18:50
    (Andrew) We could. It's possible.
  • 18:50 - 18:54
    (man 7) But again,
    what they use [inaudible]
  • 18:54 - 19:00
    about 2,000, 8,000, 10,000,
    of 400,000 digital [inaudible].
  • 19:00 - 19:04
    So that's only 2.5%,
  • 19:04 - 19:09
    [inaudible]
  • 19:09 - 19:13
    (Andrew) So, I'd say, solve it for 1,500,
    then scale up to 150 thousand.
  • 19:13 - 19:14
    So, we're trying to solve it
  • 19:14 - 19:17
    for the best
    well-known objects, and then--
  • 19:17 - 19:20
    (man 7) When do you think
    that will happen?
  • 19:21 - 19:26
    I understand that those are people
    that shouldn't go onto Wikidata.
  • 19:26 - 19:30
    So you go to Commons
    or your own Wikibase solution,
  • 19:30 - 19:32
    not to be a [inaudible]--
  • 19:32 - 19:35
    (Andrew) Right. That's why we're going
    with the 2,000 and 8,000.
  • 19:35 - 19:37
    We're pretty confident
    these are highly notable objects
  • 19:37 - 19:39
    that deserve to be in Wikidata.
  • 19:39 - 19:40
    Beyond that, it's debatable.
  • 19:40 - 19:44
    So, that's why we're not
    vacuuming 400-thousand things at one shot.
  • 19:44 - 19:49
    We're starting with notable 2,000,
    notable 8,000, then we'll talk after that.
  • 19:50 - 19:53
    So, these are the two lines of code
    that do the most stuff here.
  • 19:53 - 19:54
    So, even if you don't know Python,
  • 19:54 - 19:56
    it's actually not that bad
    if you look at this.
  • 19:56 - 19:58
    There's a read_csv function.
  • 19:58 - 20:00
    You're taking the crosswalk URL,
  • 20:00 - 20:02
    basically, the URL
    of that Google Spreadsheet.
  • 20:02 - 20:05
    You're grabbing the spreadsheet
    that's called "Object Name",
  • 20:05 - 20:07
    and you're basically creating
    a data structure
  • 20:07 - 20:08
    that has the Object Name and the QID.
  • 20:08 - 20:10
    That's it. That's all you're doing.
  • 20:10 - 20:12
    Just pulling that in to the Python code.
  • 20:12 - 20:16
    Then, you're actually matching
    whatever the entity's name is,
  • 20:16 - 20:18
    and then looking up the QID.
  • 20:18 - 20:22
    Okay, so, this is just to tell you
    that's not super hard.
  • 20:22 - 20:24
    The code is available right there,
    if you want to look at it.
  • 20:24 - 20:26
    But these two lines of code,
    which takes a little while
  • 20:26 - 20:30
    when you're writing it from scratch
    to create these two lines of code,
  • 20:30 - 20:31
    but once you have an example,
  • 20:31 - 20:34
    it's pretty darn easy to plug in
    your own data set, your own crosswalk,
  • 20:34 - 20:37
    to generate the QuickStatements.
  • 20:37 - 20:39
    So, I've done a lot of the work already,
  • 20:39 - 20:41
    and I invite you
    to steal the code and try it.
  • 20:42 - 20:45
    So, when it comes to images,
    it's a little more challenging.
  • 20:45 - 20:48
    So, at this point, Pattypan
    is probably your best bet.
  • 20:48 - 20:51
    Pattypan is a tool that is
    a spreadsheet-oriented tool.
  • 20:51 - 20:55
    You fill in the metadata, you point
    to the local file on your computer,
  • 20:55 - 20:57
    and it uploads it to Commons
    with all that information,
  • 20:57 - 21:02
    or another alternative
    is if you set P4765 to a URL--
  • 21:03 - 21:06
    because this is the Commons-compatible
    image available at URL,
  • 21:06 - 21:09
    Martin Dahhmers has a bot,
    at least for paintings,
  • 21:09 - 21:12
    that will just swoop through and say,
    "Oh, we don't have this image.
  • 21:12 - 21:15
    Here's a Commons compatible one.
  • 21:15 - 21:18
    Why don't I slip it from that site
    and put it into Commons?"
  • 21:18 - 21:19
    And that's what his bot does.
  • 21:19 - 21:21
    So, you can actually take
    a look at his bot
  • 21:21 - 21:24
    and modify it for your own purposes,
    but that is also another alternative
  • 21:24 - 21:28
    that doesn't require you
    to do some spreadsheet work there.
  • 21:28 - 21:30
    If you might have heard
    of GLAM Wiki Toolset,
  • 21:30 - 21:33
    it's effectively end
    of life at this point.
  • 21:33 - 21:37
    It hasn't been updated, and even the folks
    who have been working with it in the past
  • 21:37 - 21:39
    have said Pattypan
    is probably your best bet.
  • 21:39 - 21:42
    Has anyone used GWT these days?
  • 21:42 - 21:44
    A few of you, a little bit.
  • 21:44 - 21:45
    It's just not being further developed,
  • 21:45 - 21:48
    and it's not compatible with a lot
    of our authentication protocols
  • 21:48 - 21:49
    that we have now.
  • 21:49 - 21:53
    Okay. So, right now, we have basic
    metadata added to Wikidata,
  • 21:53 - 21:55
    with pretty good results from the Met,
  • 21:55 - 21:58
    and we have a Python script here
    to also analyze that.
  • 21:58 - 22:00
    You're welcome to steal
    some of that code, as well.
  • 22:00 - 22:03
    So, this is what we are showing
    to the Met folks, now.
  • 22:03 - 22:06
    We actually have Listeria lists
    that are running
  • 22:06 - 22:08
    to show all the inventory
  • 22:08 - 22:11
    and all the information
    that we have in Wikidata.
  • 22:11 - 22:16
    And I'll show you very quickly
    about a project that we ran to show folks.
  • 22:16 - 22:19
    So, what are the benefits of adding
    your collections to Wikidata?
  • 22:19 - 22:22
    One is to use AI in the image classifier
  • 22:22 - 22:25
    to actually help train
    a machine learning model
  • 22:25 - 22:29
    with all the Met's images and keywords,
    and let that be an engine for other folks
  • 22:29 - 22:32
    to recognize content.
  • 22:32 - 22:36
    So, this is a hack-a-thon that we had
    with MIT and Microsoft last year.
  • 22:36 - 22:39
    The way this works, is we have
    the paintings from the Met,
  • 22:39 - 22:40
    and we have the keywords
  • 22:40 - 22:43
    that they actually paid a crew
    for six months to work on
  • 22:43 - 22:47
    to add hand keyword tags
    to all the artworks.
  • 22:48 - 22:50
    We ingested that
    into an AI system right here,
  • 22:50 - 22:51
    and then, what we did was say,
  • 22:51 - 22:55
    "Let's feed in new images that
    this AI ML system had never seen before,
  • 22:55 - 22:57
    and see what comes out."
  • 22:57 - 23:00
    And the problem is that it comes out
    with pretty good results,
  • 23:00 - 23:02
    but it's maybe only 60% accurate.
  • 23:02 - 23:05
    And for most folks,
    60% accurate is garbage.
  • 23:05 - 23:09
    How do I get the 60% good
    out of this pile of stuff?
  • 23:09 - 23:11
    The good news is that our community
    knows how to do that.
  • 23:11 - 23:13
    We can actually feed this
    into a Wikidata game
  • 23:13 - 23:15
    and get the good stuff out of that.
  • 23:15 - 23:16
    That's basically what we did.
  • 23:16 - 23:18
    So, this is the Wikidata game--
  • 23:18 - 23:20
    you'll notice this is
    Magnus' interface right there--
  • 23:20 - 23:21
    being played at the Met Museum,
  • 23:21 - 23:22
    in the lobby.
  • 23:22 - 23:25
    We actually had folks at a cocktail party
    drinking champagne
  • 23:25 - 23:27
    and hitting buttons on the screen.
  • 23:27 - 23:31
    Hopefully, accurately. (chuckles)
  • 23:31 - 23:33
    (applause)
  • 23:33 - 23:35
    We had journalists, curators,
  • 23:35 - 23:38
    we had some board members
    from the Met there as well.
  • 23:38 - 23:39
    And this was great.
  • 23:39 - 23:40
    No log in, whatever.
  • 23:40 - 23:42
    (lowers voice) We created
    an account just for this.
  • 23:42 - 23:44
    So, they just hit yes-no-yes-no.
  • 23:44 - 23:45
    This is great.
  • 23:45 - 23:48
    You saw this, it said,
    "Is there a tree in this picture?"
  • 23:48 - 23:49
    You don't have to train anyone on this.
  • 23:49 - 23:52
    You just hit yes--
    depicts a tree, not depicted.
  • 23:52 - 23:56
    I even had my eight-year-old boys
    play this game with a finger tap.
  • 23:57 - 24:00
    And we also created a little tool
    that showed all the depictions going by
  • 24:00 - 24:02
    so people could see them.
  • 24:03 - 24:06
    It basically is like--
    how do you sift good from bad?
  • 24:06 - 24:08
    This is where the Wikimedia
    community comes in,
  • 24:08 - 24:11
    that no other entity could ever do.
  • 24:12 - 24:15
    So, in that first few months
    that we had this,
  • 24:15 - 24:19
    over 7,000 judgments,
    resulting in about 5,000 edits.
  • 24:20 - 24:22
    We did really well on tree,
    boat, flower, horse,
  • 24:22 - 24:25
    things that are in landscape paintings.
  • 24:25 - 24:27
    But when you go to things
    like gender discrimination,
  • 24:27 - 24:30
    and cats and dogs, not so good, I know.
  • 24:30 - 24:32
    Because there's so many different
    types of cats and dogs
  • 24:32 - 24:33
    in different positions.
  • 24:33 - 24:36
    But horses, a lot easier
    than cats and dogs.
  • 24:37 - 24:39
    But also, I should note
    that Wikimedia Foundation
  • 24:39 - 24:43
    is now looking into doing
    image recognition on Commons uploads
  • 24:43 - 24:46
    to do these suggestions as well,
    which is an awesome development.
  • 24:47 - 24:50
    Okay, so, dashboards.
  • 24:51 - 24:53
    Let's just show you
    some of these dashboards.
  • 24:53 - 24:55
    Folks you work with love dashboards.
  • 24:55 - 24:57
    They just want to see stats.
  • 24:57 - 24:59
    So, we have them, like BaGLAMa.
  • 24:59 - 25:01
    We have InteGraality.
  • 25:01 - 25:03
    Is JeanFred here?
  • 25:03 - 25:06
    I think this is a very new thing
    relative to last WikidataCon.
  • 25:06 - 25:08
    We actually have a tool
    which will create
  • 25:08 - 25:11
    this property completeness
    chart right here.
  • 25:11 - 25:13
    So, it's called InteGraality,
    with two A's.
  • 25:13 - 25:16
    It's on that big chart
    that I showed you before.
  • 25:16 - 25:19
    And it can just autogenerate
    how complete your items are
  • 25:19 - 25:21
    in any set, which is really cool.
  • 25:22 - 25:24
    So, we can see that paintings
    are by far the highest,
  • 25:24 - 25:26
    we have sculptures, drawings, photographs.
  • 25:26 - 25:29
    And then, they also like to see
    what are the most popular artworks
  • 25:29 - 25:31
    in the Wikisphere?
  • 25:31 - 25:33
    So, just looking at the site links
    in Wikidata--
  • 25:33 - 25:38
    you can see and rank
    all these different artworks there.
  • 25:40 - 25:42
    Also another thing they'd like to see
  • 25:42 - 25:47
    is what are the most frequent creators
    of content or Met artworks--
  • 25:47 - 25:49
    what are the most commonly
    depicted things.
  • 25:49 - 25:52
    So, these are very easy
    to generate in SPARQL,
  • 25:52 - 25:55
    you could look at it right there,
    using bubble graphs.
  • 25:55 - 25:57
    Then place of birth
    of the most prominent artists,
  • 25:57 - 25:59
    we have a chart there, as well.
  • 25:59 - 26:01
    So, structured data on Commons.
  • 26:01 - 26:04
    I just want to show you very briefly
    in case you can't get to Sandra's session,
  • 26:04 - 26:06
    but you definitely should go
    to Sandra's session.
  • 26:06 - 26:11
    You actually can search in Commons
    for a specific Wikibase statement.
  • 26:11 - 26:15
    I don't always remember the syntax,
    but you have burn in your brain
  • 26:15 - 26:20
    and say, it's haswbstatement:P1343=
  • 26:20 - 26:23
    whatever-- basically, your last
    two parts of the triple.
  • 26:23 - 26:26
    I always get haswb and wbhas mixed up.
  • 26:26 - 26:28
    I always get the colon
    and the equals mixed up.
  • 26:28 - 26:32
    So just do it once, remember it,
    and you'll get the hang of it.
  • 26:32 - 26:35
    But simple searches are must faster
    than SPARQL queries.
  • 26:35 - 26:36
    So, if you can just look
    for one statement,
  • 26:36 - 26:38
    boom, you'll get the results.
  • 26:39 - 26:44
    So, things like this, you can look
    for symbolically or semantically,
  • 26:44 - 26:48
    things that depict
    the Met museum, for example.
  • 26:48 - 26:50
    So, finally, community campaigns.
  • 26:50 - 26:52
    Richard has been a pioneer in this area.
  • 26:52 - 26:54
    So, once you have the Wikidata items,
  • 26:54 - 26:57
    they can actually assist
    in creating Wikipedia articles.
  • 26:57 - 27:00
    So, Richard, why don't you tell us
    a little bit about the Mbabel tool
  • 27:00 - 27:01
    that you created for this.
  • 27:01 - 27:03
    (Richard) Hi, can I get this on?
  • 27:05 - 27:06
    (Andrew) Oh, use [Joisey's].
  • 27:06 - 27:08
    (Richard) It's on, now. I'm good.
  • 27:09 - 27:11
    So, we had all this information
    on Wikidata.
  • 27:11 - 27:14
    [inaudible] browsing data
    on our evenings and weekends
  • 27:14 - 27:16
    to learn about art-- not everyone does.
  • 27:16 - 27:19
    We have quite a bit more people
    [inaudible] Wikipedia,
  • 27:19 - 27:22
    so how do we get this information
    from Wikidata to Wikipedia?
  • 27:22 - 27:25
    One of the ways of doing this
    is this so-called Mbabel,
  • 27:25 - 27:28
    which developed with the help
    of a lot of people in [inaudible].
  • 27:28 - 27:31
    People like Martin and others.
  • 27:32 - 27:35
    So, basically to take
    some basic art information,
  • 27:35 - 27:38
    and use it to populate
    a Wikipedia article.
  • 27:38 - 27:40
    So, by who created this work,
    who was the artist,
  • 27:40 - 27:42
    when it was created, et cetera.
  • 27:42 - 27:45
    The nice thing about this
    is it can generate works.
  • 27:45 - 27:46
    We started with English Wikipedia,
  • 27:46 - 27:49
    but it's been developed
    in other languages.
  • 27:49 - 27:51
    So, Portuguese Wikipedia,
    our Brazilian friends
  • 27:51 - 27:54
    who've done a lot of work and taking it
    to realms beyond art,
  • 27:54 - 27:57
    to stuff like elections
    and political work as well.
  • 27:57 - 28:01
    And the nice thing about this
    is we can query on Wikidata--
  • 28:02 - 28:07
    so different artists-- so for example,
    we've done projects with Women in Red,
  • 28:07 - 28:08
    looking at women artists.
  • 28:08 - 28:13
    Projects related to Wiki Loves Pride,
    looking at LGBT-identified artists,
  • 28:13 - 28:14
    African Diaspora Artists,
  • 28:14 - 28:16
    and a lot of different groups
    and things of time periods,
  • 28:16 - 28:19
    different collections,
    and also looking at articles
  • 28:19 - 28:22
    that have been and haven't been
    translated to different languages.
  • 28:22 - 28:25
    So all of the articles that haven't
    been translated to Arabic yet.
  • 28:25 - 28:28
    You need to find some interesting articles
    maybe that are relevant to a culture
  • 28:28 - 28:30
    that haven't been translated
    into that language yet.
  • 28:30 - 28:33
    We actually have a number of works
    in the Met collection
  • 28:33 - 28:35
    that are in Wikipedias
    that aren't in English yet,
  • 28:35 - 28:37
    because it's a global collection.
  • 28:38 - 28:40
    So, there are a lot of ways,
    and hopefully, we can spread it around
  • 28:40 - 28:45
    of creating Wikipedia content, as well,
    that is driven by these Wikidata items,
  • 28:45 - 28:48
    and that also maybe
    can help spread the improvement
  • 28:48 - 28:50
    to Wikidata items, as well, in the future.
  • 28:50 - 28:52
    (Andrew) And there's a number of folks
    here using Mbable already, right?
  • 28:52 - 28:54
    Who's using Mbable
    in the room? Brazilians?
  • 28:54 - 28:59
    And also, if [Armin] is here,
    we have our winner
  • 28:59 - 29:03
    of the Wikipedia Asia Month,
    and Wiki Loves Pride contest.
  • 29:03 - 29:06
    So, thank you for joining,
    and congratulations.
  • 29:06 - 29:10
    We'll have another Wiki Asia Month
    campaign in November.
  • 29:10 - 29:13
    The way I like to describe it
    [inaudible]
  • 29:13 - 29:15
    It doesn't give you a blank page.
  • 29:15 - 29:17
    It gives you the skeleton,
  • 29:17 - 29:19
    which is really a much better
    user experience
  • 29:19 - 29:21
    for edit-a-thons and beginners.
  • 29:21 - 29:24
    So, it's a lot of great work
    that Richard has done,
  • 29:24 - 29:26
    and people are building on it,
    which is awesome.
  • 29:26 - 29:29
    (woman 3) [inaudible] for some of them,
    which is really nice.
  • 29:29 - 29:30
    Yeah, exactly.
  • 29:30 - 29:33
    (woman 3) [inaudible]
  • 29:33 - 29:36
    Right. We should have put a URL here.
  • 29:36 - 29:38
    (man 8) [inaudible]
  • 29:38 - 29:40
    Oh, that's right.
    We have the link right here.
  • 29:40 - 29:44
    So if you click-- this is a Listeria list,
    it's autogenerating all that for you.
  • 29:44 - 29:46
    And then, you click on the red link,
    it'll create the skeleton,
  • 29:46 - 29:47
    which is pretty cool.
  • 29:47 - 29:49
    Alright, we're on the final stretch here.
  • 29:49 - 29:52
    The tool that we're going
    to be announcing--
  • 29:52 - 29:55
    well, we announced a few weeks ago,
    but only to a small set of folks,
  • 29:55 - 29:57
    but we're making a big splash here,
  • 29:57 - 29:59
    is the depiction tool
    that we just created.
  • 29:59 - 30:05
    Wikipedia has shown that volunteer
    contributors can add a lot of these things
  • 30:05 - 30:07
    that museums can't.
  • 30:07 - 30:10
    So, what if we created a tool
    that could let you enrich
  • 30:10 - 30:16
    the metadata about artworks
    in terms of the depiction information?
  • 30:16 - 30:19
    And what we did was we applied
    for a grant from the Knight Foundation,
  • 30:19 - 30:23
    and we created this tool--
    and is Edward here?
  • 30:23 - 30:27
    Edward is our wonderful developer
    who in like a month, said,
  • 30:27 - 30:28
    "Okay, here's a prototype."
  • 30:28 - 30:33
    After we gave him a specification,
    and it's pretty cool.
  • 30:34 - 30:36
    - So what we can do--
    - (applause)
  • 30:36 - 30:37
    Thanks, Edward.
  • 30:38 - 30:39
    We're working within collections of items.
  • 30:39 - 30:42
    So, what we do, is we can
    bring up a page like this.
  • 30:42 - 30:45
    It's no longer looking
    at a Wikidata item with a tiny picture.
  • 30:45 - 30:48
    If we're working with what's depicted
    in the image, we want the picture big.
  • 30:48 - 30:51
    And we don't really have tools
    that work with big images.
  • 30:51 - 30:53
    We have tools that deal
    with lexical and typing.
  • 30:53 - 30:57
    So one of the big things that Edward did
    was made a big version of the picture,
  • 30:57 - 30:59
    scrape whatever you can
    from the object page
  • 30:59 - 31:01
    from a GLAM organization,
    give you context.
  • 31:01 - 31:03
    I can see dogs, children, wigwam.
  • 31:03 - 31:06
    These are things that direct the user
    to add meaningful information.
  • 31:06 - 31:09
    You have some metadata
    that's scraped from the site, too.
  • 31:09 - 31:12
    Teepee, Comanche--
    oh, it's Comanche, not Navajo,
  • 31:12 - 31:14
    because I know the object page said that.
  • 31:14 - 31:16
    And you can actually start typing
    in the field, there.
  • 31:16 - 31:18
    And the cool thing is that
    it gives you context,
  • 31:18 - 31:20
    It doesn't just match anything
    to Wikidata,
  • 31:20 - 31:23
    it first matches things that have already
    been used in other depiction statements.
  • 31:23 - 31:25
    Very simple thing,
    but what a godsend it is
  • 31:25 - 31:27
    for folks who have tried this in the past.
  • 31:27 - 31:29
    Don't give me everything
    that matches teepee.
  • 31:29 - 31:33
    Show me what other paintings
    have used teepee in the past.
  • 31:33 - 31:36
    So, it's interactive, context-driven,
    statistics-driven,
  • 31:36 - 31:38
    by showing you what is matched before.
  • 31:38 - 31:40
    And the cool thing is once you're done
    with that painting,
  • 31:40 - 31:42
    you can start to work in other areas.
  • 31:42 - 31:45
    You want to work within the same artist,
    the collection, location,
  • 31:46 - 31:47
    other criteria here.
  • 31:47 - 31:49
    And you can even browse
    through the collections
  • 31:49 - 31:52
    of different organizations,
    just work on their paintings.
  • 31:52 - 31:54
    So, we wanted people
    to not live in Wikidata--
  • 31:54 - 31:56
    kind of onesy-twosies with items,
    but live in a space
  • 31:56 - 31:59
    where you're looking at artworks
    in collections that make sense.
  • 32:00 - 32:02
    And then, you can actually
    look through it visually.
  • 32:02 - 32:04
    It kind of looks like Krotos
    or these other tools,
  • 32:04 - 32:08
    but you can actually live edit
    on Wikidata at the same time.
  • 32:08 - 32:09
    So, go ahead and try it out.
  • 32:09 - 32:11
    We've only have 14 users,
  • 32:11 - 32:15
    but we've had 2,100 paintings worked on,
    with 5,000 plus depict statements.
  • 32:15 - 32:16
    That's pretty good for 14.
  • 32:16 - 32:18
    So, multiply that by 10--
  • 32:18 - 32:21
    imagine how many more things
    we could do with that.
  • 32:21 - 32:24
    So, you can go ahead and go
    to art.wikidata.link and try out the tool.
  • 32:24 - 32:27
    It uses OLAF authentication,
    and you're off to the races.
  • 32:27 - 32:29
    And it should be very natural
    without any kind of training
  • 32:29 - 32:32
    to add depiction statements to artworks.
  • 32:32 - 32:35
    But you can put any object.
    We don't restrict the object right now.
  • 32:35 - 32:37
    So, you could put any Q number
  • 32:38 - 32:41
    to edit this content if you want.
  • 32:41 - 32:45
    But we primarily stick with paintings
    and 2D artworks, right now.
  • 32:46 - 32:49
    Okay. You can actually look
    at the recent changes
  • 32:49 - 32:52
    and see who's made edits recently to that.
  • 32:53 - 32:55
    Okay? Okay, so we're going
    to wind it down.
  • 32:55 - 32:58
    Ooh, one minute, then we'll do some Q&A.
  • 32:59 - 33:03
    So, the final thing that I think
    is useful for museum types especially,
  • 33:03 - 33:07
    is there's a very famous author
    named Nina Simon in the museum world,
  • 33:07 - 33:11
    where she likes to talk about
    how do we go from users,
  • 33:11 - 33:15
    or I guess your audience,
    contributing stuff to your collections
  • 33:15 - 33:18
    to collaborating around content,
    to actually being co-creative
  • 33:18 - 33:20
    and creating new things.
  • 33:20 - 33:21
    And that's always been tough.
  • 33:21 - 33:24
    And I'd like to argue that Wikidata
    is this co-creative level.
  • 33:24 - 33:27
    So, it's not just uploading
    a file to Commons,
  • 33:27 - 33:28
    which is contributing something.
  • 33:28 - 33:31
    It's not just editing an article
    with someone else, which is collaborative.
  • 33:31 - 33:35
    But we are now seeing these tools
    that let you make timelines,
  • 33:35 - 33:36
    and graphs, and bubble charts.
  • 33:36 - 33:39
    And this is actually the co-creative part
    that's really interesting.
  • 33:39 - 33:40
    And that's what Wikidata provides you.
  • 33:40 - 33:42
    Because suddenly,
    it's not language dependent--
  • 33:42 - 33:45
    we've got this database
    that's got this rich information in it.
  • 33:46 - 33:49
    So, it's not just pictures, not just text,
  • 33:49 - 33:51
    but it's all this rich multimedia
  • 33:51 - 33:53
    that we have the opportunity to work on.
  • 33:53 - 33:56
    So, this is just another example
    of this connected graph
  • 33:56 - 33:57
    that you can take a look at later on
  • 33:57 - 34:00
    to show another example
    of The Death of Socrates,
  • 34:00 - 34:02
    and the different themes
    around that painting.
  • 34:03 - 34:06
    And it's really easy
    to make this graph yourself.
  • 34:06 - 34:08
    So again, another scary graphic
    that only makes sense
  • 34:08 - 34:10
    for Wikidata folks, like you.
  • 34:10 - 34:14
    You just give it a list of Wikidata items,
    and it'll do the rest, that's it.
  • 34:14 - 34:16
    You'll give the list.
  • 34:16 - 34:18
    Keep all this code the same.
  • 34:18 - 34:21
    So, fortunately, Martin and Lucas
    helped do all this code here.
  • 34:21 - 34:24
    Just give it a list of items
    and the magic will happen.
  • 34:24 - 34:26
    Hopefully, it won't blow up your computer,
  • 34:26 - 34:29
    because you're putting in
    a reasonable number of items there.
  • 34:29 - 34:32
    But as long as you have the screen space,
    it'll draw the graph,
  • 34:32 - 34:33
    which is pretty darn cool.
  • 34:33 - 34:37
    And then, finally, two tools--
    I realized at 2 a.m. last night
  • 34:37 - 34:40
    a few people said,
    "I didn't know about these tools."
  • 34:40 - 34:41
    And you should know about these tools.
  • 34:41 - 34:45
    So, one is Recoin, which shows you
    the relative completeness of an item
  • 34:45 - 34:47
    compared to other items
    of the same instance.
  • 34:47 - 34:49
    And then, Cradle, which is a way
    to have a forms-based way
  • 34:49 - 34:51
    to create content.
  • 34:51 - 34:52
    So, these are very useful for edit-a-thons
  • 34:52 - 34:55
    where if you know that
    you're working with just artworks,
  • 34:55 - 34:58
    don't just let people create items
    with a blank screen.
  • 34:58 - 35:00
    Give them a form to fill out
    to start entering in information
  • 35:00 - 35:02
    that's structured.
  • 35:02 - 35:05
    And then, finally, we've gone
    through some of this, already.
  • 35:06 - 35:10
    This is my big chart that I love
    to get people's feedback on.
  • 35:10 - 35:14
    How do we get people
    across the chasm to be in this space?
  • 35:14 - 35:17
    We have a lot of folks who, now,
    can do template coding,
  • 35:17 - 35:20
    spreadsheets, QuickStatements,
    SPARQL queries, and then we got--
  • 35:21 - 35:24
    how do we get people to this side
    where we have Python
  • 35:24 - 35:27
    and the things that can do more
    sophisticated editing.
  • 35:27 - 35:29
    It's really hard
    to get people across this.
  • 35:29 - 35:31
    But I would like to say
    it's hard to get people across,
  • 35:31 - 35:33
    but the content and the technology
    is not that hard.
  • 35:33 - 35:35
    We actually need more people
    to learn about regular expressions.
  • 35:35 - 35:38
    And once you get some kind
    of experience here,
  • 35:38 - 35:42
    you'll find that this is a wonderful world
    that you can learn a lot in,
  • 35:42 - 35:45
    but it does take some time
    to get across this chasm.
  • 35:45 - 35:46
    Yes, James.
  • 35:46 - 35:52
    (James) [inaudible]
  • 35:53 - 35:57
    No, what it means is that the graph
    is not necessarily accurate
  • 35:57 - 35:59
    in terms of its data points.
  • 35:59 - 36:03
    But what it means-- I guess
    it's more like this is a valley.
  • 36:04 - 36:07
    It's like we need to get people
    across this valley here.
  • 36:07 - 36:10
    (woman 4) [inaudible]
  • 36:10 - 36:12
    I would say this is the key.
  • 36:12 - 36:16
    If we can get people who know this stuff,
    but can grok this stuff,
  • 36:16 - 36:18
    it gets them to this stuff.
  • 36:18 - 36:20
    Does that make sense? Yeah.
  • 36:20 - 36:24
    So, my vision for the next few years,
    we can get better training
  • 36:24 - 36:28
    in our community to get people
    from batch processing,
  • 36:28 - 36:30
    which is pretty much what this is,
    to kind of intelligent--
  • 36:30 - 36:33
    I wouldn't say intelligent,
    but more sophisticated programming,
  • 36:33 - 36:35
    that would be a great thing,
    because we're seeing this is a bottleneck
  • 36:35 - 36:38
    to a lot of the stuff
    that I just showed you up there.
  • 36:38 - 36:39
    Yes.
  • 36:39 - 36:42
    (man 9) [inaudible]
  • 36:42 - 36:46
    Okay, wait, you want to show me something,
    show me after the session, does that work?
  • 36:46 - 36:48
    Okay. Yes, Megan.
  • 36:48 - 36:51
    - (Megan) Can I have a microphone?
    - Microphone, yes.
  • 36:51 - 36:55
    - (Megan) [inaudible]
    - Yeah.
  • 36:55 - 36:57
    And we have lunch after this,
  • 36:57 - 36:59
    so if you want to stay
    a little bit later, that's fine, too.
  • 36:59 - 37:01
    - [inaudible]
    - We're already at lunch break? Okay.
  • 37:01 - 37:03
    (Megan) So, thank you so much
    to both you and Richard
  • 37:03 - 37:05
    for all the work you're doing at the Met.
  • 37:05 - 37:07
    And I know that you're
    very well supported in that.
  • 37:07 - 37:09
    (mic feedback)
    I don't know what happened there.
  • 37:09 - 37:15
    For the average volunteer community,
    how do you balance doing the work
  • 37:15 - 37:19
    for the cultural heritage organization
    versus training the professionals
  • 37:19 - 37:22
    that are there to do that work?
  • 37:22 - 37:24
    Where do you find the balance
    in terms of labor?
  • 37:26 - 37:27
    It's a good question.
  • 37:27 - 37:30
    (Megan) One that really comes up,
    I think, with this as well.
  • 37:30 - 37:33
    - With this?
    - (Megan) Yeah, and with building out...
  • 37:33 - 37:36
    where we put efforts in terms
    of building out competencies.
  • 37:36 - 37:39
    Yeah. I don't have a great answer for you,
    but it's a great question.
  • 37:39 - 37:41
    (Megan) Cool.
  • 37:41 - 37:44
    (Richard) There are a lot
    of tech people at [inaudible]
  • 37:44 - 37:46
    who understand this side of the graph,
    and don't understand it--
  • 37:46 - 37:49
    the people in [inaudible]
    who understand this part of the graph,
  • 37:49 - 37:51
    and don't understand
    this part of the graph.
  • 37:51 - 37:54
    So, the more we can get Wikimedians
    who understand some of this,
  • 37:54 - 37:58
    with some tech professionals at museums
    who understand this,
  • 37:58 - 37:59
    then that makes it a little bit easier--
  • 37:59 - 38:02
    and hopefully, as well as
    training up Wikimedians,
  • 38:02 - 38:06
    we can also provide some guidance
    and let the museums [inaudible]
  • 38:06 - 38:07
    to take care of themselves
    in the [inaudible].
  • 38:07 - 38:09
    Yeah, that's a good point.
  • 38:09 - 38:12
    How many people here know
    what regular expressions are?
  • 38:12 - 38:13
    Raise your hand.
  • 38:13 - 38:17
    Okay, so how many people are comfortable
    specifying a regular expression?
  • 38:17 - 38:19
    So, yeah, we need more work here.
  • 38:19 - 38:21
    (laughter)
  • 38:21 - 38:23
    (man 10) I want to suggest that--
  • 38:25 - 38:29
    maybe not getting
    every Wikidata practitioner,
  • 38:29 - 38:34
    or institution practitioner
    to embrace Python programming is the way.
  • 38:34 - 38:40
    But as Richard just said, finding more
    bridging people-- people like you--
  • 38:40 - 38:41
    who speak both--
  • 38:41 - 38:44
    who speak Python,
    but also speak GLAM institution--
  • 38:45 - 38:48
    to help the GLAM's own
    technical department, which may not--
  • 38:49 - 38:52
    they know Python,
    they don't know this stuff.
  • 38:53 - 38:54
    That's, I think, what's needed.
  • 38:54 - 38:59
    People like you, people like me,
    people who speak both of these jargons
  • 38:59 - 39:02
    to help make the connections,
    to document the connections.
  • 39:02 - 39:03
    You're already doing this, of course.
  • 39:03 - 39:06
    You share your code, et cetera,
    you're doing tutorials.
  • 39:06 - 39:07
    But we need more of this.
  • 39:07 - 39:09
    I'm not sure we need
    to make everyone programmers.
  • 39:09 - 39:11
    We already have programmers.
  • 39:11 - 39:12
    We need to make them understand
  • 39:12 - 39:15
    the non-programming
    material they need to--
  • 39:15 - 39:16
    I think that's a great point.
  • 39:16 - 39:18
    We don't need to make everyone
    highly proficient in this,
  • 39:18 - 39:20
    but we do need people
    knowledgeable to say that,
  • 39:20 - 39:23
    "Yeah, we can ingest 400 thousand rows
    and do something with it."
  • 39:23 - 39:25
    Whereas, if you're stuck
    on this side, you're like,
  • 39:25 - 39:27
    "400 thousand rows
    sounds really big and scary."
  • 39:27 - 39:30
    But if you know that it's possible,
    you're like, "No problem."
  • 39:30 - 39:32
    400 thousand is not a problem.
  • 39:32 - 39:35
    (woman 5) I would just like to chime in
    a little bit in that
  • 39:35 - 39:40
    that there may be countries and areas
    where you will not find a GLAM
  • 39:40 - 39:44
    with any skilled technologists.
  • 39:44 - 39:48
    So, you will have to invent
    something there in the middle.
  • 39:49 - 39:50
    That's a good point.
  • 39:50 - 39:51
    Any questions? Sandra.
  • 39:56 - 39:58
    (Sandra) Yeah, I just wanted
    to add to this discussion.
  • 39:58 - 40:02
    Actually, I've seen some very good cases
    where it indeed has been successful
  • 40:02 - 40:05
    to train GLAM professionals to work
    with this entire environment,
  • 40:05 - 40:09
    and where they've done fantastic jobs,
    also at small institutions.
  • 40:10 - 40:15
    It also requires that you have chapters
    or volunteers that can train the staff.
  • 40:15 - 40:18
    So, it's really like a bigger environment.
  • 40:18 - 40:22
    But I think that's a model
    that if we can manage to make that grow,
  • 40:22 - 40:24
    it can scale very well, I think.
  • 40:25 - 40:26
    Good point.
  • 40:26 - 40:31
    (woman 5) [inaudible]
  • 40:32 - 40:34
    Sorry, just noting that we don't have
  • 40:34 - 40:38
    any structured trainings
    right now for that.
  • 40:38 - 40:42
    We might want to develop those,
    and that would be helpful.
  • 40:43 - 40:44
    We have been doing that for education
  • 40:44 - 40:47
    in terms of teaching people
    Wikipedia and Wikidata.
  • 40:47 - 40:50
    It's just a matter of taking it
    one step further.
  • 40:51 - 40:52
    Right. Stacy.
  • 40:55 - 40:57
    (Stacy) Well, I'd just like to say
    that a lot of professionals
  • 40:57 - 41:02
    who work in this area of metadata
    have all these skills already.
  • 41:02 - 41:09
    So, I think part of it is just proving
    the value to these organizations,
  • 41:09 - 41:13
    but then it's also tapping
    into professional associations who can--
  • 41:13 - 41:17
    or ways of collaborating within
    those professional communities
  • 41:17 - 41:21
    to build this work, and the documentation
    on how to do things
  • 41:21 - 41:23
    is really, really important,
  • 41:23 - 41:27
    because I'm not sure about the role
    of depending on volunteers,
  • 41:27 - 41:32
    when some of this work is actually work
    GLAM organizations do anyway.
  • 41:32 - 41:35
    We manage our collections
    in a variety of ways through metadata,
  • 41:35 - 41:37
    and this is actually one more way.
  • 41:37 - 41:40
    So, should we also not be thinking
    about ways to integrate this work
  • 41:40 - 41:44
    into a GLAM professional's regular job.
  • 41:44 - 41:46
    And then that way you're generating--
  • 41:46 - 41:49
    and when you think
    about sustainability and scalability,
  • 41:49 - 41:53
    that's the real trick to making this
    sustainable and both scalable,
  • 41:54 - 41:59
    is that once this is the regular
    work of GLAM folks,
  • 41:59 - 42:01
    we're not worried as much about this part,
  • 42:01 - 42:04
    because it's just turning
    that little switch to get this
  • 42:04 - 42:06
    to be a part of that work.
  • 42:06 - 42:08
    Right. Good point. [Shani]?.
  • 42:12 - 42:13
    (Shani) You're absolutely right.
  • 42:13 - 42:16
    But I want to echo what you said before.
  • 42:16 - 42:22
    And yes, Susana-- this might work
    for more privileged countries
  • 42:22 - 42:25
    where they have money,
    they have people doing it.
  • 42:26 - 42:29
    It doesn't work for places
    that are still developing,
  • 42:29 - 42:32
    that don't have resources--
    they don't have all of that.
  • 42:33 - 42:37
    And they can barely do
    what they need to do.
  • 42:37 - 42:41
    So, it's difficult for them, and then,
    the community is really helpful.
  • 42:42 - 42:45
    These are the cases where the community
    can have a huge impact actually,
  • 42:46 - 42:50
    working with the GLAMS,
    because they can't do it all
  • 42:51 - 42:52
    as part of their jobs.
  • 42:53 - 42:55
    So, we need to think about that as well.
  • 42:55 - 42:58
    And having these examples,
    actually, is hugely important,
  • 42:58 - 43:01
    because it's helping
    to still convince them,
  • 43:01 - 43:06
    that it's critical to invest in it
    and to work with volunteers,
  • 43:06 - 43:09
    so, with non-professionals
    of sorts, to get there.
  • 43:10 - 43:13
    I can imagine a future where
    you don't have to know all this code.
  • 43:13 - 43:14
    These would just be
    kind of like Lego bricks
  • 43:14 - 43:16
    you can slap together,
  • 43:16 - 43:19
    saying, "Here's my database.
    Here's the crosswalk. Here's Wikidata,"
  • 43:19 - 43:21
    and just put it together,
    and you don't have to even code,
  • 43:21 - 43:24
    you just have to make sure
    the databases are in the right place.
  • 43:24 - 43:25
    Yep. Okay.
  • 43:27 - 43:29
    (man 11) Sorry. [inaudible]
  • 43:29 - 43:34
    I think if I would have done this project,
    I'd probably have done it the same way.
  • 43:34 - 43:36
    So, I think that's maybe a good sign.
  • 43:36 - 43:40
    I was wondering how did
    the whole financing work of this project?
  • 43:40 - 43:41
    How did the-- I'm sorry?
  • 43:41 - 43:43
    The financing of this project work.
  • 43:44 - 43:46
    - The financing?
    - Yeah, the money.
  • 43:46 - 43:48
    That's a good question.
  • 43:48 - 43:49
    Well, so, there are different parts of it.
  • 43:49 - 43:53
    So, the Knight grant funded
    the Wiki Art Depiction Explorer.
  • 43:53 - 43:57
    But I, for the last, maybe what--
    nine months--
  • 43:57 - 43:59
    I've been their Wikimedia strategist.
  • 43:59 - 44:02
    So, I've been on
    since February of this year.
  • 44:02 - 44:05
    So, that's pretty much they're paying
    for my time to help with their--
  • 44:05 - 44:08
    not only the upload of their collections,
    but developing these tools, as well.
  • 44:08 - 44:12
    - (Richard) So the Met's paying you?
    - Yeah, that's right.
  • 44:12 - 44:15
    (Richard) The grant, at least part
    of it has come from--
  • 44:15 - 44:17
    There was a grant for Open Access.
  • 44:17 - 44:20
    And this is under that campaign
    and with the digital department.
  • 44:20 - 44:24
    So, working as contractors throughout
    the Open Access campaign for the Met.
  • 44:28 - 44:30
    (man 12) I'm sorry.
    I guess before you were hired,
  • 44:30 - 44:31
    and before there was a grant,
  • 44:31 - 44:34
    there was probably a lot
    of volunteer work done to make sure--
  • 44:34 - 44:35
    Richard did a lot of work before that.
  • 44:35 - 44:37
    And then, Wikimedia New York
    did a lot of work,
  • 44:37 - 44:39
    but it was kind of in bursts.
  • 44:39 - 44:41
    It wasn't as comprehensive
    as we're talking about now
  • 44:41 - 44:46
    in terms of having-- making sure
    those two layers are complete
  • 44:46 - 44:47
    in Wikidata.
  • 44:49 - 44:51
    Alright, yeah. I think that's it.
  • 44:51 - 44:54
    So, I'm happy to talk after lunch,
    or after the break, if you want.
  • 44:55 - 44:56
    Okay. Thank you.
  • 44:56 - 44:59
    (applause)
Title:
cdn.media.ccc.de/.../wikidatacon2019-1077-eng-Wikidata_Commons_contribution_strategies_for_GLAM_organizations_hd.mp4
Video Language:
English
Duration:
45:06

English subtitles

Revisions