Return to Video

cdn.media.ccc.de/.../wikidatacon2019-18-eng-Lightning_talks_1_hd.mp4

  • 0:06 - 0:09
    (host) Hello, everyone. Thank you
    for coming to these lightning talks.
  • 0:09 - 0:12
    Our first speaker, I'm going
    to run straight into it,
  • 0:12 - 0:14
    is going to be Rosie
    Stephenson-Goodknight.
  • 0:14 - 0:15
    Did I get that right?
  • 0:15 - 0:20
    Yes. And so she's going to be talking
    about the Women Writers Project.
  • 0:20 - 0:23
    And we're going to--
    yeah, is that right? Great.
  • 0:23 - 0:24
    And so, we're going
    to just launch right in,
  • 0:24 - 0:27
    and I want to remind you,
    if there's time for questions,
  • 0:27 - 0:29
    to please not speak
    until you have the microphone.
  • 0:29 - 0:30
    Thank you.
  • 0:32 - 0:34
    (Rosie) Hi, everyone, and thanks
    for coming to this session,
  • 0:34 - 0:37
    where we're going to talk
    about Women Writers in Review,
  • 0:37 - 0:40
    cultures of reception associated
    with trans-Atlantic,
  • 0:40 - 0:44
    English language women writers,
    broadly construed.
  • 0:45 - 0:48
    Women Writers in Review is an initiative
    of the Women Writers Project
  • 0:48 - 0:51
    of Northeastern University.
  • 0:51 - 0:55
    It moved there from Brown University,
    approximately 15 years ago.
  • 0:56 - 1:00
    Women Writers in Review is a collection
    of 18th- and 19th-century reviews,
  • 1:00 - 1:04
    publication notices,
    literary histories, and other texts
  • 1:04 - 1:10
    corresponding to trans-Atlantic--
    so, UK and US mostly,
  • 1:10 - 1:13
    though a few Canadian--
    written works by women.
  • 1:13 - 1:16
    It's a project where the two universities,
  • 1:16 - 1:18
    Brown University
    and Northeastern University,
  • 1:18 - 1:23
    started collecting the manuscripts
    of women from this period.
  • 1:23 - 1:28
    And then they started collecting
    the reviews of these works,
  • 1:28 - 1:32
    and then they started scoring
    these reviews by giving them a rating.
  • 1:32 - 1:36
    It's designed to investigate
    the discourse of reception and connection
  • 1:36 - 1:39
    with the changing trans-Atlantic
    literary landscape
  • 1:39 - 1:43
    for the period 1770 to 1830.
  • 1:46 - 1:49
    You're going to pardon me if I speak fast,
    because I've got five minutes
  • 1:49 - 1:51
    to go over this.
  • 1:51 - 1:55
    It includes 690 English language texts
    responding to works
  • 1:55 - 2:00
    written or translated
    by 18th- and 19th-century women writers.
  • 2:00 - 2:05
    There are 74 authors in the corpus,
    using 112 different sources,
  • 2:05 - 2:08
    or periodicals, or magazines.
  • 2:08 - 2:11
    And there are 628 critical reviews.
  • 2:12 - 2:15
    Here's a picture that shows you
    what we're talking about
  • 2:15 - 2:17
    in terms of a review.
  • 2:17 - 2:19
    And you can also see what kind of scores
  • 2:19 - 2:25
    were given by the academics
    at Northeastern University.
  • 2:26 - 2:29
    Most of these are women
    who were giving scores
  • 2:29 - 2:34
    based on the reviews that were done
    mostly, probably all men,
  • 2:34 - 2:40
    back in this time period 1770 to 1830
    of works written by women.
  • 2:40 - 2:43
    By works, we're talking about plays,
    and novels, and poems,
  • 2:43 - 2:47
    essays, and other kinds of articles.
  • 2:49 - 2:50
    So, what are we talking about?
  • 2:50 - 2:55
    This required creating
    items for authors for their works,
  • 2:55 - 2:58
    like I said, novels and plays and poems.
  • 2:58 - 3:05
    It required creating new items
    for this period of time
  • 3:05 - 3:08
    where there are defunct periodicals.
  • 3:08 - 3:12
    It required creating items
    for the scholarly articles.
  • 3:13 - 3:17
    And then the review scores of each,
    and the review score by,
  • 3:17 - 3:20
    which in this case would be
    Women Writers in Review,
  • 3:20 - 3:23
    and what we still need to add
    is the described by source.
  • 3:25 - 3:29
    This gives you a picture
    of the kind of spreadsheets,
  • 3:29 - 3:31
    Google Spreadsheets,
    that I have been working on.
  • 3:31 - 3:34
    I shouldn't just say I,
    because I've had a lot of help.
  • 3:34 - 3:38
    I've had a lot of people
    who were working on this project with me.
  • 3:38 - 3:40
    And you can see at the top,
    something about the authors,
  • 3:40 - 3:42
    about the works.
  • 3:42 - 3:45
    The third group is going to be
    the periodical,
  • 3:45 - 3:48
    and then, how the scores started showing.
  • 3:49 - 3:52
    And of course, this is how they look--
  • 3:52 - 3:57
    the beauty of being able to present
    the preliminary findings.
  • 3:58 - 4:02
    Once we have uploaded all of the data,
  • 4:03 - 4:06
    and I hope that that's going to be done
    by the end of this year,
  • 4:07 - 4:08
    this will obviously look different.
  • 4:10 - 4:11
    Appendix.
  • 4:11 - 4:15
    So, here's what the depiction looks like
  • 4:15 - 4:19
    at the Northeastern University website.
  • 4:19 - 4:22
    I don't think it's quite as clear
    as what we can do with Wikidata.
  • 4:23 - 4:27
    And so, this was probably the reason why,
    when I started as a visiting scholar
  • 4:27 - 4:32
    in 2017, they asked if this is one
    of the projects that I could work on.
  • 4:32 - 4:36
    They stopped their work
    the year before, in 2016.
  • 4:36 - 4:39
    And I think they just don't have
    the resources to continue.
  • 4:40 - 4:43
    Some parts of this presentation
    came from another
  • 4:43 - 4:46
    that was published in 2016.
  • 4:46 - 4:49
    And last but not least, here are links
  • 4:49 - 4:53
    to the different parts
    of the work that I'm doing.
  • 4:54 - 4:56
    Thank you very much.
  • 4:56 - 4:57
    Questions.
  • 4:57 - 4:59
    (applause)
  • 5:10 - 5:15
    (woman) So, when you have a work,
    and you have the review of the work,
  • 5:15 - 5:18
    are you looking
    at a particular edition of the work,
  • 5:18 - 5:21
    or are these all reviews
    of first editions?
  • 5:21 - 5:23
    It's a good question. No.
  • 5:23 - 5:26
    They are not just reviews
    of the first edition.
  • 5:26 - 5:29
    Some are reviews of the second
    or third edition.
  • 5:30 - 5:32
    I'm going to add something
    that maybe I should have said
  • 5:32 - 5:35
    before I closed
    and went to question and answers--
  • 5:35 - 5:37
    what's so special about this?
  • 5:37 - 5:40
    What's special is nobody else
    has done this on Wikidata.
  • 5:41 - 5:46
    Surely, there are other universities
    that have their own collections,
  • 5:46 - 5:51
    where their scholars have reviewed
    the reviews of someone's work
  • 5:52 - 5:53
    in some language.
  • 5:54 - 5:57
    So, hopefully,
    once this methodology gets--
  • 5:58 - 6:02
    once I write this up and the project
    is over and presented again,
  • 6:02 - 6:05
    that there will be other
    universities, other libraries
  • 6:05 - 6:08
    that will speak up and say,
    "We've got data sets, too,
  • 6:08 - 6:13
    and we're going to go ahead
    and upload them into Wikidata ourselves,"
  • 6:13 - 6:16
    and then it'd be lovely
    to start doing some comparisons.
  • 6:20 - 6:22
    Anyone? Jane.
  • 6:22 - 6:24
    (Jane) Do you actually have books?
  • 6:24 - 6:27
    Do you actually have the books--
    are the books in existence,
  • 6:27 - 6:29
    or are you actually
    doing metadata about books
  • 6:29 - 6:31
    where we don't even know
    where the books are?
  • 6:32 - 6:35
    Northeastern University
    actually has the book,
  • 6:35 - 6:37
    or the essay, or the poem.
  • 6:40 - 6:45
    And they have the critical review
    of the book, or the essay, or the poem.
  • 6:46 - 6:49
    And they're working
    on the transcription of these,
  • 6:49 - 6:51
    and they're not at 100% yet.
  • 6:52 - 6:56
    They're not at 100%, but it's like,
    all things working on it.
  • 7:00 - 7:02
    Any other questions?
  • 7:06 - 7:07
    (host) We're going to wrap it up there.
  • 7:07 - 7:09
    Thanks for being such a nice audience.
  • 7:09 - 7:12
    (applause)
  • 7:14 - 7:19
    Lady bug for [inaudible].
  • 8:58 - 8:59
    (man) Finally got that.
  • 8:59 - 9:03
    What I'm going to do is I'm just going
    to click on these to load.
  • 9:03 - 9:06
    Just while-- is that new tab there?
  • 9:07 - 9:08
    [inaudible]
  • 9:08 - 9:11
    The first one? Yeah, perfect.
  • 9:11 - 9:14
    Sorry, my German is not even rusty,
  • 9:14 - 9:15
    it's simply non-existent.
  • 9:16 - 9:20
    So, I'll just let them load,
    because then these queries can run
  • 9:20 - 9:23
    while I'm sort of introducing
    what I was talking about and doing.
  • 9:23 - 9:25
    So, hi, I'm Nav from Histropedia.
  • 9:25 - 9:28
    And basically, for the last
    quite a few years,
  • 9:28 - 9:30
    we've been relatively quiet,
  • 9:30 - 9:32
    while we've been sort of working
    on technology and tools
  • 9:32 - 9:37
    that we need to sort of develop,
    ultimately, Histropedia version 2,
  • 9:37 - 9:39
    which is going to be, you know,
    this huge enhancement
  • 9:39 - 9:41
    on the first version.
  • 9:41 - 9:43
    Well, it's kind of in progress,
    but as we do it,
  • 9:43 - 9:45
    we've been experimenting
    with these other tools,
  • 9:45 - 9:47
    and building the technology
    that we're going to need.
  • 9:48 - 9:52
    One really crucial part for this
    is the ability to sort of see
  • 9:52 - 9:55
    the whole of history
    from the billions of years time scale,
  • 9:55 - 9:59
    to up to the current day,
  • 9:59 - 10:01
    and zooming all the way into single days.
  • 10:01 - 10:03
    And ultimately, in the end,
    down to hours and minutes.
  • 10:03 - 10:07
    We've managed to create
    a [inaudible] of update to our engine.
  • 10:07 - 10:08
    Other engines can already do this,
  • 10:08 - 10:11
    but unfortunately, they also can't handle
    the large data sets.
  • 10:11 - 10:13
    So, we finally got this update
    to our engine.
  • 10:13 - 10:15
    It allows us to zoom to billions of years.
  • 10:15 - 10:20
    So, recently-- the recently
    finished update,
  • 10:20 - 10:22
    and it's basically, it's an update
    to our query viewer tool,
  • 10:22 - 10:24
    which is like a live version
    of Histropedia
  • 10:24 - 10:27
    just linked straight to Wikidata.
  • 10:27 - 10:29
    So, it's literally based on a query,
  • 10:29 - 10:31
    a live query, and we see
    the results of it.
  • 10:31 - 10:34
    So, it's sort of separate
    to our main tool.
  • 10:34 - 10:38
    So, I'm going to flick to the first one,
    which is my first experiment.
  • 10:38 - 10:40
    And you'll forgive me, the queries--
  • 10:40 - 10:42
    the code was kind of finished
    not so long ago,
  • 10:42 - 10:45
    and the queries, I've been trying
    to find out what can I find
  • 10:45 - 10:48
    and what's interesting
    to look at, what's missing.
  • 10:48 - 10:52
    So, I started off
    with a kind of, sort of, well--
  • 10:52 - 10:54
    So, that's not the right--
    that's not Life on Earth.
  • 10:54 - 10:56
    Is this Life on Earth?
  • 10:56 - 10:57
    That will do, anyway.
  • 10:57 - 11:02
    So, I started off just trying to look
    at what sort of things
  • 11:02 - 11:05
    are actually in Wikidata.
  • 11:05 - 11:07
    And this particular one--
    sorry, it's in reverse.
  • 11:07 - 11:10
    So, this is the first one
    I wanted to show you.
  • 11:10 - 11:12
    So, this is a kind of
    a life on Earth query
  • 11:12 - 11:14
    that I wanted to develop.
  • 11:14 - 11:18
    And basically, what it is
    is all the taxons in Wikidata
  • 11:18 - 11:20
    that have a date.
  • 11:20 - 11:24
    And as you can probably see
    from the panel, there is not many of them.
  • 11:24 - 11:26
    But we do have the different taxon ranks.
  • 11:26 - 11:28
    So, you know, is it a species, a class--
  • 11:28 - 11:30
    for a biologist,
    this makes a lot of sense.
  • 11:30 - 11:32
    But if I was just to close that a bit,
  • 11:33 - 11:35
    we can see, we are going back
    to the earliest forms of life here.
  • 11:35 - 11:37
    3.5 billion years ago.
  • 11:37 - 11:43
    And as we zoom in here, we start to see
    the more modern forms of life,
  • 11:43 - 11:47
    and we see some really
    interesting things developing,
  • 11:47 - 11:51
    but we're still lacking a lot of data
    in terms of this kind of time range.
  • 11:52 - 11:55
    So, my next thought was,
    "Okay, well, why aren't--"
  • 11:56 - 11:57
    "I want to see a Tyrannosaurus Rex."
  • 11:57 - 12:00
    That's what I really wanted to see
    on my query, and it wasn't there.
  • 12:00 - 12:02
    So, had a little dig in,
    and I found out why.
  • 12:02 - 12:05
    It's because they're much more
    being stored
  • 12:05 - 12:09
    in terms of the temporal range
    or time period that they relate to.
  • 12:09 - 12:11
    So, on comes the next query,
  • 12:11 - 12:13
    where I actually sort of--
  • 12:14 - 12:18
    basically, this query
    is looking for any item
  • 12:18 - 12:22
    that has a temporal range start,
    and/or a temporal range end.
  • 12:23 - 12:26
    Which is basically in the form--
    in life forms, it kind of relates
  • 12:26 - 12:29
    to when they emerged
    and when they became extinct.
  • 12:29 - 12:31
    So, these are the periods
    on the side here.
  • 12:32 - 12:33
    If I just close that a bit--
  • 12:33 - 12:37
    you can see that we have
    quite a lot of interesting stuff.
  • 12:37 - 12:40
    And there's the Tyrannosaurus
    that I was looking for.
  • 12:40 - 12:43
    So, I finally got that,
    and I was like, "Yes! I've done it!"
  • 12:43 - 12:46
    I've got that Triceratops
    in there for bonus.
  • 12:46 - 12:49
    But of course, still loads missing.
  • 12:49 - 12:51
    And I'd love to see lots more here.
  • 12:51 - 12:53
    But at least, it gives you the idea.
  • 12:53 - 12:56
    The nice thing is, here as well,
    if I star some of these,
  • 12:56 - 12:58
    you can see that
    the time range is shown.
  • 12:58 - 13:01
    So, you can start to do
    what I really wanted to do, is say,
  • 13:01 - 13:04
    "Okay, when did this one end,
    and when did the next one begin?
  • 13:04 - 13:06
    When did things start going extinct?"
  • 13:06 - 13:10
    So, I was pretty excited, but, still,
    really hoping for a lot more.
  • 13:10 - 13:12
    So, there's a lot of editing to be done
  • 13:12 - 13:15
    in terms of these large geological
    and cosmic time scales.
  • 13:16 - 13:19
    You can see on the color code,
    I can also do extinction period.
  • 13:19 - 13:23
    So, I say, I want to find out stuff
    that went extinct in the late Cretaceous.
  • 13:23 - 13:26
    And I now know that two things did that.
  • 13:26 - 13:28
    There's obviously quite a few more.
  • 13:28 - 13:30
    And I put the taxon rank
    in there, as well,
  • 13:30 - 13:32
    just so that we can also see,
  • 13:32 - 13:35
    "Okay, which, what
    is its species, genus, et cetera."
  • 13:35 - 13:37
    So, pretty exciting.
  • 13:37 - 13:41
    I was quite happy, but it's unfolding,
    what needs to be done a lot.
  • 13:42 - 13:45
    So I went to the next one, which was--
  • 13:45 - 13:48
    I was thinking, "Well, I can't find
    all the data I'm looking for.
  • 13:48 - 13:49
    Let's go a bit more general,
  • 13:49 - 13:54
    and just look for all of a certain kind
    of dates in Wikidata that I can find
  • 13:54 - 13:57
    that are over 10,000 years old, basically.
  • 13:58 - 14:01
    And what type of thing are they?"
  • 14:01 - 14:04
    So, this color code is relatively okay,
    but it might be a bit misleading,
  • 14:04 - 14:06
    because some things are multiple types.
  • 14:06 - 14:08
    So, therefore,
    it's a bit random, at times.
  • 14:08 - 14:11
    But, you get some really
    fascinating stuff in here.
  • 14:11 - 14:14
    I've got for a start--
    I've got all of the millennia
  • 14:14 - 14:18
    that we have in Wikidata,
    which is, you know, there you go.
  • 14:18 - 14:22
    Read about everything that happened
    in all these different millennia.
  • 14:22 - 14:24
    No pictures for any
    of these, unfortunately.
  • 14:24 - 14:27
    So, there's nothing to really say
    what happened in them.
  • 14:27 - 14:29
    Taxon, which we were just looking at,
    which kind of led me on
  • 14:29 - 14:31
    to the other queries.
  • 14:31 - 14:34
    And of course, that sort of
    like all of them in one group.
  • 14:34 - 14:37
    Interesting stuff.
    Archaeological cultures.
  • 14:37 - 14:40
    And this is like, okay,
    this is more like up my street.
  • 14:40 - 14:43
    This is the sort of things
    I want to learn about.
  • 14:43 - 14:45
    Again, pictures would be nice.
  • 14:45 - 14:49
    But it's really showing you
    something interesting.
  • 14:49 - 14:50
    And it's just worth exploring here.
  • 14:50 - 14:53
    And of course, there's some
    that really make me excited
  • 14:53 - 14:54
    for what we could be doing.
  • 14:54 - 14:57
    For example, there was
    something here which was--
  • 14:58 - 15:01
    I mean, system, actually,
    was quite an interesting one.
  • 15:02 - 15:04
    And sorry, that's not actually
    the one I was thinking about.
  • 15:04 - 15:06
    In fact, that means nothing to me at all.
  • 15:06 - 15:08
    Someone might know what that means.
  • 15:08 - 15:11
    Art movements,
    archaeological sites, activities.
  • 15:11 - 15:12
    There was only two of these,
  • 15:12 - 15:16
    but I really like the idea, because--
    and they're both the same.
  • 15:16 - 15:18
    They're both hunting.
  • 15:18 - 15:19
    And of course, there's two of them.
  • 15:19 - 15:22
    And the reason is, is because
    there's a little qualifier on there.
  • 15:22 - 15:25
    If we were to just
    look through, we can see--
  • 15:25 - 15:28
    we can see somewhere down here,
    will be the start time.
  • 15:28 - 15:31
    And the qualifier is talking about
    when Homo erectus did it,
  • 15:31 - 15:33
    and when Homo sapiens did it.
  • 15:33 - 15:36
    So that should be
    in brackets on the query,
  • 15:36 - 15:39
    a little extension to do to show you
    what the two different versions mean.
  • 15:39 - 15:42
    But I would love to see
    all of human skills in here.
  • 15:42 - 15:45
    When did we first do farming,
    when did we first this--
  • 15:45 - 15:46
    when did fire come about?
  • 15:46 - 15:48
    All of these things,
    when did we first extract iron?
  • 15:48 - 15:50
    When did we first--
    all of these wonderful things
  • 15:50 - 15:54
    that developed
    to modern world that we live in.
  • 15:54 - 15:57
    So, really exciting signs
    of what could be there,
  • 15:57 - 15:58
    if it all got populated.
  • 15:58 - 16:00
    So, you know, this is what
    we really need to work on,
  • 16:00 - 16:02
    is some of this historical info.
  • 16:03 - 16:05
    Last one, I just wanted to just show you,
  • 16:05 - 16:07
    which was just an extra
    bonus one I threw in,
  • 16:07 - 16:11
    just to look at the time periods
    that we actually have,
  • 16:11 - 16:14
    the historical ages
    that we have in Wikidata.
  • 16:14 - 16:18
    And so, this is actually just all
    sub-classes of unit of time.
  • 16:18 - 16:22
    And then, this is the actual
    instance that it was.
  • 16:22 - 16:24
    And it's just really interesting.
  • 16:24 - 16:26
    This is more the kind of thing--
  • 16:27 - 16:30
    In Histropedia Mark II,
    these are the kind of things
  • 16:30 - 16:32
    that will actually will be displayed
    more under the timeline
  • 16:32 - 16:34
    as a sort of a range or period.
  • 16:34 - 16:36
    And so, we are particularly interested
    in these periods
  • 16:36 - 16:38
    being really tight and nice,
  • 16:38 - 16:41
    because it helps you to, then,
    say what happened when,
  • 16:41 - 16:44
    and you can sound really clever
    when you talk about when things happened,
  • 16:44 - 16:47
    in the Neolithic or the upper
    Paleolithic, or whatever.
  • 16:47 - 16:49
    I'm still pretty clueless on most of it,
  • 16:49 - 16:52
    because I'm just kind of just waiting
    for the data to be up to scratch.
  • 16:52 - 16:55
    Great. I think I can actually
    round it up there.
  • 16:55 - 16:57
    Loads more exciting queries to come.
  • 16:57 - 17:00
    A lot more features and cool stuff,
    actually, just around the corner for us,
  • 17:00 - 17:03
    because we've just finished
    a lot of cool things.
  • 17:03 - 17:05
    But there's a little bit of time
    to pull it all together.
  • 17:05 - 17:07
    So, look out for more.
  • 17:07 - 17:10
    If there's any questions,
    I think I've got one minute.
  • 17:10 - 17:11
    So, it would have to be one.
  • 17:12 - 17:13
    (host) Yes, Nav.
    I forgot to introduce you.
  • 17:13 - 17:17
    I'm sorry. That's Nav, as he said,
    Histropedia, Evans. Thank you very much.
  • 17:17 - 17:18
    Thank you. Cheers. Yeah.
  • 17:18 - 17:19
    (host) Very fast questions.
  • 17:19 - 17:22
    Anyone with a very fast question
    [inaudible].
  • 17:25 - 17:29
    (woman 2) Very quickly, how can
    I do my own, if I want languages,
  • 17:29 - 17:31
    when do we start, for instance.
  • 17:31 - 17:32
    Absolutely. Good question.
  • 17:32 - 17:34
    So just click on the--
    oh, I've shared this.
  • 17:34 - 17:37
    It's called cosmic timelines on the URL.
  • 17:37 - 17:41
    Should be cosmic and geological,
    but then it's not a short URL anymore.
  • 17:41 - 17:44
    So, you click on this icon
    in the top corner there,
  • 17:44 - 17:47
    and then, you get to the query page,
    which is like the home page of this tool.
  • 17:47 - 17:49
    This is where the query is pasted in.
  • 17:49 - 17:51
    So, at the moment,
    I've got the language there.
  • 17:51 - 17:53
    If I want to change it to something else,
  • 17:53 - 17:56
    Arabic, or French, or whatever--
  • 17:56 - 17:58
    and here are the-- this is the area
  • 17:58 - 18:03
    where you sort of enter in exactly
    which variables in your query
  • 18:03 - 18:05
    you would like to do each thing.
  • 18:05 - 18:07
    If you put nothing in,
    it will try and figure it out.
  • 18:07 - 18:10
    But if you want advanced stuff--
    and really important, is the precision,
  • 18:10 - 18:13
    because that's not available
    on the query service timeline.
  • 18:13 - 18:14
    So, you get everything--
  • 18:14 - 18:16
    is the first of January
    10 billion years ago,
  • 18:16 - 18:18
    you know, which is not
    what we want to see.
  • 18:18 - 18:21
    And the rank, which is quite interesting.
  • 18:21 - 18:24
    My timelines are all based
    on a very simple rank of site link count,
  • 18:24 - 18:27
    how many different articles there are,
    or something else.
  • 18:27 - 18:29
    But that's how you go
    and mess around with it with yourself,
  • 18:29 - 18:32
    and you put your color codes
    and your filters in down here.
  • 18:32 - 18:34
    Comma separate them,
    if you would like more,
  • 18:34 - 18:36
    and they come up as options
    in the final tool.
  • 18:36 - 18:38
    And I think that
    pretty much is it, isn't it.
  • 18:38 - 18:40
    So, any other questions,
    do find me afterwards.
  • 18:40 - 18:42
    Always happy to get cornered
    for this stuff.
  • 18:42 - 18:43
    I love talking about it.
  • 18:43 - 18:45
    Okay. So, thank you very much. Cheers.
  • 18:45 - 18:47
    (applause)
  • 19:28 - 19:30
    (mumbles)
  • 19:30 - 19:32
    So, where is the first one?
  • 19:34 - 19:35
    This one, no.
  • 19:46 - 19:47
    This? Sorry.
  • 19:48 - 19:50
    Is it full screen?
  • 19:50 - 19:52
    Yep. Full screen.
  • 19:55 - 19:56
    Well, good work.
  • 19:58 - 19:59
    [Strike.]
  • 19:59 - 20:02
    Yeah, so, okay. Thank you.
  • 20:05 - 20:07
    So, hi, I'm Thibaud Senalada.
  • 20:07 - 20:09
    As [inaudible] introduced me.
  • 20:10 - 20:14
    I'm a software engineer
    at the French National Library.
  • 20:15 - 20:18
    And I'm here today
    to talk to you about NOEMI,
  • 20:19 - 20:24
    which is a software, a proof of concept,
  • 20:24 - 20:27
    and a [inaudible] software
  • 20:27 - 20:30
    to the French Library to cataloging.
  • 20:31 - 20:33
    Sorry. [inaudible].
  • 20:33 - 20:35
    Sorry for my English. It's a bit of fuzzy.
  • 20:37 - 20:39
    And so, what's NOEMI?
  • 20:39 - 20:42
    So, NOEMI stands for:
  • 20:42 - 20:45
    Nouer les oeuvres, expressions,
    Manifestations et Items.
  • 20:45 - 20:47
    Which, in English, is:
  • 20:47 - 20:50
    to link work, expression,
    manifestation, and items.
  • 20:51 - 20:58
    It's based on the FRBR,
  • 20:58 - 21:01
    and [inaudible].
  • 21:01 - 21:03
    Yeah. Anyway.
  • 21:04 - 21:05
    So, yeah.
  • 21:05 - 21:10
    So, this software,
    we use to produce metadata.
  • 21:11 - 21:12
    It will be used
  • 21:12 - 21:18
    by 600 people on a daily basis.
  • 21:19 - 21:24
    And as I say in the title,
    it will be based on Wikibase.
  • 21:25 - 21:32
    So, there is also a format manager.
  • 21:32 - 21:39
    So, people using this software
    will use like a code editor,
  • 21:39 - 21:42
    but for MARC format.
  • 21:42 - 21:45
    So, it's [inaudible], things like that.
  • 21:47 - 21:50
    A data processing tool, like I said.
  • 21:50 - 21:53
    And also, authorization management,
  • 21:54 - 21:56
    because they will need a--
  • 21:57 - 22:01
    if there is some data,
    where it can be modified.
  • 22:06 - 22:08
    So, the PoC context.
  • 22:09 - 22:13
    So, this software will be replacing
    an old software,
  • 22:13 - 22:16
    called ADCAT02.
  • 22:17 - 22:21
    It is part of the bibliographic
    transition.
  • 22:21 - 22:25
    So, I say the [inaudible].
  • 22:25 - 22:29
    [inaudible]. [inaudible] in English?
  • 22:30 - 22:32
    Format.
  • 22:33 - 22:36
    And it will be the [inaudible] of the--
  • 22:40 - 22:41
    Sorry.
  • 22:42 - 22:47
    It will be [inaudible]
    all the [inaudible]
  • 22:47 - 22:50
    of the BnF with data.
  • 22:52 - 22:54
    And so, doing this work,
  • 22:54 - 23:00
    we accessed Wikibase to see
    if it fits our needs.
  • 23:01 - 23:03
    And [inaudible] pretty good.
  • 23:04 - 23:07
    So, why Wikibase?
  • 23:07 - 23:09
    Because of the flexibility of the format.
  • 23:09 - 23:12
    We arrive--
  • 23:12 - 23:16
    to inject MARC, INTERMARC for BnF--
  • 23:17 - 23:18
    in the database.
  • 23:18 - 23:23
    And use it to-- use this link management
  • 23:23 - 23:26
    between entities using Blazegraph,
  • 23:26 - 23:28
    so, as Wikibase does.
  • 23:29 - 23:33
    We also choose Wikibase,
    because it was already--
  • 23:35 - 23:39
    it handles history and user account.
  • 23:40 - 23:42
    So, it's easiest for us.
  • 23:43 - 23:48
    And it also has a good--
    it's pretty easy to create bots
  • 23:48 - 23:51
    to watch and curate data
  • 23:52 - 23:53
    and also to make statistics.
  • 23:55 - 23:57
    It's free and open, and sustainable.
  • 23:58 - 23:59
    Yeah, so.
  • 24:00 - 24:03
    I'm sorry if you don't
    understand what I say,
  • 24:03 - 24:05
    because I know my English
    is not that good.
  • 24:08 - 24:12
    But during this PoC,
    we encountered some trouble.
  • 24:13 - 24:14
    Okay.
  • 24:15 - 24:21
    First of all, as a search engine,
    I think we have to create
  • 24:21 - 24:24
    another--
  • 24:24 - 24:29
    not another, a supplementary
    search engine to use it with,
  • 24:29 - 24:31
    to fit our needs.
  • 24:32 - 24:37
    Because we need some search
  • 24:37 - 24:42
    like faceted search and filters.
  • 24:44 - 24:48
    Also we have the [inaudible],
  • 24:48 - 24:50
    of using postgreSQL database.
  • 24:50 - 24:55
    And for the moment,
    I think Wikibase [inaudible].
  • 24:56 - 25:01
    And when we try to use postgreSQL,
    it was a bit difficult,
  • 25:01 - 25:04
    and will cause some issues.
  • 25:06 - 25:09
    And we have also some fear
    about performance,
  • 25:09 - 25:15
    because the catalog is about
    20 million entities,
  • 25:16 - 25:19
    20 million bibliographic entities.
  • 25:19 - 25:23
    That can be more
    than 20 million entities, actually.
  • 25:23 - 25:28
    And we don't know the time
    that we'll have to inject them
  • 25:28 - 25:31
    in the Wikibase, and how to do it.
  • 25:32 - 25:34
    So, [inaudible],
  • 25:34 - 25:40
    but the real software development
    has already started.
  • 25:43 - 25:46
    We start by creating
    an interface with Wikibase.
  • 25:46 - 25:48
    We're using Java.
  • 25:48 - 25:50
    Like PyWikibase.
  • 25:52 - 25:55
    - (man) Pywikibot.
    - Pywikibot. Yeah, thank you.
  • 25:56 - 25:58
    The same way, but in Java.
  • 25:59 - 26:03
    We also inject already the format
    into the Wikibase.
  • 26:04 - 26:09
    And we do something
    like the INTERMARC editor,
  • 26:09 - 26:12
    [inaudible], et cetera.
  • 26:14 - 26:15
    Thank you.
  • 26:15 - 26:17
    (applause)
  • 26:24 - 26:25
    Yeah.
  • 26:28 - 26:30
    (man 2) Faceted search
    will be a nice feature
  • 26:30 - 26:32
    in the Wikidata UI itself.
  • 26:32 - 26:34
    So, have you talked
    to any of the developers,
  • 26:34 - 26:36
    or is that something
    that could be done?
  • 26:36 - 26:37
    Sorry, I don't understand.
  • 26:37 - 26:39
    (man 2) The faceted search idea.
  • 26:40 - 26:42
    It would be nice to be able
    to search only humans,
  • 26:42 - 26:44
    or search only works, or something, right?
  • 26:44 - 26:48
    Yeah. I'm sorry, I don't-- I don't--
  • 26:48 - 26:50
    (man 2) Yeah, I mean, so,
    it would be nice if we had that
  • 26:50 - 26:52
    in Wikidata itself in the UI.
  • 26:53 - 26:54
    Yeah, yeah, yeah.
  • 26:54 - 26:56
    [inaudible]
  • 26:56 - 26:58
    Yeah, okay, thank you.
  • 26:58 - 27:00
    I'm sorry. (laughs)
  • 27:01 - 27:04
    Yeah, yeah. But I think we will--
  • 27:05 - 27:07
    I don't know if we want
    to do it inside Wikibase,
  • 27:07 - 27:11
    or in our next systems.
  • 27:11 - 27:15
    For the moment,
    we don't really solve that.
  • 27:16 - 27:18
    For the moment, I think.
  • 27:18 - 27:19
    Sorry.
  • 27:28 - 27:31
    (man 3) I suppose on the topic
    of the faceted search,
  • 27:33 - 27:35
    Wikidata, SPARQL Query, Wikibase--
  • 27:35 - 27:39
    SPARQL Query is I think,
    functionally equivalent
  • 27:39 - 27:41
    to a facetable search.
  • 27:42 - 27:44
    So, it's mostly an interface issue, right?
  • 27:44 - 27:48
    I mean, you could build an interface
    that starts with a query,
  • 27:48 - 27:51
    and then, gives you
    possible facets to filter by.
  • 27:51 - 27:53
    And when you click one of them,
  • 27:53 - 27:55
    it adds a condition
    to the SPARQL Query, right?
  • 27:56 - 27:58
    Yeah, but I think the SPARQL--
  • 27:59 - 28:04
    they don't go as detailed
    as we want, as we have--
  • 28:06 - 28:10
    When we inject the format,
    we use a statement for--
  • 28:11 - 28:13
    the format is like XML.
  • 28:13 - 28:16
    So, it's a zone, subzone, and value.
  • 28:16 - 28:20
    And in the [inaudible] statement,
    we add the subzone,
  • 28:21 - 28:23
    because the zone was already there.
  • 28:23 - 28:29
    And we want to query
    some qualifier on this.
  • 28:29 - 28:35
    And I don't know if the SPARQL
    goes through that-- I'm sorry--
  • 28:36 - 28:38
    in a fast way.
  • 28:40 - 28:46
    I think we need some index
    for us to [inaudible].
  • 28:47 - 28:48
    Yeah.
  • 28:48 - 28:50
    (man 3) SPARQL doesn't do a query--
  • 28:52 - 28:56
    To do proper string searches
    in SPARQL is very hard.
  • 28:56 - 28:58
    You have to have filters, which are slow,
  • 28:58 - 29:00
    and it really doesn't work that well.
  • 29:00 - 29:03
    So, it's a different
    search problem, really.
  • 29:07 - 29:09
    More question? If anyone has one?
  • 29:12 - 29:14
    - Great. Thank you.
    - Thank you.
  • 29:14 - 29:16
    (applause)
  • 29:38 - 29:42
    (host) Nielsen speaking about
    the tool Ordia. Thank you.
  • 30:05 - 30:06
    So, I'm Finn Årup Nielsen,
  • 30:06 - 30:09
    and a couple of years ago,
    I started Scholia
  • 30:09 - 30:15
    that displays data from Wikidata
    via a SPARQL Query
  • 30:15 - 30:16
    to the Wikidata Query Service
  • 30:16 - 30:19
    so we can generate, for example,
    a list of publications
  • 30:19 - 30:20
    for a specific author.
  • 30:21 - 30:27
    Now, last year, Wikidata
    introduced lexicographic data.
  • 30:29 - 30:33
    And I [inaudible] the idea of Scholia
  • 30:33 - 30:39
    that is using Wikidata
    and the Wikidata Query Service
  • 30:39 - 30:42
    to generate overviews
    of lexicographic data.
  • 30:43 - 30:46
    So, Ordia is the example of this one here.
  • 30:46 - 30:52
    So, it generates-- it's a web application
    run from the Toolforge service,
  • 30:52 - 30:57
    and for example, it will dynamically
    generate a page such as--
  • 30:57 - 31:02
    This one here is statistics over
    what there is of lexicographic data
  • 31:02 - 31:04
    in Wikidata.
  • 31:04 - 31:07
    For example, the number of lexemes,
    is currently over 200,000.
  • 31:09 - 31:10
    So, there's a range of things
    you can do here.
  • 31:10 - 31:13
    You can, for example,
    look in the aspects of that.
  • 31:13 - 31:16
    The menu, there's quite a lot
    of things here.
  • 31:16 - 31:18
    And so, I will search
    on a specific Danish lexemes.
  • 31:20 - 31:23
    "Rød"-- which is "red" in Danish.
  • 31:23 - 31:27
    So, you basically get,
    for the specific lexeme,
  • 31:28 - 31:31
    the same type of information
    that you could see
  • 31:31 - 31:34
    in the ordinary part of Wikidata, here.
  • 31:34 - 31:38
    Annotations about the lexeme,
    annotation about the forms,
  • 31:39 - 31:41
    single or plural forms.
  • 31:42 - 31:44
    Annotation about the sentence.
  • 31:45 - 31:48
    But what you can't see
    in ordinary Wikidata
  • 31:48 - 31:52
    is sort of aggregating across lexemes.
  • 31:52 - 31:54
    And this is, for example, down here--
  • 31:54 - 31:56
    down here with the compound.
  • 31:56 - 31:58
    So, in Danish, like in German,
  • 31:58 - 32:00
    words can be compounded.
  • 32:00 - 32:03
    For example, for "red",
    we have rødkælk
  • 32:03 - 32:06
    which is compounded by two words.
  • 32:07 - 32:10
    And we've got, on the second one here,
    rødvin-- red wine.
  • 32:11 - 32:16
    This list here is constructed
    by a SPARQL Query to the Wikidata Service.
  • 32:17 - 32:20
    And also, further down here,
    we've got a lot of Danish words here.
  • 32:21 - 32:26
    Further down here, we should have
    a graph of the words
  • 32:27 - 32:29
    which are compounded from rød.
  • 32:30 - 32:32
    We have [rød]-- red here in the middle.
  • 32:32 - 32:34
    And for example, around--
    somewhere around here,
  • 32:34 - 32:37
    which should have,
    for example, "red cabbage,"
  • 32:37 - 32:40
    "red cabbage salad,"
    "red cabbage soup," and so on.
  • 32:40 - 32:43
    So you can browse around,
    in this one here, and see it.
  • 32:44 - 32:51
    We can go a bit back here,
    and then look on the main sense
  • 32:51 - 32:55
    of the word rød-- red in Danish.
  • 32:56 - 33:02
    So, Ordia automatically generates
    information about hyponyms.
  • 33:03 - 33:04
    Subconcepts, for example,
  • 33:04 - 33:07
    light red, dark red,
    pink, purple, and so on,
  • 33:08 - 33:14
    are in the-- when we make
    a Wikidata Query service, SPARQL Query.
  • 33:15 - 33:21
    Then we go around in the Wikidata graph,
  • 33:21 - 33:22
    and get this information here.
  • 33:22 - 33:25
    And we can also get translation
    automatically,
  • 33:25 - 33:28
    even though it's not necessarily stated
    within the Wikidata lexemes items.
  • 33:28 - 33:33
    For example, here, we have translated
    rød to "red" in English,
  • 33:33 - 33:36
    and röd in Swedish, and so on.
  • 33:36 - 33:38
    There's not that very many there.
  • 33:39 - 33:40
    There's a range of other things here.
  • 33:40 - 33:43
    Let me show you,
    for example, this one here--
  • 33:44 - 33:51
    this is veninde- now I go
    over to this one here.
  • 33:54 - 33:57
    -inde, which is a feminine suffix.
  • 33:58 - 34:00
    So, this is auto-generated there,
  • 34:00 - 34:03
    it's a combination of "instance of"--
  • 34:03 - 34:07
    lexemes that are "instance of"
    feminine suffixes.
  • 34:08 - 34:12
    And for example, for German,
    we have [inaudible].
  • 34:12 - 34:15
    So, -in would be
    a feminine suffix in German.
  • 34:16 - 34:21
    And I put in sort of the five Danish
    feminine suffixes
  • 34:23 - 34:24
    of Danish.
  • 34:25 - 34:29
    Another facility is, for example,
    if you have a text,
  • 34:29 - 34:34
    you can copy and paste it
    into this Text to lexemes here.
  • 34:35 - 34:36
    Let me--
  • 34:37 - 34:41
    "a car crashed into...
  • 34:42 - 34:44
    a green house."
  • 34:46 - 34:49
    Let me change that to "English".
  • 34:49 - 34:50
    Press Submit.
  • 34:50 - 34:53
    Now, Ordia will then extract
    each of the word here,
  • 34:53 - 34:55
    in this sentence here,
  • 34:55 - 34:58
    and try to see whether they
    are entered in the specific form,
  • 34:58 - 35:01
    a lexeme, are entered in Wikidata.
  • 35:01 - 35:04
    And these simple words here
    are entered in Wikidata.
  • 35:04 - 35:09
    But if we, for example, change it to--
    there's nothing called "vancar"
  • 35:09 - 35:14
    but just let us do that here.
  • 35:15 - 35:20
    And you got down here--
    it's as a blue link
  • 35:20 - 35:23
    that you can create a new
    Wikidata lexeme item.
  • 35:25 - 35:29
    But the range of other things to explore
  • 35:30 - 35:31
    in this web application.
  • 35:31 - 35:36
    And if there's any suggestions,
    or comments, or notes, or something,
  • 35:36 - 35:39
    you can contact me, or put in
    an issue on GitHub.
  • 35:39 - 35:45
    So, this particular application
    is developed on GitHub,
  • 35:45 - 35:51
    and I'm open for new ideas
    and ways to represent information there.
  • 35:51 - 35:53
    Okay, thank you.
  • 35:53 - 35:55
    (applause)
  • 35:59 - 36:01
    Questions?
  • 36:03 - 36:05
    (woman 3) I love your tool.
  • 36:05 - 36:10
    Can you show the languages,
    that which is awesome for me, I think,
  • 36:10 - 36:12
    to show other languages.
  • 36:12 - 36:15
    So, this is a bit of statistics
    over the languages,
  • 36:15 - 36:17
    and the Russians
    have been scraping Wictionary,
  • 36:17 - 36:20
    and that's why they have now
    100,000 lexemes.
  • 36:24 - 36:28
    There's also a lot of work on Basque here.
  • 36:30 - 36:32
    I think there's an organization
    putting that information in here.
  • 36:32 - 36:35
    And you can also see a graph of these--
  • 36:35 - 36:38
    this is Number of forms as functions
    of number of lexemes.
  • 36:39 - 36:41
    And all the way up here--
  • 36:41 - 36:45
    here, this is Russian,
    down here, Basque, I think.
  • 36:45 - 36:48
    And English, perhaps, down here.
  • 36:49 - 36:51
    And also in the Number of senses,
  • 36:52 - 36:58
    I think Basque, English, and Russian,
  • 37:00 - 37:02
    Hebrew, and so on.
  • 37:02 - 37:03
    Yeah.
  • 37:11 - 37:13
    (man 4) That looks
    like an incredible tool.
  • 37:13 - 37:15
    But I was just wondering,
    is it all fully live?
  • 37:15 - 37:18
    Is it all based on SPARQL Queries
    and live or are there some things--
  • 37:18 - 37:20
    - Yes. I believe, yes.
    - Fantastic.
  • 37:21 - 37:25
    But as they get more data into Wikidata,
  • 37:25 - 37:26
    there's a bit of an issue.
  • 37:26 - 37:27
    For example, for Russian here.
  • 37:27 - 37:32
    I started out this a year ago
    when there's not that very many lexemes,
  • 37:32 - 37:36
    and so there was no problems
    with the time-outs.
  • 37:36 - 37:38
    But representing it here--
  • 37:38 - 37:42
    but if I press Russian,
    I think there might be some issues.
  • 37:42 - 37:44
    There's a count that works here,
  • 37:44 - 37:46
    for example, longest words or phrases.
  • 37:46 - 37:49
    But I think the lexemes
    are sort of loading in.
  • 37:49 - 37:53
    I think I'll need to fix that
    as Wikidata grows here.
  • 37:53 - 37:56
    As you see, there's a lot
    of Russian nouns, apparently.
  • 37:57 - 37:58
    And I don't know whether the--
  • 37:59 - 38:02
    apparently, that's what
    they're working on.
  • 38:02 - 38:04
    There seems also to be
    a bit of time-out there.
  • 38:07 - 38:08
    [inaudible], oh, yes.
  • 38:08 - 38:10
    The first one there.
  • 38:11 - 38:16
    But apparently, the longest words
    and phrases is a bit too expansive.
  • 38:18 - 38:20
    But apparently, it can be loaded there,
    and it's probably--
  • 38:21 - 38:23
    it's loaded all the 100,000 there,
  • 38:23 - 38:28
    so you can click all 10,000 pages.
  • 38:37 - 38:39
    (host) If there aren't
    any other questions--
  • 38:40 - 38:41
    The longest word came now.
  • 38:41 - 38:43
    So, it's, yeah.
  • 38:45 - 38:46
    Probably--
  • 38:48 - 38:50
    [inaudible]
  • 38:50 - 38:52
    What is that?
  • 38:52 - 38:54
    - (audience) It's a chemical.
    - A chemical, yes.
  • 38:56 - 38:58
    (host) More questions? Or shall we?
  • 39:00 - 39:02
    Alright, alright. Thank you very much.
  • 39:02 - 39:04
    (applause)
  • 39:24 - 39:25
    (Nicolas) Is it good?
  • 39:31 - 39:32
    (host) Awesome.
  • 39:35 - 39:38
    Alright, now, to wrap it up,
    we have Nicolas Vigneron,
  • 39:38 - 39:41
    talking about Wikisource and Wikidata.
  • 39:41 - 39:43
    (Nicolas) This is good?
  • 39:45 - 39:46
    Who knows Wikisource?
  • 39:48 - 39:49
    Yay!
  • 39:51 - 39:54
    More and more people
    raising hands every year.
  • 39:54 - 39:55
    That's good.
  • 39:55 - 40:01
    So, this morning, [Lydia] said that
    Wikivoyage was the first real user of--
  • 40:03 - 40:06
    [inaudible]
  • 40:07 - 40:08
    Wikisource is not that far behind.
  • 40:09 - 40:13
    There's a lot to do,
    and I want to do some basic numbers,
  • 40:13 - 40:17
    statistics, about where we are,
    and where I want to go.
  • 40:18 - 40:23
    So first, there will be a lot of questions
    of what is a book,
  • 40:23 - 40:25
    what is bibliographical data.
  • 40:25 - 40:27
    People from the BnF can agree with me.
  • 40:27 - 40:30
    That can be a nightmare
    if you go into details.
  • 40:30 - 40:36
    But some big numbers that--
    Google Books tried to do an estimation
  • 40:36 - 40:40
    on how many "books," air quote books,
    there is in the world,
  • 40:40 - 40:43
    and there's 130 million books
    in the world.
  • 40:44 - 40:47
    And, yeah, let's put them all on Wikidata.
  • 40:48 - 40:49
    Or not. I don't know.
  • 40:49 - 40:51
    But where are we now?
  • 40:51 - 40:52
    And why is it books?
  • 40:52 - 40:56
    Because for Google Books,
    everything is scanned, basically.
  • 40:56 - 40:59
    They don't have exactly
    a very clear distinction.
  • 40:59 - 41:04
    There's sometimes, two-page books,
    which [inaudible], Google Books is a book.
  • 41:05 - 41:10
    But for many people, you have to have
    at least 50 pages to be a book.
  • 41:11 - 41:12
    So, that's always hard to count.
  • 41:13 - 41:16
    But here's what we know on Wikidata.
  • 41:16 - 41:19
    This the graph of what
    is a book for Wikidata.
  • 41:19 - 41:22
    You have-- that's totally [inaudible]--
  • 41:22 - 41:24
    but that's Wikidata,
    literary work as well.
  • 41:24 - 41:27
    And this is all the subclasses,
    or subclasses of subclasses--
  • 41:27 - 41:30
    or subclasses of subclasses
    of what is a book.
  • 41:31 - 41:33
    So, that's very hard to do.
  • 41:33 - 41:34
    I can do a graph like that,
  • 41:34 - 41:37
    but SPARQL Query engine doesn't work
  • 41:37 - 41:42
    if I want to count everything
    that is instance of these subclasses,
  • 41:42 - 41:45
    and basically, SPARQL says no, time-out.
  • 41:46 - 41:47
    So, what's the problem?
  • 41:47 - 41:51
    But I know already that there's
    a lot of subclasses,
  • 41:51 - 41:52
    but we need to look into it.
  • 41:52 - 41:58
    And probably, if you know Wikidata,
    on the page, Wikidata point statistics,
  • 41:59 - 42:03
    you have all the numbers by big classes,
  • 42:03 - 42:07
    and you all probably know
    that the big chunk here
  • 42:07 - 42:09
    is scholarly articles,
  • 42:09 - 42:13
    which is, thanks to
    the WikiCite project, in particular,
  • 42:14 - 42:17
    which can be books or not,
    depending on definition.
  • 42:19 - 42:23
    You see that there's no subclass books,
  • 42:23 - 42:26
    because there's not enough to show.
  • 42:26 - 42:28
    It's probably somewhere in the others,
  • 42:28 - 42:30
    the purple area is others.
  • 42:30 - 42:34
    And there's a lot of things
    that's under one percent.
  • 42:34 - 42:39
    So, basically, we can say
    that we have less one percent
  • 42:39 - 42:42
    of things identified as books in Wikidata.
  • 42:43 - 42:46
    Maybe there is more books,
    but not identified as such.
  • 42:48 - 42:49
    I'm talking about books,
  • 42:49 - 42:52
    but when we are talking
    about bibliographical data,
  • 42:52 - 42:54
    there's also the author, person,
  • 42:54 - 42:58
    so maybe some of the human here
    are also authors, surely.
  • 43:00 - 43:03
    And we need to do another count,
    which is another big query to do.
  • 43:04 - 43:05
    That times out, so--
  • 43:05 - 43:08
    I have a lot of not number
    to this, sorry.
  • 43:11 - 43:14
    So, yeah, basically, this first slide
    is about how it's complicated
  • 43:14 - 43:19
    to know how much we have of what,
    and how to count them.
  • 43:19 - 43:21
    So, yeah, hard to count.
  • 43:22 - 43:23
    What we know--
  • 43:24 - 43:27
    that is we have a lot of properties--
  • 43:27 - 43:30
    700,000, I guess,
  • 43:30 - 43:32
    now on Wikidata.
  • 43:33 - 43:36
    We know that we have a lot of identifiers
    among these properties.
  • 43:37 - 43:43
    And we know that almost 4,000
    are properties for identifiers
  • 43:43 - 43:46
    relative to bibliographical,
  • 43:46 - 43:50
    like ID at the National Library of France,
  • 43:50 - 43:52
    National Library of Yaddi, Yaddi, Yada,
  • 43:52 - 43:57
    because we love identifier
    of National Library on Wikidata.
  • 43:57 - 44:00
    So, we have almost all libraries,
    national libraries and more.
  • 44:01 - 44:04
    So, we have a lot of properties.
    I know that.
  • 44:05 - 44:07
    And we are widely used.
  • 44:07 - 44:10
    I know that, for instance,
    BnF properties use--
  • 44:11 - 44:13
    BnF is National Library of France--
  • 44:13 - 44:19
    is used 1 million times--
    OCOC, VIAF, or the big like that.
  • 44:21 - 44:24
    A lot of uses in Wikidata.
  • 44:25 - 44:29
    But it's not because we have
    a lot of uses of various properties
  • 44:29 - 44:31
    in Wikidata that it's complete.
  • 44:31 - 44:34
    As Thibaud said, there's more
    than 20 million books,
  • 44:34 - 44:37
    [inaudible], which is more as entities.
  • 44:38 - 44:40
    And we have only 1 million,
  • 44:40 - 44:44
    so we have 19 million still to do.
  • 44:45 - 44:47
    Also, what we know from the Wikidata side,
  • 44:47 - 44:52
    is that we have a good--
    very quite active Wikidata project,
  • 44:52 - 44:54
    called WikiProject Books,
  • 44:54 - 44:58
    where we have a model we kind of agree on,
  • 44:58 - 45:01
    which is not always followed,
    which is, again, a problem.
  • 45:01 - 45:03
    What is a book? You know it.
  • 45:03 - 45:05
    I only have five minutes,
    so, I'll keep going.
  • 45:06 - 45:09
    And then, I'm a Wikisourcean,
    so, Wikisourcer.
  • 45:09 - 45:12
    So, I wanted to know
    the other way around
  • 45:12 - 45:13
    what is from Wikisource already,
  • 45:13 - 45:16
    because Wikisource is already
    inside the Wikimedia project.
  • 45:16 - 45:20
    A lot of bibliographical records
    and information.
  • 45:20 - 45:23
    So, in the 66 million items on Wikidata,
  • 45:23 - 45:29
    already 1 million are linked
    to Wikisource.
  • 45:29 - 45:32
    [inaudible].
  • 45:32 - 45:36
    So, that's very few,
    but that's quite a lot.
  • 45:37 - 45:40
    There's a lot of author.
  • 45:40 - 45:45
    There's some books, texts,
    work, edition, whatever.
  • 45:45 - 45:48
    Not always well-arranged.
  • 45:49 - 45:51
    And there's a lot of internal pages,
  • 45:51 - 45:53
    like categories and templates,
    and things like that.
  • 45:53 - 45:55
    But still, 1 million in total.
  • 45:58 - 46:02
    The Wikisource community
    are often small communities,
  • 46:02 - 46:05
    like on the French community Wikisource,
  • 46:05 - 46:08
    which is one of the biggest,
    there's 50 people.
  • 46:08 - 46:09
    That's the biggest we have.
  • 46:09 - 46:13
    So, we love Wikidata, because,
    hey, they did a lot of work for us.
  • 46:13 - 46:15
    So, just take it from Wikisource.
  • 46:15 - 46:20
    So, in this small community,
    we love to reuse Wikidata data.
  • 46:21 - 46:24
    Right now, we use a lot of a tool
    which is called WEF--
  • 46:24 - 46:28
    Wikidata Edit Framework-- thank you.
  • 46:29 - 46:33
    And we are eager to see
    how Wikidata Bridge will work.
  • 46:33 - 46:37
    And we are trying to do things
    with a team in Wikidata
  • 46:38 - 46:41
    in Wikipedia Deutschland team,
    [inaudible].
  • 46:41 - 46:44
    And there's a lot
    of collaboration in the future
  • 46:44 - 46:47
    that we want to do: better integrate,
  • 46:48 - 46:51
    do everything in one click when you import
    a first book in Wikisource,
  • 46:51 - 46:52
    things like that.
  • 46:53 - 46:58
    Better-- do links between
    edition in Wikidata.
  • 46:58 - 46:59
    That needs to be done.
  • 47:00 - 47:02
    The Foundation is doing the wish list now,
  • 47:02 - 47:05
    and we have a lot of requests about that.
  • 47:06 - 47:07
    And yeah, that's it.
  • 47:07 - 47:09
    That was just a short overview.
  • 47:09 - 47:15
    So, if you have some questions,
    I'll take them and be available later,
  • 47:16 - 47:17
    if you want to.
  • 47:18 - 47:20
    (applause)
  • 47:26 - 47:28
    Come on, you love Wikisource,
    you have questions!
  • 47:34 - 47:36
    (woman 4) I asked you
    already this in August,
  • 47:36 - 47:38
    and I wonder if this has already changed.
  • 47:38 - 47:42
    What is the biggest problem you have
    in Wikisource right now,
  • 47:42 - 47:44
    from your perspective?
  • 47:44 - 47:46
    The first one, only? (chuckles)
  • 47:48 - 47:54
    I think because it's a small community,
    we need efficient tools that work easily,
  • 47:54 - 47:57
    because we have very few people,
  • 47:57 - 47:59
    so we need tool that are easy to use
  • 47:59 - 48:04
    and a one-click solution
    to [inaudible] a bit,
  • 48:04 - 48:06
    that's a big dream.
  • 48:06 - 48:07
    I think that's what's most important,
  • 48:07 - 48:10
    because that's the threshold
    in Wikisource, it's a small community.
  • 48:11 - 48:13
    I think this is the most important.
  • 48:15 - 48:16
    [inaudible]
  • 48:17 - 48:20
    (man 5) I'm curious if you can speak
    to your opinion,
  • 48:20 - 48:23
    or the French Wikisource opinion,
    or maybe you spoke to other communities
  • 48:23 - 48:30
    about the notion of not including
    metadata about all the world's books.
  • 48:30 - 48:32
    That was mentioned in the morning.
  • 48:32 - 48:35
    Maybe other Wikibases,
    and other federated databases
  • 48:35 - 48:38
    will have that information,
    and Wikidata won't.
  • 48:39 - 48:41
    How does that feel for Wikisource?
  • 48:44 - 48:46
    This is my very personal opinion.
  • 48:46 - 48:47
    I know that people
    in the Wikisource community
  • 48:47 - 48:49
    disagree with that.
  • 48:49 - 48:51
    But I think we need to stay--
  • 48:51 - 48:53
    an external Wikibase
    is not a good solution,
  • 48:53 - 48:55
    because we have Shakespeare on Wikisource,
  • 48:55 - 48:58
    and we have Shakespeare on Wikipedia.
  • 48:59 - 49:01
    So, we need to interlink,
    and interlink is there.
  • 49:01 - 49:04
    Or like, Romeo and Juliet,
    we have them both.
  • 49:04 - 49:07
    So, we are still pretty close
    to Wikipedia.
  • 49:07 - 49:09
    And the difference with WikiCites--
  • 49:09 - 49:13
    with WikiCite, we have a lot of items
    which are small.
  • 49:14 - 49:16
    Wikisource is the other way around.
  • 49:16 - 49:18
    We have few items, who are big.
  • 49:18 - 49:21
    Which can be a scaling problem
    and everything,
  • 49:21 - 49:24
    but it's quite a small subset of data.
  • 49:24 - 49:28
    So, my personal opinion
    is we should stay in the Wikidata.
  • 49:28 - 49:32
    Again, because we are not
    very much a lot of people,
  • 49:32 - 49:34
    so we need to stay,
    with the tool we know,
  • 49:34 - 49:36
    don't change too much the tools
  • 49:36 - 49:38
    for the small community, please.
  • 49:38 - 49:39
    So, that's it.
  • 49:39 - 49:41
    But I know that other people disagree.
  • 49:41 - 49:45
    You can talk to [Sadeep] if you want.
    He will have another point of view.
  • 49:46 - 49:49
    Thank you. I think, last question, maybe.
  • 49:51 - 49:54
    (man 6) Sometimes, I find it difficult
    to link the Wikidata item
  • 49:54 - 50:01
    with a Wikisource article,
    because there's a Wikisource novel--
  • 50:01 - 50:06
    might be split over several pages,
    and there's an index page,
  • 50:06 - 50:09
    and there's perhaps a front page,
    or something like that.
  • 50:09 - 50:12
    Do you have that problem,
    or is that a general problem, or--
  • 50:12 - 50:17
    Yeah, that's one of the first ideas
    on the wish list
  • 50:17 - 50:19
    for the Foundation, actually.
  • 50:19 - 50:21
    Yeah, because Wikipedia is on the--
  • 50:21 - 50:23
    if you know the [inaudible] organization,
  • 50:23 - 50:27
    Wikipedia is on the work level,
    and Wikisource on the edition level.
  • 50:27 - 50:29
    So, already, you have a problem there.
  • 50:29 - 50:31
    And then, we have several editions
    of the same work,
  • 50:31 - 50:34
    and we have sub-chapters
    and things inside the edition.
  • 50:34 - 50:41
    So, yeah, that's one too many problems
    which is hard to solve by nature.
  • 50:42 - 50:45
    But there's maybe a tool
    that can help to solve that.
  • 50:46 - 50:47
    Hopefully.
  • 50:49 - 50:51
    And that's time, ladies and gentlemen.
  • 50:51 - 50:53
    So, thank you very much, Nicolas.
  • 50:53 - 50:55
    (applause)
  • 50:59 - 51:01
    And please join me giving
    one more round of applause
  • 51:01 - 51:03
    to all of our wonderful speakers.
  • 51:03 - 51:05
    (applause)
Title:
cdn.media.ccc.de/.../wikidatacon2019-18-eng-Lightning_talks_1_hd.mp4
Video Language:
English
Duration:
51:14

English subtitles

Revisions