< Return to Video

cdn.media.ccc.de/.../wikidatacon2019-3-eng-Glimpse_over_Wikidata_hd.mp4

  • 0:06 - 0:09
    Hello, everyone.
  • 0:09 - 0:12
    It's awesome that you're all here,
    so many of you.
  • 0:12 - 0:13
    It's really, really great.
  • 0:15 - 0:20
    So Lea already talked a lot
    about this event,
  • 0:20 - 0:23
    and I'm going to talk a bit
    about Wikidata itself
  • 0:23 - 0:26
    and what has been happening
    around it over the last year
  • 0:26 - 0:28
    and where we are going.
  • 0:29 - 0:33
    So... what is this? Sorry.
  • 0:40 - 0:44
    So... where are we?
    Where are we going?
  • 0:45 - 0:50
    Over the last year there has been
    so much to celebrate
  • 0:50 - 0:52
    and I want to highlight some of that
  • 0:52 - 0:55
    because sometimes it goes unnoticed.
  • 0:57 - 1:04
    And first I want to take you through
    some statistics around editors
  • 1:04 - 1:07
    and our content and how our data is used.
  • 1:10 - 1:15
    Over the last year,
    we have grown our community
  • 1:15 - 1:17
    which is amazing.
  • 1:17 - 1:21
    We have around 3,000 new people
  • 1:21 - 1:26
    who edit once or more in 30 days.
  • 1:26 - 1:30
    So that's 3,000 new Wikidatans, yay!
  • 1:32 - 1:37
    Now if you look at people who do more,
    like five edits in 30 days,
  • 1:37 - 1:41
    we've got an additional 1,200 roughly.
  • 1:41 - 1:44
    And if you look
    at the people who do 100 edits or more--
  • 1:44 - 1:47
    I hope many of you in this room--
  • 1:47 - 1:49
    we have 300 more.
  • 1:49 - 1:51
    Raise your hand
    if you're in this last group.
  • 1:53 - 1:56
    Woot! You're awesome!
  • 1:58 - 2:04
    And while the number of edits
    is usually not something
  • 2:04 - 2:09
    we pay a lot of attention to,
  • 2:09 - 2:13
    we did cross
    the 1 billion edits mark this year.
  • 2:13 - 2:15
    (applause)
  • 2:21 - 2:23
    Alright, let's look at content.
  • 2:28 - 2:31
    So, we're now at 65 million items,
  • 2:31 - 2:34
    so entities to describe the world,
  • 2:34 - 2:41
    and we're doing this
    with around 6,700 properties.
  • 2:44 - 2:48
    Of those, around 4,300
    are external identifiers,
  • 2:48 - 2:53
    which gives us a lot of linking
    to other catalogues, databases,
  • 2:53 - 2:56
    websites and more
  • 2:56 - 2:59
    and really makes Wikidata
    the central place
  • 2:59 - 3:02
    in a linked open data web.
  • 3:02 - 3:07
    So using those properties and items,
  • 3:07 - 3:12
    we have around 800 million statements now,
  • 3:12 - 3:16
    and compared to last year,
    we know about half a statement more
  • 3:16 - 3:18
    about every single item.
  • 3:19 - 3:20
    (laughter)
  • 3:23 - 3:25
    So, yeah, Wikidata got smarter.
  • 3:27 - 3:30
    But we don't just have items
    and properties,
  • 3:30 - 3:34
    we also have new stuff
    like lexemes
  • 3:34 - 3:40
    and we are now at 204,000 lexemes
    that describe words
  • 3:40 - 3:42
    in many different languages.
  • 3:42 - 3:43
    It's very cool.
  • 3:44 - 3:48
    I will talk more about this
    in a session later today.
  • 3:49 - 3:53
    Last, the latest addition
    are entity schemas
  • 3:53 - 3:59
    that help us figure out
    how to consistently model data
  • 3:59 - 4:01
    across a certain area.
  • 4:02 - 4:04
    And of those, we have around 140 now.
  • 4:08 - 4:11
    Now numbers aren't everything
    around content, right,
  • 4:11 - 4:15
    amount of content--we also care
    about quality of the content.
  • 4:16 - 4:22
    And what we've done now is
    we've trained a machine learning system
  • 4:22 - 4:25
    to judge the quality of an item.
  • 4:26 - 4:30
    Now this is far from perfect,
    but it gives you an idea.
  • 4:30 - 4:35
    So every item in Wikidata gets a score
    between 1 and 5.
  • 4:35 - 4:38
    One is pretty terrible; five is amazing.
  • 4:38 - 4:42
    And it looks at things
    like how many statements does it have,
  • 4:42 - 4:44
    how many external identifiers
    does it have,
  • 4:44 - 4:46
    how many references are there,
  • 4:46 - 4:49
    how many different labels are there
    in different languages,
  • 4:49 - 4:51
    and so on.
  • 4:51 - 4:55
    And then we looked at Wikidata over time,
  • 4:55 - 5:00
    and as you can see,
    based on these measures,
  • 5:00 - 5:04
    we went from pretty terrible
    to much better.
  • 5:04 - 5:05
    (laughter)
  • 5:06 - 5:07
    So that's good.
  • 5:07 - 5:12
    But what you can also see,
    there's still a lot of room to 5.
  • 5:14 - 5:20
    Now I don't think
    this is where we will get to, right?
  • 5:20 - 5:23
    Not every item will be absolutely perfect
  • 5:23 - 5:26
    according to these measures
    that we have taken.
  • 5:26 - 5:31
    But I'm really happy to see
    that consistently the quality of our data
  • 5:31 - 5:32
    is getting better and better.
  • 5:37 - 5:43
    Okay, but creating that data isn't enough.
  • 5:44 - 5:47
    We want this--we do this for a reason.
  • 5:47 - 5:49
    We want it to be used.
  • 5:49 - 5:55
    And now we looked at how many articles
  • 5:55 - 6:01
    on each of the other Wikimedia projects
    use data from Wikidata,
  • 6:02 - 6:07
    and we looked at the percentage
    of all articles on those projects.
  • 6:07 - 6:10
    Now if you look across all of Wikimedia
  • 6:10 - 6:12
    and all of the articles there,
  • 6:12 - 6:19
    then 56.35% of them today
    make use of some data from Wikidata.
  • 6:20 - 6:22
    Which I think is pretty good,
  • 6:22 - 6:27
    but of course,
    there's still a lot of room to 100.
  • 6:29 - 6:34
    And then I looked at which projects
    are actually making most use
  • 6:34 - 6:36
    of Wikidata's data,
  • 6:36 - 6:39
    and I split this
    by language versions and so on.
  • 6:40 - 6:45
    And now what do you think
    the top five projects--
  • 6:46 - 6:48
    which ones are all of them?
  • 6:48 - 6:51
    Which project family do they belong to?
  • 6:51 - 6:53
    (several in audience) Commons.
  • 6:53 - 6:57
    Okay, that's pretty uniformly Commons.
  • 6:57 - 6:59
    You would actually be wrong.
  • 6:59 - 7:02
    All of the top five are Wikivoyage.
  • 7:02 - 7:04
    (audience) Oh!
  • 7:04 - 7:05
    (laughter)
  • 7:05 - 7:08
    So yeah, applause to Wikivoyage.
  • 7:09 - 7:11
    (applause)
  • 7:17 - 7:20
    If you would like to check
    where Commons actually is
  • 7:20 - 7:22
    and where all of your other projects are,
  • 7:22 - 7:24
    there is a dashboard.
  • 7:24 - 7:25
    Come to me and we can check it out.
  • 7:28 - 7:32
    Of course, inside Wikimedia is
    not the only place where our data is used.
  • 7:32 - 7:35
    It's also used outside,
    and so much has happened.
  • 7:35 - 7:39
    I can't begin to mention it all,
    but to highlight some
  • 7:40 - 7:44
    there are great uses of our data
    at the Met, at the Wellcome Trust,
  • 7:44 - 7:46
    at the Library of Congress,
  • 7:46 - 7:48
    in GeneWiki and so many more.
  • 7:48 - 7:51
    And if you go through some of the sessions
    later in the program,
  • 7:51 - 7:53
    you will hear about some of them.
  • 7:57 - 8:00
    Alright, enough statistics.
  • 8:00 - 8:02
    Let's look at some other highlights.
  • 8:03 - 8:07
    So we already talked
    about data quality improving,
  • 8:07 - 8:11
    and when you look at data quality,
    there are a lot of dimensions
  • 8:11 - 8:16
    that you can look at,
    and we've improved on some of those,
  • 8:16 - 8:19
    like how accurate is the data,
  • 8:19 - 8:21
    how trustworthy is the data,
  • 8:21 - 8:23
    how referenced is it,
  • 8:23 - 8:25
    how consistent is it modeled,
  • 8:26 - 8:29
    how completed is it and so on.
  • 8:31 - 8:36
    Just to pick out one--
    for consistency for example,
  • 8:36 - 8:42
    we have created the ability to store
    entity schemas now in Wikidata
  • 8:42 - 8:47
    so that you can describe
    how certain domains should be modeled.
  • 8:47 - 8:49
    So you can find--
  • 8:50 - 8:54
    you can create an entity schema,
    say, for Dutch painters,
  • 8:54 - 8:56
    and then you can look how--
  • 8:56 - 8:59
    which items that are for Dutch painters
  • 8:59 - 9:02
    do not, for example,
    have a date of birth but should
  • 9:02 - 9:05
    and similar things like that.
  • 9:06 - 9:10
    And I hope that a lot more
    wiki projects and so on
  • 9:10 - 9:13
    will be able to make use
    of entity schemas to take good care
  • 9:13 - 9:16
    of their data, and if you want
    to learn how to do that,
  • 9:16 - 9:18
    there's a session later
    in the program as well
  • 9:18 - 9:23
    by people who know all about this
    and will make this less
  • 9:23 - 9:25
    of a black box for you.
  • 9:28 - 9:29
    Alright.
  • 9:31 - 9:35
    Another thing that really got traction
  • 9:35 - 9:38
    over the last year
    is the Wikibase ecosystem, right?
  • 9:38 - 9:44
    This idea that not all open data
    should and has to happen
  • 9:44 - 9:47
    in Wikidata, but instead, we want
    a thriving ecosystem
  • 9:47 - 9:51
    of different places, of different actors,
  • 9:51 - 9:54
    like institutions, companies,
  • 9:54 - 9:57
    volunteer projects opening up their data
    in a similar way
  • 9:57 - 10:00
    that Wikidata does it
    and then connecting all of it,
  • 10:00 - 10:03
    exchanging data between those,
    linking that data.
  • 10:04 - 10:09
    And over the last year,
    the interest in that
  • 10:09 - 10:12
    and the interest in institutions
    and people running
  • 10:12 - 10:15
    their own Wikibase instance
    has really exploded,
  • 10:15 - 10:20
    and especially in the sector
    of libraries.
  • 10:23 - 10:26
    There's a lot of testing, evaluating,
  • 10:26 - 10:29
    and to be honest, trailblazing,
  • 10:29 - 10:34
    going on there at the moment
    where adventurous institutions
  • 10:34 - 10:39
    work with us to really figure out
    how Wikibase can work
  • 10:39 - 10:42
    for their collections,
    for their catalogues and so on.
  • 10:43 - 10:45
    Among them, the German National Library,
  • 10:45 - 10:46
    the French National Library,
  • 10:46 - 10:49
    OCLC and it's really exciting to see.
  • 10:55 - 10:57
    One of the reasons
    why I think this is so exciting
  • 10:57 - 11:03
    is that we are helping these institutions
    open up data in a way that is
  • 11:03 - 11:08
    not just putting it on a website
    and someone can access it
  • 11:08 - 11:12
    but really thinking about this--
    the next step after that, right?
  • 11:12 - 11:15
    Letting people help you maintain
    that data, augment that data,
  • 11:15 - 11:20
    enrich it, and that's really a shift
  • 11:20 - 11:25
    that I hope will bring good things.
  • 11:26 - 11:28
    And the other thing it helps us with
  • 11:28 - 11:31
    is that it lets experts curate the data
  • 11:31 - 11:37
    in their space, keep it in good shape
    so that we can then set up
  • 11:37 - 11:42
    synchronizing processes
    to Wikidata, for example,
  • 11:42 - 11:46
    instead of having to take care of it
    ourselves all the time.
  • 11:47 - 11:50
    And at the end of the day,
    I hope it will take some pressure
  • 11:50 - 11:54
    off of Wikidata to be that place
    where everything has to go.
  • 11:58 - 12:00
    Lexicographical data--
  • 12:02 - 12:07
    Over the last year,
    people started describing words
  • 12:07 - 12:12
    in their language in Wikidata
    so that we can build things
  • 12:12 - 12:15
    like automated translation tools,
  • 12:16 - 12:21
    and we are at the point
    where in some languages
  • 12:21 - 12:26
    we are starting to get nearer
    to reaching that critical mass
  • 12:26 - 12:29
    that is needed to actually
    build a serious application.
  • 12:30 - 12:33
    In a lot of languages,
    we still have a long way to go,
  • 12:33 - 12:35
    but in some,
    we're really starting to get there,
  • 12:35 - 12:37
    and that's really great to see.
  • 12:39 - 12:41
    If you want to know more about this,
    come to my session later today.
  • 12:46 - 12:49
    And, of course, not to forget,
  • 12:49 - 12:51
    structured data on Commons.
  • 12:51 - 12:52
    (audience member whistles)
  • 12:52 - 12:54
    Yes! (laughs)
  • 12:54 - 12:56
    (applause)
  • 12:59 - 13:02
    The structured data on Commons
    seen at the foundation
  • 13:02 - 13:06
    has really gotten...
  • 13:07 - 13:11
    everything together and made it possible
  • 13:11 - 13:15
    to add statements to files
    on Commons over the last year,
  • 13:16 - 13:19
    and people are starting to add
    those statements to images
  • 13:19 - 13:23
    to then make it easier to find
    to build better applications on top of it,
  • 13:23 - 13:24
    and so much more.
  • 13:24 - 13:27
    It's really exciting to see how
    that is growing,
  • 13:27 - 13:30
    and I think what's really important
  • 13:30 - 13:33
    for the Wikidata community
    to understand here
  • 13:33 - 13:37
    is that when you see "depicts"
  • 13:37 - 13:42
    or "house cat" or "sitting," "lizard"
    and "wall" here,
  • 13:42 - 13:45
    those are links to Wikidata items
    and properties.
  • 13:45 - 13:50
    That means when we create items
    and properties,
  • 13:50 - 13:54
    those are no longer just providing
    the vocabulary for Wikidata itself.
  • 13:54 - 13:58
    They are providing the vocabulary
    for Commons as well.
  • 13:58 - 14:01
    And this will only get more and more so,
  • 14:01 - 14:03
    so we have to pay a lot more attention
  • 14:03 - 14:07
    to how our ontology, our vocabulary
  • 14:07 - 14:10
    is actually used in other places
    than we had before.
  • 14:14 - 14:20
    And the last one I have is that
    we've started building stronger bridges
  • 14:20 - 14:22
    to the other Wikimedia projects.
  • 14:23 - 14:26
    My team and I are working
    on a project called the Wikidata Bridge,
  • 14:26 - 14:29
    and you should totally come
    to the UX booth
  • 14:29 - 14:33
    and do some testing of the current state
  • 14:33 - 14:36
    that will have
    for example Wikipedia editors
  • 14:36 - 14:39
    edit Wikidata directly
    from their projects
  • 14:39 - 14:41
    without having to go to Wikidata
  • 14:41 - 14:44
    and having to understand
    everything around it.
  • 14:44 - 14:51
    I hope that this will take away
    one more hurdle that makes it difficult
  • 14:51 - 14:54
    for Wikimedia projects
    to adopt more data from Wikidata.
  • 14:57 - 15:01
    Alright, now to strategies
    and where are we going?
  • 15:03 - 15:07
    Since December, the Wikidata team
    at Wikimedia Deutschland,
  • 15:07 - 15:12
    and people from the Wikimedia Foundation
    have been working on strategies,
  • 15:12 - 15:15
    papers around Wikidata.
  • 15:15 - 15:16
    It's basically writing down
  • 15:16 - 15:20
    what a lot of us have been
    talking about already
  • 15:20 - 15:23
    over the last four or five years.
  • 15:24 - 15:29
    And I don't know if all of you
    have read those papers.
  • 15:29 - 15:34
    They're published on Meta Commons
    until the end of the month.
  • 15:34 - 15:36
    It would be great
    if you haven't read them,
  • 15:36 - 15:39
    go read them,
    leave your comments and so on.
  • 15:40 - 15:44
    Now the very quick overview
    of what is in there
  • 15:44 - 15:51
    is that we think about Wikidata
    and Wikibase in three pieces.
  • 15:52 - 15:55
    The first one is Wikidata as a platform.
  • 15:55 - 15:57
    You can see it in the lower corner,
  • 15:57 - 16:04
    and that is really around
    Wikidata enables every person
  • 16:04 - 16:06
    to access and share information
  • 16:06 - 16:09
    regardless of their language
    and technology,
  • 16:09 - 16:14
    and we do that by providing
    general purpose data about the world.
  • 16:14 - 16:18
    So basically what you do every day.
  • 16:21 - 16:25
    The second thing is
    the Wikibase ecosystem part
  • 16:25 - 16:30
    where Wikibase, the software
    running Wikidata, powers
  • 16:30 - 16:35
    not just Wikidata, but a thriving
    open data web that is the backbone
  • 16:35 - 16:37
    of free and open knowledge.
  • 16:38 - 16:43
    And the third and last thing
    is Wikidata for the Wikimedia projects
  • 16:43 - 16:47
    at the top where Wikidata is there
  • 16:47 - 16:50
    to help the Wikimedia projects--
  • 16:51 - 16:54
    help make them ready for the future.
  • 16:58 - 17:03
    Concretely, what does that mean
    for the near or midterm future?
  • 17:04 - 17:06
    Wikidata as a platform--
  • 17:07 - 17:11
    We want to have better data quality,
    so we will continue working
  • 17:11 - 17:14
    on better tools,
    improving the tools we have and so on.
  • 17:15 - 17:19
    We need to make our data
    more accessible
  • 17:19 - 17:24
    through better APIs,
    a more robust SPARQL endpoint
  • 17:24 - 17:27
    but also things like more consistently
    modeling our data
  • 17:27 - 17:31
    so it actually is easy to reuse
    in applications.
  • 17:32 - 17:37
    And the last thing I had was
    setting up feedback processes
  • 17:37 - 17:39
    with our partners.
  • 17:40 - 17:44
    Unlike Wikipedia, Wikidata is not
  • 17:44 - 17:46
    what I call a destination project, right?
  • 17:46 - 17:49
    Someone goes to Wikipedia and reads it
  • 17:49 - 17:51
    whereas Wikidata is usually not
  • 17:51 - 17:53
    someone goes to Wikidata and reads it.
  • 17:53 - 17:54
    It would be awesome,
  • 17:54 - 17:58
    but realistically
    it's not what it is, right?
  • 17:58 - 18:01
    A lot of the people who are exposed
  • 18:01 - 18:03
    to our data are not on Wikidata itself,
  • 18:03 - 18:07
    but they are seeing it through Wikipedia
    and many other places.
  • 18:08 - 18:12
    Now these other places do get feedback
    on that data, right?
  • 18:12 - 18:15
    Their users tell them,
    "Hey, here's something that's wrong,"
  • 18:17 - 18:21
    and I would like to have that
    so that we can make it available
  • 18:21 - 18:24
    to the people who actually edit
    on Wikidata, meaning you.
  • 18:24 - 18:27
    And figuring out how to do that
    in a meaningful way
  • 18:27 - 18:32
    without overwhelming everyone
    will be one of the things to do
  • 18:32 - 18:33
    over the next year.
  • 18:35 - 18:37
    Alright, Wikibase ecosystem.
  • 18:37 - 18:41
    There, we will continue to work
    with the libraries,
  • 18:41 - 18:46
    but also look into science,
    for example, and more.
  • 18:46 - 18:52
    There is a Wikibase showcase later today
    that you should totally go to
  • 18:52 - 18:53
    and see what's already there
  • 18:53 - 18:56
    and what people are already doing
    with Wikibase.
  • 18:56 - 18:57
    It's really worth it.
  • 18:58 - 19:01
    And what's needed there is
  • 19:01 - 19:03
    also setting up
    good processes around that.
  • 19:04 - 19:08
    Helping people figure out
    who to talk to about what,
  • 19:08 - 19:10
    where they can find help,
  • 19:10 - 19:12
    all these kinds of things.
  • 19:13 - 19:17
    And, of course, making it easier
    to install and maintain
  • 19:17 - 19:20
    a Wikibase because that's still
    a bit of a pain.
  • 19:21 - 19:25
    And the last thing is federation
    which is basically
  • 19:25 - 19:27
    what we've been talking about
    for Commons earlier
  • 19:27 - 19:31
    where Commons uses
    Wikidata's items and properties
  • 19:31 - 19:34
    but for other Wikibase instances out there
  • 19:34 - 19:36
    so they can also use
    Wikidata's vocabulary.
  • 19:38 - 19:42
    And that, as I was saying earlier,
    increases yet again
  • 19:42 - 19:48
    the need to be mindful
    of how our vocabulary is used out there
  • 19:48 - 19:51
    more than we have had to so far.
  • 19:54 - 19:57
    And Wikidata for the Wikimedia projects--
  • 19:57 - 20:01
    of course, tighter integration
    through the Wikidata Bridge
  • 20:01 - 20:04
    and helping people edit directly
    from their projects
  • 20:04 - 20:09
    and the other thing that we all need
    to think about together, I think,
  • 20:09 - 20:15
    is figuring out how to reduce
    the language barriers.
  • 20:15 - 20:19
    The more Wikidata is integrated
    in the Wikimedia projects,
  • 20:19 - 20:22
    the more people will have
    a need to talk to each other
  • 20:22 - 20:26
    about that data without
    speaking the same language,
  • 20:26 - 20:32
    and we have to figure out
    how to deal with that.
  • 20:33 - 20:37
    If people have smart ideas,
    I would love to talk to you.
  • 20:39 - 20:41
    And with that,
    I come to the end of my talk.
  • 20:42 - 20:44
    Thank you, everyone, for giving
    more people more access
  • 20:44 - 20:46
    to more knowledge every day.
  • 20:47 - 20:49
    (applause)
  • 20:58 - 21:00
    We have some time for questions
  • 21:00 - 21:02
    so if there are any questions
    in the audience
  • 21:02 - 21:05
    or if you are remotely watching
    the livestream--Hi, Mom--
  • 21:05 - 21:08
    you can ask the question
    on the EtherPad
  • 21:08 - 21:11
    or on the Telegram Channel
    and we'll do our best.
  • 21:11 - 21:13
    So anything?
  • 21:16 - 21:17
    Ah.
  • 21:21 - 21:25
    (person 1) Hi, everyone, this is more
    of a meme than a question,
  • 21:25 - 21:32
    so when the time extension
    will be able to also to get
  • 21:32 - 21:36
    hours and minutes and seconds
  • 21:36 - 21:38
    because up till now
    the position is just to date.
  • 21:38 - 21:42
    - I know... it's not my question--
    - (laughing)
  • 21:42 - 21:44
    That's why I said it's a meme.
  • 21:44 - 21:46
    Every time is always like that,
  • 21:46 - 21:49
    but it comes always from remote so...
  • 21:50 - 21:53
    I do not have a very good answer to that.
  • 21:53 - 21:54
    I'm sorry.
  • 21:56 - 22:02
    But maybe as some background,
    people need it even more
  • 22:02 - 22:08
    to describe images on Commons
    so it might bubble up the long list
  • 22:08 - 22:11
    of things that need to be done
    a bit faster through that.
  • 22:15 - 22:16
    Any more questions?
  • 22:25 - 22:28
    (person 2) [Linda] from Wikimedia
    Foundation's research team--
  • 22:28 - 22:31
    I have a question about your thoughts
  • 22:31 - 22:38
    on patrolling, and that may be related
    to quality of content on Wikidata,
  • 22:38 - 22:40
    but if you can speak to that
  • 22:40 - 22:44
    like how do you see the near medium term
    patrolling efforts changing,
  • 22:44 - 22:46
    especially with the Bridge project
  • 22:46 - 22:48
    which I'm looking forward to
    going out and trying it.
  • 22:48 - 22:49
    Yeah, thank you.
  • 22:52 - 22:57
    So as you say, with things
    like we did at Bridge,
  • 22:59 - 23:03
    a lot more effort will have to be spent
    on patrolling, I think.
  • 23:04 - 23:09
    But we are at a size where this
    is probably not feasible
  • 23:09 - 23:11
    to do it by hand, by a human,
  • 23:11 - 23:15
    so we need to spend a lot more effort
    on improving, for example,
  • 23:15 - 23:18
    ORES, the machine learning system
    to help us with that,
  • 23:18 - 23:25
    to help us figure out which edits
    a human really needs to look at
  • 23:25 - 23:26
    and which is probably just like yeah,
  • 23:26 - 23:30
    the regular stuff
    I don't need to look at this.
  • 23:34 - 23:39
    Currently, ORES is not super good
    at judging what--
  • 23:39 - 23:41
    if an edit on Wikidata is good or bad.
  • 23:41 - 23:45
    There's currently a campaign going on
  • 23:45 - 23:50
    that is training
    the machine learning system,
  • 23:51 - 23:52
    with your help,
  • 23:53 - 23:56
    to teach it basically what a good edit is
  • 23:56 - 23:57
    and what a bad edit is,
  • 23:57 - 24:03
    and we haven't reached the threshold
    of enough humans teaching it yet
  • 24:03 - 24:08
    to really improve it,
    but if you have a few minutes,
  • 24:08 - 24:11
    it would be great if you help teach ORES
  • 24:11 - 24:14
    make better judgements
    about Wikidata edits.
  • 24:14 - 24:16
    And it's really simple--
    it shows you an edit,
  • 24:16 - 24:18
    and you say this is a good edit,
  • 24:18 - 24:20
    this is a bad edit, and that's it.
  • 24:20 - 24:23
    You can do this in front of the TV
    in the evening on the couch.
  • 24:26 - 24:27
    (person 3) Share a link.
  • 24:28 - 24:31
    We will share a link
    in the Telegram Group, yes.
  • 24:32 - 24:36
    And once we've reached
    the threshold we need--
  • 24:36 - 24:39
    I think it's around 7,000,
    but I might be wrong--
  • 24:40 - 24:44
    then we can rerun the training
    for ORES and then it will be
  • 24:44 - 24:48
    hopefully considerably better
    at judging the edits on Wikidata.
  • 24:50 - 24:52
    And then I hope more of you can use that
  • 24:52 - 24:56
    to filter recent changes, for example,
    or your watch list
  • 24:56 - 24:58
    for edits that really need your attention.
  • 24:59 - 25:00
    Yeah.
  • 25:03 - 25:04
    Hi.
  • 25:07 - 25:10
    (person 4) I'm just curious to know,
    and this is a question not from me,
  • 25:10 - 25:13
    but from partners
    that I've been working with,
  • 25:13 - 25:16
    the more partners we have joining Wikidata
  • 25:16 - 25:20
    and starting to experiment with queries,
  • 25:20 - 25:23
    the more issues we are having
    with timeout of queries
  • 25:23 - 25:26
    so what's happening with that?
  • 25:28 - 25:30
    So, some people
    at the Wikimedia Foundation
  • 25:30 - 25:34
    are looking into that,
    and--small spoiler--
  • 25:34 - 25:37
    be there for the birthday present session.
  • 25:37 - 25:38
    (laughter)
  • 25:43 - 25:46
    (person 5) Hello, I'm Bart Magnus
    from Belgium (PACKED).
  • 25:46 - 25:49
    I would like to know
    what the current state of affairs is
  • 25:49 - 25:52
    regarding federation
    so raising your properties
  • 25:52 - 25:54
    in your own Wikibase instance--
  • 25:54 - 25:57
    is there anything to mention about that?
  • 25:57 - 26:01
    So over the last year,
    a lot of people have told us
  • 26:01 - 26:04
    that they want federation, right?
  • 26:04 - 26:07
    But the problem was
    that a lot of people understood
  • 26:07 - 26:09
    very different things
    when they said federation.
  • 26:11 - 26:14
    Some of those things
    were very easily doable.
  • 26:14 - 26:16
    Some of those things were
    really, really hard.
  • 26:17 - 26:22
    And my team and I have been talking
    to a lot of people, for example,
  • 26:22 - 26:27
    the partners we work with at libraries
    to figure out what is it actually
  • 26:27 - 26:29
    precisely that they need.
  • 26:30 - 26:34
    And we finished that now,
    though, of course, I'm happy
  • 26:34 - 26:38
    to take more feedback
    if you want to talk to me about that,
  • 26:38 - 26:41
    and now I'm at a stage where
    I'm comfortable to say,
  • 26:41 - 26:43
    "Okay, we're going to start with that."
  • 26:45 - 26:48
    And that will happen over the next
    I would say two or three months
  • 26:48 - 26:51
    that we actually write
    the first lines of code
  • 26:51 - 26:54
    and then hopefully have people able
  • 26:54 - 26:57
    to test it early next year, I would say.
  • 27:00 - 27:01
    (presenter) Okay, last questions.
  • 27:02 - 27:06
    (person 6) Finn Årup Nielsen
    from Copenhagen, Denmark.
  • 27:06 - 27:10
    In relation to the other language,
    there's been a sort of discussion
  • 27:10 - 27:14
    in the WikiCite community
    about whether we should continue
  • 27:14 - 27:16
    to put more scientific papers in there--
  • 27:16 - 27:20
    this relates to how much data
    we can put into Wikidata.
  • 27:20 - 27:23
    Timeout in the Wikidata Query Service
    is one issue
  • 27:23 - 27:24
    but also the maintaining
  • 27:24 - 27:30
    so what are your thoughts about...
  • 27:31 - 27:35
    Is the size of Wikidata
    beginning to be a problem
  • 27:35 - 27:36
    in general?
  • 27:36 - 27:39
    Should we stop putting in lexeme data?
  • 27:39 - 27:41
    Should we stop putting
    in scientific data
  • 27:41 - 27:46
    into Wikidata or do we have
    any research on this
  • 27:46 - 27:50
    or technical problems inflating?
  • 27:50 - 27:51
    Yeah...
  • 27:53 - 27:57
    Wikidata is definitely coming
    to some...
  • 27:59 - 28:03
    scalability boundaries, let's say,
  • 28:04 - 28:06
    both technically and socially.
  • 28:06 - 28:09
    And for both we need solutions, right?
  • 28:09 - 28:13
    Socially, we have things like more editors
  • 28:13 - 28:16
    and recent changes to the point
    where it's completely unfeasible
  • 28:16 - 28:20
    for a human to patrol that
    because it's simply too much.
  • 28:21 - 28:26
    But also technically,
    and we've been addressing some of that.
  • 28:26 - 28:30
    For example, some database
    re-architecturing
  • 28:30 - 28:34
    around database view-turned table,
    if that says anything for anyone.
  • 28:36 - 28:38
    But those only get us so far,
  • 28:39 - 28:41
    and one of the things we want
    to look at next year
  • 28:41 - 28:46
    is where the other pain points are
    and what to do about them
  • 28:46 - 28:48
    on the technical side.
  • 28:49 - 28:51
    So that's a general picture.
  • 28:51 - 28:54
    At the same time, I am very hesitant
  • 28:54 - 28:58
    to tell anyone, "No, no, no,
    stop putting data into Wikidata."
  • 28:58 - 29:02
    That would kind of defeat the purpose.
  • 29:04 - 29:07
    But, for example, the Wikibase ecosystem
  • 29:07 - 29:09
    is one way to address that, right,
  • 29:09 - 29:14
    to not require everything
    in Wikidata.
  • 29:14 - 29:16
    That's the whole beauty
    of linked open data.
  • 29:16 - 29:18
    You don't have
    to have it all in the same place.
  • 29:18 - 29:20
    You can connect different places.
  • 29:20 - 29:21
    It's amazing.
  • 29:22 - 29:28
    So around WikiCites specifically, yes--
  • 29:30 - 29:35
    okay, WikiCites specifically,
    I think we need
  • 29:35 - 29:36
    to look at in proportion.
  • 29:36 - 29:41
    I don't have an exact percentage
    of what percentage
  • 29:41 - 29:45
    of the items in Wikidata
    are around WikiCite topics,
  • 29:45 - 29:47
    but it's a big percentage.
  • 29:47 - 29:50
    And maybe that's the thing
    we need to talk about...
  • 29:50 - 29:52
    in the break.
  • 29:53 - 29:55
    Well, thank you very much!
  • 29:55 - 29:56
    (applause)
Title:
cdn.media.ccc.de/.../wikidatacon2019-3-eng-Glimpse_over_Wikidata_hd.mp4
Video Language:
English
Duration:
30:07

English subtitles

Revisions