< Return to Video

#rC3 - Introduction to Wikidata

  • 0:00 - 0:15
    rC3 Wikipaka intro music
  • 0:15 - 0:21
    Léa: Hi, everyone, I'm Léa. Here's
    Mohammed, and we're going to introduce you
  • 0:21 - 0:31
    to Wikidata today.
    Mohammed: Yes, hi everyone. So in the
  • 0:31 - 0:35
    course of the talk, if you do have a
    question, just feel free to ask them in a
  • 0:35 - 0:41
    chat. And then we are going to try and
    answer them at the end of the talk. Yes.
  • 0:41 - 0:48
    So let's dive straight in. What is
    Wikidata? Wikidata is a free knowledge
  • 0:48 - 0:55
    base that is based on facts and references
    that anyone can edit and reuse. It is part
  • 0:55 - 1:00
    of the Wikimedia projects. And like all of
    us, to start open projects, Wikipedia is
  • 1:00 - 1:06
    multilingual and has no language barriers.
    Data in Wikidata is released under CC0
  • 1:06 - 1:12
    license. That means Wikidata's data is in
    the public domain and it has no exclusive
  • 1:12 - 1:18
    intellectual property rights that is
    applied to it. Wikidata is not a primary
  • 1:18 - 1:24
    source of information. It only aggregates
    or collects structured data that is
  • 1:24 - 1:29
    already available, some of which are links
    to other databases. So it is not meant to
  • 1:29 - 1:35
    be a place for original research. Wikidata
    is made for humans and machines, and is
  • 1:35 - 1:41
    available for everyone to use, whether on
    other Wikimedia projects or outside of it.
  • 1:41 - 1:51
    Next slide. So what is in the Wikidata?
    Wikidata was launched some eight years ago
  • 1:51 - 1:57
    and was originally created to solve the
    problem of unstructuredness in the plain
  • 1:57 - 2:02
    text format that information in Wikipedia
    is rendered in, and also to provide a
  • 2:02 - 2:07
    central storage location where all of the
    different language Wikipedias can connect
  • 2:07 - 2:12
    and talk to each other. Today, Wikidata
    has since outgrown its intended purpose
  • 2:12 - 2:18
    and has become so big and successful that
    it is not only, you know, the most edited
  • 2:18 - 2:25
    Wikimedia projects, Wikidata's data is now
    used more outside of the Wikimedia project than
  • 2:25 - 2:32
    within it. There are more than 25,000
    active editors. That means people who make
  • 2:32 - 2:38
    at least one edit every month. Wikidata
    is used across 800+ Wikimedia projects in
  • 2:38 - 2:43
    more than 300 languages. And it's
    interesting to note that the largest
  • 2:43 - 2:48
    proportion of Wikidata's items are in the
    category of scholarly items comprising
  • 2:48 - 3:00
    about 30% of the whole. Next slide. So
    far, people and bots have made more than
  • 3:00 - 3:07
    1.3 billion edits to Wikidata and created
    more than 91 million items. This map you
  • 3:07 - 3:12
    see here is a visual impression of
    geolocated items currently existing on
  • 3:12 - 3:19
    Wikidata. So, the bright areas are items
    that have coordinates, location property
  • 3:19 - 3:30
    added as a statement. Next slide. So
    Wikidata has a vision, and what is this
  • 3:30 - 3:38
    vision? Wikidata's vision is to give more
    people more access to more knowledge. So
  • 3:38 - 3:42
    Wikidata gives access to information,
    regardless of the language that people
  • 3:42 - 3:49
    speak, because Wikidata is multilingual,
    it expects translations of so-called Q
  • 3:49 - 3:54
    numbers into different languages. And so
    doing Wikidata helps us support the
  • 3:54 - 4:00
    smaller Wikimedia projects better, you
    know, by helping them to benefit from all
  • 4:00 - 4:04
    of the work that the bigger projects are
    doing. And applications and projects
  • 4:04 - 4:08
    outside of Wikimedia are also able to
    benefit from the rich datasets in
  • 4:08 - 4:16
    Wikidata. So in a nutshell, Wikidata can
    be thought of as an online repository of
  • 4:16 - 4:25
    structured data that anyone can edit and
    reuse. Next slide. OK, now, how is
  • 4:25 - 4:30
    Wikidata connected to Wikipedia and other
    Wikimedia projects? Among other things,
  • 4:30 - 4:36
    Wikidata can assist sister projects
    with more easily maintainable infoboxes.
  • 4:36 - 4:41
    So the table at the right corner of this
    article on Wikipedia is called an infobox,
  • 4:41 - 4:46
    which I'm sure you've seen before, and
    Wikidata is able to retrieve content on
  • 4:46 - 4:56
    Wikidata into those infoboxes [distorted].
    And for smaller language Wikipedias like,
  • 4:56 - 5:03
    you know, Catalan Wikipedia or Welsh
    Wikipedia that, that readily leverages
  • 5:03 - 5:09
    Wikidata to see their content. And it is
    helpful because it's, it helps to reduce
  • 5:09 - 5:19
    editing workload for volunteers. Next
    slide. So what should we expect to see on
  • 5:19 - 5:26
    a typical Wikidata item? Wikidata
    expresses relationships in the form of
  • 5:26 - 5:36
    triples that use items starting with "Q"
    and property starting with "P", OK, and
  • 5:36 - 5:41
    the item will typically be made up of at
    least one statement. So in this example
  • 5:41 - 5:45
    you see on the screen we have two
    statements about an entity called Douglas
  • 5:45 - 5:54
    Adams. The first statement, Douglas Adams
    was educated at P69 St John's College.
  • 5:55 - 5:59
    What this means is that this statement is
    qualified by further properties. That is
  • 5:59 - 6:06
    the academic major, academic degree, the
    start time and then the end time and
  • 6:06 - 6:13
    qualifiers add more meaning to statements.
    So Wikidata records not just statements,
  • 6:13 - 6:19
    but also their sources. And as you can see
    here, this helps us to reflect the notion
  • 6:19 - 6:26
    of verifiability on the project so that
    statements Douglas Adams was educated at
  • 6:26 - 6:31
    St. John's College has two open references
    that points to the source of that
  • 6:31 - 6:39
    information. And the second statement,
    Douglas Adams, Q42, was educated at P69,
  • 6:39 - 6:47
    Brentwood School, only has the qualifiers
    start time and end time, and it has no
  • 6:47 - 6:53
    references, so a single statement consists
    of a property that is made up of a value
  • 6:54 - 7:07
    with or without a reference or with or
    without qualifiers. Next slide. So a
  • 7:07 - 7:12
    typical Wikidata item looks like this, and
    you can edit by clicking on the edit
  • 7:12 - 7:18
    button, it has this pen symbol with edit
    next to it. As you can see, each item has
  • 7:18 - 7:24
    a unique ID that is Q followed by some
    number. In this case, the item Douglas
  • 7:24 - 7:32
    Adams has QID of Q42. And when you look at
    the top, there's a termbox. We call it, we
  • 7:32 - 7:37
    call it the termbox at the top, at the
    top, that contains the label in different
  • 7:37 - 7:44
    languages. A description of the items that
    is more of a short phrase telling us what
  • 7:44 - 7:48
    the item represents. It's says here in
    English that Douglas Adams is an English
  • 7:48 - 7:54
    writer and humorist. Then there is the
    alias next to the description which, aside
  • 7:54 - 8:01
    from the label, tells us what the item
    could also be known by here. Next slide.
  • 8:05 - 8:14
    So, creating a new item is as simple as
    going to any page on Wikidata and clicking
  • 8:14 - 8:21
    on create a new item. And once you click
    on create a new item, you get to fill in
  • 8:21 - 8:26
    the form that is asking for a label,
    description and an alias and QIDs are
  • 8:26 - 8:51
    assigned automatically. Next slide. Next
    slide. Next slide, please. Alright, so
  • 8:51 - 8:57
    there are tools that allow us to edit
    Wikidata more efficiently and make bulk
  • 8:57 - 9:06
    edits to Wikidata, such as Quick
    Statements and OpenRefine. Please go to
  • 9:06 - 9:20
    the previous slide. OK, yeah, right, so,
    yeah, Quick Statements and OpenRefine
  • 9:20 - 9:26
    allow us to make automated edits and
    changes to Wikidata. Other tools are
  • 9:26 - 9:32
    available that allow us to visualize
    Wikidata's data. Some of them enhances the
  • 9:32 - 9:37
    user interface of Wikidata, and these
    could include scripts that editors can
  • 9:37 - 9:41
    install or they could be gadgets that may
    be enabled in your preferences settings.
  • 9:42 - 9:56
    Next slide.
    Léa: Alright. So, um, so far, Mohammed
  • 9:56 - 10:02
    told you about how we describe concepts in
    Wikidata, and that's what we've been doing
  • 10:02 - 10:09
    for the first years of the project, but in
    2018, we also started storing a new type
  • 10:09 - 10:15
    of information in Wikidata, which is
    lexicographical data, which is basically
  • 10:15 - 10:23
    information about words and phrases in all
    kinds of languages. And so you see on the
  • 10:23 - 10:28
    left the data model that is a bit complex
    and that's why I'm not going to get too
  • 10:28 - 10:32
    much into details now but we can talk
    about this later. And you can see an
  • 10:32 - 10:38
    example on the right where we basically
    describe the word "Luftballon" in German
  • 10:38 - 10:42
    and we indicate the language, the lexical
    category and all kind of informations that
  • 10:42 - 10:47
    are not about the object any more, but
    actually about the word and how it's
  • 10:47 - 10:56
    composed of two words, as we like to do in
    German and things like this. So, again, if
  • 10:56 - 11:00
    you want to know more about this, you can
    have a look at lexicographical data in
  • 11:00 - 11:08
    Wikidata or we can talk about it together
    later in the questions, for example. So
  • 11:08 - 11:14
    Wikidata doesn't come alone, it comes with
    a bunch of tools that have been, some of
  • 11:14 - 11:17
    them have been developed by the
    development team of Wikidata, some of them
  • 11:17 - 11:22
    have been developed by the community
    themselves in order to do things more
  • 11:22 - 11:27
    efficiently. That can be, for example,
    adding data and some of the tools have
  • 11:27 - 11:31
    already been mentioned by Mohammed, that
    can also be matching data with other
  • 11:31 - 11:37
    databases, querying the data, reusing the
    data. There are also a bunch of tools that
  • 11:37 - 11:43
    are about watching the data and watching
    its quality, watching what edits have been
  • 11:43 - 11:48
    done recently and so on. And you can find
    the page that is called Wikidata Tools on
  • 11:48 - 11:56
    Wikidata to discover plenty of these tools
    and you can, of course, create your own.
  • 11:59 - 12:04
    So we mentioned that the goal of Wikidata
    is to be reused by everyone, but you may
  • 12:04 - 12:10
    wonder who is actually reusing the data.
    Well, the first reusers of Wikidata's data
  • 12:10 - 12:16
    is actually the Wikidata community itself,
    the Wikidata editors, because all of these
  • 12:16 - 12:22
    items are connected. So one item can be
    linked from another, the content of one
  • 12:22 - 12:28
    item can be reused on another and so on.
    The Wikimedia project such as Wikipedia,
  • 12:28 - 12:33
    but not only. Wikimedia Commons,
    Wikisource, almost all of the Wikimedia
  • 12:33 - 12:40
    projects at that point reuse part of the
    data that is coming from Wikidata, and
  • 12:40 - 12:49
    then we have companies, from the biggest
    ones to the small ones because the data is
  • 12:49 - 12:55
    in CC0 everyone can just reuse the content
    that they need. We have, of course, public
  • 12:55 - 13:02
    institutions such as museums, libraries
    and so on. We also have journalists and,
  • 13:02 - 13:08
    for example, data journalists. We have
    scientists and researchers and probably
  • 13:08 - 13:12
    much more. And the thing is that we don't
    necessarily know who's reusing the data
  • 13:12 - 13:17
    because it's here in the open but there
    are probably many usages that we don't
  • 13:17 - 13:21
    even imagine. So if you're using Wikidata,
    or if you would like to use Wikidata's
  • 13:21 - 13:27
    data, let us know, because we are always
    interested to discover more. Now, the
  • 13:27 - 13:34
    question is: How can one reuse Wikidata?
    I'm going to present very quickly one of
  • 13:34 - 13:38
    the most popular way to query the data.
    I'm not going to get into details right
  • 13:38 - 13:45
    now because there will actually be a
    workshop at the conference in two days on
  • 13:45 - 13:50
    day three about the query service so I'm
    gonna let you go there and discover more
  • 13:50 - 13:56
    about how to use it. The query service is
    basically a SPARQL endpoint, SPARQL being
  • 13:56 - 14:02
    a query language where you can basically
    ask questions to Wikidata and get lists or
  • 14:02 - 14:09
    visualizations as a result. For example,
    here's the map of the airports of the
  • 14:09 - 14:16
    world named after the person and the color
    of the dot, it represent the gender of the
  • 14:16 - 14:24
    person. Or you can make a list of country
    flags that are including a sun, because if
  • 14:24 - 14:29
    the data is properly modeled in Wikidata,
    you're able to describe, what are the
  • 14:29 - 14:38
    different elements that compose a country
    flag? Or you can have this bubble charts
  • 14:38 - 14:44
    with the occupation of accused witches,
    because why not? That's the kind of data
  • 14:44 - 14:52
    we have in Wikidata. Now, there are other
    ways, of course, to query the data, I'm
  • 14:52 - 14:55
    not going to get into details right now,
    but if you want to talk more about this,
  • 14:55 - 15:01
    you can, for example, join the Wikidata
    meetups that are gonna happen tomorrow. We
  • 15:01 - 15:07
    have dumps of the data where you can
    download part of or all of the data in a
  • 15:07 - 15:13
    file. We have a bunch of APIs to access
    the data directly from your program. And
  • 15:13 - 15:18
    on a Wikimedia project specifically, the
    community developed a bunch of templates
  • 15:18 - 15:29
    that are using Wikidata's data using Lua.
    And now for something a bit different,
  • 15:29 - 15:34
    Wikibase. You may have heard of it and you
    may even have wondered, OK, what's the
  • 15:34 - 15:39
    difference between Wikibase and Wikidata?
    Well, Wikibase is basically the software
  • 15:39 - 15:45
    powering Wikidata and, more precisely, the
    MediaWiki extension that is turning
  • 15:45 - 15:53
    MediaWiki into a database. And so,
    Wikibase was started to power Wikidata
  • 15:53 - 15:58
    but it also started developing on its own.
    Wikidata is still for now the biggest
  • 15:58 - 16:05
    existing Wikibase instance, but people can
    also install Wikibase directly on their
  • 16:05 - 16:13
    server and basically create their own
    little personal or public Wikidata. And
  • 16:13 - 16:17
    the development is still ongoing, there
    are all kind of super exciting features
  • 16:17 - 16:23
    coming up soon. And, for example, the
    ability to connect better Wikidata and
  • 16:23 - 16:29
    your own instance of Wikibase, for
    example, to be able to reuse data that is
  • 16:29 - 16:34
    already in Wikidata and to connect it to
    the data that you have in your own
  • 16:34 - 16:44
    Wikibase. So, if you're interested in
    Wikidata, if you want to know more, there
  • 16:44 - 16:48
    are a bunch of pages that you can find.
    There is a help portal, the Project Chat
  • 16:48 - 16:52
    is the main discussion page on the wiki
    where you can interact with the other
  • 16:52 - 16:56
    editors, the community. It's super
    important to get in touch with them if you
  • 16:56 - 17:01
    want to get started with Wikidata. We also
    have a mailing list. We have a newsletter
  • 17:01 - 17:06
    that is called Weekly Summary that you can
    find on wiki but also if you subscribe to
  • 17:06 - 17:10
    the mailing list, you will also receive
    it. And then we have some accounts in the
  • 17:10 - 17:14
    social media, on Twitter, there is a
    Facebook group, there is a Telegram, um,
  • 17:14 - 17:20
    that is linked from the Project Chat and
    there is also an IRC channel. So you can
  • 17:20 - 17:29
    basically find people from the Wikidata
    community everywhere. So we are
  • 17:29 - 17:36
    approaching the end of the session, but
    it's not done, we have more Wikidata
  • 17:36 - 17:42
    related sessions at the c3 in the
    Wikipaka. So, for example, tomorrow you're
  • 17:42 - 17:47
    going to get an introduction to Wikidata,
    specifically for journalists and
  • 17:47 - 17:51
    especially data journalists. Then in the
    afternoon, we're gonna have two Wikidata
  • 17:51 - 17:55
    meetups. The first one is gonna be in
    German. The second one is gonna be in
  • 17:55 - 17:59
    English. So depending on your preferred
    language, you can attend one or the other
  • 17:59 - 18:06
    or both, and on day three, as I mentioned
    before, we're going to have a workshop to
  • 18:06 - 18:12
    learn how to query Wikidata's data with
    SPARQL. So feel free to have a look and
  • 18:12 - 18:22
    check them also in the main schedule of
    Wikipaka. Thank you very much for
  • 18:22 - 18:27
    attending this session. These are our
    contact details if you want to, to contact
  • 18:27 - 18:33
    us. And of course, you can now ask
    questions, as we mentioned in the chat or
  • 18:33 - 18:41
    with the hashtag. And we will be very
    happy to answer all your questions right
  • 18:41 - 18:47
    now.
    Herald: Thank you for your input and the
  • 18:47 - 18:52
    overview about Wikidata. There has been a
    few question or questions already answered
  • 18:52 - 18:58
    by Joel in the IRC channel. One was about
    the big dump of scholarly data and what
  • 18:58 - 19:04
    scholarly data is and how this came to be
    in Wikidata. But there is one more
  • 19:04 - 19:10
    question from the chat right now Till asks
    can I add new types of data that are not
  • 19:10 - 19:18
    yet tracked in Wikidata?
    Léa: So I'm wondering, what do you mean
  • 19:18 - 19:23
    exactly by type of data? Maybe you can
    give a bit more details because that can
  • 19:23 - 19:30
    mean a lot of things. Wikidata, the data
    model of Wikidata is very flexible and
  • 19:30 - 19:36
    it's absolutely not set in stone. Every,
    every week the community comes up with
  • 19:36 - 19:41
    some new ways to describe things.
    Sometimes we realize that there is an area
  • 19:41 - 19:47
    of the world that we completely forgot to
    cover, and then we create new properties
  • 19:47 - 19:52
    to describe, for example, a certain type
    of, I don't know, of concept, a certain type
  • 19:52 - 20:00
    of building or objects that we or
    philosophical concept that we didn't
  • 20:00 - 20:05
    describe yet. So this is always in
    movement, in action. When it comes to what
  • 20:05 - 20:12
    we actually call data types, which is, for
    example, a string of text or a date or a
  • 20:12 - 20:16
    picture, we have all kind of data types
    like this, this is a bit more complicated
  • 20:16 - 20:22
    and overall, it's quite rare that we add a
    new data type and it needs a strong, like,
  • 20:22 - 20:28
    use case so we add that to the software. I
    hope that it answer your question and if I
  • 20:28 - 20:32
    didn't, feel free to ask again.
    Herald: Yeah, we've got a feedback. The
  • 20:32 - 20:38
    example Till meant was, there's a, there's
    an organization or a project called
  • 20:38 - 20:45
    Parliamentwatch in Germany. There was one
    talk earlier today where they try to track
  • 20:45 - 20:50
    and scrape and analyze the parliamentary
    protocols. And one big issue they had was
  • 20:50 - 20:56
    with structural data about all the members
    of parliament and how they are organized
  • 20:56 - 21:02
    and stuff like that. And, um, well, if I
    remember correctly, there actually was a
  • 21:02 - 21:06
    project that tried to include the
    structural data of of members of
  • 21:06 - 21:08
    parliament in Wikidata, if I'm not
    mistaken.
  • 21:08 - 21:16
    Léa: Absolutely. It's a WikiProject
    that is called, um, something politicians,
  • 21:16 - 21:21
    all politicians. I don't remember the
    exact name right now, but indeed. Some
  • 21:21 - 21:28
    people are already working on members of
    parliaments and, like, political people in
  • 21:28 - 21:34
    general. So it's very likely that there is
    already a way to structure the data. The
  • 21:34 - 21:41
    best way is to contact the people directly
    involved on this, on this WikiProject.
  • 21:41 - 21:46
    WikiProjects, by the way, are pages where
    basically people who have a specific topic
  • 21:46 - 21:50
    of interest gather and can discuss about
    the specific questions about the topic.
  • 21:51 - 21:58
    Um, so have a look at this, at this
    project about politics and, um, yeah. Try
  • 21:58 - 22:03
    to see if, if anything is missing, but
    generally Wikidata definitely welcome
  • 22:03 - 22:09
    information about about politicians, about
    member of parliaments, this kind of stuff.
  • 22:09 - 22:13
    What we do not do, however, is store the
    full, like, documents, for example, in
  • 22:13 - 22:18
    that case, the reports or the documents,
    that belongs elsewhere. Maybe on Wikimedia
  • 22:18 - 22:22
    Commons, for example, if it's possible, if
    the license allows it. But on Wikidata,
  • 22:22 - 22:24
    we'll be happy to store the metadata about
    them.
  • 22:26 - 22:31
    Herald: Alright, Joel just posted the link
    to the WikiProject, Every Politician, so
  • 22:31 - 22:36
    if anybody looks for Every Politician on
    Wikidata, they will find the project. So
  • 22:36 - 22:40
    basically, the bottom line is pretty much
    anything is possible in Wikidata, right?
  • 22:40 - 22:49
    Léa: Yeah, thank you Joel, and hi. Almost
    everything. So on Wikidata, just like on
  • 22:49 - 22:55
    Wikipedia, we still have some criteria to
    define what can get in Wikidata and what
  • 22:55 - 23:01
    not, because we are aware that this
    knowledge base, it needs to stay quite
  • 23:01 - 23:06
    general and it cannot contain absolutely
    everything. For example, the community
  • 23:06 - 23:12
    decided a while ago that they would not
    create one item for each human living or
  • 23:12 - 23:18
    who used to live on Earth, that's just not
    possible, so there are some notability
  • 23:18 - 23:26
    criteria that you can find in the help
    pages and I would say that the level of,
  • 23:26 - 23:31
    like, how fine-grained the data should be has
    to be discussed with the community and the
  • 23:31 - 23:35
    good thing about having Wikibase also
    available as a separate instance of
  • 23:35 - 23:40
    Wikidata is that if some people want to
    work on a topic where they have some
  • 23:40 - 23:44
    information that is very, very specific
    and would maybe not fit the scope of
  • 23:44 - 23:48
    Wikidata, they can create their own
    Wikibase and then they can connect the
  • 23:48 - 23:54
    content with what is already in Wikidata.
    So altogether, in this Wikibase ecosystem,
  • 23:54 - 23:58
    yes, pretty much everything is possible.
    Herald: Well, the future is certainly
  • 23:58 - 24:02
    here, at least, with Wikidata. Thank you
    again, Léa and Mohammed, for your
  • 24:02 - 24:06
    insightful introduction to Wikidata and
    we're looking forward to more people
  • 24:06 - 24:09
    joining you in your efforts. Thanks for
    your presentation.
  • 24:09 - 24:13
    Léa: Thank you. See you soon.
  • 24:13 - 24:19
    rC3 Wikipaka outro music
  • 24:19 - 24:22
    Subtitles created by c3subtitles.de
    in the year 2021. Join, and help us!
Title:
#rC3 - Introduction to Wikidata
Description:

more » « less
Video Language:
English
Duration:
24:22

English subtitles

Revisions