-
Hello, everyone.
-
It's awesome that you're all here,
so many of you.
-
It's really, really great.
-
So Lea already talked a lot
about this event,
-
and I'm going to talk a bit
about Wikidata itself
-
and what has been happening
around it over the last year
-
and where we are going.
-
So... what is this? Sorry.
-
So... where are we?
Where are we going?
-
Over the last year there has been
so much to celebrate
-
and I want to highlight some of that
-
because sometimes it goes unnoticed.
-
And first I want to take you through
some statistics around editors
-
and our content and how our data is used.
-
Over the last year,
we have grown our community
-
which is amazing.
-
We have around 3,000 new people
-
who edit once or more in 30 days.
-
So that's 3,000 new Wikidatans, yay!
-
Now if you look at people who do more,
like five edits in 30 days,
-
we've got an additional 1,200 roughly.
-
And if you look
at the people who do 100 edits or more--
-
I hope many of you in this room--
-
we have 300 more.
-
Raise your hand
if you're in this last group.
-
Woot! You're awesome!
-
And while the number of edits
is usually not something
-
we pay a lot of attention to,
-
we did cross
the 1 billion edits mark this year.
-
(applause)
-
Alright, let's look at content.
-
So, we're now at 65 million items,
-
so entities to describe the world,
-
and we're doing this
with around 6,700 properties.
-
Of those, around 4,300
are external identifiers,
-
which gives us a lot of linking
to other catalogues, databases,
-
websites and more
-
and really makes Wikidata
the central place
-
in a linked open data web.
-
So using those properties and items,
-
we have around 800 million statements now,
-
and compared to last year,
we know about half a statement more
-
about every single item.
-
(laughter)
-
So, yeah, Wikidata got smarter.
-
But we don't just have items
and properties,
-
we also have new stuff
like lexemes
-
and we are now at 204,000 lexemes
that describe words
-
in many different languages.
-
It's very cool.
-
I will talk more about this
in a session later today.
-
Last, the latest addition
are entity schemas
-
that help us figure out
how to consistently model data
-
across a certain area.
-
And of those, we have around 140 now.
-
Now numbers aren't everything
around content, right,
-
amount of content--we also care
about quality of the content.
-
And what we've done now is
we've trained a machine learning system
-
to judge the quality of an item.
-
Now this is far from perfect,
but it gives you an idea.
-
So every item in Wikidata gets a score
between 1 and 5.
-
One is pretty terrible; five is amazing.
-
And it looks at things
like how many statements does it have,
-
how many external identifiers
does it have,
-
how many references are there,
-
how many different labels are there
in different languages,
-
and so on.
-
And then we looked at Wikidata over time,
-
and as you can see,
based on these measures,
-
we went from pretty terrible
to much better.
-
(laughter)
-
So that's good.
-
But what you can also see,
there's still a lot of room to 5.
-
Now I don't think
this is where we will get to, right?
-
Not every item will be absolutely perfect
-
according to these measures
that we have taken.
-
But I'm really happy to see
that consistently the quality of our data
-
is getting better and better.
-
Okay, but creating that data isn't enough.
-
We want this--we do this for a reason.
-
We want it to be used.
-
And now we looked at how many articles
-
on each of the other Wikimedia projects
use data from Wikidata,
-
and we looked at the percentage
of all articles on those projects.
-
Now if you look across all of Wikimedia
-
and all of the articles there,
-
then 56.35% of them today
make use of some data from Wikidata.
-
Which I think is pretty good,
-
but of course,
there's still a lot of room to 100.
-
And then I looked at which projects
are actually making most use
-
of Wikidata's data,
-
and I split this
by language versions and so on.
-
And now what do you think
the top five projects--
-
which ones are all of them?
-
Which project family do they belong to?
-
(several in audience) Commons.
-
Okay, that's pretty uniformly Commons.
-
You would actually be wrong.
-
All of the top five are Wikivoyage.
-
(audience) Oh!
-
(laughter)
-
So yeah, applause to Wikivoyage.
-
(applause)
-
If you would like to check
where Commons actually is
-
and where all of your other projects are,
-
there is a dashboard.
-
Come to me and we can check it out.
-
Of course, inside Wikimedia is
not the only place where our data is used.
-
It's also used outside,
and so much has happened.
-
I can't begin to mention it all,
but to highlight some
-
there are great uses of our data
at the Met, at the Wellcome Trust,
-
at the Library of Congress,
-
in GeneWiki and so many more.
-
And if you go through some of the sessions
later in the program,
-
you will hear about some of them.
-
Alright, enough statistics.
-
Let's look at some other highlights.
-
So we already talked
about data quality improving,
-
and when you look at data quality,
there are a lot of dimensions
-
that you can look at,
and we've improved on some of those,
-
like how accurate is the data,
-
how trustworthy is the data,
-
how referenced is it,
-
how consistent is it modeled,
-
how completed is it and so on.
-
Just to pick out one--
for consistency for example,
-
we have created the ability to store
entity schemas now in Wikidata
-
so that you can describe
how certain domains should be modeled.
-
So you can find--
-
you can create an entity schema,
say, for Dutch painters,
-
and then you can look how--
-
which items that are for Dutch painters
-
do not, for example,
have a date of birth but should
-
and similar things like that.
-
And I hope that a lot more
wiki projects and so on
-
will be able to make use
of entity schemas to take good care
-
of their data, and if you want
to learn how to do that,
-
there's a session later
in the program as well
-
by people who know all about this
and will make this less
-
of a black box for you.
-
Alright.
-
Another thing that really got traction
-
over the last year
is the Wikibase ecosystem, right?
-
This idea that not all open data
should and has to happen
-
in Wikidata, but instead, we want
a thriving ecosystem
-
of different places, of different actors,
-
like institutions, companies,
-
volunteer projects opening up their data
in a similar way
-
that Wikidata does it
and then connecting all of it,
-
exchanging data between those,
linking that data.
-
And over the last year,
the interest in that
-
and the interest in institutions
and people running
-
their own Wikibase instance
has really exploded,
-
and especially in the sector
of libraries.
-
There's a lot of testing, evaluating,
-
and to be honest, trailblazing,
-
going on there at the moment
where adventurous institutions
-
work with us to really figure out
how Wikibase can work
-
for their collections,
for their catalogues and so on.
-
Among them, the German National Library,
-
the French National Library,
-
OCLC and it's really exciting to see.
-
One of the reasons
why I think this is so exciting
-
is that we are helping these institutions
open up data in a way that is
-
not just putting it on a website
and someone can access it
-
but really thinking about this--
the next step after that, right?
-
Letting people help you maintain
that data, augment that data,
-
enrich it, and that's really a shift
-
that I hope will bring good things.
-
And the other thing it helps us with
-
is that it lets experts curate the data
-
in their space, keep it in good shape
so that we can then set up
-
synchronizing processes
to Wikidata, for example,
-
instead of having to take care of it
ourselves all the time.
-
And at the end of the day,
I hope it will take some pressure
-
off of Wikidata to be that place
where everything has to go.
-
Lexicographical data--
-
Over the last year,
people started describing words
-
in their language in Wikidata
so that we can build things
-
like automated translation tools,
-
and we are at the point
where in some languages
-
we are starting to get nearer
to reaching that critical mass
-
that is needed to actually
build a serious application.
-
In a lot of languages,
we still have a long way to go,
-
but in some,
we're really starting to get there,
-
and that's really great to see.
-
If you want to know more about this,
come to my session later today.
-
And, of course, not to forget,
-
structured data on Commons.
-
(audience member whistles)
-
Yes! (laughs)
-
(applause)
-
The structured data on Commons
seen at the foundation
-
has really gotten...
-
everything together and made it possible
-
to add statements to files
on Commons over the last year,
-
and people are starting to add
those statements to images
-
to then make it easier to find
to build better applications on top of it,
-
and so much more.
-
It's really exciting to see how
that is growing,
-
and I think what's really important
-
for the Wikidata community
to understand here
-
is that when you see "depicts"
-
or "house cat" or "sitting," "lizard"
and "wall" here,
-
those are links to Wikidata items
and properties.
-
That means when we create items
and properties,
-
those are no longer just providing
the vocabulary for Wikidata itself.
-
They are providing the vocabulary
for Commons as well.
-
And this will only get more and more so,
-
so we have to pay a lot more attention
-
to how our ontology, our vocabulary
-
is actually used in other places
than we had before.
-
And the last one I have is that
we've started building stronger bridges
-
to the other Wikimedia projects.
-
My team and I are working
on a project called the Wikidata Bridge,
-
and you should totally come
to the UX booth
-
and do some testing of the current state
-
that will have
for example Wikipedia editors
-
edit Wikidata directly
from their projects
-
without having to go to Wikidata
-
and having to understand
everything around it.
-
I hope that this will take away
one more hurdle that makes it difficult
-
for Wikimedia projects
to adopt more data from Wikidata.
-
Alright, now to strategies
and where are we going?
-
Since December, the Wikidata team
at Wikimedia Deutschland,
-
and people from the Wikimedia Foundation
have been working on strategies,
-
papers around Wikidata.
-
It's basically writing down
-
what a lot of us have been
talking about already
-
over the last four or five years.
-
And I don't know if all of you
have read those papers.
-
They're published on Meta Commons
until the end of the month.
-
It would be great
if you haven't read them,
-
go read them,
leave your comments and so on.
-
Now the very quick overview
of what is in there
-
is that we think about Wikidata
and Wikibase in three pieces.
-
The first one is Wikidata as a platform.
-
You can see it in the lower corner,
-
and that is really around
Wikidata enables every person
-
to access and share information
-
regardless of their language
and technology,
-
and we do that by providing
general purpose data about the world.
-
So basically what you do every day.
-
The second thing is
the Wikibase ecosystem part
-
where Wikibase, the software
running Wikidata, powers
-
not just Wikidata, but a thriving
open data web that is the backbone
-
of free and open knowledge.
-
And the third and last thing
is Wikidata for the Wikimedia projects
-
at the top where Wikidata is there
-
to help the Wikimedia projects--
-
help make them ready for the future.
-
Concretely, what does that mean
for the near or midterm future?
-
Wikidata as a platform--
-
We want to have better data quality,
so we will continue working
-
on better tools,
improving the tools we have and so on.
-
We need to make our data
more accessible
-
through better APIs,
a more robust SPARQL endpoint
-
but also things like more consistently
modeling our data
-
so it actually is easy to reuse
in applications.
-
And the last thing I had was
setting up feedback processes
-
with our partners.
-
Unlike Wikipedia, Wikidata is not
-
what I call a destination project, right?
-
Someone goes to Wikipedia and reads it
-
whereas Wikidata is usually not
-
someone goes to Wikidata and reads it.
-
It would be awesome,
-
but realistically
it's not what it is, right?
-
A lot of the people who are exposed
-
to our data are not on Wikidata itself,
-
but they are seeing it through Wikipedia
and many other places.
-
Now these other places do get feedback
on that data, right?
-
Their users tell them,
"Hey, here's something that's wrong,"
-
and I would like to have that
so that we can make it available
-
to the people who actually edit
on Wikidata, meaning you.
-
And figuring out how to do that
in a meaningful way
-
without overwhelming everyone
will be one of the things to do
-
over the next year.
-
Alright, Wikibase ecosystem.
-
There, we will continue to work
with the libraries,
-
but also look into science,
for example, and more.
-
There is a Wikibase showcase later today
that you should totally go to
-
and see what's already there
-
and what people are already doing
with Wikibase.
-
It's really worth it.
-
And what's needed there is
-
also setting up
good processes around that.
-
Helping people figure out
who to talk to about what,
-
where they can find help,
-
all these kinds of things.
-
And, of course, making it easier
to install and maintain
-
a Wikibase because that's still
a bit of a pain.
-
And the last thing is federation
which is basically
-
what we've been talking about
for Commons earlier
-
where Commons uses
Wikidata's items and properties
-
but for other Wikibase instances out there
-
so they can also use
Wikidata's vocabulary.
-
And that, as I was saying earlier,
increases yet again
-
the need to be mindful
of how our vocabulary is used out there
-
more than we have had to so far.
-
And Wikidata for the Wikimedia projects--
-
of course, tighter integration
through the Wikidata Bridge
-
and helping people edit directly
from their projects
-
and the other thing that we all need
to think about together, I think,
-
is figuring out how to reduce
the language barriers.
-
The more Wikidata is integrated
in the Wikimedia projects,
-
the more people will have
a need to talk to each other
-
about that data without
speaking the same language,
-
and we have to figure out
how to deal with that.
-
If people have smart ideas,
I would love to talk to you.
-
And with that,
I come to the end of my talk.
-
Thank you, everyone, for giving
more people more access
-
to more knowledge every day.
-
(applause)
-
We have some time for questions
-
so if there are any questions
in the audience
-
or if you are remotely watching
the livestream--Hi, Mom--
-
you can ask the question
on the EtherPad
-
or on the Telegram Channel
and we'll do our best.
-
So anything?
-
Ah.
-
(person 1) Hi, everyone, this is more
of a meme than a question,
-
so when the time extension
will be able to also to get
-
hours and minutes and seconds
-
because up till now
the position is just to date.
-
- I know... it's not my question--
- (laughing)
-
That's why I said it's a meme.
-
Every time is always like that,
-
but it comes always from remote so...
-
I do not have a very good answer to that.
-
I'm sorry.
-
But maybe as some background,
people need it even more
-
to describe images on Commons
so it might bubble up the long list
-
of things that need to be done
a bit faster through that.
-
Any more questions?
-
(person 2) [Linda] from Wikimedia
Foundation's research team--
-
I have a question about your thoughts
-
on patrolling, and that may be related
to quality of content on Wikidata,
-
but if you can speak to that
-
like how do you see the near medium term
patrolling efforts changing,
-
especially with the Bridge project
-
which I'm looking forward to
going out and trying it.
-
Yeah, thank you.
-
So as you say, with things
like we did at Bridge,
-
a lot more effort will have to be spent
on patrolling, I think.
-
But we are at a size where this
is probably not feasible
-
to do it by hand, by a human,
-
so we need to spend a lot more effort
on improving, for example,
-
ORES, the machine learning system
to help us with that,
-
to help us figure out which edits
a human really needs to look at
-
and which is probably just like yeah,
-
the regular stuff
I don't need to look at this.
-
Currently, ORES is not super good
at judging what--
-
if an edit on Wikidata is good or bad.
-
There's currently a campaign going on
-
that is training
the machine learning system,
-
with your help,
-
to teach it basically what a good edit is
-
and what a bad edit is,
-
and we haven't reached the threshold
of enough humans teaching it yet
-
to really improve it,
but if you have a few minutes,
-
it would be great if you help teach ORES
-
make better judgements
about Wikidata edits.
-
And it's really simple--
it shows you an edit,
-
and you say this is a good edit,
-
this is a bad edit, and that's it.
-
You can do this in front of the TV
in the evening on the couch.
-
(person 3) Share a link.
-
We will share a link
in the Telegram Group, yes.
-
And once we've reached
the threshold we need--
-
I think it's around 7,000,
but I might be wrong--
-
then we can rerun the training
for ORES and then it will be
-
hopefully considerably better
at judging the edits on Wikidata.
-
And then I hope more of you can use that
-
to filter recent changes, for example,
or your watch list
-
for edits that really need your attention.
-
Yeah.
-
Hi.
-
(person 4) I'm just curious to know,
and this is a question not from me,
-
but from partners
that I've been working with,
-
the more partners we have joining Wikidata
-
and starting to experiment with queries,
-
the more issues we are having
with timeout of queries
-
so what's happening with that?
-
So, some people
at the Wikimedia Foundation
-
are looking into that,
and--small spoiler--
-
be there for the birthday present session.
-
(laughter)
-
(person 5) Hello, I'm Bart Magnus
from Belgium (PACKED).
-
I would like to know
what the current state of affairs is
-
regarding federation
so raising your properties
-
in your own Wikibase instance--
-
is there anything to mention about that?
-
So over the last year,
a lot of people have told us
-
that they want federation, right?
-
But the problem was
that a lot of people understood
-
very different things
when they said federation.
-
Some of those things
were very easily doable.
-
Some of those things were
really, really hard.
-
And my team and I have been talking
to a lot of people, for example,
-
the partners we work with at libraries
to figure out what is it actually
-
precisely that they need.
-
And we finished that now,
though, of course, I'm happy
-
to take more feedback
if you want to talk to me about that,
-
and now I'm at a stage where
I'm comfortable to say,
-
"Okay, we're going to start with that."
-
And that will happen over the next
I would say two or three months
-
that we actually write
the first lines of code
-
and then hopefully have people able
-
to test it early next year, I would say.
-
(presenter) Okay, last questions.
-
(person 6) Finn Årup Nielsen
from Copenhagen, Denmark.
-
In relation to the other language,
there's been a sort of discussion
-
in the WikiCite community
about whether we should continue
-
to put more scientific papers in there--
-
this relates to how much data
we can put into Wikidata.
-
Timeout in the Wikidata Query Service
is one issue
-
but also the maintaining
-
so what are your thoughts about...
-
Is the size of Wikidata
beginning to be a problem
-
in general?
-
Should we stop putting in lexeme data?
-
Should we stop putting
in scientific data
-
into Wikidata or do we have
any research on this
-
or technical problems inflating?
-
Yeah...
-
Wikidata is definitely coming
to some...
-
scalability boundaries, let's say,
-
both technically and socially.
-
And for both we need solutions, right?
-
Socially, we have things like more editors
-
and recent changes to the point
where it's completely unfeasible
-
for a human to patrol that
because it's simply too much.
-
But also technically,
and we've been addressing some of that.
-
For example, some database
re-architecturing
-
around database view-turned table,
if that says anything for anyone.
-
But those only get us so far,
-
and one of the things we want
to look at next year
-
is where the other pain points are
and what to do about them
-
on the technical side.
-
So that's a general picture.
-
At the same time, I am very hesitant
-
to tell anyone, "No, no, no,
stop putting data into Wikidata."
-
That would kind of defeat the purpose.
-
But, for example, the Wikibase ecosystem
-
is one way to address that, right,
-
to not require everything
in Wikidata.
-
That's the whole beauty
of linked open data.
-
You don't have
to have it all in the same place.
-
You can connect different places.
-
It's amazing.
-
So around WikiCites specifically, yes--
-
okay, WikiCites specifically,
I think we need
-
to look at in proportion.
-
I don't have an exact percentage
of what percentage
-
of the items in Wikidata
are around WikiCite topics,
-
but it's a big percentage.
-
And maybe that's the thing
we need to talk about...
-
in the break.
-
Well, thank you very much!
-
(applause)