Hello, everyone.
It's awesome that you're all here,
so many of you.
It's really, really great.
So Lea already talked a lot
about this event,
and I'm going to talk a bit
about Wikidata itself
and what has been happening
around it over the last year
and where we are going.
So... what is this? Sorry.
So... where are we?
Where are we going?
Over the last year there has been
so much to celebrate
and I want to highlight some of that
because sometimes it goes unnoticed.
And first I want to take you through
some statistics around editors
and our content and how our data is used.
Over the last year,
we have grown our community
which is amazing.
We have around 3,000 new people
who edit once or more in 30 days.
So that's 3,000 new Wikidatans, yay!
Now if you look at people who do more,
like five edits in 30 days,
we've got an additional 1,200 roughly.
And if you look
at the people who do 100 edits or more--
I hope many of you in this room--
we have 300 more.
Raise your hand
if you're in this last group.
Woot! You're awesome!
And while the number of edits
is usually not something
we pay a lot of attention to,
we did cross
the 1 billion edits mark this year.
(applause)
Alright, let's look at content.
So, we're now at 65 million items,
so entities to describe the world,
and we're doing this
with around 6,700 properties.
Of those, around 4,300
are external identifiers,
which gives us a lot of linking
to other catalogues, databases,
websites and more
and really makes Wikidata
the central place
in a linked open data web.
So using those properties and items,
we have around 800 million statements now,
and compared to last year,
we know about half a statement more
about every single item.
(laughter)
So, yeah, Wikidata got smarter.
But we don't just have items
and properties,
we also have new stuff
like lexemes
and we are now at 204,000 lexemes
that describe words
in many different languages.
It's very cool.
I will talk more about this
in a session later today.
Last, the latest addition
are entity schemas
that help us figure out
how to consistently model data
across a certain area.
And of those, we have around 140 now.
Now numbers aren't everything
around content, right,
amount of content--we also care
about quality of the content.
And what we've done now is
we've trained a machine learning system
to judge the quality of an item.
Now this is far from perfect,
but it gives you an idea.
So every item in Wikidata gets a score
between 1 and 5.
One is pretty terrible; five is amazing.
And it looks at things
like how many statements does it have,
how many external identifiers
does it have,
how many references are there,
how many different labels are there
in different languages,
and so on.
And then we looked at Wikidata over time,
and as you can see,
based on these measures,
we went from pretty terrible
to much better.
(laughter)
So that's good.
But what you can also see,
there's still a lot of room to 5.
Now I don't think
this is where we will get to, right?
Not every item will be absolutely perfect
according to these measures
that we have taken.
But I'm really happy to see
that consistently the quality of our data
is getting better and better.
Okay, but creating that data isn't enough.
We want this--we do this for a reason.
We want it to be used.
And now we looked at how many articles
on each of the other Wikimedia projects
use data from Wikidata,
and we looked at the percentage
of all articles on those projects.
Now if you look across all of Wikimedia
and all of the articles there,
then 56.35% of them today
make use of some data from Wikidata.
Which I think is pretty good,
but of course,
there's still a lot of room to 100.
And then I looked at which projects
are actually making most use
of Wikidata's data,
and I split this
by language versions and so on.
And now what do you think
the top five projects--
which ones are all of them?
Which project family do they belong to?
(several in audience) Commons.
Okay, that's pretty uniformly Commons.
You would actually be wrong.
All of the top five are Wikivoyage.
(audience) Oh!
(laughter)
So yeah, applause to Wikivoyage.
(applause)
If you would like to check
where Commons actually is
and where all of your other projects are,
there is a dashboard.
Come to me and we can check it out.
Of course, inside Wikimedia is
not the only place where our data is used.
It's also used outside,
and so much has happened.
I can't begin to mention it all,
but to highlight some
there are great uses of our data
at the Met, at the Wellcome Trust,
at the Library of Congress,
in GeneWiki and so many more.
And if you go through some of the sessions
later in the program,
you will hear about some of them.
Alright, enough statistics.
Let's look at some other highlights.
So we already talked
about data quality improving,
and when you look at data quality,
there are a lot of dimensions
that you can look at,
and we've improved on some of those,
like how accurate is the data,
how trustworthy is the data,
how referenced is it,
how consistent is it modeled,
how completed is it and so on.
Just to pick out one--
for consistency for example,
we have created the ability to store
entity schemas now in Wikidata
so that you can describe
how certain domains should be modeled.
So you can find--
you can create an entity schema,
say, for Dutch painters,
and then you can look how--
which items that are for Dutch painters
do not, for example,
have a date of birth but should
and similar things like that.
And I hope that a lot more
wiki projects and so on
will be able to make use
of entity schemas to take good care
of their data, and if you want
to learn how to do that,
there's a session later
in the program as well
by people who know all about this
and will make this less
of a black box for you.
Alright.
Another thing that really got traction
over the last year
is the Wikibase ecosystem, right?
This idea that not all open data
should and has to happen
in Wikidata, but instead, we want
a thriving ecosystem
of different places, of different actors,
like institutions, companies,
volunteer projects opening up their data
in a similar way
that Wikidata does it
and then connecting all of it,
exchanging data between those,
linking that data.
And over the last year,
the interest in that
and the interest in institutions
and people running
their own Wikibase instance
has really exploded,
and especially in the sector
of libraries.
There's a lot of testing, evaluating,
and to be honest, trailblazing,
going on there at the moment
where adventurous institutions
work with us to really figure out
how Wikibase can work
for their collections,
for their catalogues and so on.
Among them, the German National Library,
the French National Library,
OCLC and it's really exciting to see.
One of the reasons
why I think this is so exciting
is that we are helping these institutions
open up data in a way that is
not just putting it on a website
and someone can access it
but really thinking about this--
the next step after that, right?
Letting people help you maintain
that data, augment that data,
enrich it, and that's really a shift
that I hope will bring good things.
And the other thing it helps us with
is that it lets experts curate the data
in their space, keep it in good shape
so that we can then set up
synchronizing processes
to Wikidata, for example,
instead of having to take care of it
ourselves all the time.
And at the end of the day,
I hope it will take some pressure
off of Wikidata to be that place
where everything has to go.
Lexicographical data--
Over the last year,
people started describing words
in their language in Wikidata
so that we can build things
like automated translation tools,
and we are at the point
where in some languages
we are starting to get nearer
to reaching that critical mass
that is needed to actually
build a serious application.
In a lot of languages,
we still have a long way to go,
but in some,
we're really starting to get there,
and that's really great to see.
If you want to know more about this,
come to my session later today.
And, of course, not to forget,
structured data on Commons.
(audience member whistles)
Yes! (laughs)
(applause)
The structured data on Commons
seen at the foundation
has really gotten...
everything together and made it possible
to add statements to files
on Commons over the last year,
and people are starting to add
those statements to images
to then make it easier to find
to build better applications on top of it,
and so much more.
It's really exciting to see how
that is growing,
and I think what's really important
for the Wikidata community
to understand here
is that when you see "depicts"
or "house cat" or "sitting," "lizard"
and "wall" here,
those are links to Wikidata items
and properties.
That means when we create items
and properties,
those are no longer just providing
the vocabulary for Wikidata itself.
They are providing the vocabulary
for Commons as well.
And this will only get more and more so,
so we have to pay a lot more attention
to how our ontology, our vocabulary
is actually used in other places
than we had before.
And the last one I have is that
we've started building stronger bridges
to the other Wikimedia projects.
My team and I are working
on a project called the Wikidata Bridge,
and you should totally come
to the UX booth
and do some testing of the current state
that will have
for example Wikipedia editors
edit Wikidata directly
from their projects
without having to go to Wikidata
and having to understand
everything around it.
I hope that this will take away
one more hurdle that makes it difficult
for Wikimedia projects
to adopt more data from Wikidata.
Alright, now to strategies
and where are we going?
Since December, the Wikidata team
at Wikimedia Deutschland,
and people from the Wikimedia Foundation
have been working on strategies,
papers around Wikidata.
It's basically writing down
what a lot of us have been
talking about already
over the last four or five years.
And I don't know if all of you
have read those papers.
They're published on Meta Commons
until the end of the month.
It would be great
if you haven't read them,
go read them,
leave your comments and so on.
Now the very quick overview
of what is in there
is that we think about Wikidata
and Wikibase in three pieces.
The first one is Wikidata as a platform.
You can see it in the lower corner,
and that is really around
Wikidata enables every person
to access and share information
regardless of their language
and technology,
and we do that by providing
general purpose data about the world.
So basically what you do every day.
The second thing is
the Wikibase ecosystem part
where Wikibase, the software
running Wikidata, powers
not just Wikidata, but a thriving
open data web that is the backbone
of free and open knowledge.
And the third and last thing
is Wikidata for the Wikimedia projects
at the top where Wikidata is there
to help the Wikimedia projects--
help make them ready for the future.
Concretely, what does that mean
for the near or midterm future?
Wikidata as a platform--
We want to have better data quality,
so we will continue working
on better tools,
improving the tools we have and so on.
We need to make our data
more accessible
through better APIs,
a more robust SPARQL endpoint
but also things like more consistently
modeling our data
so it actually is easy to reuse
in applications.
And the last thing I had was
setting up feedback processes
with our partners.
Unlike Wikipedia, Wikidata is not
what I call a destination project, right?
Someone goes to Wikipedia and reads it
whereas Wikidata is usually not
someone goes to Wikidata and reads it.
It would be awesome,
but realistically
it's not what it is, right?
A lot of the people who are exposed
to our data are not on Wikidata itself,
but they are seeing it through Wikipedia
and many other places.
Now these other places do get feedback
on that data, right?
Their users tell them,
"Hey, here's something that's wrong,"
and I would like to have that
so that we can make it available
to the people who actually edit
on Wikidata, meaning you.
And figuring out how to do that
in a meaningful way
without overwhelming everyone
will be one of the things to do
over the next year.
Alright, Wikibase ecosystem.
There, we will continue to work
with the libraries,
but also look into science,
for example, and more.
There is a Wikibase showcase later today
that you should totally go to
and see what's already there
and what people are already doing
with Wikibase.
It's really worth it.
And what's needed there is
also setting up
good processes around that.
Helping people figure out
who to talk to about what,
where they can find help,
all these kinds of things.
And, of course, making it easier
to install and maintain
a Wikibase because that's still
a bit of a pain.
And the last thing is federation
which is basically
what we've been talking about
for Commons earlier
where Commons uses
Wikidata's items and properties
but for other Wikibase instances out there
so they can also use
Wikidata's vocabulary.
And that, as I was saying earlier,
increases yet again
the need to be mindful
of how our vocabulary is used out there
more than we have had to so far.
And Wikidata for the Wikimedia projects--
of course, tighter integration
through the Wikidata Bridge
and helping people edit directly
from their projects
and the other thing that we all need
to think about together, I think,
is figuring out how to reduce
the language barriers.
The more Wikidata is integrated
in the Wikimedia projects,
the more people will have
a need to talk to each other
about that data without
speaking the same language,
and we have to figure out
how to deal with that.
If people have smart ideas,
I would love to talk to you.
And with that,
I come to the end of my talk.
Thank you, everyone, for giving
more people more access
to more knowledge every day.
(applause)
We have some time for questions
so if there are any questions
in the audience
or if you are remotely watching
the livestream--Hi, Mom--
you can ask the question
on the EtherPad
or on the Telegram Channel
and we'll do our best.
So anything?
Ah.
(person 1) Hi, everyone, this is more
of a meme than a question,
so when the time extension
will be able to also to get
hours and minutes and seconds
because up till now
the position is just to date.
- I know... it's not my question--
- (laughing)
That's why I said it's a meme.
Every time is always like that,
but it comes always from remote so...
I do not have a very good answer to that.
I'm sorry.
But maybe as some background,
people need it even more
to describe images on Commons
so it might bubble up the long list
of things that need to be done
a bit faster through that.
Any more questions?
(person 2) [Linda] from Wikimedia
Foundation's research team--
I have a question about your thoughts
on patrolling, and that may be related
to quality of content on Wikidata,
but if you can speak to that
like how do you see the near medium term
patrolling efforts changing,
especially with the Bridge project
which I'm looking forward to
going out and trying it.
Yeah, thank you.
So as you say, with things
like we did at Bridge,
a lot more effort will have to be spent
on patrolling, I think.
But we are at a size where this
is probably not feasible
to do it by hand, by a human,
so we need to spend a lot more effort
on improving, for example,
ORES, the machine learning system
to help us with that,
to help us figure out which edits
a human really needs to look at
and which is probably just like yeah,
the regular stuff
I don't need to look at this.
Currently, ORES is not super good
at judging what--
if an edit on Wikidata is good or bad.
There's currently a campaign going on
that is training
the machine learning system,
with your help,
to teach it basically what a good edit is
and what a bad edit is,
and we haven't reached the threshold
of enough humans teaching it yet
to really improve it,
but if you have a few minutes,
it would be great if you help teach ORES
make better judgements
about Wikidata edits.
And it's really simple--
it shows you an edit,
and you say this is a good edit,
this is a bad edit, and that's it.
You can do this in front of the TV
in the evening on the couch.
(person 3) Share a link.
We will share a link
in the Telegram Group, yes.
And once we've reached
the threshold we need--
I think it's around 7,000,
but I might be wrong--
then we can rerun the training
for ORES and then it will be
hopefully considerably better
at judging the edits on Wikidata.
And then I hope more of you can use that
to filter recent changes, for example,
or your watch list
for edits that really need your attention.
Yeah.
Hi.
(person 4) I'm just curious to know,
and this is a question not from me,
but from partners
that I've been working with,
the more partners we have joining Wikidata
and starting to experiment with queries,
the more issues we are having
with timeout of queries
so what's happening with that?
So, some people
at the Wikimedia Foundation
are looking into that,
and--small spoiler--
be there for the birthday present session.
(laughter)
(person 5) Hello, I'm Bart Magnus
from Belgium (PACKED).
I would like to know
what the current state of affairs is
regarding federation
so raising your properties
in your own Wikibase instance--
is there anything to mention about that?
So over the last year,
a lot of people have told us
that they want federation, right?
But the problem was
that a lot of people understood
very different things
when they said federation.
Some of those things
were very easily doable.
Some of those things were
really, really hard.
And my team and I have been talking
to a lot of people, for example,
the partners we work with at libraries
to figure out what is it actually
precisely that they need.
And we finished that now,
though, of course, I'm happy
to take more feedback
if you want to talk to me about that,
and now I'm at a stage where
I'm comfortable to say,
"Okay, we're going to start with that."
And that will happen over the next
I would say two or three months
that we actually write
the first lines of code
and then hopefully have people able
to test it early next year, I would say.
(presenter) Okay, last questions.
(person 6) Finn Årup Nielsen
from Copenhagen, Denmark.
In relation to the other language,
there's been a sort of discussion
in the WikiCite community
about whether we should continue
to put more scientific papers in there--
this relates to how much data
we can put into Wikidata.
Timeout in the Wikidata Query Service
is one issue
but also the maintaining
so what are your thoughts about...
Is the size of Wikidata
beginning to be a problem
in general?
Should we stop putting in lexeme data?
Should we stop putting
in scientific data
into Wikidata or do we have
any research on this
or technical problems inflating?
Yeah...
Wikidata is definitely coming
to some...
scalability boundaries, let's say,
both technically and socially.
And for both we need solutions, right?
Socially, we have things like more editors
and recent changes to the point
where it's completely unfeasible
for a human to patrol that
because it's simply too much.
But also technically,
and we've been addressing some of that.
For example, some database
re-architecturing
around database view-turned table,
if that says anything for anyone.
But those only get us so far,
and one of the things we want
to look at next year
is where the other pain points are
and what to do about them
on the technical side.
So that's a general picture.
At the same time, I am very hesitant
to tell anyone, "No, no, no,
stop putting data into Wikidata."
That would kind of defeat the purpose.
But, for example, the Wikibase ecosystem
is one way to address that, right,
to not require everything
in Wikidata.
That's the whole beauty
of linked open data.
You don't have
to have it all in the same place.
You can connect different places.
It's amazing.
So around WikiCites specifically, yes--
okay, WikiCites specifically,
I think we need
to look at in proportion.
I don't have an exact percentage
of what percentage
of the items in Wikidata
are around WikiCite topics,
but it's a big percentage.
And maybe that's the thing
we need to talk about...
in the break.
Well, thank you very much!
(applause)