Hello, everyone.

It's awesome that you're all here,
so many of you.

It's really, really great.

So Lea already talked a lot 
about this event,

and I'm going to talk a bit 
about Wikidata itself

and what has been happening 
around it over the last year

and where we are going.

So... what is this? Sorry.

So... where are we?
Where are we going?

Over the last year there has been 
so much to celebrate

and I want to highlight some of that

because sometimes it goes unnoticed.

And first I want to take you through
some statistics around editors

and our content and how our data is used.

Over the last year, 
we have grown our community

which is amazing.

We have around 3,000 new people

who edit once or more in 30 days.

So that's 3,000 new Wikidatans, yay!

Now if you look at people who do more,
like five edits in 30 days,

we've got an additional 1,200 roughly.

And if you look 
at the people who do 100 edits or more--

I hope many of you in this room--

we have 300 more.

Raise your hand 
if you're in this last group.

Woot! You're awesome!

And while the number of edits 
is usually not something

we pay a lot of attention to,

we did cross 
the 1 billion edits mark this year.

(applause)

Alright, let's look at content.

So, we're now at 65 million items,

so entities to describe the world,

and we're doing this 
with around 6,700 properties.

Of those, around 4,300 
are external identifiers,

which gives us a lot of linking
to other catalogues, databases,

websites and more

and really makes Wikidata 
the central place

in a linked open data web.

So using those properties and items,

we have around 800 million statements now,

and compared to last year,
we know about half a statement more

about every single item.

(laughter)

So, yeah, Wikidata got smarter.

But we don't just have items
and properties,

we also have new stuff 
like lexemes

and we are now at 204,000 lexemes
that describe words

in many different languages.

It's very cool.

I will talk more about this
in a session later today.

Last, the latest addition 
are entity schemas

that help us figure out 
how to consistently model data

across a certain area.

And of those, we have around 140 now.

Now numbers aren't everything 
around content, right,

amount of content--we also care
about quality of the content.

And what we've done now is 
we've trained a machine learning system

to judge the quality of an item.

Now this is far from perfect,
but it gives you an idea.

So every item in Wikidata gets a score
between 1 and 5.

One is pretty terrible; five is amazing.

And it looks at things
like how many statements does it have,

how many external identifiers
does it have,

how many references are there,

how many different labels are there
in different languages,

and so on.

And then we looked at Wikidata over time,

and as you can see, 
based on these measures,

we went from pretty terrible
to much better.

(laughter)

So that's good.

But what you can also see,
there's still a lot of room to 5.

Now I don't think 
this is where we will get to, right?

Not every item will be absolutely perfect

according to these measures 
that we have taken.

But I'm really happy to see 
that consistently the quality of our data

is getting better and better.

Okay, but creating that data isn't enough.

We want this--we do this for a reason.

We want it to be used.

And now we looked at how many articles

on each of the other Wikimedia projects
use data from Wikidata,

and we looked at the percentage 
of all articles on those projects.

Now if you look across all of Wikimedia

and all of the articles there,

then 56.35% of them today
make use of some data from Wikidata.

Which I think is pretty good,

but of course, 
there's still a lot of room to 100.

And then I looked at which projects
are actually making most use

of Wikidata's data,

and I split this 
by language versions and so on.

And now what do you think
the top five projects--

which ones are all of them?

Which project family do they belong to?

(several in audience) Commons.

Okay, that's pretty uniformly Commons.

You would actually be wrong.

All of the top five are Wikivoyage.

(audience) Oh!

(laughter)

So yeah, applause to Wikivoyage.

(applause)

If you would like to check 
where Commons actually is

and where all of your other projects are,

there is a dashboard.

Come to me and we can check it out.

Of course, inside Wikimedia is 
not the only place where our data is used.

It's also used outside,
and so much has happened.

I can't begin to mention it all,
but to highlight some

there are great uses of our data 
at the Met, at the Wellcome Trust,

at the Library of Congress,

in GeneWiki and so many more.

And if you go through some of the sessions
later in the program,

you will hear about some of them.

Alright, enough statistics.

Let's look at some other highlights.

So we already talked 
about data quality improving,

and when you look at data quality,
there are a lot of dimensions

that you can look at,
and we've improved on some of those,

like how accurate is the data,

how trustworthy is the data,

how referenced is it,

how consistent is it modeled,

how completed is it and so on.

Just to pick out one-- 
for consistency for example,

we have created the ability to store 
entity schemas now in Wikidata

so that you can describe
how certain domains should be modeled.

So you can find--

you can create an entity schema,
say, for Dutch painters,

and then you can look how--

which items that are for Dutch painters

do not, for example, 
have a date of birth but should

and similar things like that.

And I hope that a lot more 
wiki projects and so on

will be able to make use 
of entity schemas to take good care

of their data, and if you want 
to learn how to do that,

there's a session later
in the program as well

by people who know all about this
and will make this less

of a black box for you.

Alright.

Another thing that really got traction

over the last year 
is the Wikibase ecosystem, right?

This idea that not all open data 
should and has to happen

in Wikidata, but instead, we want
a thriving ecosystem

of different places, of different actors,

like institutions, companies,

volunteer projects opening up their data
in a similar way

that Wikidata does it 
and then connecting all of it,

exchanging data between those,
linking that data.

And over the last year,
the interest in that

and the interest in institutions
and people running

their own Wikibase instance 
has really exploded,

and especially in the sector
of libraries.

There's a lot of testing, evaluating,

and to be honest, trailblazing,

going on there at the moment
where adventurous institutions

work with us to really figure out
how Wikibase can work

for their collections, 
for their catalogues and so on.

Among them, the German National Library,

the French National Library,

OCLC and it's really exciting to see.

One of the reasons
why I think this is so exciting

is that we are helping these institutions 
open up data in a way that is

not just putting it on a website 
and someone can access it

but really thinking about this--
the next step after that, right?

Letting people help you maintain
that data, augment that data,

enrich it, and that's really a shift

that I hope will bring good things.

And the other thing it helps us with

is that it lets experts curate the data

in their space, keep it in good shape 
so that we can then set up

synchronizing processes 
to Wikidata, for example,

instead of having to take care of it 
ourselves all the time.

And at the end of the day,
I hope it will take some pressure

off of Wikidata to be that place 
where everything has to go.

Lexicographical data--

Over the last year, 
people started describing words

in their language in Wikidata
so that we can build things

like automated translation tools,

and we are at the point 
where in some languages

we are starting to get nearer 
to reaching that critical mass

that is needed to actually 
build a serious application.

In a lot of languages, 
we still have a long way to go,

but in some, 
we're really starting to get there,

and that's really great to see.

If you want to know more about this,
come to my session later today.

And, of course, not to forget,

structured data on Commons.

(audience member whistles)

Yes! (laughs)

(applause)

The structured data on Commons 
seen at the foundation

has really gotten...

everything together and made it possible

to add statements to files 
on Commons over the last year,

and people are starting to add
those statements to images

to then make it easier to find 
to build better applications on top of it,

and so much more.

It's really exciting to see how 
that is growing,

and I think what's really important

for the Wikidata community 
to understand here

is that when you see "depicts"

or "house cat" or "sitting," "lizard"
and "wall" here,

those are links to Wikidata items
and properties.

That means when we create items 
and properties,

those are no longer just providing 
the vocabulary for Wikidata itself.

They are providing the vocabulary
for Commons as well.

And this will only get more and more so,

so we have to pay a lot more attention

to how our ontology, our vocabulary

is actually used in other places
than we had before.

And the last one I have is that
we've started building stronger bridges

to the other Wikimedia projects.

My team and I are working 
on a project called the Wikidata Bridge,

and you should totally come 
to the UX booth

and do some testing of the current state

that will have 
for example Wikipedia editors

edit Wikidata directly 
from their projects

without having to go to Wikidata

and having to understand 
everything around it.

I hope that this will take away 
one more hurdle that makes it difficult

for Wikimedia projects 
to adopt more data from Wikidata.

Alright, now to strategies 
and where are we going?

Since December, the Wikidata team
at Wikimedia Deutschland,

and people from the Wikimedia Foundation
have been working on strategies,

papers around Wikidata.

It's basically writing down

what a lot of us have been 
talking about already

over the last four or five years.

And I don't know if all of you
have read those papers.

They're published on Meta Commons
until the end of the month.

It would be great 
if you haven't read them,

go read them, 
leave your comments and so on.

Now the very quick overview
of what is in there

is that we think about Wikidata 
and Wikibase in three pieces.

The first one is Wikidata as a platform.

You can see it in the lower corner,

and that is really around 
Wikidata enables every person

to access and share information

regardless of their language 
and technology,

and we do that by providing 
general purpose data about the world.

So basically what you do every day.

The second thing is 
the Wikibase ecosystem part

where Wikibase, the software 
running Wikidata, powers

not just Wikidata, but a thriving 
open data web that is the backbone

of free and open knowledge.

And the third and last thing
is Wikidata for the Wikimedia projects

at the top where Wikidata is there

to help the Wikimedia projects--

help make them ready for the future.

Concretely, what does that mean
for the near or midterm future?

Wikidata as a platform--

We want to have better data quality,
so we will continue working

on better tools, 
improving the tools we have and so on.

We need to make our data 
more accessible

through better APIs, 
a more robust SPARQL endpoint

but also things like more consistently
modeling our data

so it actually is easy to reuse
in applications.

And the last thing I had was 
setting up feedback processes

with our partners.

Unlike Wikipedia, Wikidata is not

what I call a destination project, right?

Someone goes to Wikipedia and reads it

whereas Wikidata is usually not

someone goes to Wikidata and reads it.

It would be awesome,

but realistically 
it's not what it is, right?

A lot of the people who are exposed

to our data are not on Wikidata itself,

but they are seeing it through Wikipedia
and many other places.

Now these other places do get feedback 
on that data, right?

Their users tell them, 
"Hey, here's something that's wrong,"

and I would like to have that
so that we can make it available

to the people who actually edit 
on Wikidata, meaning you.

And figuring out how to do that 
in a meaningful way

without overwhelming everyone
will be one of the things to do

over the next year.

Alright, Wikibase ecosystem.

There, we will continue to work 
with the libraries,

but also look into science, 
for example, and more.

There is a Wikibase showcase later today
that you should totally go to

and see what's already there

and what people are already doing
with Wikibase.

It's really worth it.

And what's needed there is

also setting up 
good processes around that.

Helping people figure out
who to talk to about what,

where they can find help,

all these kinds of things.

And, of course, making it easier
to install and maintain

a Wikibase because that's still 
a bit of a pain.

And the last thing is federation
which is basically

what we've been talking about
for Commons earlier

where Commons uses 
Wikidata's items and properties

but for other Wikibase instances out there

so they can also use 
Wikidata's vocabulary.

And that, as I was saying earlier,
increases yet again

the need to be mindful
of how our vocabulary is used out there

more than we have had to so far.

And Wikidata for the Wikimedia projects--

of course, tighter integration
through the Wikidata Bridge

and helping people edit directly 
from their projects

and the other thing that we all need
to think about together, I think,

is figuring out how to reduce 
the language barriers.

The more Wikidata is integrated 
in the Wikimedia projects,

the more people will have
a need to talk to each other

about that data without 
speaking the same language,

and we have to figure out 
how to deal with that.

If people have smart ideas,
I would love to talk to you.

And with that, 
I come to the end of my talk.

Thank you, everyone, for giving
more people more access

to more knowledge every day.

(applause)

We have some time for questions

so if there are any questions 
in the audience

or if you are remotely watching 
the livestream--Hi, Mom--

you can ask the question
on the EtherPad

or on the Telegram Channel
and we'll do our best.

So anything?

Ah.

(person 1) Hi, everyone, this is more
of a meme than a question,

so when the time extension
will be able to also to get

hours and minutes and seconds

because up till now 
the position is just to date.

- I know... it's not my question--
- (laughing)

That's why I said it's a meme.

Every time is always like that,

but it comes always from remote so...

I do not have a very good answer to that.

I'm sorry.

But maybe as some background,
people need it even more

to describe images on Commons
so it might bubble up the long list

of things that need to be done
a bit faster through that.

Any more questions?

(person 2) [Linda] from Wikimedia 
Foundation's research team--

I have a question about your thoughts

on patrolling, and that may be related
to quality of content on Wikidata,

but if you can speak to that

like how do you see the near medium term
patrolling efforts changing,

especially with the Bridge project

which I'm looking forward to
going out and trying it.

Yeah, thank you.

So as you say, with things 
like we did at Bridge,

a lot more effort will have to be spent
on patrolling, I think.

But we are at a size where this 
is probably not feasible

to do it by hand, by a human,

so we need to spend a lot more effort 
on improving, for example,

ORES, the machine learning system
to help us with that,

to help us figure out which edits
a human really needs to look at

and which is probably just like yeah,

the regular stuff 
I don't need to look at this.

Currently, ORES is not super good 
at judging what--

if an edit on Wikidata is good or bad.

There's currently a campaign going on

that is training 
the machine learning system,

with your help,

to teach it basically what a good edit is

and what a bad edit is,

and we haven't reached the threshold 
of enough humans teaching it yet

to really improve it,
but if you have a few minutes,

it would be great if you help teach ORES

make better judgements
about Wikidata edits.

And it's really simple--
it shows you an edit,

and you say this is a good edit,

this is a bad edit, and that's it.

You can do this in front of the TV
in the evening on the couch.

(person 3) Share a link.

We will share a link 
in the Telegram Group, yes.

And once we've reached 
the threshold we need--

I think it's around 7,000,
but I might be wrong--

then we can rerun the training
for ORES and then it will be

hopefully considerably better 
at judging the edits on Wikidata.

And then I hope more of you can use that

to filter recent changes, for example,
or your watch list

for edits that really need your attention.

Yeah.

Hi.

(person 4) I'm just curious to know,
and this is a question not from me,

but from partners 
that I've been working with,

the more partners we have joining Wikidata

and starting to experiment with queries,

the more issues we are having 
with timeout of queries

so what's happening with that?

So, some people 
at the Wikimedia Foundation

are looking into that,
and--small spoiler--

be there for the birthday present session.

(laughter)

(person 5) Hello, I'm Bart Magnus
from Belgium (PACKED).

I would like to know 
what the current state of affairs is

regarding federation 
so raising your properties

in your own Wikibase instance--

is there anything to mention about that?

So over the last year, 
a lot of people have told us

that they want federation, right?

But the problem was
that a lot of people understood

very different things 
when they said federation.

Some of those things 
were very easily doable.

Some of those things were 
really, really hard.

And my team and I have been talking
to a lot of people, for example,

the partners we work with at libraries
to figure out what is it actually

precisely that they need.

And we finished that now,
though, of course, I'm happy

to take more feedback 
if you want to talk to me about that,

and now I'm at a stage where 
I'm comfortable to say,

"Okay, we're going to start with that."

And that will happen over the next
I would say two or three months

that we actually write 
the first lines of code

and then hopefully have people able

to test it early next year, I would say.

(presenter) Okay, last questions.

(person 6) Finn Årup Nielsen 
from Copenhagen, Denmark.

In relation to the other language,
there's been a sort of discussion

in the WikiCite community
about whether we should continue

to put more scientific papers in there--

this relates to how much data
we can put into Wikidata.

Timeout in the Wikidata Query Service
is one issue

but also the maintaining

so what are your thoughts about...

Is the size of Wikidata
beginning to be a problem

in general?

Should we stop putting in lexeme data?

Should we stop putting
in scientific data

into Wikidata or do we have 
any research on this

or technical problems inflating?

Yeah...

Wikidata is definitely coming
to some...

scalability boundaries, let's say,

both technically and socially.

And for both we need solutions, right?

Socially, we have things like more editors

and recent changes to the point 
where it's completely unfeasible

for a human to patrol that
because it's simply too much.

But also technically, 
and we've been addressing some of that.

For example, some database 
re-architecturing

around database view-turned table,
if that says anything for anyone.

But those only get us so far,

and one of the things we want
to look at next year

is where the other pain points are
and what to do about them

on the technical side.

So that's a general picture.

At the same time, I am very hesitant

to tell anyone, "No, no, no,
stop putting data into Wikidata."

That would kind of defeat the purpose.

But, for example, the Wikibase ecosystem

is one way to address that, right,

to not require everything 
in Wikidata.

That's the whole beauty 
of linked open data.

You don't have 
to have it all in the same place.

You can connect different places.

It's amazing.

So around WikiCites specifically, yes--

okay, WikiCites specifically,
I think we need

to look at in proportion.

I don't have an exact percentage
of what percentage

of the items in Wikidata 
are around WikiCite topics,

but it's a big percentage.

And maybe that's the thing
we need to talk about...

in the break.

Well, thank you very much!

(applause)