Hello.
The two of us are starting
a level on a side-effect
or side-project or whatever,
something which
is loosely connected to Wikidata,
which is open data
and we're glad to see you're here.
I'm Alice Wiegand.
I'm the project lead for open data
in the municipality of Düsseldorf,
and this is Knut Huhne, who is a student.
You may introduce yourself.
Yeah, I'm a software developer by day,
and in my spare time
I do a lot of work at Code for Germany,
which is in community organization
that I'll talk a bit about,
and we try to build civic tech tools
based on open data.
Yeah, that's exactly what we need.
And so let's see where we are [on this].
[inaudible]
So if we talk about open government data,
this is something where I think
the entire world is much more forward
than Europe and especially Germany is.
But in Germany,
where we both come from and live,
this is getting some dynamics
because laws are changing.
And overall, we have just data
which is used, produced,
and cared
and maintained by government,
which is just a reliable data source,
and it's official data with a high value,
and it is sometimes
really surprising to see
what kind of data there is,
openly kind of published.
So this is, for example...
I hope it opens soon.
This, for example, is...
it's the measure of radioactivity in kale.
And I think it's surprising,
I wonder why is it kale
and not red cabbage?
And I wonder why is this a fixed date?
You know, 20th of November in 2013.
And I wonder why is it that far away?
What are we doing
with radioactivity in kale today?
I don't know.
So you find a lot
of these surprising things
when you start to...
What have I to do, do you know?
...when you start to
look at open data in Germany.
I'm confused with this computer.
Oh, yes. Thanks.
Yeah, and this data usually is up to date.
Well, it should be, of course.
As in all data, we have our gaps there.
And overall if I just look
on the region I know best,
we have 86
of singular portals
with open data within Germany,
which is on municipality level,
on the country level,
on the federal country level,
and on state level.
And in Austria, it's 19;
and in Switzerland, it's 6,
and numbers are growing.
So, of course, also,
question is why are we all doing
the same thing on different places?
It doesn't seem to be that efficient,
I'm not sure, but this is how
our world today works.
So now I find the right key, thanks.
And there are a lot of challenges
which we have to face
and kind of a huge gap
between wish and reality.
So, after all, I do think there is a huge,
you know, kind of [friendliness]
between open data and Wikidata.
It's all about essential data.
It is about being as actual
or being as up to date as possible.
But in the end, when we look
at the open data platforms
in mostly Europe,
we find incompatible licenses.
So usually mainly municipalities
choose a BY license,
because they think it would be good
to know where this data came from
and to be named there.
And this is really a crazy thing.
I looked at open data portals,
and we have a portal in Düsseldorf
for two years now
and by design, we choose the 0 license.
And I found that open data in Zurich--
Okay, it's not Germany, but it's Zurich--
and they are doing
a lot of cool stuff there as well.
And they also use the 0 license.
But usually municipalities
like CC BY licenses, sadly.
And another thing we have to face
is that, especially in municipalities,
this kind of task to publish
this internal data
on a free and open license,
on a platform, wherever,
is just given to a person
who usually does something else.
So it's not, you know, a 100 person task
for this person to do,
but something to do, you know,
with all the other things.
Overall, I think we can say
that of course there are people
who are really doing a great job.
Usually, we don't find
that level of expertise
on data analysis and data management
that we would need to
to really find high-quality data
within the open data
which comes from governances.
And I think this is a problem,
and I realized also
that there's a language issue.
So if I just think about
putting my colleagues into this room,
into the session we had just before,
about data quality,
it would be problematic
to find a common language,
to figure out how we can start
to improve our data quality
so that Wikidata's data quality
is also improved.
Another thing
is that we have no standards
in the name of anthologies,
in the name of how we prepare data.
There is a metadata standard,
which is great,
but this, after all, does not mean
that we all do the same thing
and that we find the same kind of data,
just because it is named in the same way.
But, overall, it's a lot of official data.
You can get from open data.
I made an example here
which is about street names,
and usually you find a lot
of different forms and street names.
Sometimes something like the Karlsplatz
it's written with a C,
or with a K, or separated,
and sometimes this is also developing
over the time.
And in the end, there's just
only one official name
of a place or of a street,
and it's the municipality
which can give you that name.
And this part, like a list
of official street names
is something which is regularly published
by a lot of municipalities
in their open data portals.
And I think that at all
is a good start to figure out
what we can do with this
in Wikidata as well.
So this is my short introduction,
and I'm happy to hear about
community work with open data.
Yeah, I thought I would just kind of give
a quick introduction from the other side,
of movement from the community side.
So, as I said, I work in my spare time
for an organization
called Code for Germany.
We've been running since about five years
where we have labs,
that is groups of people
that meet once a week,
some once a month in Germany
in local, what we call labs.
And we try to build tools
that somehow make it easier
for people to participate in politics,
to get an understanding
of the environment around them,
to collect data about air pollution.
And, of course, we'd like to use
governmentally provided
open data for that,
but we've also realized
that there's difficulties with that,
that sometimes the data isn't there,
it's under a difficult license,
which is kind of how we found our way
to Wikidata also, I think.
We also happened to meet in Berlin
in the offices of Wikimedia Deutschland,
so this kind of brought us
very close to Wikidata.
And I think it's cool to see
that we're kind of strengthening
the relationship
between the Wikidata community in Germany
and the Code for Germany community.
We also would like to work
even closer with the government,
but talking about bridging gaps.
I mean, there's very basic problems
such as us meeting after we work
and the people for the government
wanting to meet when they work.
So I think when we think about
how these communities can work together,
there's very mundane things,
such as working times,
that we need to keep in mind.
So just a quick introduction
to what we do at Code for Germany
especially with regards to Wikidata.
We've had a couple of hackathons now
within the last years
where people from the Wikidata community
and the Code for Germany community
kind of came together to meet
and just spend a weekend
to work on Wikidata.
And we've done
all kinds of different things.
We've usually been very interested
in political data,
so we've been importing a lot of data
regarding politicians
and regarding elections.
We've thought about how to model
election data in Wikidata a lot
and we've also had a lot of people
that built games with Wikidata.
One of the nice examples for this
would be the Wikidata card game,
where you can put in any Q number
and you get a nice trading card game.
You might have seen that.
If not, I encourage you to look for that.
I think that's a really cool way
to sell Wikidata to other people.
Selling-- this is also
something that we've realized
when we talk to data providers,
that often they're quite scared
to give data to you
with the traditional argument
of "Our data is so complicated,
you won't understand it,
and you'll build bad applications
that will make us look bad."
And our strategy usually
is to just take the data anyway,
build an application share it with them,
and then their response is usually,
"Oh, this is pretty cool.
Can we link to that from our website?"
And then, at some point,
maybe you can start having
a discussion with them.
But, yeah, I think this is kind
of what we can do as a community.
We can build little small games
and tools to showcase.
Okay, there is Wikidata,
and it's pretty cool,
and you have open data,
and we can build cool things with it,
but you'll need to give it to us,
you'll need to publish it
under a license that we can work with.
And this is one of the things
that we try to do at Code for Germany.
[inaudible], thanks.
(applause)
Yeah, thank you.
Before we open
the room for questions from you,
we would like to just open
or ask some questions to you.
I think that Knut has really described
the challenges we face quite well.
But, still, I do think there's a lot
of opportunities in these data,
and we just need to kind of harvest it
better than we do it right now.
And so my questions--
and maybe it helps you a bit
to think about that--
is how could we integrate
more open government data
into Wikidata in a more structured way.
Just keeping in mind that the people
who are kind of providing these data
are not the experts you may expect.
And at the same time,
there already is a WikiProject,
open government data,
and I'm not sure if you, Christina
had opened it quite a while ago.
And I wonder in which way we can
kind of reanimate it
and make the best out of it
because we still have this place,
and we have people
who are engaged
in the municipalities, in governments,
to open up data.
And maybe it's an opportunity
to just match these different
languages and expectations.
So, yeah, I'm open
for any ideas to do that,
and I'm happy to engage
a bit in that as well.
So, questions?
(person 1) Hi, thank you, guys.
Maybe an idea is one
we could be taking
from the Wikipedia beginnings,
where I think it was Matthias Schindler,
who started
with his Content Liberation Army.
And the idea that,
you know, you have to really go in,
and the data is there.
But for example,
I had a project with a student
where we were looking
at where the trees
are geolocated in Berlin,
and this is sometimes on paper,
it's sometimes on a stupid database.
We were accused of being terrorists
by the people who didn't want
to give us the data.
We had to get really, really
picky about this and point to the laws
saying, "This is open data,
and you have to give it to us."
but we have to sort of go in friendly,
as you were saying
and try and explain to them
what they will have from it.
Many of them don't see
that they have a use of it
because it's more work for them
having to deal with us.
I think that's one
of the main kind of fears
which is there are coming people
who are just putting more work onto us.
And at the same time,
there's so little understanding
that this is just part
of what they are doing already.
And that they can really also
learn and get a lot of input
from the people
who are asking about that data.
But this is really culture change,
a cultural change
especially here in Germany.
So we are working on it.
We are working hard,
but it's really kind of a tough thing.
- Maybe I can add?
- Yes.
I think what's also
really interesting to see
from the community's perspective
is that when we talk to different cities,
it so depends on who happens
to work in the cities.
Like we have this very small city of Moers
that is very unknown,
but if you talk to people
in the open data community,
everyone will know it
because they happen to pay someone
to do work on open data.
And when I talk to people
from the government in Berlin,
they tell me, "Okay, I now know
I have to publish open data,
but I don't know how, for whom, or why.
And I think this is actually
a chance for the smaller cities
to kind of champion this idea
because it's so much easier for them
to kind of get a movement
and to liberate some data
where if we talk in Berlin,
we always need to talk to 12 districts,
and they'll never align
on what data they want to publish.
(person 2) And we have a remote comment
from Beat Estermann
who wants to point out
he has some links in Etherpad
about "Interest in open government data
helps Swiss authorities
prioritize base registers
and controlled vocabularies."
And I'm told he just came in
while I'm reading his Etherpad entry.
So if you could just take the mic from me.
(person 2) Go on.
(Beat) Okay, thank you.
I missed the first introduction.
What did you start on?
- (person 2) I was just reading--
- (Beat) Oh, you were reading. Okay.
So we're currently running--
In Switzerland, we're running a survey
to kind of prioritize data
from within the government.
There are like base registers
or controlled vocabularies.
Because we think
that they would be crucial
to actually promote and boost
the publication of linked open data
across the public authorities,
so we're running a server
to prioritize them.
And for some authorities
to know which ones to publish now
and for others--
for the community to know
where to put pressure on
and how to actually,
yeah, argue why they should publish it.
We're also collecting use cases.
I posted the link to the Etherpad.
It's in German and French only,
the questionnaires.
I'm sorry we're still not like up
five language count here,
but you said four languages-
(person 3) Just switch to English.
(Beat) Yeah, we could switch
to English, right.
Yeah, so that's one point.
The other point I think is we could...
and I'll put a little bit more love
into kind of documenting
the whole Wiki project,
open common data,
and that's something
we're not really doing
if you compare it
to what is going on in GLAM.
I think that is definitely something
which I probably will try to figure out
after my vacation time,
which is starting on Monday.
There is this WikiProject,
and we need to figure out
who is interested in it
what can we do there,
and how can we motivate people
from kind of [out] the Wikidata community
to add this important information to that.
So I do think there is a huge opportunity
to figure out how we can include
more of this really, really valuable
and reliable data into Wikidata.
But overall, there's a lot
of challenges as well,
and still it's kind of
a different crowd of people,
and we need to figure out
how to bring them together.
Any idea is welcome.
(Beat) Yeah, there is another point
which we're currently not focusing on
with this base register
and vocabulary thing.
But what I have had as a request
is to be able
to actually store tabular data
and to be able to pull it.
Because it does not make sense
to put like 200 years
of population statistics from Zurich
into that Wikidata item for Zurich.
Maybe I just pick it up
and just an anecdote from my day work.
So I started to introduce Wikidata
to my colleagues.
We are a small team doing open data,
and it was fine,
and they were really, really interested,
but in the end we started
to add some of the population dates,
and then, you know, there isn't any order.
So it's so hard to figure out
if you find a population date
for year Y or X or something,
and if it is still missing.
So, of course,
there are still a lot of things
to improve in Wikidata as well,
and tabular data could be one of it also.
(person 4) [inaudible] Is it working?
I have a comment on the tabular data.
I remember we had also discussions
with a canton and the city
of Zurich about this,
and that it might make sense to start
discussions on whether
we should maybe consider
setting up a Wikibase
for open governmental data
and having such kind of datasets
and then link them to Wikidata
or link them from Wikidata to them,
because mostly
the linked open data technology
is actually enabling that
and is one of the key advantages
of this technology.
It is, of course, something
that doesn't relate only to OGD data,
it's a global divide
in the whole Wikidata community.
Because the larger we make
the central endpoint or the graph
the more difficult it is to handle it--
I think we all agree on that.
So I think there should be
a deeper conversation and discussion
on whether we should
start building this network.
Well, actually, there is already
a network of Wikibases.
We also work in the university
with publications and research data
with our own Wikibase.
Yeah, and then another comment
about the Wiki projects.
So we continued working
and documenting the materials
of the events,
so we actually now have
two upcoming events in November.
We have a full weekend
technical training on Wikidata
in collaboration
with the open data Zurich people
and the canton of Zurich,
and also Wikimedia Switzerland,
and we have a hackathon.
But I totally agree that it would be great
to start having conversations
with all the participants
that have been listed already
in the project,
and start more discussions,
especially with all the countries
that have many good initiatives,
like Germany, like what you described
and start documenting
what are the specific needs
of these institutions,
what are the problems,
and what specific tools
we need to develop, or procedures,
that we can help them import
or link data in Wikidata.
I think we're out of time.
One last question.
(person 5) So a proposal
to use Wikibase for that?
I'm not sure whether
that actually would solve
this tabular data problem.
And when thinking of statistical data,
like population data,
that is not data
that we want to really edit,
that's data we just want to consume.
So it means we have to ask ourselves
whether we want to build in
the capability to actually pull data
directly from external third-party
SPARQL endpoints,
and not just from
within this Wikibase ecosystem
that we're planning to build up as well.
(person 4) So I agree
that it doesn't solve the tabular data,
but what I was trying to say
is that the information
that is more specific,
it might be the case that we want
to export it to something else
and I see Wikibase also
as a very good data modeling example.
So not only because you want
to have humans editing,
but also because the whole data modeling
happening in Wikidata
with all the qualifiers and references
adds a lot to all the datasets.
So if we would do it from scratch in RDF
we would be missing these features
that Wikidata has,
and I see it has an advantage.
So that was a reason why I mentioned
that it would be very helpful
to maybe think of
for the Wikibases around the OGD data.
(moderator) So, I'm sorry,
but I think we just ran out of time,
and I encourage you
to keep talking with our speakers,
[inaudible] during all the conference
and please, a round of applause for them.
(applause)
Thank you.