cdn.media.ccc.de/.../wikidatacon2019-1035-eng-Data_import_process_overview_hd.mp4

0:00 - 0:08

Good afternoon, everybody.
0:09 - 0:12

Welcome to our GLAM panel.
0:13 - 0:17

Before we start, I just have
two announcements to make.
0:17 - 0:23

First of all, please extensively make use
of our Etherpad to take notes.
0:24 - 0:28

And the second one is directed
at our audience at home,
0:28 - 0:30

or wherever you are.
0:30 - 0:31

If you have any questions,
0:31 - 0:34

you can also write that into the Etherpad,
0:34 - 0:38

and our room angels
will keep track of them.
0:39 - 0:44

So, we decided that for this year's panel,
0:45 - 0:49

after seeing all the contributions
that were made,
0:49 - 0:54

we would focus on the role of Wikidata
within data ecosystems
0:54 - 0:57

that go beyond the actual
Wikimedia projects,
0:57 - 1:00

which is also absolutely in line
1:00 - 1:04

with the new Wikimedia
Foundation strategy.
1:05 - 1:08

And we have, today, four panelists.
1:08 - 1:10

Three plus one.
1:10 - 1:14

So, I would like to ask you on stage,
1:14 - 1:16

so we can introduce you.
1:22 - 1:25

So, we have Susanna Ånäs.
1:25 - 1:29

She's a long time free-knowledge activist
1:29 - 1:31

involved in many WikiProjects.
1:32 - 1:36

And she will be reporting today
on the project in cooperation
1:36 - 1:38

with the Finnish National Library.
1:39 - 1:43

Then we have, next to me, Mike Dickison,
1:43 - 1:46

who will be second in this order.
1:47 - 1:50

He is a museum curator from New Zealand.
1:50 - 1:54

He's a zoologist and a Wikipedia editor.
1:54 - 1:59

And he was New Zealand's
first Wikipedian at Large
1:59 - 2:03

in 2018 and 2019.
2:03 - 2:07

And he will tell us
about his experience in that role,
2:07 - 2:13

and what kind of role Wikidata
is starting to play in that context.
2:16 - 2:18

Then we have Joachim Neubert
2:18 - 2:23

from the Leibniz Information Center
for Economics in Kiel and Hamburg.
2:24 - 2:29

He has been working on making the largest
public press archives worldwide
2:29 - 2:35

more accessible to the public,
and he's using Wikidata to do that.
2:36 - 2:39

And then I will go last.
My name is Beat Estermann.
2:39 - 2:43

I work for Bern University
of Applied Sciences, in Switzerland.
2:44 - 2:50

And I've been a long-time promoter
for OpenGLAM in Switzerland and Austria.
2:50 - 2:55

And I will today report
about my activities in connection
2:55 - 2:59

with the mandate from the Canadian Arts
Presenting Association,
2:59 - 3:01

focusing on performing arts.
3:02 - 3:04

Not primarily on Wikidata,
3:04 - 3:08

but you will see Wikidata
is starting to play a role there, as well.
3:09 - 3:13

So now, most of us
will take our seat here,
3:13 - 3:17

and I will give the floor to Susanna.
3:18 - 3:23

Okay. So, hello. My name is Susana Ånäs,
3:23 - 3:26

and I work part-time for Wikimedia Finland
3:26 - 3:27

as a GLAM coordinator,
3:27 - 3:33

and I also do consulting
in the open knowledge sphere.
3:33 - 3:36

And this is a discourse,
maybe, of [inaudible].
3:36 - 3:39

So, I have been involved in the workings
3:39 - 3:46

of geographic data group of the--
3:48 - 3:51

well, I looked it up,
but it isn't in English,
3:51 - 3:54

but, cultural heritage initiative
of the Finnish royal government.
3:55 - 4:00

So, this is about place names
4:00 - 4:03

and how they are represented
4:03 - 4:07

in different repositories
in the GLAM sector in Finland,
4:07 - 4:12

and how they are trying to pull together
these different sources,
4:12 - 4:18

and how they are informed
by modeling in Wikidata and elsewhere.
4:18 - 4:23

So, here we see the three main sources
for these YSO places,
4:23 - 4:28

which is part of the national ontology--
general ontology.
4:28 - 4:30

AHAA is for Finnish archives,
4:30 - 4:32

Melinda is for Finnish libraries,
4:32 - 4:34

and KOOKOS is for Finnish museums.
4:34 - 4:38

So, there are three, also,
content management systems
4:38 - 4:40

that come together in these YSO places.
4:41 - 4:47

And there are exchanges between Wikidata
already taking place,
4:48 - 4:53

as well as the names project
for the National Land Survey.
4:53 - 4:56

And then, there's a third project,
the Finnish Names Archive,
4:56 - 5:00

which doesn't yet contribute to this,
5:00 - 5:03

but there are plans for that.
5:03 - 5:09

So, one of the key modeling issues
in this whole problem area
5:09 - 5:15

is that there are three types
of elements in place names
5:16 - 5:18

represented in this project.
5:18 - 5:21

One of them is the place,
the one that has location.
5:21 - 5:25

And one of them is the place name,
the toponym, for example.
5:25 - 5:28

And then, there are sources,
which are documents
5:28 - 5:31

from which these both can be derived from,
5:31 - 5:33

or like, backed up with.
5:33 - 5:36

The YSO places--
here, on the top right,
5:36 - 5:39

you will see the same diagram again.
5:39 - 5:41

It focuses mainly on the places.
5:43 - 5:46

The main thing of this
is the Finnish National Library,
5:46 - 5:49

and the Finto project.
5:50 - 5:56

There are now more than 7,000 places
in Finnish and Swedish
5:56 - 5:59

and over 3,000 in English,
5:59 - 6:03

and they are CC0 we've licensed with.
6:03 - 6:06

So, here you can see the service of Finto.
6:06 - 6:10

And a place-- I chose Sevettijärvi.
6:10 - 6:14

It is now also related
to our language project
6:14 - 6:15

with the Skolt Sami--
6:15 - 6:19

this is a place
in the very north of Finland
6:19 - 6:22

inhabited by Skolt Sámi.
6:22 - 6:27

So, here you can see the place
which belongs to the--
6:27 - 6:33

well, you will see the data
about this place.
6:33 - 6:38

You can see that it is connected
to a Wikidata,
6:38 - 6:42

as well as this National Land Survey data.
6:43 - 6:47

Here we go. And you will see
this in more detail, here.
6:49 - 6:52

It is also hierarchically arranged
6:52 - 6:56

inside this repository.
6:58 - 7:00

Well, actually,
the actual place is not seen,
7:00 - 7:06

but it is underneath this municipality,
7:06 - 7:08

as well as the region,
7:08 - 7:10

and Finland as a country,
and Nordic countries,
7:10 - 7:13

the broader region.
7:13 - 7:14

Here you can see that many of these
7:14 - 7:18

have been matched
with Wikidata previously
7:19 - 7:22

through Mix'n'Match,
and there are still remaining ones.
7:22 - 7:28

But then, the amount of names
is not that high.
7:28 - 7:31

It's only less than 5,000.
7:32 - 7:34

So, then there is this other repository
7:34 - 7:38

by the Finnish Geospatial
Platform Project--
7:38 - 7:39

Place Names Cards.
7:39 - 7:42

These are all the place names
that are on Finnish maps.
7:42 - 7:48

And they have the linked data,
which is licensed CC BY 4.0.
7:49 - 7:54

800,000 map labels in Finnish, Swedish,
and all those three Saami languages
7:54 - 7:56

that are in Finland.
7:56 - 7:59

And they have
two different types of entities.
7:59 - 8:01

The other ones are places,
and the other ones
8:01 - 8:03

are place names, toponyms.
8:03 - 8:05

And they both have persistent URIs.
8:06 - 8:10

Here's, for example,
the same Sevettijärvi, in first Finnish,
8:10 - 8:14

and then all those three Saami languages,
as well as the geographic data,
8:14 - 8:19

and then there is more information
about that, like the place type,
8:20 - 8:21

et cetera.
8:22 - 8:28

Here is the card for the place name,
the toponym, having its own URI.
8:30 - 8:34

Sorry, it seems that it's not translated
into the English list.
8:34 - 8:39

So, multilinguality
is not covering the whole project.
8:40 - 8:43

Okay, we come
to the Finnish Names Archive.
8:43 - 8:46

This is a project by the Institute
for the Languages of Finland,
8:46 - 8:50

and these represent not the places,
not the place names,
8:50 - 8:53

but they are actually sources for those.
8:53 - 8:57

So, these are three million
field notes of place names,
8:58 - 9:00

and it is a Wikibase project.
9:00 - 9:03

They are in a Wikibase,
mainly in Finnish, some in Swedish.
9:03 - 9:08

An outstanding collection of Saami names,
which we are very interested in.
9:08 - 9:10

And they are licensed CC BY.
9:10 - 9:15

And that is also a challenge
from the Wikidata point of view.
9:15 - 9:18

But if there was a Finnish local Wikibase,
9:18 - 9:23

we might be able to first work
on them in that project.
9:23 - 9:25

So, here's a screenshot of that,
9:26 - 9:31

showing that there's information
about the place, the maps--
9:31 - 9:35

the maps that the collectors
initially use,
9:35 - 9:41

and the card that they produce
of the information they collected.
9:41 - 9:46

So, here's one of those cards
9:46 - 9:49

broken down into data
9:49 - 9:51

that is included in them.
9:51 - 9:54

So, then they sent
this linked data project
9:54 - 9:56

by the Helsinki Digital Humanities Lab
9:56 - 9:58

and Semantic Computers,
9:58 - 10:01

computing group of Aalto University--
10:01 - 10:07

and together with this Institute
for the Languages of Finland--
10:07 - 10:08

the Names Sampo.
10:08 - 10:11

And this is an aggregated
research interface
10:11 - 10:14

to several place name sources.
10:14 - 10:18

Here you can see that many
of the sources are out there on the left,
10:18 - 10:21

and then, you can make
different kinds of visualizations
10:21 - 10:23

based on this data.
10:23 - 10:24

And, yeah.
10:25 - 10:31

So, I've been bringing up this idea
of modeling for a local Wikibase
10:31 - 10:33

that we could do with this data.
10:33 - 10:37

But when we enter
these modeling questions,
10:37 - 10:38

how do we model?
10:38 - 10:42

There are different ways,
different traditions in each of these.
10:46 - 10:50

And the good thing about it
is it could also serve minority languages
10:50 - 10:52

with very little effort.
10:53 - 10:57

Okay. So, here we have
the two basic options:
10:57 - 11:02

the SAPO model, which is
the Finnish Space-Time Ontology,
11:03 - 11:04

and the Wikidata model.
11:04 - 11:08

Here you can see
that Wikidata items tend to zero.
11:08 - 11:13

Ideally, they remain the same
with the changing properties.
11:13 - 11:17

Whereas, in the SAPO model,
these items become new
11:17 - 11:20

when there is a change,
such as area change and name change.
11:21 - 11:26

So here, come back to this division
11:26 - 11:32

between these three different dimensions
of places, place names.
11:32 - 11:38

So, should we make these place names
into entities or properties?
11:38 - 11:39

Wikidata uses properties,
11:39 - 11:43

whereas this land survey
project has entities.
11:44 - 11:46

Or should we make them into lexemes?
11:46 - 11:51

Wikidata has chosen to work
with properties,
11:51 - 11:55

textual properties
for place names over lexemes.
11:56 - 11:58

I'm sorry, the other way around.
11:58 - 12:00

So, the names are...
12:03 - 12:05

properties, not lexemes.
12:06 - 12:07

Right.
12:07 - 12:11

And maybe the shortcoming of the Wikibase
12:11 - 12:16

is the lack of geographical
shapes inside that--
12:16 - 12:21

like in the basic setup of it,
12:21 - 12:25

so one would have to add
more technology into the stack
12:25 - 12:30

to be able to use local geographic shapes.
12:30 - 12:32

And a federation is really needed
12:32 - 12:38

to be able to take advantage
of the Wikidata corpus.
12:39 - 12:43

So, I'm done already. Thank you.
12:44 - 12:46

(applause)
13:01 - 13:03

Okay.
13:03 - 13:05

(speaking in Maori)
13:05 - 13:08

Welcome, everyone.
My name is Mike Dickison.
13:08 - 13:10

And for a year,
13:10 - 13:13

I was New Zealand Wikipedian at Large.
13:14 - 13:17

You might wonder
what a Wikipedian at Large is.
13:18 - 13:22

Because if you actually look out for it,
there is no such thing, as we can see.
13:23 - 13:26

It's a term that I made up
in the grant proposal,
13:26 - 13:29

which the foundation
seemed to like very much.
13:30 - 13:32

And so, we ran with it.
13:32 - 13:37

So, for a year, I went through
35 different institutions,
13:37 - 13:41

residents, and most of them,
running training sessions,
13:41 - 13:44

organizing public events,
and trying to develop
13:44 - 13:47

a Wikimedia strategy for each one.
13:48 - 13:49

It was a very interesting experience,
13:49 - 13:53

and you encounter a wide range
of different projects and people.
13:53 - 13:58

And I wanted to try and talk through
some of the different projects
13:58 - 14:00

that dealt with Wikidata
14:01 - 14:05

in interesting or, perhaps,
illuminating ways,
14:05 - 14:08

that might be useful for folks to discuss.
14:09 - 14:12

The project was initially
a Wikipedia project by the name,
14:12 - 14:15

simply because that was what people
were familiar with,
14:15 - 14:18

and so we organized
multiple different events
14:18 - 14:23

at very traditional edit-a-thons,
gender gap work, and so forth.
14:25 - 14:27

[And a bunch you can see] [inaudible],
14:27 - 14:31

and a bunch of very successful
new editors recruited, and so forth.
14:32 - 14:34

We did bulk uploads into Commons.
14:35 - 14:41

In this case, there was a collection
of over 1,000 original artworks
14:41 - 14:46

by an entomological
illustrator, Des Helmore,
14:46 - 14:48

which had been sitting on a hard drive,
14:48 - 14:50

[lacking] research for ten years,
14:50 - 14:52

and we were able
to get clearance to release those
14:52 - 14:54

all under CC BY license.
14:54 - 14:58

So, easy wins to show to people there.
14:58 - 15:01

Everyone can understand
lots of pictures of beetles.
15:01 - 15:07

Everyone can understand workshops
devoted to fixing the gender gap.
15:07 - 15:10

But Wikidata
is much more difficult to sell
15:10 - 15:12

to people in the GLAM sector,
15:12 - 15:15

or anyone outside
of our particular movement.
15:16 - 15:20

So, I began to realize that Wikidata
15:20 - 15:23

was going to be a more
and more important part
15:23 - 15:26

of the Wikipedian at Large projects.
15:26 - 15:30

So, as we went through, it became
a larger and larger component
15:30 - 15:32

of what I was doing.
15:32 - 15:36

And I began to try and teach myself
more about Wikidata as well,
15:37 - 15:40

because I was beginning to see
how important it was.
15:40 - 15:42

So, this one project--
15:42 - 15:46

the kakapo is a native
New Zealand flightless parrot.
15:48 - 15:51

We worked with
the Department of Conservation,
15:51 - 15:54

whose job is to save
this species from extinction,
15:54 - 15:56

and pitched the idea,
15:56 - 15:59

"What if we put every
single kakapo into Wikidata?"
16:01 - 16:03

And that may seem ridiculous,
16:03 - 16:06

but it's actually
a perfectly doable project.
16:07 - 16:08

A few of them are in there already.
16:09 - 16:12

A key thing to notice here
is there are not many kakapos.
16:12 - 16:13

So, it's a manageable task.
16:13 - 16:17

There were 148 when I started,
and then one died.
16:17 - 16:21

And they've just had
a great breeding season up to 213.
16:22 - 16:25

This is great. This is the most kakapo
there have been for over 50 years.
16:26 - 16:28

So, this was also a big deal.
16:28 - 16:31

This was on the news
every day in New Zealand.
16:31 - 16:33

Each new one that was born--
16:33 - 16:34

(man) In the New York Times.
16:34 - 16:36

(Mike) Did it? Oh, lovely.
16:36 - 16:39

Yeah, this was national news.
Everyone likes these birds.
16:39 - 16:41

But something interesting about them
16:41 - 16:44

is because unlike species
that are more populous,
16:44 - 16:48

every single kakapo is named,
has a unique name
16:48 - 16:50

and a unique ID number.
16:50 - 16:52

And often has good biographical data
16:52 - 16:55

about where and when they were born,
16:55 - 16:57

were hatched, who their father
and mother was,
16:57 - 16:59

when they died, if they died.
16:59 - 17:01

So, there is, in fact,
a Department of Conservation database
17:01 - 17:03

of all this information.
17:03 - 17:07

And one of the most famous kakapos,
of course, is Sirocco,
17:07 - 17:10

who you can see is named
after a wind, was born there.
17:10 - 17:13

Sirocco has a Twitter account,
17:14 - 17:16

which Wikidata had some problems with,
17:16 - 17:19

because, apparently,
they just can't have Twitter accounts.
17:19 - 17:20

I don't know about that.
17:21 - 17:23

He's even featured
on an album cover, and so forth.
17:23 - 17:26

So there are multiple properties of this,
17:26 - 17:28

probably one of the most famous
individual kakapo.
17:28 - 17:30

So, I pitched to the Department
of Conservation,
17:30 - 17:33

"Why don't we try and do this
with every single one?"
17:33 - 17:38

And so, they had to think about
how much of the biographical data
17:38 - 17:39

could be made public.
17:39 - 17:41

And they come up with a short list.
17:41 - 17:47

And now we've got, I think, 212,
210--I think a couple died--
17:47 - 17:51

living kakapo that are all candidates now.
17:51 - 17:53

And they only get a name when they fledge.
17:53 - 17:56

They have a code number until that
while they're still babies.
17:56 - 17:58

So, when we've got the full-fledged crop,
17:58 - 18:02

we're going to create
a complete Wikidata--
18:02 - 18:04

the entire species will be in Wikidata.
18:05 - 18:07

But we need to come up
with a property for DOC ID--
18:07 - 18:09

I actually would like to talk
with folks about that.
18:09 - 18:11

Should we be using a very specific ID,
18:11 - 18:13

or should we be coming up with an ID
18:13 - 18:18

that would work for all individual birds
or plants or animals
18:18 - 18:22

that have been tagged
in any scientific research project?
18:22 - 18:24

It's a good question.
18:25 - 18:27

Second project was
Christchurch Art Gallery.
18:28 - 18:32

There are very few paintings
of Colin MacCahon,
18:32 - 18:34

New Zealand's most famous
artist in existence.
18:34 - 18:37

This is a drawing he did
for the New Zealand School Journal,
18:37 - 18:38

which was government-funded at the time.
18:38 - 18:41

So, it's actually in Archives New Zealand
18:41 - 18:42

who own the copyright for that.
18:42 - 18:44

This is a very unusual situation.
18:45 - 18:47

So, I worked with
Christchurch Art Gallery
18:47 - 18:49

who, along with Auckland Art Gallery,
18:49 - 18:53

maintain a site called
Find New Zealand artists.
18:53 - 18:56

The job of which is to keep track
of the holdings--
18:56 - 18:58

every institution that has holdings
of the New Zealand artist.
18:58 - 19:03

So, about 18,000 different artists
in their database,
19:03 - 19:06

and most with very little
information at all.
19:06 - 19:09

So, we did a standard sort of Mix'n'Match.
19:09 - 19:14

We did an export of the ones
that had at least a birth date,
19:14 - 19:18

or a death date, or a place of birth,
or a place of death.
19:18 - 19:21

So, that's not restricting it very much.
19:21 - 19:23

And even then, we were not able
to match quite a few,
19:23 - 19:26

but we've got about 1,500 now
19:26 - 19:29

that are matched
to known artists in Wikidata,
19:29 - 19:30

which is nice.
19:30 - 19:32

But what was appealing to them--
19:32 - 19:34

this is their website,
19:34 - 19:39

which really just maintains
the holdings links there.
19:39 - 19:45

But this biographical data,
which they create by hand, currently,
19:45 - 19:46

for every single artist.
19:46 - 19:49

And the act of exporting
and putting into Mix'n'Match
19:49 - 19:52

exposed numerous typos
and mistakes and such
19:52 - 19:54

that they haven't noticed.
19:54 - 19:56

And it's only when you start
running things through [Excel],
19:56 - 19:57

these things show up.
19:57 - 20:02

And the value of Wikidata
was suddenly conveyed to them
20:02 - 20:06

when I said, "You can just suck in
that information from Wikidata."
20:07 - 20:10

And that made them sit up straight.
20:10 - 20:12

So this, I think, is one
of the selling points.
20:12 - 20:15

When you have this carefully
hand-curated website
20:15 - 20:19

with 18,000 entries, full of mistakes,
and tell them there's another way,
20:19 - 20:21

that they can get other people
20:21 - 20:23

to do some of this fact-checking
and correction for them--
20:23 - 20:25

that's when it sinks home.
20:25 - 20:27

And then announced I was pitching the idea
20:27 - 20:30

that they "Wikidatafy"
this entire history book
20:30 - 20:33

of the New Zealand artists
in Christchurch in the '30s,
20:33 - 20:37

and run through--just published--
and run through every single person,
20:37 - 20:39

connection, place, exhibition, and such.
20:39 - 20:43

But it's a manageable sized project,
and they're very excited by this.
20:44 - 20:47

And thirdly, I wanted to show you
Maori Subject Headings.
20:47 - 20:51

A waka is a Maori name
for a particular kind of canoe,
20:51 - 20:53

a war canoe.
20:53 - 20:56

So, in the National Library
of New Zealand,
20:56 - 20:59

there's a listing for waka,
because the National Library
20:59 - 21:03

actually has its own dictionary
of Maori Subject Headings,
21:03 - 21:04

in the Maori language.
21:04 - 21:06

So, there it defines a waka,
21:07 - 21:10

in Maori and English.
21:10 - 21:12

But it also has a whole lot
of narrower terms,
21:12 - 21:14

you can see there on the side there.
21:14 - 21:16

a typical would be taurapa.
21:16 - 21:20

And a definition first in Maori,
and then in English.
21:20 - 21:22

It's the carved sternpost
that you can see there.
21:23 - 21:24

And in English, you would say "sternpost,"
21:24 - 21:27

but you can't use
the word "sternpost" for taurapa,
21:27 - 21:31

because taurapa only works
for particular kinds of war canoes.
21:31 - 21:34

So, there's no English word
equivalent for that.
21:35 - 21:38

And I suddenly realized
that here is an entire ontology
21:38 - 21:42

of cultural-specific terms that have been
very carefully worked out
21:42 - 21:45

and verified by the National
Library with Maori,
21:45 - 21:50

constantly being added to and improved
with definitions, with descriptions,
21:50 - 21:52

in both English and Maori.
21:52 - 21:53

Really exciting.
21:53 - 21:56

I suddenly thought we could put
this whole lot into Wikidata--
21:56 - 22:01

Maori first, and then translated
into English, as required.
22:01 - 22:02

Be a nice change, wouldn't it!
22:03 - 22:05

And here's the copyright licensing.
22:05 - 22:09

Unfortunately, NonCommercial-NoDerivs.
22:10 - 22:12

So now I have to start
the conversation with them
22:12 - 22:15

about why did they pick that license.
22:16 - 22:20

And possibly because they only got
[buy in] from Maori,
22:20 - 22:23

who agreed to sit down
and [inaudible] this stuff
22:23 - 22:24

if there was a guarantee
22:24 - 22:27

that none of this information
could be used for commercial purposes.
22:28 - 22:32

So, that's one of the frustrating
aspects of the task
22:32 - 22:34

is coming up against
these sorts of restrictions.
22:34 - 22:37

So, those are the three things
I wanted to put out in front
22:37 - 22:38

and sparking discussion.
22:38 - 22:41

Putting an entire species into Wikidata,
22:41 - 22:44

what it takes to actually change
an art gallery's curator's mind
22:44 - 22:46

about the value of Wikidata,
22:46 - 22:50

and what do we do when we would see
a complete ontology
22:50 - 22:52

in another language that,
unfortunately, has been slapped
22:52 - 22:56

with a restrictive
Creative Commons license.
22:56 - 22:57

Thank you.
22:57 - 22:59

(applause)
23:11 - 23:14

Hello. My name is Joachim Neubert.
23:14 - 23:16

I'm working for the ZBW,
23:18 - 23:21

that is, Information Center
for Economics in Hamburg,
23:21 - 23:24

as a scientific software developer.
23:25 - 23:31

And one of my tasks last year
was preparing a data donation to Wikidata.
23:32 - 23:37

And I want to give some report on this
on our first experiences
23:38 - 23:43

from donating metadata
from the 20th-Century Press Archives.
23:46 - 23:48

To our best knowledge,
23:48 - 23:53

this is the largest public
press archive in the world.
23:54 - 23:59

It has been collected
between 1908 and 2005,
24:01 - 24:04

and has been got from
24:05 - 24:09

more than 1,500 newspapers
and periodicals
24:09 - 24:13

from Germany, and also internationally.
24:15 - 24:19

And it has covered everything
which could be of interest
24:19 - 24:23

for the Hamburg,
24:26 - 24:28

the Hamburg businesspeople
24:28 - 24:32

who wanted to expand over the world.
24:35 - 24:39

As you can see, this material
has been clipped from newspapers
24:39 - 24:42

and put onto paper,
24:42 - 24:45

and then collected in folders.
24:46 - 24:50

Here you see a small corner
of the Person's Archive,
24:51 - 24:56

and, similarly, information
has been collected on companies,
24:56 - 25:00

on general topics, on wares,
on everybody,
25:02 - 25:06

on everything which could be interesting.
25:07 - 25:11

These folders have been scanned
25:13 - 25:16

up to roughly 1949.
25:17 - 25:23

by the DFG-funded project in 2004 to 2007.
25:24 - 25:31

As a result, up to now,
it was 25,000 thematic dossiers
25:32 - 25:34

of this time.
25:34 - 25:38

This contained about 2 million,
or more than 2 million pages.
25:39 - 25:42

And these are online.
25:44 - 25:48

This application developed
at that time by ZBW,
25:50 - 25:54

which now looks a bit outdated,
25:55 - 25:58

not so fancy,
and what’s more of a problem.
25:59 - 26:04

It's an application which was built
architecturally on Oracle,
26:04 - 26:09

it was built on ColdFusion,
it runs on Windows servers,
26:09 - 26:15

so it's not very sustainable
in the long term.
26:16 - 26:19

And we have discussed
should we migrate this
26:19 - 26:23

to a more fancy linked data application,
26:24 - 26:28

or should we take a radical step
26:28 - 26:32

and put all this data in the open.
26:33 - 26:37

We have assigned CC0 license to that data
26:37 - 26:41

and, currently, moving some main--
26:42 - 26:46

access layer, some main discovery layer--
so it's a primary access layer
26:48 - 26:51

to the open linked data web,
26:51 - 26:57

where it actually makes most sense
26:57 - 27:01

to put some metadata into Wikidata,
27:02 - 27:07

and to make sure that all folders
27:08 - 27:11

of the collections are linked to Wikidata,
27:11 - 27:13

so they are findable,
27:14 - 27:18

and that all metadata about these folders
27:18 - 27:23

is also transferred to Wikidata.
27:23 - 27:28

So it can be used there,
and it can be enriched there, possibly.
27:29 - 27:32

Corrections can be made to that data.
27:33 - 27:39

What is still maintained by ZBW is,
of course, the storage of the images,
27:40 - 27:44

which we can't put in any way,
27:46 - 27:47

or we can't give a license on that
27:47 - 27:51

because this was owned
by the original creators.
27:52 - 27:55

But we make sure that they are accessible
27:56 - 28:02

by some, again, metadata files
via DFG Viewer
28:03 - 28:06

in the future by IIIF manifests.
28:07 - 28:11

And we will prepare
some static landing pages
28:12 - 28:18

which will serve as a data point
of reference for Wikidata,
28:18 - 28:23

as well as still making available data
28:23 - 28:26

which doesn't fit well into Wikidata.
28:31 - 28:37

[For us] is migration
and data donation to Wikidata
28:37 - 28:41

with our custom infrastructure
28:41 - 28:45

of SPARQL endpoint with that data,
28:46 - 28:49

and we basically used federated queries
28:50 - 28:54

between that endpoint
and the Wikidata Query Service
28:54 - 28:58

to create according statements
28:59 - 29:02

through [eyes of] concatenated
29:02 - 29:07

in SPARQL queries themselves,
or transformed via a script,
29:08 - 29:12

which also generated references
for the statements.
29:13 - 29:19

And then put that into QuickStatements
of the code to use this online.
29:23 - 29:24

So, this is what we get.
29:24 - 29:29

It's not only simple things
like birth dates, but, sorry--
29:30 - 29:35

but also complex statements
29:35 - 29:40

about already existing items,
29:40 - 29:45

like this person was a supervisory
board member of said company
29:47 - 29:49

during this period of time,
29:50 - 29:57

and referenced for use in...
29:58 - 30:02

in the scientific context.
30:08 - 30:11

The first part of this data donation
has been finished.
30:13 - 30:17

The Person's Archive
is completely linked to Wikidata.
30:18 - 30:24

And this is also an information tool.
30:24 - 30:27

A lot of items which have been before
30:27 - 30:30

not had any external references.
30:31 - 30:36

And we had about more
than 6,000 statements,
30:36 - 30:42

which are now sourced
in this archive's metadata.
30:45 - 30:50

Well, this was the most easy part,
30:51 - 30:55

because persons are easily
identifiable in Wikidata.
30:56 - 31:00

More than 90% already existed here,
31:00 - 31:02

so we could link to that.
31:02 - 31:06

We created some 100 items for these,
31:06 - 31:09

for the ones which were missing.
31:09 - 31:14

But now, we are working
31:14 - 31:18

on the rest of the archive,
31:18 - 31:20

particularly on the topics archive.
31:21 - 31:27

Which means mapping a historic system
for the organization of knowledge
31:27 - 31:30

about the whole world,
31:30 - 31:34

materialized as newspaper
clippings to Wikidata.
31:36 - 31:42

To give you a basic idea,
the Countries and Topics archive
31:43 - 31:49

is organized by a hierarchy of countries
31:49 - 31:51

and other geographic entities,
31:52 - 31:56

which is translated to English,
which makes this more easy.
31:56 - 32:02

And German deeply nested...
32:04 - 32:08

deeply nested classification of topics.
32:08 - 32:12

And this combination defines one...
32:13 - 32:16

one folder.
32:16 - 32:21

So, what we now want to do
is to match this
32:21 - 32:25

as a structure to Wikidata,
and to bring the data in.
32:25 - 32:29

And I want to invite you
32:29 - 32:34

to join this really nice challenge
32:34 - 32:36

in terms of knowledge organization.
32:38 - 32:41

So, it's a WikiProject
where this work is tracked,
32:41 - 32:46

and you can follow this
or participate in this.
32:47 - 32:49

And, yes, thank you very much.
32:50 - 32:52

(applause)
33:04 - 33:07

So, we're taking
performing arts to Wikidata.
33:08 - 33:12

And we're taking performing arts
to the linked open data cloud,
33:12 - 33:16

by building a linked open data
ecosystem for the performing arts.
33:16 - 33:21

And the question I'm trying to answer,
33:21 - 33:24

and I hope you'll help me
in answering the questions
33:24 - 33:27

which place for Wikidata and all that.
33:27 - 33:31

But let me first start with my experiences
33:31 - 33:34

which I made this year,
33:35 - 33:38

the first half of the year,
when I had the pleasure
33:38 - 33:39

to work with CAPACOA,
33:39 - 33:42

which is the Canadian Arts
Presenting Association,
33:42 - 33:47

which actually launched a project
called Linked Digital Future Initiative,
33:48 - 33:53

to actually get the entire art sector
in Canada to embrace linked open data.
33:53 - 33:57

And they did that based on the observation
33:57 - 33:59

that over the past five years,
34:00 - 34:04

the [inaudible]-- the important topic
within performing arts
34:04 - 34:09

was the fact that metadata
was not around in sufficient quality
34:09 - 34:12

and not interlinked, not interoperable.
34:12 - 34:16

And that was why some of the performances,
34:16 - 34:20

some of the events
are not so well findable
34:20 - 34:25

by Google and by personal
computer-based assistants, and so on.
34:26 - 34:30

So, the vision we kind
of developed together
34:30 - 34:33

is that we want to have a knowledge base
34:34 - 34:36

for many stakeholders at once.
34:36 - 34:40

So we looked at the entire
performing arts value network,
34:40 - 34:42

we identified key stakeholders in there,
34:42 - 34:47

we looked at the usage scenarios
that we like to pursue,
34:48 - 34:52

and we kind of mapped it
to the whole architecture
34:52 - 34:57

of such a knowledge base,
or of the different platforms in there,
34:57 - 35:00

which, obviously,
is a distributed architecture,
35:00 - 35:01

and not one big monolith.
35:02 - 35:06

I'm just going to run
through that quite quickly
35:06 - 35:08

because we have ten minutes each.
35:09 - 35:14

But I think we'll have plenty of time
tonight or tomorrow to deepen that
35:14 - 35:16

if anybody's interested in the details.
35:16 - 35:19

So, we started from
that Performing Arts Value Network,
35:19 - 35:23

which, interestingly,
was just published last year.
35:23 - 35:28

So, we're lucky to be able
to build on previous work,
35:28 - 35:31

like you have the primary value chain
of the performing arts in the middle,
35:31 - 35:34

and various stakeholders around that.
35:34 - 35:37

All in all, we identified
20 stakeholder groups,
35:37 - 35:43

which then we kind of boiled down
into seven larger categories
35:43 - 35:45

for each of the stakeholder groups.
35:45 - 35:52

We kind of formulated what kind of needs
35:52 - 35:55

they would have in terms
of such an infrastructure,
35:55 - 35:59

and what would they be able to achieve
if the whole thing was interlinked
35:59 - 36:02

and the data was publicly accessible.
36:03 - 36:05

And so, you can see the types here,
36:05 - 36:09

the different types is Production,
then Presention & Promotion,
36:09 - 36:12

Coverage & Reuse, Live Audiences,
36:12 - 36:14

Online Consumption, Heritage,
36:14 - 36:16

Research & Education.
36:16 - 36:19

And after kind of setting up a big table,
36:19 - 36:21

of which you can see
just the first part here,
36:21 - 36:25

we kind of compared [over there],
had a look at which type of data
36:25 - 36:27

were actually used across the board
36:27 - 36:31

by all different groups of stakeholders.
36:31 - 36:37

And there's quite a large basis of data
that is common to all of them,
36:37 - 36:38

and that is really is the area
36:38 - 36:43

where it makes a lot of sense, actually,
to cooperate and to keep that--
36:43 - 36:46

to maintain the data together.
36:48 - 36:51

So, when talking about
platform architecture,
36:51 - 36:54

you can see that we have four layers here.
36:54 - 36:56

At the bottom, display the data layer.
36:56 - 36:59

Of course, Wikidata plays a part in it,
36:59 - 37:03

but also a lot of other databases,
distributed databases
37:03 - 37:08

that can expose data
through SPARQL endpoints.
37:09 - 37:13

The yellow part in the middle,
that's the semantic layer.
37:13 - 37:16

It's our common language
to describe our things,
37:16 - 37:22

to make statements about things
around the performing arts, the ontology.
37:22 - 37:25

Then we have an application layer
37:25 - 37:31

that consists of various modules,
for example, data analysis,
37:31 - 37:35

data extraction-- so, how do you
actually get unstructured data
37:35 - 37:36

into structured data--
37:36 - 37:39

how can we support that by tools.
37:39 - 37:42

Then, obviously, there's
a visualization of data--
37:42 - 37:47

so if there are large quantities of data,
you want to visualize it in some way.
37:48 - 37:50

And on the top, you have
the presentation layer,
37:50 - 37:55

that's what the ordinary people
are actually interacting with
37:55 - 37:56

on a daily basis--
37:56 - 38:00

search engines, encyclopedias,
cultural agendas,
38:00 - 38:02

and a variety of other services.
38:03 - 38:05

We're not starting from scratch.
38:05 - 38:09

Some work has already
been done in this area.
38:09 - 38:13

I'll just cite a few examples
from a project
38:13 - 38:15

which I have been involved in.
38:15 - 38:18

Some other stuff going on as well.
38:18 - 38:21

And so, I started in this area
38:21 - 38:24

with the Swiss Archive
of the Performing Arts.
38:25 - 38:28

[Until] building a Swiss
Performing Arts database,
38:28 - 38:31

we created the performing arts ontology,
38:31 - 38:34

that's currently being
implemented into RDF.
38:35 - 38:40

And there we have the database
of like 60, 70 years
38:40 - 38:43

of performance history in Switzerland.
38:43 - 38:45

So, that's something that can build on,
38:45 - 38:49

and that's something
that's been transformed into RDF.
38:50 - 38:55

And there was a builder platform
where this data can be accessed.
38:56 - 39:02

Then we have done
several ingests into Wikidata,
39:02 - 39:03

partly from Switzerland,
39:03 - 39:09

partly also from
the performance arts institutes,
39:10 - 39:12

for example, Bart Magnus
was involved in that.
39:13 - 39:15

He was the driving force behind that.
39:15 - 39:17

There's also stuff from Wikimedia Commons,
39:17 - 39:21

but not very well interlinked
with all the rest of our metadata.
39:21 - 39:25

And obviously, by doing this ingest,
39:25 - 39:29

we also kind of started to implement
parts of this Swiss data model
39:29 - 39:31

into Wikidata.
39:33 - 39:38

Then one of the Canadian
implementation partners
39:38 - 39:39

is Culture Creates.
39:39 - 39:44

They're running a platform that actually
scrapes information from theater websites,
39:44 - 39:47

and inputs it into a knowledge graph,
39:48 - 39:54

to then expose it to search engines
and other search devices.
39:56 - 40:03

And there again, we kind of had
to implement and extend this in ontology.
40:03 - 40:08

And as you can see from the slide,
is that there's so many empty spaces,
40:08 - 40:10

but there's also some overlap,
40:10 - 40:13

and an important overlap, obviously,
is the common shared language,
40:13 - 40:19

which will help us actually interlink
the various data sets.
40:21 - 40:23

What is also important, obviously,
40:23 - 40:26

is that we're using the same
base registers and authority files.
40:26 - 40:31

And this is a place where Wikidata
plays an important role
40:31 - 40:34

by kind of interlinking these.
40:35 - 40:38

Now, I'd like to share the recommendations
40:38 - 40:42

by the Linked Data Future Initiatives
Advisory Committee.
40:43 - 40:45

At least the two first recommendations.
40:45 - 40:48

So, for the Canadians,
now it's absolutely crucial
40:48 - 40:53

to kind of fill in their own Canadian
performing arts knowledge graph,
40:53 - 40:56

because unlike the Swiss Archive
of the Performing Arts,
40:56 - 40:59

they're not starting
with an already existing database,
40:59 - 41:02

but they're kind of
creating it from scratch.
41:02 - 41:04

And it's absolutely crucial
to have data in there.
41:04 - 41:09

And second, as you can see,
comes in already Wikidata.
41:09 - 41:12

Wikidata, by the Advisory Committee,
41:12 - 41:18

has been seen as complementary
to Artsdata.ca, this knowledge graph,
41:18 - 41:21

and, therefore, efforts should
be undertaken to contribute
41:21 - 41:25

to its population
with performing arts-related data.
41:26 - 41:31

And that's where we're going to work on
over the coming months and years,
41:31 - 41:35

and that's also why
I'm kind of on the lookout here
41:35 - 41:39

to see who else will join that effort.
41:41 - 41:45

So, right now, obviously,
we're saying they're complementary.
41:45 - 41:48

So, we have to think about whether
the pluses and the minuses
41:48 - 41:50

of each of the approaches.
41:50 - 41:52

And you can see here a comparison
41:52 - 41:56

between Wikidata and the Classical
Linked Open Data approach.
41:57 - 42:00

I would be happy to discuss
that further with you guys,
42:00 - 42:03

how your experiences are in there.
42:03 - 42:08

But, as I see it, Wikidata is a huge plus
because it's a crowdsourcing platform,
42:08 - 42:12

and it's easy to invite further parties
to actually contribute.
42:12 - 42:17

On the negative side, obviously,
you get this problem of loss of control.
42:18 - 42:23

Data owners have to give up control
over their graphs, data quality,
42:23 - 42:24

and completeness.
42:27 - 42:31

It's harder to track on Wikidata
than if you have it under your control.
42:31 - 42:34

And the other strength of Wikidata
42:34 - 42:40

is that it requires immediate integration
into that worldwide graph.
42:40 - 42:42

And you kind of just do it--
42:43 - 42:47

kind of reconcile step by step
against other databases,
42:47 - 42:50

which may also be seen by some
as an advantage,
42:50 - 42:54

but of course, if you're looking
for integration and interoperability,
42:54 - 42:57

Wikidata forces you to go for that
from the beginning.
42:59 - 43:03

And then, obviously, harmonizing
data modeling practices
43:03 - 43:06

is an issue in both cases.
43:06 - 43:11

But it may seem, at the beginning,
easier to do with just in your own silo,
43:11 - 43:13

because at some point,
you're done with the task,
43:13 - 43:17

and it would be
an ongoing task on Wikidata.
43:18 - 43:23

So, when it now comes to prioritizing
the data to be ingested,
43:24 - 43:28

that's like the rules
I kind of go by at the moment.
43:30 - 43:32

First of all, we'd like to ingest it
43:32 - 43:36

where it's unclear who would be
the natural authority in the given area.
43:36 - 43:40

So that's definitely data
that will be managed in a shared manner.
43:41 - 43:44

And we'd like to ingest it where we see
43:44 - 43:47

a high potential
for crowdsourcing approaches.
43:47 - 43:52

We'd like to ingest data where the data
is likely to be reused
43:52 - 43:54

in the context of Wikipedia.
43:55 - 44:00

And there's also hope that some part
of the international coordination
44:00 - 44:04

around the whole data modeling,
about the standardization,
44:04 - 44:08

they could actually take place
directly on Wikidata,
44:08 - 44:09

if it's not taking place elsewhere,
44:09 - 44:12

because it kind of forces people
to start interacting
44:12 - 44:15

if they ingest data in the same part.
44:16 - 44:22

And we'd like to focus now next
on base registers and authority files
44:22 - 44:26

because they kind of help us
create the linkages
44:26 - 44:29

between different data
and uncontrolled vocabularies
44:29 - 44:33

as an extension of the existing ontology.
44:34 - 44:36

So, just two more slides.
44:36 - 44:41

The next steps will be that we're taking
the sum of all GLAMs approach
44:41 - 44:43

to Wiki Loves Performing Arts.
44:43 - 44:48

That means we're describing
venues and organizations,
44:48 - 44:51

and try to push the data to Wikipedia
44:51 - 44:54

in forms of infoboxes
and [bubble] templates.
44:54 - 45:00

And the other one, the other projects
I'm going to pursue is COST Action
45:00 - 45:02

that we'll submit next year
45:03 - 45:06

around that Linked Open Data Ecosystem
for the Performing Arts.
45:06 - 45:10

COST is a European program
that supports networking activities,
45:10 - 45:14

and the topics to be covered
are listed here.
45:14 - 45:16

Two of them, I have highlighted--
45:16 - 45:21

one of them is like the question
of federation between Wikidata
45:21 - 45:24

and the classical linked
open data approaches.
45:24 - 45:28

And the other one, I think,
is very important also,
45:28 - 45:31

where we have a huge potential still,
45:31 - 45:36

is implementing international campaigns
to supplement data on Wikidata.
45:38 - 45:41

So, that's it. Thank you
for your attention.
45:41 - 45:46

Now, I would like to ask
my colleagues up here.
45:47 - 45:51

To the panel, maybe you'll get them
microphones as well.
45:54 - 45:56

And then I would like to...
45:57 - 46:00

give you the chance to ask questions.
46:01 - 46:05

And obviously, also ask my colleagues
46:06 - 46:08

whether they have questions to each other.
46:12 - 46:15

So, do we have maybe a question
from the audience?
46:21 - 46:23

(man) [inaudible]
46:24 - 46:27

I would like to ask from each of you
46:27 - 46:31

where would you draw the line,
46:31 - 46:33

basically, how you define--
46:33 - 46:36

when do you need to run your own Wikibase,
46:36 - 46:39

and what do you want to put on Wikidata?
46:39 - 46:44

Like, is this a clear delineation
of what is seen
46:44 - 46:46

behind of putting it [into order.]
46:48 - 46:51

I can answer first because I have the mic.
46:51 - 46:57

So, I've been thinking
that one of the issues is notability.
46:59 - 47:02

I'm addressing that
in a different project.
47:02 - 47:06

And I think licensing could be one,
47:06 - 47:10

because you can apply your own terms
in your own database,
47:10 - 47:14

and then I think wherever it's possible.
47:14 - 47:20

And then, the third one
is just to have it as a sandbox,
47:20 - 47:23

prepare it for ingestion into Wikidata.
47:23 - 47:26

These are the three main things
that I come up with now,
47:26 - 47:29

but I can come up with more.
47:30 - 47:32

For me, rights are always
going to be an issue.
47:32 - 47:37

So, if the National Library
wanted to move towards Wikibase,
47:37 - 47:40

that would enable them to continue
to control the licensing
47:40 - 47:43

for the work they've done
with Maori language terms.
47:43 - 47:46

The kakapo database only contains data
47:46 - 47:50

that the Department of Conservation
felt could be made public,
47:50 - 47:53

but I suspect if they see it
up and running,
47:53 - 47:56

they might be tempted
to use a private Wikibase
47:56 - 47:58

to maintain their own database,
47:58 - 48:01

simply because of some
of the visualization tools
48:01 - 48:04

that could be applied might be better
48:04 - 48:07

than the sort of Excel spreadsheet system
that they currently run.
48:12 - 48:17

Well, I think this very much depends
on the kind of data.
48:18 - 48:22

We are, with the Press Archive, of course,
in a quite lucky position,
48:22 - 48:27

in that this was material
which was published,
48:27 - 48:30

it was published at the time,
48:30 - 48:32

but it was expensive to publish.
48:33 - 48:36

So, this is quite easy.
48:36 - 48:39

I think, also, projects--
48:40 - 48:42

and this is a typical project,
48:42 - 48:46

so it was funded for some time,
and then funding ended,
48:46 - 48:52

and what happens with the data
which is enclosed in some silo,
48:52 - 48:55

and some software
which will not run forever.
48:56 - 48:59

And so, it makes
absolute sense in my eyes.
49:00 - 49:03

At the time, Wikidata
wasn't around, but now it is,
49:03 - 49:07

and it makes absolute sense
for our project to early on
49:07 - 49:13

discuss sustainability in the context
of how could we put this
49:13 - 49:17

into a larger ecosystem like Wikidata,
49:19 - 49:21

and discuss this with the data community
49:21 - 49:27

what is notable and what makes sense
to add this to Wikidata,
49:27 - 49:32

and what makes sense to keep this
as a proprietary form.
49:32 - 49:38

Maybe in a more simple form
than sophisticated application,
49:38 - 49:43

but make it discoverable
and make it linked to the large data cloud
49:43 - 49:46

instead of investing lots of money
49:46 - 49:53

into some silo which will not sustain.
49:55 - 50:00

Yeah, as I said before
in the project I was presenting here,
50:00 - 50:05

are dualities between Wikidata
and classical linked open data approaches.
50:05 - 50:08

So, it's not so much about
setting up a private Wikibase.
50:11 - 50:15

Like one challenge we have had,
and, of course, in Wikidata,
50:15 - 50:18

is that when we ingest
your own data there,
50:18 - 50:20

you also have to do some housekeeping
50:21 - 50:24

of people, of other people, actually.
50:24 - 50:28

And they can put off people,
[or it also means] that we will address it
50:28 - 50:30

just step by step.
50:30 - 50:33

So, there will be, at the moment,
a database living--
50:34 - 50:36

in classical linked open data
50:36 - 50:38

and we're starting to linking it
with Wikidata,
50:38 - 50:41

and it's a continuous process to find out
50:42 - 50:48

for which areas the most data
will be eventually on Wikidata,
50:48 - 50:52

and for which areas it will actually
live on other databases.
50:53 - 50:57

Obviously, we'll have challenges
regarding synchronization,
50:57 - 50:59

as we probably all have,
50:59 - 51:02

because that linked data field,
51:02 - 51:05

where we still have
to negotiate who we trust,
51:05 - 51:09

who has authority about what.
51:14 - 51:16

(assistant) Other questions?
51:24 - 51:26

(woman) Thank you.
51:26 - 51:31

So, fully agree with that issue of--
51:34 - 51:41

where to put the boundary
between why do we put data on Wikidata,
51:43 - 51:49

or why do we keep them,
and create, manage, and maintain them
51:49 - 51:53

in local databases and for what purposes.
51:54 - 51:57

And I think that
this is a large discussion
51:57 - 52:02

that goes beyond just the excitement
52:02 - 52:07

of putting data on Wikidata
because it is public,
52:07 - 52:11

because it serves humanity, because--
52:11 - 52:13

while there are two cool tools,
52:13 - 52:18

and things are more complicated
in real life, I think.
52:19 - 52:24

Well, despite this,
it's quite an interesting discussion.
52:24 - 52:30

And then this is another issue, also,
or another problem that is being discussed
52:30 - 52:35

in this event in different panels.
52:36 - 52:41

It is on one side, have your own database,
52:41 - 52:43

whatever the technology is
52:43 - 52:47

and publish things on Wikidata,
52:47 - 52:51

or build your own system
52:51 - 52:55

of creating and managing information
52:55 - 52:58

on the Wikibase technology.
52:59 - 53:04

And then, synchronize or whatever--
do federation or things,
53:04 - 53:08

so it's a matter
of technology that is used,
53:09 - 53:15

and the fact that you use Wikidata
just for publishing,
53:15 - 53:19

or the infrastructure
that is underneath Wikidata
53:19 - 53:23

to create and manage your data.
53:27 - 53:31

I mean, we had a discussion
53:31 - 53:34

about the Wikibase panel,
53:34 - 53:37

and there will be other discussions here,
53:37 - 53:41

but things are
on different levels, I think.
53:42 - 53:48

Maybe [you sort of get] to that discussion
about Wikibase or Wikidata--
53:49 - 53:52

I think it's problematic
that we are focusing so much
53:52 - 53:56

on this Wikibase infrastructure,
because there are other infrastructures,
53:56 - 53:59

like in the area of performing arts.
54:00 - 54:04

We have another complementary community,
which is MusicBrainz
54:04 - 54:09

that runs on their own platform
that provides linked open data,
54:10 - 54:13

and as I understand it,
54:14 - 54:17

there's agreement
within the Wikidata community
54:17 - 54:20

that we're not going
to double all their data--
54:20 - 54:24

we're not going to copy all their data,
but we accept that they're complementary.
54:25 - 54:30

So, what will happen when you start
integrating this data in Wikipedia?
54:30 - 54:32

Infoboxes, for example.
54:32 - 54:36

Would we be able to pull that data
directly from their SPARQL endpoint?
54:37 - 54:40

Or would we be obliged
to kind of copy all the data,
54:40 - 54:42

and what kind of processes
are involved in that?
54:42 - 54:45

(woman) Discussions are open, I think,
54:45 - 54:50

because within this event,
you have both interested communities--
54:50 - 54:52

those that are interested in Wikibase,
54:52 - 54:54

and those that are interested in Wikidata,
54:54 - 54:56

and those who are interested in both.
54:56 - 55:00

Yeah, but we're not going
to oblige them to move to Wikibase.
55:00 - 55:03

- (woman) Not necessarily.
- MusicBrainz is not running on Wikibase.
55:03 - 55:07

(woman) No, I just wanted to say
that you have separate problems,
55:07 - 55:11

sometimes interrelated,
sometimes not completely separated.
55:12 - 55:17

And I had another question or remark
55:17 - 55:22

regarding the management of hierarchies
in controlled vocabularies,
55:22 - 55:26

like thesaurus, like you in Finto.
55:28 - 55:31

You do have the places
55:32 - 55:35

in the Maori
55:36 - 55:41

Subject Headings,
55:42 - 55:48

Well, they have to deal with
the management of concepts in hierarchy.
55:48 - 55:52

What is your take, your opinion
55:52 - 55:57

about the possibility
of managing this controlled
55:59 - 56:02

knowledge organization
systems in Wikidata?
56:07 - 56:10

I think in the case
of Finto and YSO places,
56:11 - 56:14

the repository will be a collection
56:14 - 56:19

of several sources, eventually.
56:19 - 56:22

So, it is in flux, anyway.
56:22 - 56:25

So, we don't have to necessarily--
56:25 - 56:28

well, I don't represent
the National Library,
56:28 - 56:32

but in that possible project,
56:32 - 56:36

we would not have
to maintain an existing--
56:36 - 56:39

or fight with an existing structure.
56:39 - 56:45

So, in that sense, it is an area
open for exploration.
56:49 - 56:52

The Maori Subject Headings
seems to lend themselves ideally
56:52 - 56:54

to Wikidata structure,
56:54 - 56:57

but the licensing,
of course, forbids that.
56:57 - 56:59

I suspect that if the licensing
were different
56:59 - 57:02

and they were put into Wikidata,
57:02 - 57:05

as soon as somebody decided
they didn't like the hierarchy
57:05 - 57:06

and started to change things,
57:06 - 57:10

there would be an immediate outcry
from people who worked very hard
57:10 - 57:12

to create that structure
57:12 - 57:16

and get the sign-off
from various different Maori
57:16 - 57:18

that was the current hierarchy.
57:18 - 57:21

So, that's an issue to try and resolve.
57:24 - 57:27

I think in terms of knowledge
organization systems,
57:27 - 57:28

they are all different.
57:28 - 57:32

And I'm not sure
if it would be a good idea
57:32 - 57:37

to represent different hierarchies
in Wikidata as such,
57:38 - 57:42

but it maybe makes sense
to think about overlays
57:43 - 57:45

of the data.
57:45 - 57:48

So, to do mappings on the content level.
57:49 - 57:54

For example, as ZBW partnership
Thesaurus for Economics.
57:55 - 57:59

And this thesaurus has its own hierarchy,
58:00 - 58:04

and, of course, it would be possible
to project the hierarchy
58:04 - 58:08

of this thesaurus into Wikidata concepts
58:08 - 58:12

without actually storing
this kind of structure
58:12 - 58:15

as an alternative structure
within Wikidata
58:15 - 58:19

which would make a lot of confusion.
58:19 - 58:25

But I think we should think
of Wikidata, also, as a pool of concepts
58:25 - 58:30

which can be connected on layers
which are outside,
58:30 - 58:33

and which give another view of the world
58:33 - 58:39

which is not necessarily to be
within Wikidata.
58:46 - 58:48

(assistant) Alright. Some other questions?
58:49 - 58:52

Otherwise-- okay.
58:55 - 58:58

(man 2) Joachim, I just wanted
to follow up on that last point.
58:58 - 59:01

So, these layers, as you picture it,
59:02 - 59:04

they would be maintained externally
59:04 - 59:07

and somehow integrated
59:09 - 59:12

with Wikidata from the Wikidata side,
59:12 - 59:17

or have you thought a bit further
59:17 - 59:19

about how that might be managed?
59:22 - 59:25

Actually, no, I have no--
59:25 - 59:30

I have done experiments
with ZBW and Wikidata.
59:31 - 59:33

I was [inaudible] here at Wikidata.
59:33 - 59:39

But I think this is
a whole new complex thing,
59:39 - 59:46

and so, it's up to [discuss],
[to give up a lot of control]
59:46 - 59:48

to do such things.
59:48 - 59:50

But it has to be figured out.
59:57 - 59:58

Should we take one more?
59:58 - 60:00

(man 3) Ah, great.
60:00 - 60:03

I was just wondering
about the kakapo project.
60:04 - 60:05

Uh-hmm.
60:05 - 60:11

(man 3) Okay. So, did you get
any pushback from the Wikidata community
60:11 - 60:15

about having individual animals
out of those items?
60:16 - 60:17

Not so far.
60:17 - 60:19

(man 3) Has anyone heard
about this before?
60:19 - 60:22

Is it "not so far" because
no one has heard about it yet?
60:23 - 60:26

There's been a small discussion
for quite some time now--
60:26 - 60:29

those people interested
in this sort of thing in Wikidata,
60:29 - 60:32

and we all seem to think
that it's a natural extension
60:32 - 60:36

of getting individual Wikidata items
to a famous racehorse
60:36 - 60:40

or someone's cat, which--
that's modeled pretty well.
60:40 - 60:44

I guess just the audacious thing
is putting the entire species in there.
60:44 - 60:48

But I think it's perfectly manageable.
60:48 - 60:50

(man 3) Don't try it with cats and dogs.
60:50 - 60:52

(laughter)
60:52 - 60:54

(assistant) Okay. I think
the time is finished.
60:54 - 60:56

Thank you very much for attending.
60:56 - 60:59

I think the speakers will be still open
for the questions and a break.
60:59 - 61:01

And have fun.
61:01 - 61:02

Thank you very much.
61:02 - 61:04

(applause)

Title:: cdn.media.ccc.de/.../wikidatacon2019-1035-eng-Data_import_process_overview_hd.mp4
Video Language:: English
Duration:: 54:29

	Bar Sch edited English subtitles for cdn.media.ccc.de/.../wikidatacon2019-1035-eng-Data_import_process_overview_hd.mp4
	C3Subtitles edited English subtitles for cdn.media.ccc.de/.../wikidatacon2019-1035-eng-Data_import_process_overview_hd.mp4

English subtitles

Revisions

Revision 2 Uploaded

Bar Sch

cdn.media.ccc.de/.../wikidatacon2019-1035-eng-Data_import_process_overview_hd.mp4

Revisions

Our website uses cookies

Operating cookies (Required)