Good afternoon, everybody.
Welcome to our GLAM panel.
Before we start, I just have
two announcements to make.
First of all, please extensively make use
of our Etherpad to take notes.
And the second one is directed
at our audience at home,
or wherever you are.
If you have any questions,
you can also write that into the Etherpad,
and our room angels
will keep track of them.
So, we decided that for this year's panel,
after seeing all the contributions
that were made,
we would focus on the role of Wikidata
within data ecosystems
that go beyond the actual
Wikimedia projects,
which is also absolutely in line
with the new Wikimedia
Foundation strategy.
And we have, today, four panelists.
Three plus one.
So, I would like to ask you on stage,
so we can introduce you.
So, we have Susanna Ånäs.
She's a long time free-knowledge activist
involved in many WikiProjects.
And she will be reporting today
on the project in cooperation
with the Finnish National Library.
Then we have, next to me, Mike Dickison,
who will be second in this order.
He is a museum curator from New Zealand.
He's a zoologist and a Wikipedia editor.
And he was New Zealand's
first Wikipedian at Large
in 2018 and 2019.
And he will tell us
about his experience in that role,
and what kind of role Wikidata
is starting to play in that context.
Then we have Joachim Neubert
from the Leibniz Information Center
for Economics in Kiel and Hamburg.
He has been working on making the largest
public press archives worldwide
more accessible to the public,
and he's using Wikidata to do that.
And then I will go last.
My name is Beat Estermann.
I work for Bern University
of Applied Sciences, in Switzerland.
And I've been a long-time promoter
for OpenGLAM in Switzerland and Austria.
And I will today report
about my activities in connection
with the mandate from the Canadian Arts
Presenting Association,
focusing on performing arts.
Not primarily on Wikidata,
but you will see Wikidata
is starting to play a role there, as well.
So now, most of us
will take our seat here,
and I will give the floor to Susanna.
Okay. So, hello. My name is Susana Ånäs,
and I work part-time for Wikimedia Finland
as a GLAM coordinator,
and I also do consulting
in the open knowledge sphere.
And this is a discourse,
maybe, of [inaudible].
So, I have been involved in the workings
of geographic data group of the--
well, I looked it up,
but it isn't in English,
but, cultural heritage initiative
of the Finnish royal government.
So, this is about place names
and how they are represented
in different repositories
in the GLAM sector in Finland,
and how they are trying to pull together
these different sources,
and how they are informed
by modeling in Wikidata and elsewhere.
So, here we see the three main sources
for these YSO places,
which is part of the national ontology--
general ontology.
AHAA is for Finnish archives,
Melinda is for Finnish libraries,
and KOOKOS is for Finnish museums.
So, there are three, also,
content management systems
that come together in these YSO places.
And there are exchanges between Wikidata
already taking place,
as well as the names project
for the National Land Survey.
And then, there's a third project,
the Finnish Names Archive,
which doesn't yet contribute to this,
but there are plans for that.
So, one of the key modeling issues
in this whole problem area
is that there are three types
of elements in place names
represented in this project.
One of them is the place,
the one that has location.
And one of them is the place name,
the toponym, for example.
And then, there are sources,
which are documents
from which these both can be derived from,
or like, backed up with.
The YSO places--
here, on the top right,
you will see the same diagram again.
It focuses mainly on the places.
The main thing of this
is the Finnish National Library,
and the Finto project.
There are now more than 7,000 places
in Finnish and Swedish
and over 3,000 in English,
and they are CC0 we've licensed with.
So, here you can see the service of Finto.
And a place-- I chose Sevettijärvi.
It is now also related
to our language project
with the Skolt Sami--
this is a place
in the very north of Finland
inhabited by Skolt Sámi.
So, here you can see the place
which belongs to the--
well, you will see the data
about this place.
You can see that it is connected
to a Wikidata,
as well as this National Land Survey data.
Here we go. And you will see
this in more detail, here.
It is also hierarchically arranged
inside this repository.
Well, actually,
the actual place is not seen,
but it is underneath this municipality,
as well as the region,
and Finland as a country,
and Nordic countries,
the broader region.
Here you can see that many of these
have been matched
with Wikidata previously
through Mix'n'Match,
and there are still remaining ones.
But then, the amount of names
is not that high.
It's only less than 5,000.
So, then there is this other repository
by the Finnish Geospatial
Platform Project--
Place Names Cards.
These are all the place names
that are on Finnish maps.
And they have the linked data,
which is licensed CC BY 4.0.
800,000 map labels in Finnish, Swedish,
and all those three Saami languages
that are in Finland.
And they have
two different types of entities.
The other ones are places,
and the other ones
are place names, toponyms.
And they both have persistent URIs.
Here's, for example,
the same Sevettijärvi, in first Finnish,
and then all those three Saami languages,
as well as the geographic data,
and then there is more information
about that, like the place type,
et cetera.
Here is the card for the place name,
the toponym, having its own URI.
Sorry, it seems that it's not translated
into the English list.
So, multilinguality
is not covering the whole project.
Okay, we come
to the Finnish Names Archive.
This is a project by the Institute
for the Languages of Finland,
and these represent not the places,
not the place names,
but they are actually sources for those.
So, these are three million
field notes of place names,
and it is a Wikibase project.
They are in a Wikibase,
mainly in Finnish, some in Swedish.
An outstanding collection of Saami names,
which we are very interested in.
And they are licensed CC BY.
And that is also a challenge
from the Wikidata point of view.
But if there was a Finnish local Wikibase,
we might be able to first work
on them in that project.
So, here's a screenshot of that,
showing that there's information
about the place, the maps--
the maps that the collectors
initially use,
and the card that they produce
of the information they collected.
So, here's one of those cards
broken down into data
that is included in them.
So, then they sent
this linked data project
by the Helsinki Digital Humanities Lab
and Semantic Computers,
computing group of Aalto University--
and together with this Institute
for the Languages of Finland--
the Names Sampo.
And this is an aggregated
research interface
to several place name sources.
Here you can see that many
of the sources are out there on the left,
and then, you can make
different kinds of visualizations
based on this data.
And, yeah.
So, I've been bringing up this idea
of modeling for a local Wikibase
that we could do with this data.
But when we enter
these modeling questions,
how do we model?
There are different ways,
different traditions in each of these.
And the good thing about it
is it could also serve minority languages
with very little effort.
Okay. So, here we have
the two basic options:
the SAPO model, which is
the Finnish Space-Time Ontology,
and the Wikidata model.
Here you can see
that Wikidata items tend to zero.
Ideally, they remain the same
with the changing properties.
Whereas, in the SAPO model,
these items become new
when there is a change,
such as area change and name change.
So here, come back to this division
between these three different dimensions
of places, place names.
So, should we make these place names
into entities or properties?
Wikidata uses properties,
whereas this land survey
project has entities.
Or should we make them into lexemes?
Wikidata has chosen to work
with properties,
textual properties
for place names over lexemes.
I'm sorry, the other way around.
So, the names are...
properties, not lexemes.
Right.
And maybe the shortcoming of the Wikibase
is the lack of geographical
shapes inside that--
like in the basic setup of it,
so one would have to add
more technology into the stack
to be able to use local geographic shapes.
And a federation is really needed
to be able to take advantage
of the Wikidata corpus.
So, I'm done already. Thank you.
(applause)
Okay.
(speaking in Maori)
Welcome, everyone.
My name is Mike Dickison.
And for a year,
I was New Zealand Wikipedian at Large.
You might wonder
what a Wikipedian at Large is.
Because if you actually look out for it,
there is no such thing, as we can see.
It's a term that I made up
in the grant proposal,
which the foundation
seemed to like very much.
And so, we ran with it.
So, for a year, I went through
35 different institutions,
residents, and most of them,
running training sessions,
organizing public events,
and trying to develop
a Wikimedia strategy for each one.
It was a very interesting experience,
and you encounter a wide range
of different projects and people.
And I wanted to try and talk through
some of the different projects
that dealt with Wikidata
in interesting or, perhaps,
illuminating ways,
that might be useful for folks to discuss.
The project was initially
a Wikipedia project by the name,
simply because that was what people
were familiar with,
and so we organized
multiple different events
at very traditional edit-a-thons,
gender gap work, and so forth.
[And a bunch you can see] [inaudible],
and a bunch of very successful
new editors recruited, and so forth.
We did bulk uploads into Commons.
In this case, there was a collection
of over 1,000 original artworks
by an entomological
illustrator, Des Helmore,
which had been sitting on a hard drive,
[lacking] research for ten years,
and we were able
to get clearance to release those
all under CC BY license.
So, easy wins to show to people there.
Everyone can understand
lots of pictures of beetles.
Everyone can understand workshops
devoted to fixing the gender gap.
But Wikidata
is much more difficult to sell
to people in the GLAM sector,
or anyone outside
of our particular movement.
So, I began to realize that Wikidata
was going to be a more
and more important part
of the Wikipedian at Large projects.
So, as we went through, it became
a larger and larger component
of what I was doing.
And I began to try and teach myself
more about Wikidata as well,
because I was beginning to see
how important it was.
So, this one project--
the kakapo is a native
New Zealand flightless parrot.
We worked with
the Department of Conservation,
whose job is to save
this species from extinction,
and pitched the idea,
"What if we put every
single kakapo into Wikidata?"
And that may seem ridiculous,
but it's actually
a perfectly doable project.
A few of them are in there already.
A key thing to notice here
is there are not many kakapos.
So, it's a manageable task.
There were 148 when I started,
and then one died.
And they've just had
a great breeding season up to 213.
This is great. This is the most kakapo
there have been for over 50 years.
So, this was also a big deal.
This was on the news
every day in New Zealand.
Each new one that was born--
(man) In the New York Times.
(Mike) Did it? Oh, lovely.
Yeah, this was national news.
Everyone likes these birds.
But something interesting about them
is because unlike species
that are more populous,
every single kakapo is named,
has a unique name
and a unique ID number.
And often has good biographical data
about where and when they were born,
were hatched, who their father
and mother was,
when they died, if they died.
So, there is, in fact,
a Department of Conservation database
of all this information.
And one of the most famous kakapos,
of course, is Sirocco,
who you can see is named
after a wind, was born there.
Sirocco has a Twitter account,
which Wikidata had some problems with,
because, apparently,
they just can't have Twitter accounts.
I don't know about that.
He's even featured
on an album cover, and so forth.
So there are multiple properties of this,
probably one of the most famous
individual kakapo.
So, I pitched to the Department
of Conservation,
"Why don't we try and do this
with every single one?"
And so, they had to think about
how much of the biographical data
could be made public.
And they come up with a short list.
And now we've got, I think, 212,
210--I think a couple died--
living kakapo that are all candidates now.
And they only get a name when they fledge.
They have a code number until that
while they're still babies.
So, when we've got the full-fledged crop,
we're going to create
a complete Wikidata--
the entire species will be in Wikidata.
But we need to come up
with a property for DOC ID--
I actually would like to talk
with folks about that.
Should we be using a very specific ID,
or should we be coming up with an ID
that would work for all individual birds
or plants or animals
that have been tagged
in any scientific research project?
It's a good question.
Second project was
Christchurch Art Gallery.
There are very few paintings
of Colin MacCahon,
New Zealand's most famous
artist in existence.
This is a drawing he did
for the New Zealand School Journal,
which was government-funded at the time.
So, it's actually in Archives New Zealand
who own the copyright for that.
This is a very unusual situation.
So, I worked with
Christchurch Art Gallery
who, along with Auckland Art Gallery,
maintain a site called
Find New Zealand artists.
The job of which is to keep track
of the holdings--
every institution that has holdings
of the New Zealand artist.
So, about 18,000 different artists
in their database,
and most with very little
information at all.
So, we did a standard sort of Mix'n'Match.
We did an export of the ones
that had at least a birth date,
or a death date, or a place of birth,
or a place of death.
So, that's not restricting it very much.
And even then, we were not able
to match quite a few,
but we've got about 1,500 now
that are matched
to known artists in Wikidata,
which is nice.
But what was appealing to them--
this is their website,
which really just maintains
the holdings links there.
But this biographical data,
which they create by hand, currently,
for every single artist.
And the act of exporting
and putting into Mix'n'Match
exposed numerous typos
and mistakes and such
that they haven't noticed.
And it's only when you start
running things through [Excel],
these things show up.
And the value of Wikidata
was suddenly conveyed to them
when I said, "You can just suck in
that information from Wikidata."
And that made them sit up straight.
So this, I think, is one
of the selling points.
When you have this carefully
hand-curated website
with 18,000 entries, full of mistakes,
and tell them there's another way,
that they can get other people
to do some of this fact-checking
and correction for them--
that's when it sinks home.
And then announced I was pitching the idea
that they "Wikidatafy"
this entire history book
of the New Zealand artists
in Christchurch in the '30s,
and run through--just published--
and run through every single person,
connection, place, exhibition, and such.
But it's a manageable sized project,
and they're very excited by this.
And thirdly, I wanted to show you
Maori Subject Headings.
A waka is a Maori name
for a particular kind of canoe,
a war canoe.
So, in the National Library
of New Zealand,
there's a listing for waka,
because the National Library
actually has its own dictionary
of Maori Subject Headings,
in the Maori language.
So, there it defines a waka,
in Maori and English.
But it also has a whole lot
of narrower terms,
you can see there on the side there.
a typical would be taurapa.
And a definition first in Maori,
and then in English.
It's the carved sternpost
that you can see there.
And in English, you would say "sternpost,"
but you can't use
the word "sternpost" for taurapa,
because taurapa only works
for particular kinds of war canoes.
So, there's no English word
equivalent for that.
And I suddenly realized
that here is an entire ontology
of cultural-specific terms that have been
very carefully worked out
and verified by the National
Library with Maori,
constantly being added to and improved
with definitions, with descriptions,
in both English and Maori.
Really exciting.
I suddenly thought we could put
this whole lot into Wikidata--
Maori first, and then translated
into English, as required.
Be a nice change, wouldn't it!
And here's the copyright licensing.
Unfortunately, NonCommercial-NoDerivs.
So now I have to start
the conversation with them
about why did they pick that license.
And possibly because they only got
[buy in] from Maori,
who agreed to sit down
and [inaudible] this stuff
if there was a guarantee
that none of this information
could be used for commercial purposes.
So, that's one of the frustrating
aspects of the task
is coming up against
these sorts of restrictions.
So, those are the three things
I wanted to put out in front
and sparking discussion.
Putting an entire species into Wikidata,
what it takes to actually change
an art gallery's curator's mind
about the value of Wikidata,
and what do we do when we would see
a complete ontology
in another language that,
unfortunately, has been slapped
with a restrictive
Creative Commons license.
Thank you.
(applause)
Hello. My name is Joachim Neubert.
I'm working for the ZBW,
that is, Information Center
for Economics in Hamburg,
as a scientific software developer.
And one of my tasks last year
was preparing a data donation to Wikidata.
And I want to give some report on this
on our first experiences
from donating metadata
from the 20th-Century Press Archives.
To our best knowledge,
this is the largest public
press archive in the world.
It has been collected
between 1908 and 2005,
and has been got from
more than 1,500 newspapers
and periodicals
from Germany, and also internationally.
And it has covered everything
which could be of interest
for the Hamburg,
the Hamburg businesspeople
who wanted to expand over the world.
As you can see, this material
has been clipped from newspapers
and put onto paper,
and then collected in folders.
Here you see a small corner
of the Person's Archive,
and, similarly, information
has been collected on companies,
on general topics, on wares,
on everybody,
on everything which could be interesting.
These folders have been scanned
up to roughly 1949.
by the DFG-funded project in 2004 to 2007.
As a result, up to now,
it was 25,000 thematic dossiers
of this time.
This contained about 2 million,
or more than 2 million pages.
And these are online.
This application developed
at that time by ZBW,
which now looks a bit outdated,
not so fancy,
and what’s more of a problem.
It's an application which was built
architecturally on Oracle,
it was built on ColdFusion,
it runs on Windows servers,
so it's not very sustainable
in the long term.
And we have discussed
should we migrate this
to a more fancy linked data application,
or should we take a radical step
and put all this data in the open.
We have assigned CC0 license to that data
and, currently, moving some main--
access layer, some main discovery layer--
so it's a primary access layer
to the open linked data web,
where it actually makes most sense
to put some metadata into Wikidata,
and to make sure that all folders
of the collections are linked to Wikidata,
so they are findable,
and that all metadata about these folders
is also transferred to Wikidata.
So it can be used there,
and it can be enriched there, possibly.
Corrections can be made to that data.
What is still maintained by ZBW is,
of course, the storage of the images,
which we can't put in any way,
or we can't give a license on that
because this was owned
by the original creators.
But we make sure that they are accessible
by some, again, metadata files
via DFG Viewer
in the future by IIIF manifests.
And we will prepare
some static landing pages
which will serve as a data point
of reference for Wikidata,
as well as still making available data
which doesn't fit well into Wikidata.
[For us] is migration
and data donation to Wikidata
with our custom infrastructure
of SPARQL endpoint with that data,
and we basically used federated queries
between that endpoint
and the Wikidata Query Service
to create according statements
through [eyes of] concatenated
in SPARQL queries themselves,
or transformed via a script,
which also generated references
for the statements.
And then put that into QuickStatements
of the code to use this online.
So, this is what we get.
It's not only simple things
like birth dates, but, sorry--
but also complex statements
about already existing items,
like this person was a supervisory
board member of said company
during this period of time,
and referenced for use in...
in the scientific context.
The first part of this data donation
has been finished.
The Person's Archive
is completely linked to Wikidata.
And this is also an information tool.
A lot of items which have been before
not had any external references.
And we had about more
than 6,000 statements,
which are now sourced
in this archive's metadata.
Well, this was the most easy part,
because persons are easily
identifiable in Wikidata.
More than 90% already existed here,
so we could link to that.
We created some 100 items for these,
for the ones which were missing.
But now, we are working
on the rest of the archive,
particularly on the topics archive.
Which means mapping a historic system
for the organization of knowledge
about the whole world,
materialized as newspaper
clippings to Wikidata.
To give you a basic idea,
the Countries and Topics archive
is organized by a hierarchy of countries
and other geographic entities,
which is translated to English,
which makes this more easy.
And German deeply nested...
deeply nested classification of topics.
And this combination defines one...
one folder.
So, what we now want to do
is to match this
as a structure to Wikidata,
and to bring the data in.
And I want to invite you
to join this really nice challenge
in terms of knowledge organization.
So, it's a WikiProject
where this work is tracked,
and you can follow this
or participate in this.
And, yes, thank you very much.
(applause)
So, we're taking
performing arts to Wikidata.
And we're taking performing arts
to the linked open data cloud,
by building a linked open data
ecosystem for the performing arts.
And the question I'm trying to answer,
and I hope you'll help me
in answering the questions
which place for Wikidata and all that.
But let me first start with my experiences
which I made this year,
the first half of the year,
when I had the pleasure
to work with CAPACOA,
which is the Canadian Arts
Presenting Association,
which actually launched a project
called Linked Digital Future Initiative,
to actually get the entire art sector
in Canada to embrace linked open data.
And they did that based on the observation
that over the past five years,
the [inaudible]-- the important topic
within performing arts
was the fact that metadata
was not around in sufficient quality
and not interlinked, not interoperable.
And that was why some of the performances,
some of the events
are not so well findable
by Google and by personal
computer-based assistants, and so on.
So, the vision we kind
of developed together
is that we want to have a knowledge base
for many stakeholders at once.
So we looked at the entire
performing arts value network,
we identified key stakeholders in there,
we looked at the usage scenarios
that we like to pursue,
and we kind of mapped it
to the whole architecture
of such a knowledge base,
or of the different platforms in there,
which, obviously,
is a distributed architecture,
and not one big monolith.
I'm just going to run
through that quite quickly
because we have ten minutes each.
But I think we'll have plenty of time
tonight or tomorrow to deepen that
if anybody's interested in the details.
So, we started from
that Performing Arts Value Network,
which, interestingly,
was just published last year.
So, we're lucky to be able
to build on previous work,
like you have the primary value chain
of the performing arts in the middle,
and various stakeholders around that.
All in all, we identified
20 stakeholder groups,
which then we kind of boiled down
into seven larger categories
for each of the stakeholder groups.
We kind of formulated what kind of needs
they would have in terms
of such an infrastructure,
and what would they be able to achieve
if the whole thing was interlinked
and the data was publicly accessible.
And so, you can see the types here,
the different types is Production,
then Presention & Promotion,
Coverage & Reuse, Live Audiences,
Online Consumption, Heritage,
Research & Education.
And after kind of setting up a big table,
of which you can see
just the first part here,
we kind of compared [over there],
had a look at which type of data
were actually used across the board
by all different groups of stakeholders.
And there's quite a large basis of data
that is common to all of them,
and that is really is the area
where it makes a lot of sense, actually,
to cooperate and to keep that--
to maintain the data together.
So, when talking about
platform architecture,
you can see that we have four layers here.
At the bottom, display the data layer.
Of course, Wikidata plays a part in it,
but also a lot of other databases,
distributed databases
that can expose data
through SPARQL endpoints.
The yellow part in the middle,
that's the semantic layer.
It's our common language
to describe our things,
to make statements about things
around the performing arts, the ontology.
Then we have an application layer
that consists of various modules,
for example, data analysis,
data extraction-- so, how do you
actually get unstructured data
into structured data--
how can we support that by tools.
Then, obviously, there's
a visualization of data--
so if there are large quantities of data,
you want to visualize it in some way.
And on the top, you have
the presentation layer,
that's what the ordinary people
are actually interacting with
on a daily basis--
search engines, encyclopedias,
cultural agendas,
and a variety of other services.
We're not starting from scratch.
Some work has already
been done in this area.
I'll just cite a few examples
from a project
which I have been involved in.
Some other stuff going on as well.
And so, I started in this area
with the Swiss Archive
of the Performing Arts.
[Until] building a Swiss
Performing Arts database,
we created the performing arts ontology,
that's currently being
implemented into RDF.
And there we have the database
of like 60, 70 years
of performance history in Switzerland.
So, that's something that can build on,
and that's something
that's been transformed into RDF.
And there was a builder platform
where this data can be accessed.
Then we have done
several ingests into Wikidata,
partly from Switzerland,
partly also from
the performance arts institutes,
for example, Bart Magnus
was involved in that.
He was the driving force behind that.
There's also stuff from Wikimedia Commons,
but not very well interlinked
with all the rest of our metadata.
And obviously, by doing this ingest,
we also kind of started to implement
parts of this Swiss data model
into Wikidata.
Then one of the Canadian
implementation partners
is Culture Creates.
They're running a platform that actually
scrapes information from theater websites,
and inputs it into a knowledge graph,
to then expose it to search engines
and other search devices.
And there again, we kind of had
to implement and extend this in ontology.
And as you can see from the slide,
is that there's so many empty spaces,
but there's also some overlap,
and an important overlap, obviously,
is the common shared language,
which will help us actually interlink
the various data sets.
What is also important, obviously,
is that we're using the same
base registers and authority files.
And this is a place where Wikidata
plays an important role
by kind of interlinking these.
Now, I'd like to share the recommendations
by the Linked Data Future Initiatives
Advisory Committee.
At least the two first recommendations.
So, for the Canadians,
now it's absolutely crucial
to kind of fill in their own Canadian
performing arts knowledge graph,
because unlike the Swiss Archive
of the Performing Arts,
they're not starting
with an already existing database,
but they're kind of
creating it from scratch.
And it's absolutely crucial
to have data in there.
And second, as you can see,
comes in already Wikidata.
Wikidata, by the Advisory Committee,
has been seen as complementary
to Artsdata.ca, this knowledge graph,
and, therefore, efforts should
be undertaken to contribute
to its population
with performing arts-related data.
And that's where we're going to work on
over the coming months and years,
and that's also why
I'm kind of on the lookout here
to see who else will join that effort.
So, right now, obviously,
we're saying they're complementary.
So, we have to think about whether
the pluses and the minuses
of each of the approaches.
And you can see here a comparison
between Wikidata and the Classical
Linked Open Data approach.
I would be happy to discuss
that further with you guys,
how your experiences are in there.
But, as I see it, Wikidata is a huge plus
because it's a crowdsourcing platform,
and it's easy to invite further parties
to actually contribute.
On the negative side, obviously,
you get this problem of loss of control.
Data owners have to give up control
over their graphs, data quality,
and completeness.
It's harder to track on Wikidata
than if you have it under your control.
And the other strength of Wikidata
is that it requires immediate integration
into that worldwide graph.
And you kind of just do it--
kind of reconcile step by step
against other databases,
which may also be seen by some
as an advantage,
but of course, if you're looking
for integration and interoperability,
Wikidata forces you to go for that
from the beginning.
And then, obviously, harmonizing
data modeling practices
is an issue in both cases.
But it may seem, at the beginning,
easier to do with just in your own silo,
because at some point,
you're done with the task,
and it would be
an ongoing task on Wikidata.
So, when it now comes to prioritizing
the data to be ingested,
that's like the rules
I kind of go by at the moment.
First of all, we'd like to ingest it
where it's unclear who would be
the natural authority in the given area.
So that's definitely data
that will be managed in a shared manner.
And we'd like to ingest it where we see
a high potential
for crowdsourcing approaches.
We'd like to ingest data where the data
is likely to be reused
in the context of Wikipedia.
And there's also hope that some part
of the international coordination
around the whole data modeling,
about the standardization,
they could actually take place
directly on Wikidata,
if it's not taking place elsewhere,
because it kind of forces people
to start interacting
if they ingest data in the same part.
And we'd like to focus now next
on base registers and authority files
because they kind of help us
create the linkages
between different data
and uncontrolled vocabularies
as an extension of the existing ontology.
So, just two more slides.
The next steps will be that we're taking
the sum of all GLAMs approach
to Wiki Loves Performing Arts.
That means we're describing
venues and organizations,
and try to push the data to Wikipedia
in forms of infoboxes
and [bubble] templates.
And the other one, the other projects
I'm going to pursue is COST Action
that we'll submit next year
around that Linked Open Data Ecosystem
for the Performing Arts.
COST is a European program
that supports networking activities,
and the topics to be covered
are listed here.
Two of them, I have highlighted--
one of them is like the question
of federation between Wikidata
and the classical linked
open data approaches.
And the other one, I think,
is very important also,
where we have a huge potential still,
is implementing international campaigns
to supplement data on Wikidata.
So, that's it. Thank you
for your attention.
Now, I would like to ask
my colleagues up here.
To the panel, maybe you'll get them
microphones as well.
And then I would like to...
give you the chance to ask questions.
And obviously, also ask my colleagues
whether they have questions to each other.
So, do we have maybe a question
from the audience?
(man) [inaudible]
I would like to ask from each of you
where would you draw the line,
basically, how you define--
when do you need to run your own Wikibase,
and what do you want to put on Wikidata?
Like, is this a clear delineation
of what is seen
behind of putting it [into order.]
I can answer first because I have the mic.
So, I've been thinking
that one of the issues is notability.
I'm addressing that
in a different project.
And I think licensing could be one,
because you can apply your own terms
in your own database,
and then I think wherever it's possible.
And then, the third one
is just to have it as a sandbox,
prepare it for ingestion into Wikidata.
These are the three main things
that I come up with now,
but I can come up with more.
For me, rights are always
going to be an issue.
So, if the National Library
wanted to move towards Wikibase,
that would enable them to continue
to control the licensing
for the work they've done
with Maori language terms.
The kakapo database only contains data
that the Department of Conservation
felt could be made public,
but I suspect if they see it
up and running,
they might be tempted
to use a private Wikibase
to maintain their own database,
simply because of some
of the visualization tools
that could be applied might be better
than the sort of Excel spreadsheet system
that they currently run.
Well, I think this very much depends
on the kind of data.
We are, with the Press Archive, of course,
in a quite lucky position,
in that this was material
which was published,
it was published at the time,
but it was expensive to publish.
So, this is quite easy.
I think, also, projects--
and this is a typical project,
so it was funded for some time,
and then funding ended,
and what happens with the data
which is enclosed in some silo,
and some software
which will not run forever.
And so, it makes
absolute sense in my eyes.
At the time, Wikidata
wasn't around, but now it is,
and it makes absolute sense
for our project to early on
discuss sustainability in the context
of how could we put this
into a larger ecosystem like Wikidata,
and discuss this with the data community
what is notable and what makes sense
to add this to Wikidata,
and what makes sense to keep this
as a proprietary form.
Maybe in a more simple form
than sophisticated application,
but make it discoverable
and make it linked to the large data cloud
instead of investing lots of money
into some silo which will not sustain.
Yeah, as I said before
in the project I was presenting here,
are dualities between Wikidata
and classical linked open data approaches.
So, it's not so much about
setting up a private Wikibase.
Like one challenge we have had,
and, of course, in Wikidata,
is that when we ingest
your own data there,
you also have to do some housekeeping
of people, of other people, actually.
And they can put off people,
[or it also means] that we will address it
just step by step.
So, there will be, at the moment,
a database living--
in classical linked open data
and we're starting to linking it
with Wikidata,
and it's a continuous process to find out
for which areas the most data
will be eventually on Wikidata,
and for which areas it will actually
live on other databases.
Obviously, we'll have challenges
regarding synchronization,
as we probably all have,
because that linked data field,
where we still have
to negotiate who we trust,
who has authority about what.
(assistant) Other questions?
(woman) Thank you.
So, fully agree with that issue of--
where to put the boundary
between why do we put data on Wikidata,
or why do we keep them,
and create, manage, and maintain them
in local databases and for what purposes.
And I think that
this is a large discussion
that goes beyond just the excitement
of putting data on Wikidata
because it is public,
because it serves humanity, because--
while there are two cool tools,
and things are more complicated
in real life, I think.
Well, despite this,
it's quite an interesting discussion.
And then this is another issue, also,
or another problem that is being discussed
in this event in different panels.
It is on one side, have your own database,
whatever the technology is
and publish things on Wikidata,
or build your own system
of creating and managing information
on the Wikibase technology.
And then, synchronize or whatever--
do federation or things,
so it's a matter
of technology that is used,
and the fact that you use Wikidata
just for publishing,
or the infrastructure
that is underneath Wikidata
to create and manage your data.
I mean, we had a discussion
about the Wikibase panel,
and there will be other discussions here,
but things are
on different levels, I think.
Maybe [you sort of get] to that discussion
about Wikibase or Wikidata--
I think it's problematic
that we are focusing so much
on this Wikibase infrastructure,
because there are other infrastructures,
like in the area of performing arts.
We have another complementary community,
which is MusicBrainz
that runs on their own platform
that provides linked open data,
and as I understand it,
there's agreement
within the Wikidata community
that we're not going
to double all their data--
we're not going to copy all their data,
but we accept that they're complementary.
So, what will happen when you start
integrating this data in Wikipedia?
Infoboxes, for example.
Would we be able to pull that data
directly from their SPARQL endpoint?
Or would we be obliged
to kind of copy all the data,
and what kind of processes
are involved in that?
(woman) Discussions are open, I think,
because within this event,
you have both interested communities--
those that are interested in Wikibase,
and those that are interested in Wikidata,
and those who are interested in both.
Yeah, but we're not going
to oblige them to move to Wikibase.
- (woman) Not necessarily.
- MusicBrainz is not running on Wikibase.
(woman) No, I just wanted to say
that you have separate problems,
sometimes interrelated,
sometimes not completely separated.
And I had another question or remark
regarding the management of hierarchies
in controlled vocabularies,
like thesaurus, like you in Finto.
You do have the places
in the Maori
Subject Headings,
Well, they have to deal with
the management of concepts in hierarchy.
What is your take, your opinion
about the possibility
of managing this controlled
knowledge organization
systems in Wikidata?
I think in the case
of Finto and YSO places,
the repository will be a collection
of several sources, eventually.
So, it is in flux, anyway.
So, we don't have to necessarily--
well, I don't represent
the National Library,
but in that possible project,
we would not have
to maintain an existing--
or fight with an existing structure.
So, in that sense, it is an area
open for exploration.
The Maori Subject Headings
seems to lend themselves ideally
to Wikidata structure,
but the licensing,
of course, forbids that.
I suspect that if the licensing
were different
and they were put into Wikidata,
as soon as somebody decided
they didn't like the hierarchy
and started to change things,
there would be an immediate outcry
from people who worked very hard
to create that structure
and get the sign-off
from various different Maori
that was the current hierarchy.
So, that's an issue to try and resolve.
I think in terms of knowledge
organization systems,
they are all different.
And I'm not sure
if it would be a good idea
to represent different hierarchies
in Wikidata as such,
but it maybe makes sense
to think about overlays
of the data.
So, to do mappings on the content level.
For example, as ZBW partnership
Thesaurus for Economics.
And this thesaurus has its own hierarchy,
and, of course, it would be possible
to project the hierarchy
of this thesaurus into Wikidata concepts
without actually storing
this kind of structure
as an alternative structure
within Wikidata
which would make a lot of confusion.
But I think we should think
of Wikidata, also, as a pool of concepts
which can be connected on layers
which are outside,
and which give another view of the world
which is not necessarily to be
within Wikidata.
(assistant) Alright. Some other questions?
Otherwise-- okay.
(man 2) Joachim, I just wanted
to follow up on that last point.
So, these layers, as you picture it,
they would be maintained externally
and somehow integrated
with Wikidata from the Wikidata side,
or have you thought a bit further
about how that might be managed?
Actually, no, I have no--
I have done experiments
with ZBW and Wikidata.
I was [inaudible] here at Wikidata.
But I think this is
a whole new complex thing,
and so, it's up to [discuss],
[to give up a lot of control]
to do such things.
But it has to be figured out.
Should we take one more?
(man 3) Ah, great.
I was just wondering
about the kakapo project.
Uh-hmm.
(man 3) Okay. So, did you get
any pushback from the Wikidata community
about having individual animals
out of those items?
Not so far.
(man 3) Has anyone heard
about this before?
Is it "not so far" because
no one has heard about it yet?
There's been a small discussion
for quite some time now--
those people interested
in this sort of thing in Wikidata,
and we all seem to think
that it's a natural extension
of getting individual Wikidata items
to a famous racehorse
or someone's cat, which--
that's modeled pretty well.
I guess just the audacious thing
is putting the entire species in there.
But I think it's perfectly manageable.
(man 3) Don't try it with cats and dogs.
(laughter)
(assistant) Okay. I think
the time is finished.
Thank you very much for attending.
I think the speakers will be still open
for the questions and a break.
And have fun.
Thank you very much.
(applause)