WEBVTT
00:00:05.973 --> 00:00:07.908
Hi, guys! Can everybody hear me?
00:00:09.170 --> 00:00:11.898
So, hi! Nice to meet you all.
I'm Erica Azzellini.
00:00:11.898 --> 00:00:14.606
I'm one of the Wikimovement
Brazil's Liaison,
00:00:14.606 --> 00:00:17.829
and this is my first international
Wikimedia event,
00:00:17.829 --> 00:00:21.023
so I'm super excited to be here
and I hopefully,
00:00:21.023 --> 00:00:24.311
will share something interesting for you
all here on this lengthy talk.
00:00:25.247 --> 00:00:30.441
So this work starts with research
that I was developing in Brazil,
00:00:30.441 --> 00:00:34.219
Computational Journalism
and Structured Narratives with Wikidata.
00:00:34.276 --> 00:00:35.958
So in journalism,
00:00:35.958 --> 00:00:39.616
they're using some natural language
generation software
00:00:39.616 --> 00:00:41.418
for automating news
00:00:41.418 --> 00:00:46.535
for news that have
quite similar narrative structure.
00:00:46.535 --> 00:00:51.600
And we developed this concept here
of structured narratives,
00:00:51.600 --> 00:00:54.548
thinking about this practice
on computational journalism,
00:00:54.548 --> 00:00:58.361
that is the development of verbal text,
understandable by humans,
00:00:58.361 --> 00:01:01.274
automated from predetermined
arrangements that process information
00:01:01.274 --> 00:01:05.395
from structured databases,
which looks like that,
00:01:05.395 --> 00:01:10.043
the Wikimedia universe
and on this tool that we developed.
00:01:10.043 --> 00:01:13.555
So, when I'm talking about verbal text
understandable by humans,
00:01:13.555 --> 00:01:15.808
I'm talking about Wikipedia entries.
00:01:15.808 --> 00:01:17.778
When I'm talking about
structured databases,
00:01:17.778 --> 00:01:20.017
of course, I'm talking about
Wikidata here.
00:01:20.017 --> 00:01:22.777
And predetermined arrangement,
I'm talking about Mbabel,
00:01:22.777 --> 00:01:24.271
that is this tool.
00:01:25.467 --> 00:01:31.216
The Mbabel tool was inspired by a template
by user Pharos, right here in front of me,
00:01:31.279 --> 00:01:33.356
thank you very much,
00:01:33.356 --> 00:01:39.114
and it was developed with Ederporto
that is right here too,
00:01:39.114 --> 00:01:40.974
the brilliant Ederporto.
00:01:42.599 --> 00:01:44.498
We developed this tool
00:01:44.498 --> 00:01:47.780
that automatically generates
Wikipedia entries
00:01:47.780 --> 00:01:50.600
based on information from Wikidata.
00:01:53.189 --> 00:01:58.130
We actually do some thematic templates
00:01:58.130 --> 00:02:01.152
that are created on the Wikidata module,
00:02:01.573 --> 00:02:03.716
WikidataIB Module,
00:02:03.716 --> 00:02:07.835
and these templates are pre-determined,
generic and editable templates
00:02:07.835 --> 00:02:09.677
for various article themes.
00:02:09.677 --> 00:02:15.411
We realized that many Wikipedia entries
had a quite similar structured narrative
00:02:15.411 --> 00:02:18.922
so we could create a tool
that automatically generates that
00:02:18.922 --> 00:02:21.598
for many Wikidata items.
00:02:24.207 --> 00:02:28.571
Until now we have templates for museums,
works of art, books, films,
00:02:28.571 --> 00:02:31.265
journals, earthquakes, libraries,
archives,
00:02:31.265 --> 00:02:34.855
and Brazilian municipal
and state elections, and growing.
00:02:34.855 --> 00:02:38.984
So, everybody here is able to contribute
and create new templates.
00:02:38.984 --> 00:02:43.508
Each narrative template includes
an introduction, Wikidata infobox,
00:02:43.508 --> 00:02:46.158
section suggestions for the users,
00:02:46.158 --> 00:02:50.499
content tables or lists with Listeria,
depending on the case,
00:02:50.499 --> 00:02:53.713
references and categories,
and of course the sentences,
00:02:53.713 --> 00:02:55.776
that are created
with the Wikidata information.
00:02:55.776 --> 00:02:58.642
I'm gonna show you in a sec
an example of that.
00:03:00.137 --> 00:03:05.749
It's an integration with Wikipedia,
integration with Wikidata,
00:03:05.749 --> 00:03:08.760
so the more properties properly filled
on Wikidata,
00:03:08.760 --> 00:03:12.311
the more text entries you'll get
on your article stub.
00:03:12.857 --> 00:03:15.623
That's very important to highlight here.
00:03:16.343 --> 00:03:18.969
Structuring this Wikidata
can get more complex
00:03:18.969 --> 00:03:22.017
as I'm going to show you
on the election projects that we've made.
00:03:22.017 --> 00:03:26.552
So I'm going to let you hear this
Wikidata Lab XIV for you
00:03:26.552 --> 00:03:29.471
after this lengthy talk
00:03:29.471 --> 00:03:32.259
that is very brief,
so you'll be able to choose
00:03:32.259 --> 00:03:34.554
on the work that we've been doing
on structuring Wikidata
00:03:34.554 --> 00:03:36.005
for this purpose too.
00:03:37.272 --> 00:03:39.725
We have this challenge to build
a narrative template
00:03:39.725 --> 00:03:44.383
that is generic enough
to cover different Wikidata items
00:03:44.383 --> 00:03:46.347
and to suppress the gender
00:03:46.347 --> 00:03:50.359
and the number of difficulties
of languages,
00:03:52.054 --> 00:03:54.252
and still sounding natural for the user
00:03:54.252 --> 00:03:59.252
because we don't want to sound like
it doesn't click for the user
00:03:59.252 --> 00:04:00.546
to edit after that.
00:04:01.956 --> 00:04:07.625
This is how the Mbabel looks like
on the bottom form.
00:04:07.625 --> 00:04:14.507
You just have insert the item number there
and call the desired template
00:04:14.507 --> 00:04:21.673
and then you have article to edit
and expand, and everything.
00:04:22.135 --> 00:04:26.856
So, more importantly, why we did it?
Not because it's cool to develop
00:04:26.856 --> 00:04:30.922
things here in Wikidata,
we know, we all hear, know about it.
00:04:30.922 --> 00:04:36.178
But we are experimenting this integration
from Wikidata to Wikipedia
00:04:36.178 --> 00:04:39.226
and we want to focus
on meaningful individual contributions.
00:04:39.226 --> 00:04:42.608
So we've been working
on education programs
00:04:42.608 --> 00:04:45.067
and we want the students to feel the value
00:04:45.067 --> 00:04:47.280
of their entries too, but not only--
00:04:47.280 --> 00:04:49.405
Oh, five minutes only,
Geez, I'm gonna rush here.
00:04:49.405 --> 00:04:50.599
(laughing)
00:04:50.794 --> 00:04:54.160
And we want you all to make tasks
for users in general,
00:04:54.270 --> 00:04:57.801
especially on tables
and this kind of content
00:04:57.801 --> 00:04:59.988
that it's a bit of a rush to do.
00:05:02.456 --> 00:05:05.523
And we're working on this concept
of abstract Wikipedia.
00:05:05.523 --> 00:05:09.269
Denny Vrandečić wrote an article
super interesting about it
00:05:09.269 --> 00:05:11.500
so I linked here too.
00:05:11.500 --> 00:05:14.792
And we also want to now support
small language communities
00:05:14.792 --> 00:05:17.845
to fill the lack of content there.
00:05:18.784 --> 00:05:23.885
This is an example of how we've been using
this Mbabel tool for GLAM
00:05:23.885 --> 00:05:25.748
and education programs,
00:05:25.748 --> 00:05:29.861
and I showed you earlier
the bottom form of the Mbabel tool
00:05:29.861 --> 00:05:34.264
but also we can make red links
that aren't exactly empty.
00:05:34.264 --> 00:05:35.931
So you click on this red link
00:05:35.931 --> 00:05:38.862
and you automatically have
this article draft
00:05:38.862 --> 00:05:41.660
on your user page to edit.
00:05:42.964 --> 00:05:48.762
And I'm going to briefly talk about it
because I only have some minutes more.
00:05:50.009 --> 00:05:51.356
On educational projects,
00:05:51.356 --> 00:05:56.799
we've been doing this with elections
in Brazil for journalism students.
00:05:56.799 --> 00:06:01.993
We have the experience
with the [inaudible] students
00:06:02.087 --> 00:06:05.314
with user Joalpe--
he's not here right now,
00:06:05.314 --> 00:06:07.867
but we all know him, I think.
00:06:07.867 --> 00:06:11.930
And we realize that we have the data
about Brazilian elections
00:06:11.930 --> 00:06:14.748
but we don't have media cover on it.
00:06:15.049 --> 00:06:18.249
So we were lacking also
Wikipedia entries on it.
00:06:19.029 --> 00:06:23.000
How do we insert this meaningful
information on Wikipedia
00:06:23.000 --> 00:06:24.672
that people really access?
00:06:24.672 --> 00:06:27.989
Next year we're going
to have some election,
00:06:27.989 --> 00:06:30.710
people are going to look for
this kind of information on Wikipedia
00:06:30.710 --> 00:06:32.433
and they simply won't find it.
00:06:32.433 --> 00:06:35.726
So this tool looks quite useful
for this purpose
00:06:35.726 --> 00:06:40.214
and the students were introduced,
not only to Wikipedia,
00:06:40.214 --> 00:06:42.701
but also to Wikidata.
00:06:42.701 --> 00:06:46.575
Actually, they were introduced
to Wikipedia with Wikidata,
00:06:46.575 --> 00:06:50.675
which is an experience super interesting
and we had a lot of fun,
00:06:50.675 --> 00:06:52.823
and it was quite challenging
to organize all that.
00:06:52.823 --> 00:06:54.513
We can talk about it later too.
00:06:54.979 --> 00:06:58.582
And they also added the background
and the analysis sections
00:06:58.582 --> 00:07:01.663
on these elections articles,
00:07:01.663 --> 00:07:05.336
because we don't want them
to just simply automate the content there.
00:07:05.336 --> 00:07:06.660
We can do better.
00:07:06.660 --> 00:07:09.247
So this is the example
I'm going to show you.
00:07:09.247 --> 00:07:13.106
This is from a municipal election
in Brazil.
00:07:15.603 --> 00:07:17.121
Two minutes... oh my!
00:07:18.577 --> 00:07:23.268
This example here was entirely created
with the Mbabel tool.
00:07:23.268 --> 00:07:29.496
You have here this introduction text.
It really sounds natural for the reader.
00:07:29.496 --> 00:07:32.165
The Wikidata infobox here--
00:07:32.165 --> 00:07:34.907
it's a masterpiece
of Ederporto right there.
00:07:34.907 --> 00:07:36.769
(laughter)
00:07:37.438 --> 00:07:42.456
And we have here the tables with the
election results for each position.
00:07:42.456 --> 00:07:46.415
And we also have these results here
on the textual form too,
00:07:46.415 --> 00:07:51.767
so it really looks like an article
that was made, that was handcrafted.
00:07:53.893 --> 00:07:57.814
The references here were also made
with the Mbabel tool
00:07:57.814 --> 00:08:01.393
and we used identifiers
to build these references here
00:08:01.393 --> 00:08:03.167
and the categories too.
00:08:10.726 --> 00:08:14.999
So, to wrap things up here,
it is still a work in progress,
00:08:14.999 --> 00:08:19.326
and we have some challenges
on outreach and technical
00:08:19.326 --> 00:08:22.999
to bring Mbabel
to other language communities,
00:08:22.999 --> 00:08:24.844
especially the smaller ones,
00:08:24.844 --> 00:08:27.210
and how do we support those tools
00:08:27.210 --> 00:08:29.819
on lower resource
language communities too.
00:08:29.819 --> 00:08:33.991
And finally, is it possible
to create an Mbabel
00:08:33.991 --> 00:08:36.261
that overcomes language barriers?
00:08:36.261 --> 00:08:39.740
I think that's a question
very interesting for the conference
00:08:39.740 --> 00:08:43.835
and hopefully we can figure
that out together.
00:08:44.818 --> 00:08:49.799
So, thank you very much,
and look for the Mbabel poster downstairs
00:08:49.799 --> 00:08:53.615
if you like to have all this information
wrapped up, okay?
00:08:53.615 --> 00:08:55.038
Thank you.
00:08:55.288 --> 00:08:57.564
(audience clapping)
00:09:00.311 --> 00:09:02.778
(moderator) I'm afraid
we're a little too short for questions
00:09:02.778 --> 00:09:05.783
but yes, Erica, as she said,
has a poster and is very friendly.
00:09:05.783 --> 00:09:07.518
So I'm sure you can talk to her
afterwards,
00:09:07.518 --> 00:09:09.389
and if there's time at the end,
I'll allow it.
00:09:09.389 --> 00:09:12.131
But in the meantime,
I'd like to bring up our next speaker...
00:09:12.237 --> 00:09:13.611
Thank you.
00:09:15.549 --> 00:09:17.140
(audience chattering)
00:09:23.058 --> 00:09:27.016
Next we've got Yolanda Gil,
talking about Wikidata and Geosciences.
00:09:27.908 --> 00:09:29.031
Thank you.
00:09:29.031 --> 00:09:31.624
I come from the University
of Southern California
00:09:31.624 --> 00:09:35.164
and I've been working with
Semantic Technologies for a long time.
00:09:35.164 --> 00:09:37.894
I want to talk about geosciences
in particular,
00:09:37.894 --> 00:09:41.225
where this idea of crowd-sourcing
from the community is very important.
00:09:41.791 --> 00:09:45.033
So I'll give you a sense
that individual scientists,
00:09:45.033 --> 00:09:47.070
most of them in colleges,
00:09:47.070 --> 00:09:50.085
collect their own data
for their particular project.
00:09:50.085 --> 00:09:51.932
They describe it in their own way.
00:09:51.932 --> 00:09:55.352
They use their own properties,
their own metadata characteristics.
00:09:55.352 --> 00:09:58.560
This is an example
of some collaborators of mine
00:09:58.560 --> 00:10:00.124
that collect data from a river.
00:10:00.124 --> 00:10:02.091
They have their own sensors,
their own robots,
00:10:02.091 --> 00:10:05.339
and they study the water quality.
00:10:05.339 --> 00:10:11.423
I'm going to talk today about an effort
that we did to crowdsource metadata
00:10:11.423 --> 00:10:14.712
for a community that works
in paleoclimate.
00:10:14.712 --> 00:10:17.747
The article just came out
so it's in the slides if you're curious,
00:10:17.747 --> 00:10:20.619
but it's a pretty large community
that work together
00:10:20.619 --> 00:10:24.042
to integrate data more efficiently
through crowdsourcing.
00:10:24.042 --> 00:10:28.631
So, if you've heard of the
hockey stick graphics for climate,
00:10:28.631 --> 00:10:31.680
this is the community that does this.
00:10:31.680 --> 00:10:34.520
This is a study for climate
in the last 200 years,
00:10:34.520 --> 00:10:38.188
and it takes them literally many years
to look at data
00:10:38.188 --> 00:10:39.618
from different parts of the globe.
00:10:39.618 --> 00:10:42.607
Each dataset is collected by
a different investigator.
00:10:42.699 --> 00:10:44.433
The data is very, very different,
00:10:44.433 --> 00:10:47.017
so it takes them a long time
to put together
00:10:47.017 --> 00:10:49.230
these global studies of climate,
00:10:49.230 --> 00:10:51.665
and our goal is to make that
more efficient.
00:10:51.665 --> 00:10:53.690
So, I've done a lot of work
over the years.
00:10:53.690 --> 00:10:56.585
Going back to 2005, we used to call it,
00:10:56.585 --> 00:10:59.615
"Knowledge Collection from Web Volunteers"
00:10:59.615 --> 00:11:02.236
or from netizens at that time.
00:11:02.236 --> 00:11:04.267
We had a system called "Learner."
00:11:04.267 --> 00:11:07.048
It collected 700,000 common sense,
00:11:07.048 --> 00:11:09.368
common knowledge statements
about the world.
00:11:09.368 --> 00:11:11.367
We did a lot of different techniques.
00:11:11.367 --> 00:11:15.333
The forms that we did
to extract knowledge from volunteers
00:11:15.333 --> 00:11:19.136
really fit the knowledge models,
the data models that we used
00:11:19.136 --> 00:11:21.381
and the properties that we wanted to use.
00:11:21.381 --> 00:11:25.051
I worked with Denny
in the system called "Shortipedia"
00:11:25.051 --> 00:11:27.259
when he was a Post Doc at ISI,
00:11:27.259 --> 00:11:31.946
looking at keeping track
of the prominence of the assertions,
00:11:31.946 --> 00:11:35.129
and we started to build
on Semantic Media Wiki software.
00:11:35.129 --> 00:11:37.113
So everything that
I'm going to describe today
00:11:37.113 --> 00:11:38.936
builds on that software,
00:11:38.936 --> 00:11:41.117
but I think that now we have Wikibase,
00:11:41.117 --> 00:11:43.676
we'll be starting to work more
on Wikibase.
00:11:43.676 --> 00:11:48.935
So the LinkedEarth is the project
where we work with paleoclimate scientists
00:11:48.935 --> 00:11:50.636
to crowdsource the metadata,
00:11:50.636 --> 00:11:54.328
and seeing the title that we said,
"controlled crowdsourcing."
00:11:54.328 --> 00:11:57.101
So we found a nice niche
00:11:57.101 --> 00:12:00.538
where we could let them create
new properties
00:12:00.538 --> 00:12:02.599
but we had an editorial process for it.
00:12:02.599 --> 00:12:04.444
So I'll describe to you how it works.
00:12:04.444 --> 00:12:10.055
For them, if you're looking at a sample
from lake sediments from 200 years ago,
00:12:10.055 --> 00:12:12.622
you use different properties
to describe it
00:12:12.622 --> 00:12:15.692
than if you have coral sediments
that you're looking at
00:12:15.692 --> 00:12:18.979
or coral samples that you're looking at
that you extract from the ocean.
00:12:18.979 --> 00:12:23.532
Palmyra is a coral atoll in the Pacific.
00:12:23.532 --> 00:12:27.918
So if you have coral, you care
about the species and the genus,
00:12:27.918 --> 00:12:31.691
but if you're just looking at lake sand,
you don't have that.
00:12:31.691 --> 00:12:35.313
So each type of sample
has very different properties.
00:12:35.313 --> 00:12:38.798
In LinkedEarth,
they're able to see in a map
00:12:38.798 --> 00:12:40.264
where the datasets are.
00:12:40.264 --> 00:12:45.500
They actually annotate their own datasets
or the datasets of other researchers
00:12:45.500 --> 00:12:46.787
when they're using it.
00:12:46.787 --> 00:12:50.254
So they have a reason
why they want certain properties
00:12:50.254 --> 00:12:52.289
to describe those datasets.
00:12:52.289 --> 00:12:56.683
Whenever there are disagreements,
or whenever there are agreements,
00:12:56.683 --> 00:12:58.595
there's community discussions
about them
00:12:58.595 --> 00:13:02.894
and they're also polls to decide on
what properties to settle.
00:13:02.894 --> 00:13:05.659
So it's a nice ecosystem.
I'll give you examples.
00:13:05.659 --> 00:13:11.322
You look at a particular dataset,
in this case it's a lake in Africa.
00:13:11.322 --> 00:13:14.241
So you have the category of the page;
it can be a dataset,
00:13:14.241 --> 00:13:15.491
it can be other things.
00:13:15.491 --> 00:13:21.181
You can download the dataset itself
and you have kind of canonical properties
00:13:21.181 --> 00:13:23.737
that they have all agreed to have
for datasets,
00:13:23.737 --> 00:13:25.992
and then under Extra Information,
00:13:25.992 --> 00:13:29.369
those are properties
that the person describing this dataset,
00:13:29.369 --> 00:13:31.007
added on their own accord.
00:13:31.007 --> 00:13:32.628
So these can be new properties.
00:13:32.628 --> 00:13:36.730
We call them "crowd properties,"
rather than "core properties."
00:13:37.291 --> 00:13:41.319
And then when you're describing
your dataset,
00:13:41.319 --> 00:13:43.774
in this case
it's an ice core that you got
00:13:43.774 --> 00:13:45.716
from a glacier dataset,
00:13:45.765 --> 00:13:49.178
and your'e adding a dataset
you want to talk about measurements,
00:13:49.178 --> 00:13:54.073
you have an offering
of all the existing properties
00:13:54.073 --> 00:13:55.278
that match what you're saying.
00:13:55.278 --> 00:13:58.409
So we do this search completion
so that you can adopt that.
00:13:58.409 --> 00:14:00.140
That promotes normalization.
00:14:00.140 --> 00:14:04.260
The core of the properties
has been agreed by the community
00:14:04.260 --> 00:14:06.220
so we're really extending that core.
00:14:06.220 --> 00:14:08.795
And that core is very important
because it gives structure
00:14:08.795 --> 00:14:10.735
to all the extensions.
00:14:10.735 --> 00:14:14.382
We engage the community
through many different ways.
00:14:14.382 --> 00:14:17.260
We had one face-to-face meeting
at the beginning
00:14:17.260 --> 00:14:21.611
and after about a year and a half,
we do have a new standard,
00:14:21.611 --> 00:14:25.154
and a new way for them
to continue to evolve that standard.
00:14:25.154 --> 00:14:30.569
They have editors, very much
in the Wikipedia style
00:14:30.569 --> 00:14:31.582
of editorial boards.
00:14:31.582 --> 00:14:34.098
They have working groups
for different types of data.
00:14:34.098 --> 00:14:36.090
They do polls with the community,
00:14:36.090 --> 00:14:40.879
and they have pretty nice engagement
of the community at large,
00:14:40.879 --> 00:14:43.706
even if they've never visited our Wiki.
00:14:43.706 --> 00:14:46.183
The metadata evolves
00:14:46.183 --> 00:14:48.775
so what we do is that people annotate
their datasets,
00:14:48.775 --> 00:14:52.321
then the schema evolves,
the properties evolve
00:14:52.321 --> 00:14:55.379
and we have an entire infrastructure
and mechanisms
00:14:55.379 --> 00:15:00.336
to re-annotate the datasets
with the new structure of the ontology
00:15:00.336 --> 00:15:01.711
and the new properties.
00:15:01.711 --> 00:15:05.210
This is described in the paper.
I won't go into the details.
00:15:05.210 --> 00:15:07.583
But I think that
having that kind of capability
00:15:07.583 --> 00:15:10.342
in Wikibase would be really interesting.
00:15:10.342 --> 00:15:14.041
We basically extended
Semantic Media Wiki and Media Wiki
00:15:14.041 --> 00:15:15.722
to create our own infrastructure.
00:15:15.722 --> 00:15:18.855
I think a lot of this is now something
that we find in Wikibase,
00:15:18.961 --> 00:15:20.615
but this is older than that.
00:15:20.615 --> 00:15:24.999
And in general, we have many projects
where we look at crowdsourcing
00:15:24.999 --> 00:15:29.885
not just descriptions of datasets
but also descriptions of hydrology models,
00:15:29.885 --> 00:15:33.563
descriptions of multi-step
data analytic workflows
00:15:33.563 --> 00:15:36.080
and many other things in the sciences.
00:15:36.080 --> 00:15:42.833
So we are also interested in including
in Wikidata additional things
00:15:42.833 --> 00:15:46.250
that are not just datasets or entities
00:15:46.250 --> 00:15:48.512
but also other things
that have to do with science.
00:15:48.512 --> 00:15:53.770
I think Geosciences are more complex
in this sense than Biology, for example.
00:15:54.923 --> 00:15:56.233
That's it.
00:15:56.513 --> 00:15:57.885
Thank you.
(audience clapping)
00:16:01.640 --> 00:16:03.772
- Do I have time for questions?
- Yes.
00:16:03.772 --> 00:16:06.871
(moderator) We have time
for just a couple of short questions.
00:16:07.751 --> 00:16:11.342
When answering,
can go back to the microphone?
00:16:12.529 --> 00:16:14.520
- Yes.
- Hopefully, yeah.
00:16:21.314 --> 00:16:25.002
(audience 1) Does the structure allow
tabular datasets to be described
00:16:25.002 --> 00:16:26.988
and can you talk a bit about that?
00:16:27.225 --> 00:16:32.667
Yes. So the properties of the datasets
talk more about who collected them,
00:16:32.667 --> 00:16:36.759
what kind of data was collected,
what kind of sample it was,
00:16:36.759 --> 00:16:39.790
and then there's a separate standard
which is called "lipid"
00:16:39.790 --> 00:16:43.065
that's complementary and mapped
to the properties
00:16:43.065 --> 00:16:46.994
that describes the format
of the actual files
00:16:47.075 --> 00:16:49.343
and the actual structure of the data.
00:16:49.343 --> 00:16:53.631
So, you're right that there's both,
"how do I find data about x"
00:16:53.631 --> 00:16:55.557
but also, "Now, how do I use it?
00:16:55.557 --> 00:17:00.211
How do I know where
the temperature that I'm looking for
00:17:00.211 --> 00:17:03.013
is actually in the file?"
00:17:03.656 --> 00:17:05.394
(moderator) This will be the last.
00:17:06.887 --> 00:17:09.034
(audience 2) I'll have
to make it relevant.
00:17:09.504 --> 00:17:15.667
So, you have shown this process
of how users can suggest
00:17:15.667 --> 00:17:18.985
or like actually already put in
properties,
00:17:18.985 --> 00:17:22.705
and I didn't fully understand
how this thing works,
00:17:22.705 --> 00:17:24.027
or what's the process behind it.
00:17:24.027 --> 00:17:28.045
Is there some kind of
folksonomy approach--obviously--
00:17:28.045 --> 00:17:33.387
but how is it promoted
into the core vocabulary
00:17:33.387 --> 00:17:36.255
if something is promoted?
00:17:36.255 --> 00:17:37.882
Yes, yes. It is.
00:17:37.882 --> 00:17:42.202
So what we do is we have a core ontology
and the initial one was actually
00:17:42.202 --> 00:17:45.618
very thoughtfully put together
through a lot of discussion
00:17:45.618 --> 00:17:47.964
by very few people.
00:17:47.964 --> 00:17:51.052
And then the idea was
the whole community can extend that
00:17:51.052 --> 00:17:52.971
or propose changes to that.
00:17:52.971 --> 00:17:56.919
So, as they are describing datasets,
they can add new properties
00:17:56.919 --> 00:17:59.526
and those become "crowd properties."
00:17:59.526 --> 00:18:02.941
And every now and then,
the Editorial Committee
00:18:02.941 --> 00:18:04.367
looks at all of those properties,
00:18:04.367 --> 00:18:07.795
the working groups look at all of those
crowd properties,
00:18:07.795 --> 00:18:11.714
and decide whether to incorporate them
into the main ontology.
00:18:11.714 --> 00:18:15.804
So it could be because they're used
for a lot of dataset descriptions.
00:18:15.804 --> 00:18:18.920
It could be because
they are proposed by somebody
00:18:18.920 --> 00:18:23.339
and they're found to be really interesting
or key, or uncontroversial.
00:18:23.339 --> 00:18:30.267
So there's an entire editorial process
to incorporate those new crowd properties
00:18:30.267 --> 00:18:32.188
or the folksonomy part of it,
00:18:32.188 --> 00:18:36.308
but they are really built around the core
of the ontology.
00:18:36.404 --> 00:18:40.280
The core ontology then grows
with more crowd properties
00:18:40.280 --> 00:18:44.311
and then people propose
additional crowd properties again.
00:18:44.311 --> 00:18:46.979
So we've gone through a couple
of these iterations
00:18:46.979 --> 00:18:51.386
of rolling out a new core,
and then extending it,
00:18:51.386 --> 00:18:55.570
and then rolling out a new core
and then extending it.
00:18:55.570 --> 00:18:57.779
- (audience 2) Great. Thank you.
- Thanks.
00:18:57.779 --> 00:19:00.437
(moderator) Thank you.
(audience applauding)
00:19:02.295 --> 00:19:03.777
(moderator) Thank you, Yolanda.
00:19:03.777 --> 00:19:07.494
And now we have Adam Shorn
with "Something About Wikibase,"
00:19:07.599 --> 00:19:09.299
according to the title.
00:19:09.708 --> 00:19:12.956
Uh... where's the internet? There it is.
00:19:13.245 --> 00:19:18.925
So, I'm going to do a live demo,
which is probably a bad idea
00:19:18.925 --> 00:19:21.362
but I'm going to try and do it
as the birthday present later
00:19:21.362 --> 00:19:24.268
so I figure I might as well try it here.
00:19:24.292 --> 00:19:27.304
And I also have some notes on my phone
because I have no slides.
00:19:29.349 --> 00:19:32.248
So, two years ago,
I made these Wikibase doc images
00:19:32.248 --> 00:19:34.052
that quite a few people have tried out,
00:19:34.052 --> 00:19:38.087
and even before then,
I was working on another project,
00:19:38.087 --> 00:19:42.363
which is kind of ready now,
and here it is.
00:19:43.690 --> 00:19:46.832
It's a website that allows you
to instantly create a Wikibase
00:19:46.900 --> 00:19:48.930
with a query service and quick statements,
00:19:48.930 --> 00:19:51.616
without needing to know about
any of the technical details,
00:19:51.616 --> 00:19:54.295
without needing to manage
any of them either.
00:19:54.295 --> 00:19:57.054
There are still lots of features to go
and there's still some bugs,
00:19:57.054 --> 00:19:59.348
but here goes the demo.
00:19:59.348 --> 00:20:02.628
Let me get my emails up ready...
because I need them too...
00:20:03.315 --> 00:20:06.514
Da da da... Stopwatch.
00:20:07.272 --> 00:20:08.488
Okay.
00:20:08.829 --> 00:20:14.253
So it's a simple as...
at the moment it's locked down behind...
00:20:14.337 --> 00:20:16.495
Oh no! German keyboard!
00:20:16.495 --> 00:20:18.703
(audience laughing)
00:20:22.556 --> 00:20:23.923
Foiled... okay.
00:20:24.955 --> 00:20:26.214
Okay.
00:20:26.634 --> 00:20:28.417
(audience continues to laugh)
00:20:30.434 --> 00:20:31.989
Aha! Okay.
00:20:32.950 --> 00:20:35.335
I'll remember that for later.
(laughs)
00:20:36.911 --> 00:20:38.119
Yes.
00:20:39.438 --> 00:20:40.855
♪ (humming) ♪
00:20:40.961 --> 00:20:44.932
Oh my god... now it's American.
00:20:53.871 --> 00:20:56.131
All you have to do is create an account...
00:20:58.570 --> 00:21:00.007
da da da...
00:21:00.566 --> 00:21:02.432
Click this button up here...
00:21:02.478 --> 00:21:05.512
Come up with a name for Wiki--
"Demo1"
00:21:05.862 --> 00:21:07.299
"Demo1"
00:21:07.568 --> 00:21:09.135
"Demo user"
00:21:09.203 --> 00:21:11.864
Agree to the terms
which don't really exist yet.
00:21:12.298 --> 00:21:14.247
(audience laughing)
00:21:15.264 --> 00:21:17.698
Click on this thing which isn't a link.
00:21:21.519 --> 00:21:23.886
And then you have your Wikibase.
00:21:23.886 --> 00:21:26.602
(audience cheers and claps)
00:21:28.554 --> 00:21:30.421
Anmelden in German.
00:21:30.421 --> 00:21:35.126
Demo... oh god! I'm learning lots about
my demo later.
00:21:35.569 --> 00:21:40.069
1-6-1-4-S-G...
00:21:40.166 --> 00:21:42.567
- (audience 3) Y...
- (Adam) It's random.
00:21:43.016 --> 00:21:44.567
(audience laughing)
00:21:46.237 --> 00:21:47.958
Oh, come on....
(audience laughing)
00:21:48.001 --> 00:21:50.543
Oh no. It's because this is a capital U...
00:21:51.333 --> 00:21:53.283
(audience chattering)
00:21:54.453 --> 00:21:56.545
6-1-4....
00:21:57.465 --> 00:22:01.248
S-G-ENJ...
00:22:01.623 --> 00:22:03.794
Is J... oh no. That's... oh yeah. Okay.
00:22:03.843 --> 00:22:06.242
I'm really... I'm gonna have to look
at the laptop
00:22:06.242 --> 00:22:07.836
that I'm doing this on later.
00:22:07.836 --> 00:22:09.129
Cool...
00:22:11.046 --> 00:22:13.709
Da da da da da...
00:22:14.687 --> 00:22:17.040
Maybe I should have some things
in my clipboard ready.
00:22:17.539 --> 00:22:19.093
Okay, so now I'm logged in.
00:22:22.631 --> 00:22:25.065
Oh... keyboards.
00:22:28.083 --> 00:22:30.012
So you can go and create an item...
00:22:36.194 --> 00:22:38.508
Yeah, maybe I should make a video.
It might be easier.
00:22:38.927 --> 00:22:42.207
So, yeah. You can make items,
you have quick statements here
00:22:42.207 --> 00:22:43.901
that have... oh... it is all in German.
00:22:43.901 --> 00:22:45.088
(audience laughing)
00:22:45.088 --> 00:22:46.297
(sighs)
00:22:46.926 --> 00:22:49.021
Oh, log in? Log in?
00:22:50.348 --> 00:22:52.088
It has... Oh, set up ready.
00:22:52.088 --> 00:22:53.482
Da da da...
00:22:55.965 --> 00:22:57.850
It's as easy as...
00:22:58.966 --> 00:23:01.350
I learned how to use
Quick Statements yesterday...
00:23:01.350 --> 00:23:03.245
that's what I know how to do.
00:23:04.657 --> 00:23:07.089
I can then go back to the Wiki...
00:23:08.008 --> 00:23:09.804
We can go and see in Recent Changes
00:23:09.804 --> 00:23:11.942
that there are now two items,
the one that I made
00:23:11.942 --> 00:23:13.759
and the one from Quick Statements...
00:23:13.759 --> 00:23:14.881
and then you go to Quick...
00:23:14.881 --> 00:23:16.511
♪ (hums a tune) ♪
00:23:17.637 --> 00:23:18.770
Stop...no...
00:23:18.927 --> 00:23:20.120
No...
00:23:20.454 --> 00:23:22.437
(audience laughing)
00:23:28.394 --> 00:23:30.006
Oh god...
00:23:30.061 --> 00:23:32.012
I'm glad I tried this out in advance.
00:23:33.464 --> 00:23:35.678
There you go.
And the query service is updated.
00:23:35.830 --> 00:23:37.763
(audience clapping)
00:23:42.357 --> 00:23:45.359
And the idea of this is it'll allow
people to try out Wikibases.
00:23:45.359 --> 00:23:48.493
Hopefully, it'll even be able
to allow people to...
00:23:49.110 --> 00:23:50.945
have their real Wikibases here.
00:23:50.945 --> 00:23:53.783
At the moment you can create
as many as you want
00:23:53.783 --> 00:23:55.653
and they all just appear
in this lovely list.
00:23:55.653 --> 00:23:59.182
As I said, there's lots of bugs
but it's all super quick.
00:23:59.914 --> 00:24:03.392
Exactly how this is going to continue
in the future, we don't know yet
00:24:03.392 --> 00:24:05.757
because I only finished writing this
in the last few days.
00:24:05.757 --> 00:24:09.286
It's currently behind an invitation code
so that if you want to come try it out,
00:24:09.286 --> 00:24:10.888
come and talk to me.
00:24:11.645 --> 00:24:15.730
And if you have any other comments
or thoughts, let me know.
00:24:15.861 --> 00:24:19.711
Oh, three minutes...40. That's...
That's not that bad.
00:24:19.986 --> 00:24:21.022
Thanks.
00:24:21.022 --> 00:24:22.622
(audience clapping)
00:24:28.435 --> 00:24:30.006
Any questions?
00:24:31.020 --> 00:24:35.553
(audience 5) Does the Quick Statements
and the Query Service
00:24:35.553 --> 00:24:38.602
are automatically updated?
00:24:39.553 --> 00:24:42.345
Yes. So the idea is that
there will be somebody,
00:24:42.345 --> 00:24:43.500
at the moment, me,
00:24:43.500 --> 00:24:45.144
maintaining all of the horrible stuff
00:24:45.144 --> 00:24:47.290
that you don't have to behind the scenes.
00:24:47.657 --> 00:24:50.157
So kind of think of it like GitHub.com,
00:24:50.157 --> 00:24:54.058
but you don't have to know anything
about Git to use it. It's just all there.
00:24:55.241 --> 00:24:56.886
- [inaudible]
- Yeah, we'll get that.
00:24:56.886 --> 00:25:00.247
But any of those
big hosted solution things.
00:25:00.833 --> 00:25:03.263
- (audience 6) A feature request.
- Yes.
00:25:03.263 --> 00:25:05.479
Is there any-- In Scope
00:25:05.479 --> 00:25:09.799
do you have plans on making it
so you can easily import existing...
00:25:09.799 --> 00:25:12.549
- Wikidata...
- I have loads of plans.
00:25:12.549 --> 00:25:14.909
Like I want there to be a button
where you can just import
00:25:14.909 --> 00:25:17.348
another whole Wikibase and all of--yeah.
00:25:17.436 --> 00:25:20.723
There will, in the future list
that's really long. Yeah.
00:25:24.454 --> 00:25:28.406
(audience 7) I understand that it's...
you want to make it user-friendly
00:25:28.406 --> 00:25:32.242
but if I want to access
to the machine itself, can I do that?
00:25:32.242 --> 00:25:34.673
Nope.
(audience laughing)
00:25:37.006 --> 00:25:40.863
So again, like, in the longer term future,
there are possib...
00:25:40.863 --> 00:25:43.810
Everything's possible,
but at the moment, no.
00:25:45.156 --> 00:25:49.743
(audience 8) Two questions.
Is there a plan to have export tools
00:25:49.743 --> 00:25:52.791
so that you can export it
to your own Wikibase maybe at some point?
00:25:52.791 --> 00:25:53.824
- Yes.
- Great.
00:25:53.824 --> 00:25:55.565
And is this a business?
00:25:56.003 --> 00:25:58.164
I have no idea.
(audience laughing)
00:26:00.015 --> 00:26:01.545
Not currently.
00:26:05.754 --> 00:26:08.451
(audience 9) What if I stop
using it tomorrow,
00:26:08.451 --> 00:26:11.096
how long will the data be there?
00:26:11.181 --> 00:26:14.632
So my plan was at the end of WikidataCon
I was going to delete all of the data
00:26:14.632 --> 00:26:18.060
and there's a Wikibase Workshop
on a Sunday,
00:26:18.060 --> 00:26:21.671
and we will maybe be using this
for the Wikibase workshop
00:26:21.671 --> 00:26:23.801
so that everyone can have
their own Wikibase.
00:26:23.801 --> 00:26:27.366
And then, from that point,
I probably won't be deleting the data
00:26:27.366 --> 00:26:29.008
so it will all just stay there.
00:26:31.763 --> 00:26:32.923
(moderator) Question.
00:26:34.524 --> 00:26:36.114
(audience 10) It's two minutes...
00:26:36.175 --> 00:26:39.505
Alright, fine. I'll allow two more
questions if you talk quickly.
00:26:39.505 --> 00:26:41.550
(audience laughing)
00:26:47.370 --> 00:26:49.999
- Alright, good people.
- Thank you, Adam.
00:26:49.999 --> 00:26:52.418
Thank you for letting me test
my demo... I mean...
00:26:52.418 --> 00:26:54.640
I'm going to do it different.
(audience clapping)
00:26:59.512 --> 00:27:00.753
(moderator) Thank you.
00:27:00.753 --> 00:27:03.869
Now we have Dennis Diefenbach
presenting Q Answer.
00:27:04.489 --> 00:27:08.129
Hello, I'm Dennis Diefenbach,
I would like to present Q-Answer
00:27:08.129 --> 00:27:11.392
which is a question-answering system
on top of Wikidata.
00:27:11.392 --> 00:27:16.203
So, what we need are some questions
and this is the interface of QAnswer.
00:27:16.203 --> 00:27:23.460
For example, where is WikidataCon?
00:27:23.901 --> 00:27:25.975
Alright, I think it's written like this.
00:27:27.432 --> 00:27:32.432
2019... And we get this response
which is Berlin.
00:27:32.458 --> 00:27:38.425
So, other questions. For example,
"When did Wikidata start?"
00:27:38.430 --> 00:27:42.383
It started the 30 October 2012
so it's birthday is approaching.
00:27:44.079 --> 00:27:48.014
It is 6 years old,
so it will be their 7th birthday.
00:27:49.133 --> 00:27:51.583
Who is developing Wikidata?
00:27:51.583 --> 00:27:54.371
The Wikimedia Foundation
and Wikimedia Deutschland,
00:27:54.371 --> 00:27:55.988
so thank you very much to them.
00:27:57.013 --> 00:28:02.947
Something like museums in Berlin...
I don't know why this is not so...
00:28:05.494 --> 00:28:07.737
Only one museum... no, yeah, a few more.
00:28:09.167 --> 00:28:10.995
So, when you ask something like this,
00:28:10.995 --> 00:28:14.178
we allow the user
to explore the information
00:28:14.178 --> 00:28:16.308
with different aggregations.
00:28:16.308 --> 00:28:18.953
For example,
if there are many geo coordinates
00:28:18.953 --> 00:28:21.476
attached to the entities,
we will display a map.
00:28:21.476 --> 00:28:26.357
If there are many images attached to them,
we will display the images,
00:28:26.357 --> 00:28:29.057
and otherwise there is a list
where you can explore
00:28:29.057 --> 00:28:30.855
the different entities.
00:28:33.236 --> 00:28:35.605
You can ask something like
"Who is the mayor of Berlin,"
00:28:36.643 --> 00:28:40.201
"Give me politicians born in Berlin,"
and things like this.
00:28:40.201 --> 00:28:44.428
So you can both ask keyword questions
and foreign natural language questions.
00:28:45.171 --> 00:28:48.604
The whole data is coming from Wikidata
00:28:48.604 --> 00:28:55.346
so all entities which are in Wikidata
are queryable by this service.
00:28:55.869 --> 00:28:59.244
And the data is really all from Wikidata
00:28:59.244 --> 00:29:01.207
in the sense,
there are some Wikipedia snippets,
00:29:01.207 --> 00:29:04.851
there are images from Wikimedia Commons,
00:29:04.851 --> 00:29:07.644
but the rest is all Wikidata data.
00:29:08.760 --> 00:29:11.678
We can do this in several languages.
This is now in Chinese.
00:29:11.678 --> 00:29:15.441
I don't know what is written there
so do not ask me.
00:29:15.441 --> 00:29:19.893
We are currently supporting this languages
with more or less good quality
00:29:19.893 --> 00:29:22.094
because... yeah.
00:29:23.332 --> 00:29:27.563
So, how can this be useful
for the Wikidata community?
00:29:27.968 --> 00:29:30.052
I think there are different reasons.
00:29:30.052 --> 00:29:33.786
First of all, this thing helps you
to generate SPARQL queries
00:29:33.786 --> 00:29:37.043
and I know there are even some workshops
about how to use SPARQL.
00:29:37.043 --> 00:29:39.444
It's not a language that everyone speaks.
00:29:39.444 --> 00:29:45.147
So, if you ask something like
"a philosopher born before 1908,"
00:29:45.147 --> 00:29:48.697
to figure out, to construct
a SPARQL query like this could be tricky,
00:29:50.001 --> 00:29:54.257
In fact when you ask a question,
we generate many SPARQL queries
00:29:54.301 --> 00:29:57.486
and the first one is always the thing,
the SPARQL query where we think
00:29:57.486 --> 00:29:59.008
this is the good one.
00:29:59.017 --> 00:30:02.651
So, if you ask your question
and then you go on SPARQL list,
00:30:02.691 --> 00:30:06.468
then there is this button
for the Wikidata query service
00:30:06.468 --> 00:30:11.811
and you have the SPARQL query right there
and you will get the same result
00:30:11.811 --> 00:30:15.184
as you would get in the interface.
00:30:16.906 --> 00:30:19.289
Another thing where it could be useful for
00:30:19.289 --> 00:30:23.468
is for finding missing
contextual information.
00:30:23.468 --> 00:30:27.057
For example, if you ask for actors
in "The Lord of the Rings,"
00:30:27.057 --> 00:30:30.776
most of these entities
will have associated an image
00:30:30.776 --> 00:30:32.490
but not all of them.
00:30:32.490 --> 00:30:37.861
So here there is some missing metadata
that could be added.
00:30:37.861 --> 00:30:40.376
You could go to this entity at an image
00:30:40.376 --> 00:30:45.462
and then see first
that there is an image missing and so on.
00:30:46.457 --> 00:30:52.047
Another thing is that you could find
schema issues.
00:30:52.047 --> 00:30:55.424
For example, if you ask
"books by Andrea Camilleri,"
00:30:55.428 --> 00:30:57.711
which is a famous Italian writer,
00:30:57.711 --> 00:30:59.981
you would currently get
these three books.
00:30:59.981 --> 00:31:02.681
But he wrote many more.
He wrote more than 50.
00:31:02.681 --> 00:31:05.701
And so the question is,
are they not in Wikidata
00:31:05.701 --> 00:31:09.704
or is maybe my knowledge
not correctly currently like it is.
00:31:09.704 --> 00:31:12.804
And in this case, I know
there is another book from him,
00:31:12.804 --> 00:31:14.737
which is "Un mese con Montalbano."
00:31:14.737 --> 00:31:18.207
It has only an Italian label
so you can only search it in Italian.
00:31:18.207 --> 00:31:22.103
And if you go to this entity,
you will say that he has written it.
00:31:22.103 --> 00:31:27.504
It's a short story by Andrea Camilleri
and it's an instance of literary work,
00:31:27.504 --> 00:31:29.220
but it's not instance of book
00:31:29.220 --> 00:31:31.338
so that's the reason why
it doesn't appear.
00:31:31.338 --> 00:31:35.904
This is a way to track
where things are missing
00:31:35.904 --> 00:31:37.499
in the Wikidata model
00:31:37.499 --> 00:31:39.539
not as you would expect.
00:31:40.794 --> 00:31:42.968
Another reason is just to have fun.
00:31:43.588 --> 00:31:47.546
I imagine that many of you added
many Wikidata entities
00:31:47.546 --> 00:31:50.776
so just search for the ones
that you care most
00:31:50.776 --> 00:31:52.529
or you have edited yourself.
00:31:52.529 --> 00:31:56.893
So in this case, who developed
QAnswer, and that's it.
00:31:56.893 --> 00:32:00.226
For any other questions,
go to www.QAnswer.eu/qa
00:32:00.226 --> 00:32:03.575
and hopefully we'll find
an answer for you.
00:32:03.782 --> 00:32:05.649
(audience clapping)
00:32:13.994 --> 00:32:17.040
- Sorry.
- I'm just the dumbest person here.
00:32:17.530 --> 00:32:22.722
(audience 11) So I want to know
how is this kind of agnostic
00:32:22.752 --> 00:32:25.104
to Wikibase instance,
00:32:25.104 --> 00:32:29.020
or has it been tied to the exact
like property numbers
00:32:29.020 --> 00:32:31.054
and things in Wikidata?
00:32:31.054 --> 00:32:33.442
Has it learned in some way
or how was it set up?
00:32:33.442 --> 00:32:36.456
There is training data
and we rely on training data
00:32:36.456 --> 00:32:40.585
and this is also most of the cases
why you will not get good resutls.
00:32:40.585 --> 00:32:44.881
But we're training the system
by the simple yes and no answer.
00:32:44.881 --> 00:32:48.936
When you ask a question,
and we ask always for feedback, yes or no,
00:32:48.936 --> 00:32:51.899
and this feedback is used by
the machine learning algorithm.
00:32:51.899 --> 00:32:54.124
This is where machine learning
comes into play.
00:32:54.124 --> 00:32:58.600
But basically, we put up separate
Wikibase instances
00:32:58.600 --> 00:33:00.482
and we can plug this in.
00:33:00.482 --> 00:33:04.249
In fact, the system is agnostic
in the sense that it only wants RDF.
00:33:04.249 --> 00:33:06.618
And RDF, you have in each Wikibase,
00:33:06.618 --> 00:33:08.059
there are some few configurations
00:33:08.059 --> 00:33:10.432
but you can have this on top
of any Wikibase.
00:33:11.654 --> 00:33:13.039
(audience 11) Awesome.
00:33:23.573 --> 00:33:27.004
(audience 12) You mentioned that
it's being trained by yes/no answers.
00:33:27.073 --> 00:33:32.662
So I guess this is assuming that
the Wikidata instance is free of errors
00:33:32.722 --> 00:33:34.356
or is it also...?
00:33:34.356 --> 00:33:37.140
You assume that the Wikidata instances...
00:33:37.140 --> 00:33:40.731
(audience 12) I guess I'm asking, like,
are you distinguishing
00:33:40.731 --> 00:33:46.289
between source level errors
or misunderstanding the question
00:33:46.289 --> 00:33:50.856
versus a bad mapping, etc.?
00:33:51.706 --> 00:33:55.474
Generally, we assume that the data
in Wikidata is true.
00:33:55.474 --> 00:33:59.172
So if you click "no"
and the data in Wikidata would be false,
00:33:59.172 --> 00:34:03.023
then yeah... we would not catch
this difference.
00:34:03.023 --> 00:34:05.081
But sincerely, Wikidata quality
is very good,
00:34:05.081 --> 00:34:08.231
so I rarely have had this problem.
00:34:16.592 --> 00:34:22.068
(audience 12) Is this data available
as a dataset by any chance, sir?
00:34:22.209 --> 00:34:27.218
- What is... direct service?
- The... dataset of...
00:34:27.218 --> 00:34:30.803
"is this answer correct
versus the query versus the answer?"
00:34:30.872 --> 00:34:33.340
Is that something you're publishing
as part of this?
00:34:33.340 --> 00:34:38.040
- The training data that you've...
- We published the training data.
00:34:38.040 --> 00:34:43.423
We published some old training data
but no, just a--
00:34:44.573 --> 00:34:47.313
There is a question there.
I don't know if we have still time.
00:34:51.215 --> 00:34:55.104
(audience 13) Maybe I just missed this
but is it running on a live,
00:34:55.104 --> 00:34:57.080
like the Live Query Service,
00:34:57.080 --> 00:34:59.393
or is it running on
some static dump you loaded
00:34:59.393 --> 00:35:01.690
or where is the data source
for Wikidata?
00:35:01.784 --> 00:35:07.014
Yes. The problem is
to apply this technology,
00:35:07.014 --> 00:35:08.414
you need a local dump.
00:35:08.414 --> 00:35:10.673
Because we do not rely only
on the SPARQL end point,
00:35:10.673 --> 00:35:12.873
we rely on special indexes.
00:35:12.873 --> 00:35:16.192
So, we are currently loading
the Wikidata dump.
00:35:16.192 --> 00:35:18.699
We are updating this every two weeks.
00:35:18.699 --> 00:35:20.756
We would like to do it more often,
00:35:20.756 --> 00:35:23.823
in fact we would like to get the difs
for each day, for example,
00:35:23.823 --> 00:35:25.271
to put them in our index.
00:35:25.271 --> 00:35:28.719
But unfortunately, right now,
the Wikidata dumps are released
00:35:28.719 --> 00:35:31.753
only once every week.
00:35:31.753 --> 00:35:35.150
So, we cannot be faster than that
and we also need some time
00:35:35.150 --> 00:35:39.073
to re-index the data,
so it takes one or two days.
00:35:39.073 --> 00:35:41.833
So we are always behind. Yeah.
00:35:48.202 --> 00:35:49.780
(moderator) Any more?
00:35:50.430 --> 00:35:53.268
- Okay, thank you very much.
- Thank you all very much.
00:35:53.547 --> 00:35:54.966
(audience clapping)
00:35:57.266 --> 00:36:00.165
(moderator) And now last, we have
Eugene Alvin Villar,
00:36:00.165 --> 00:36:02.049
talking about Panandâ.
00:36:10.630 --> 00:36:12.637
Good afternoon,
my name is Eugene Alvin Villar
00:36:12.637 --> 00:36:15.297
and I'm from the Philippines,
and I'll be talking about Panandâ:
00:36:15.297 --> 00:36:18.185
a mobile app powered by Wikidata.
00:36:18.862 --> 00:36:21.678
This is a follow-up to my lightning talk
that I presented two years ago
00:36:21.678 --> 00:36:25.004
at WikidataCon 2017
together with Carlo Moskito.
00:36:25.004 --> 00:36:26.557
You can download the slides
00:36:26.557 --> 00:36:28.727
and there's a link
to that presentation there.
00:36:28.727 --> 00:36:30.868
I'll give you a bit of a background.
00:36:30.868 --> 00:36:33.471
Wiki Society of the Philippines,
formerly, Wikimedia Philippines,
00:36:33.471 --> 00:36:37.477
had a series of projects related
to Philippine heritage and history.
00:36:37.477 --> 00:36:41.705
So we have the usual photo contests,
Wikipedia Takes Manila,
00:36:41.705 --> 00:36:43.238
Wiki Loves Monuments,
00:36:43.238 --> 00:36:46.657
and then our media project
was Cultural Heritage Mapping Project
00:36:46.657 --> 00:36:49.094
back in 2014-2015.
00:36:50.044 --> 00:36:53.039
In that project, we trained volunteers
to edit articles
00:36:53.039 --> 00:36:54.389
related to cultural heritage.
00:36:54.914 --> 00:36:59.032
This is our biggest,
and most successful project that we had.
00:36:59.032 --> 00:37:03.037
794 articles were created or improved,
including 37 "Did You Knows"
00:37:03.037 --> 00:37:05.238
and 4 "Good Articles,"
00:37:05.308 --> 00:37:08.688
and more than 5,000 images were uploaded
to Commons.
00:37:08.688 --> 00:37:11.039
As a result of that, we then launched
00:37:11.039 --> 00:37:13.689
the Encyclopedia
of Philippine Heritage program
00:37:13.689 --> 00:37:18.444
in order to expand the scope
and also include Wikidata in the scope.
00:37:18.444 --> 00:37:21.695
Here's the Core Team: myself,
Carlo and Roel.
00:37:21.695 --> 00:37:26.870
Our first pilot project was to document
the country's historical markers
00:37:26.870 --> 00:37:29.153
in Wikidata and Commons,
00:37:29.153 --> 00:37:34.053
starting with those created by
our historical national agency, NHCP.
00:37:34.053 --> 00:37:38.904
For example, they installed a marker
for our national hero, here in Berlin,
00:37:38.904 --> 00:37:41.421
so there's no Wikidata page
for that marker
00:37:41.421 --> 00:37:45.102
and a collection of photos of that marker
in Commons.
00:37:46.166 --> 00:37:50.397
Unfortunately, the government agency
does not keep a good database
00:37:50.397 --> 00:37:53.480
up-to-date or complete of their markers,
00:37:53.480 --> 00:37:58.004
so we have to painstakingly input these
to Wikidata manually.
00:37:58.004 --> 00:38:02.772
After careful research and confirmation,
here's a graph of the number of markers
00:38:02.772 --> 00:38:07.466
that we've added to Wikidata over time,
over the past three years.
00:38:07.466 --> 00:38:11.230
And we've developed
this Historical Markers Map web app
00:38:11.230 --> 00:38:15.289
that lets users view
these markers on a map,
00:38:15.289 --> 00:38:21.051
so we can browse it as a list,
view a good visualization of the markers
00:38:21.051 --> 00:38:23.253
with information and inscriptions.
00:38:23.253 --> 00:38:28.885
All of this is powered by Live Query
from Wikidata Query Service.
00:38:29.732 --> 00:38:32.005
There's the link
if you want to play around with it.
00:38:33.349 --> 00:38:37.428
And so we developed
a mobile app for this one.
00:38:37.428 --> 00:38:42.117
To better publicize our project,
I developed the Panandâ
00:38:42.117 --> 00:38:45.434
which is Tagalog for "marker",
as an android app,
00:38:45.434 --> 00:38:48.393
that was published back in 2018,
00:38:48.393 --> 00:38:53.934
and I'll publish the IOS version
sometime in the future, hopefully.
00:38:54.868 --> 00:38:57.892
I'd like to demo the app
but we have no time,
00:38:57.892 --> 00:39:00.935
so here are some
of the features of the app.
00:39:00.935 --> 00:39:04.586
There's a Map and a List view,
with text search,
00:39:04.586 --> 00:39:07.452
so you can drill down as needed.
00:39:07.452 --> 00:39:10.169
You can filter by region or by distance,
00:39:10.169 --> 00:39:12.193
and whether you have marked
these markers,
00:39:12.193 --> 00:39:15.499
as either you have visited them
or you'd like to bookmark them
00:39:15.499 --> 00:39:16.949
for future visits.
00:39:16.949 --> 00:39:19.482
Then you can use your GPS
on your mobile phone
00:39:19.482 --> 00:39:21.860
to use for distance filtering.
00:39:21.860 --> 00:39:26.765
For example, if I want markers
that are near me, you can do that.
00:39:26.765 --> 00:39:30.918
And when you click on the Details page,
you can see the same thing,
00:39:30.918 --> 00:39:35.850
photos from Commons,
inscription about the marker,
00:39:35.850 --> 00:39:40.484
how to find the marker,
its location and address, etc.
00:39:41.601 --> 00:39:45.993
And one thing that's unique for this app
is you can, again, visit
00:39:46.011 --> 00:39:50.407
or put a bookmark of these,
so on the map or on the list,
00:39:50.407 --> 00:39:51.692
or on the Details page,
00:39:51.692 --> 00:39:54.891
you can just tap on those buttons
and say that you've visited them,
00:39:54.891 --> 00:39:58.520
or you'd like to bookmark them
for future visits.
00:39:58.520 --> 00:40:03.527
And my app has been covered by the press
and given recognition,
00:40:03.527 --> 00:40:06.743
so plenty of local press articles.
00:40:06.743 --> 00:40:11.281
Recently, it was selected
as one of the Top 5 finalists
00:40:11.281 --> 00:40:15.247
for the Android Masters competition
in the App for Social Good category.
00:40:15.247 --> 00:40:17.351
The final event will be next month.
00:40:17.351 --> 00:40:18.999
Hopefully, we'll win.
00:40:20.380 --> 00:40:22.378
Okay, so some behind the scenes.
00:40:22.378 --> 00:40:25.477
How did I develop this app?
00:40:25.477 --> 00:40:28.578
Panandâ is actually a hybrid app,
it's not native.
00:40:28.578 --> 00:40:30.745
Basically it's just a web app
packaged as a mobile app
00:40:30.745 --> 00:40:32.518
using Apache Cordova.
00:40:32.518 --> 00:40:34.026
That reduces development time
00:40:34.026 --> 00:40:36.181
because I don't have to learn
a different language.
00:40:36.181 --> 00:40:37.769
I know JavaScript, HTML.
00:40:37.879 --> 00:40:42.131
It's cross-platform, allows code reuse
from the Historical Markers Map.
00:40:42.385 --> 00:40:46.311
And the app is also FIN Open Source.
under the MIT license.
00:40:46.311 --> 00:40:49.429
So there's the GitHub repository
over there.
00:40:50.469 --> 00:40:53.624
The challenge is
the apps data is not live.
00:40:54.750 --> 00:40:56.820
Because if you query the data live,
00:40:56.843 --> 00:41:00.638
it means you pulling around half
a megabyte of compressed JSON every time
00:41:00.638 --> 00:41:03.594
which is not friendly
for those on mobile data,
00:41:03.594 --> 00:41:06.723
incurs too much delay when starting
the app,
00:41:06.723 --> 00:41:13.097
and if there are any errors in Wikidata,
that may result in poor user experience.
00:41:14.253 --> 00:41:18.046
So instead, what I did was
the app is updated every few months
00:41:18.046 --> 00:41:20.468
with fresh data, compiled using
a Perl script
00:41:20.468 --> 00:41:23.037
that queries Wikidata Query Service,
00:41:23.037 --> 00:41:25.678
and this script also does
some data validation
00:41:25.678 --> 00:41:30.944
to highlight consistency or schema errors,
so that allows fixes before updates
00:41:30.944 --> 00:41:34.735
in order to provide a good experience
for the mobile user.
00:41:35.174 --> 00:41:39.274
And here's the... if you're tech-oriented,
here's the more or less,
00:41:39.274 --> 00:41:41.644
the technologies that I'm using.
00:41:41.644 --> 00:41:43.976
So a bunch of JavaScript libraries.
00:41:43.976 --> 00:41:46.287
Here's the first script
that queries Wikidata,
00:41:46.287 --> 00:41:48.598
some Cordova plug-ins,
00:41:48.598 --> 00:41:53.035
and building it using Cordova
and then publishing this app.
00:41:53.763 --> 00:41:55.586
And that's it.
00:41:55.748 --> 00:41:58.164
(audience clapping)
00:42:01.800 --> 00:42:04.072
(moderator) I hope you win.
Alright, questions.
00:42:16.286 --> 00:42:17.990
(audience 14) Sorry if I missed this.
00:42:17.990 --> 00:42:21.317
Are you opening your code
so the people can adapt your app
00:42:21.317 --> 00:42:24.501
and do it for other cities?
00:42:24.501 --> 00:42:28.516
Yes, as I've mentioned,
the app is free and open source,
00:42:28.516 --> 00:42:31.095
- (audience 14) But where is it?
- There's the GitHub repository.
00:42:31.095 --> 00:42:33.610
You can download the slides,
and there's a link
00:42:33.610 --> 00:42:36.841
in one of the previous slides
to the repository.
00:42:36.841 --> 00:42:38.732
(audience 14) Okay. Can you put it?
00:42:42.392 --> 00:42:43.747
Yeah, at the bottom.
00:42:46.577 --> 00:42:49.222
(audience 15) Hi. Sorry, maybe
I also missed this,
00:42:49.222 --> 00:42:51.628
but how do you check for a schema errors?
00:42:53.055 --> 00:42:56.007
Basically, we have a Wikiproject
on Wikidata,
00:42:56.106 --> 00:43:02.425
so we try to put the other guidelines
on how to model these markers correctly.
00:43:02.425 --> 00:43:05.190
Although it's not updated right now.
00:43:06.197 --> 00:43:09.023
As far as I know, we're the only country
00:43:09.023 --> 00:43:12.874
that's currently modeling these
in Wikidata.
00:43:13.930 --> 00:43:20.152
There's also an effort
to add [inaudible]
00:43:20.161 --> 00:43:22.411
in Wikidata,
00:43:22.474 --> 00:43:25.705
but I think that's
a different thing altogether.
00:43:34.056 --> 00:43:35.895
(audience 16) So I guess this may be part
00:43:35.895 --> 00:43:37.725
of this Wikiproject you just described,
00:43:37.725 --> 00:43:42.800
but for the consistency checks,
have you considered moving those
00:43:42.800 --> 00:43:46.743
into like complex schema constraints
that then can be flagged
00:43:46.743 --> 00:43:50.583
on the Wikidata side for
what there is to fix on there?
00:43:52.930 --> 00:43:55.547
I'm actually interested in seeing
if I can do, for example,
00:43:55.598 --> 00:44:00.296
shape expressions, so that, yeah,
we can do those things.
00:44:04.256 --> 00:44:06.776
(moderator) At this point,
we have quite a few minutes left.
00:44:06.776 --> 00:44:09.026
The speakers did very well,
so if Erica is okay with it,
00:44:09.026 --> 00:44:11.238
I'm also going to allow
some time for questions,
00:44:11.238 --> 00:44:13.407
still about this presentation,
but also about Mbabel,
00:44:13.407 --> 00:44:15.498
if anyone wants to jump in
with something there,
00:44:15.498 --> 00:44:17.318
either presentation is fair game.
00:44:22.790 --> 00:44:25.639
Unless like me, you're all so dazzled
that you just want to go to snacks
00:44:25.639 --> 00:44:27.955
and think about it.
(audience giggles)
00:44:29.308 --> 00:44:31.179
- (moderator) You know...
- Yeah.
00:44:31.953 --> 00:44:34.491
(audience 17) I will always have
questions about everything.
00:44:34.491 --> 00:44:37.642
So, I came in late for the Mbabel tool.
00:44:37.642 --> 00:44:40.350
But I was looking through
and I saw there's a number of templates,
00:44:40.350 --> 00:44:43.232
and I was wondering
if there's a place to contribute
00:44:43.232 --> 00:44:45.564
to adding more templates
for different types
00:44:45.564 --> 00:44:47.620
or different languages and the like?
00:44:50.497 --> 00:44:53.683
(Erica) So for now, we're developing
those narrative templates
00:44:53.683 --> 00:44:55.566
on Portuguese Wikipedia.
00:44:55.566 --> 00:44:57.856
I can show you if you like.
00:44:57.856 --> 00:45:02.051
We're inserting those templates
on English Wikipedia too.
00:45:02.051 --> 00:45:07.017
It's not complicated to do
but we have to expand for other languages.
00:45:07.017 --> 00:45:08.236
- French?
- French.
00:45:08.236 --> 00:45:10.465
- Yes.
- French and German already have.
00:45:10.465 --> 00:45:11.465
(laughing)
00:45:12.002 --> 00:45:13.018
Yeah.
00:45:15.755 --> 00:45:18.287
(inaudible chatter)
00:45:21.756 --> 00:45:24.446
(audience 18) I also have a question
about Mbabel,
00:45:24.446 --> 00:45:27.676
which is, is this really just templates?
00:45:27.676 --> 00:45:33.893
Is this based on the LUA scripting?
Is that all? Wow. Okay.
00:45:33.956 --> 00:45:37.404
Yeah, so it's very deployable. Okay. Cool.
00:45:38.102 --> 00:45:40.199
(moderator) Just to catch that
for the live stream,
00:45:40.199 --> 00:45:42.745
the answer was an emphatic nod
of the head, and a yes.
00:45:42.915 --> 00:45:44.648
(audience laughing)
00:45:44.754 --> 00:45:47.203
- (Erica) Super simple.
- (moderator) Super simple.
00:45:47.745 --> 00:45:49.819
(audience 19) Yeah.
I would also like to ask.
00:45:49.819 --> 00:45:53.386
Sorry I haven't delved
into Mbabel earlier.
00:45:53.386 --> 00:45:57.018
I'm wondering, you're working also
with the links, the red links.
00:45:57.018 --> 00:46:00.052
Are you adding some code there?
00:46:03.987 --> 00:46:07.970
- (Erica) For the lists?
- Wherever the link comes from...
00:46:07.970 --> 00:46:11.595
(audience 19) The architecture.
Maybe I will have to look into it.
00:46:11.595 --> 00:46:13.355
(Erica) I'll show you later.
00:46:20.506 --> 00:46:23.221
(moderator) Alright. You're all ready
for snack break, I can tell.
00:46:23.221 --> 00:46:24.456
So let's wrap it up.
00:46:24.456 --> 00:46:26.429
But our kind speakers,
I'm sure will stick around
00:46:26.429 --> 00:46:27.958
if you have questions for them.
00:46:27.958 --> 00:46:31.179
Please join me in giving... first of all
we didn't give a round of applause yet.
00:46:31.179 --> 00:46:33.221
I can tell you're interested in doing so.
00:46:33.221 --> 00:46:34.886
(audience clapping)