1
00:00:05,973 --> 00:00:07,908
Hi, guys! Can everybody hear me?
2
00:00:09,170 --> 00:00:11,898
So, hi! Nice to meet you all.
I'm Erica Azzellini.
3
00:00:11,898 --> 00:00:14,606
I'm one of the Wikimovement
Brazil's Liaison,
4
00:00:14,606 --> 00:00:17,829
and this is my first international
Wikimedia event,
5
00:00:17,829 --> 00:00:21,023
so I'm super excited to be here
and I hopefully,
6
00:00:21,023 --> 00:00:24,311
will share something interesting for you
all here on this lengthy talk.
7
00:00:25,247 --> 00:00:30,441
So this work starts with research
that I was developing in Brazil,
8
00:00:30,441 --> 00:00:34,219
Computational Journalism
and Structured Narratives with Wikidata.
9
00:00:34,276 --> 00:00:35,958
So in journalism,
10
00:00:35,958 --> 00:00:39,616
they're using some natural language
generation software
11
00:00:39,616 --> 00:00:41,418
for automating news
12
00:00:41,418 --> 00:00:46,535
for news that have
quite similar narrative structure.
13
00:00:46,535 --> 00:00:51,600
And we developed this concept here
of structured narratives,
14
00:00:51,600 --> 00:00:54,548
thinking about this practice
on computational journalism,
15
00:00:54,548 --> 00:00:58,361
that is the development of verbal text,
understandable by humans,
16
00:00:58,361 --> 00:01:01,274
automated from predetermined
arrangements that process information
17
00:01:01,274 --> 00:01:05,395
from structured databases,
which looks like that,
18
00:01:05,395 --> 00:01:10,043
the Wikimedia universe
and on this tool that we developed.
19
00:01:10,043 --> 00:01:13,555
So, when I'm talking about verbal text
understandable by humans,
20
00:01:13,555 --> 00:01:15,808
I'm talking about Wikipedia entries.
21
00:01:15,808 --> 00:01:17,778
When I'm talking about
structured databases,
22
00:01:17,778 --> 00:01:20,017
of course, I'm talking about
Wikidata here.
23
00:01:20,017 --> 00:01:22,777
And predetermined arrangement,
I'm talking about Mbabel,
24
00:01:22,777 --> 00:01:24,271
that is this tool.
25
00:01:25,467 --> 00:01:31,216
The Mbabel tool was inspired by a template
by user Pharos, right here in front of me,
26
00:01:31,279 --> 00:01:33,356
thank you very much,
27
00:01:33,356 --> 00:01:39,114
and it was developed with Ederporto
that is right here too,
28
00:01:39,114 --> 00:01:40,974
the brilliant Ederporto.
29
00:01:42,599 --> 00:01:44,498
We developed this tool
30
00:01:44,498 --> 00:01:47,780
that automatically generates
Wikipedia entries
31
00:01:47,780 --> 00:01:50,600
based on information from Wikidata.
32
00:01:53,189 --> 00:01:58,130
We actually do some thematic templates
33
00:01:58,130 --> 00:02:01,152
that are created on the Wikidata module,
34
00:02:01,573 --> 00:02:03,716
WikidataIB Module,
35
00:02:03,716 --> 00:02:07,835
and these templates are pre-determined,
generic and editable templates
36
00:02:07,835 --> 00:02:09,677
for various article themes.
37
00:02:09,677 --> 00:02:15,411
We realized that many Wikipedia entries
had a quite similar structured narrative
38
00:02:15,411 --> 00:02:18,922
so we could create a tool
that automatically generates that
39
00:02:18,922 --> 00:02:21,598
for many Wikidata items.
40
00:02:24,207 --> 00:02:28,571
Until now we have templates for museums,
works of art, books, films,
41
00:02:28,571 --> 00:02:31,265
journals, earthquakes, libraries,
archives,
42
00:02:31,265 --> 00:02:34,855
and Brazilian municipal
and state elections, and growing.
43
00:02:34,855 --> 00:02:38,984
So, everybody here is able to contribute
and create new templates.
44
00:02:38,984 --> 00:02:43,508
Each narrative template includes
an introduction, Wikidata infobox,
45
00:02:43,508 --> 00:02:46,158
section suggestions for the users,
46
00:02:46,158 --> 00:02:50,499
content tables or lists with Listeria,
depending on the case,
47
00:02:50,499 --> 00:02:53,713
references and categories,
and of course the sentences,
48
00:02:53,713 --> 00:02:55,776
that are created
with the Wikidata information.
49
00:02:55,776 --> 00:02:58,642
I'm gonna show you in a sec
an example of that.
50
00:03:00,137 --> 00:03:05,749
It's an integration with Wikipedia,
integration with Wikidata,
51
00:03:05,749 --> 00:03:08,760
so the more properties properly filled
on Wikidata,
52
00:03:08,760 --> 00:03:12,311
the more text entries you'll get
on your article stub.
53
00:03:12,857 --> 00:03:15,623
That's very important to highlight here.
54
00:03:16,343 --> 00:03:18,969
Structuring this Wikidata
can get more complex
55
00:03:18,969 --> 00:03:22,017
as I'm going to show you
on the election projects that we've made.
56
00:03:22,017 --> 00:03:26,552
So I'm going to let you hear this
Wikidata Lab XIV for you
57
00:03:26,552 --> 00:03:29,471
after this lengthy talk
58
00:03:29,471 --> 00:03:32,259
that is very brief,
so you'll be able to choose
59
00:03:32,259 --> 00:03:34,554
on the work that we've been doing
on structuring Wikidata
60
00:03:34,554 --> 00:03:36,005
for this purpose too.
61
00:03:37,272 --> 00:03:39,725
We have this challenge to build
a narrative template
62
00:03:39,725 --> 00:03:44,383
that is generic enough
to cover different Wikidata items
63
00:03:44,383 --> 00:03:46,347
and to suppress the gender
64
00:03:46,347 --> 00:03:50,359
and the number of difficulties
of languages,
65
00:03:52,054 --> 00:03:54,252
and still sounding natural for the user
66
00:03:54,252 --> 00:03:59,252
because we don't want to sound like
it doesn't click for the user
67
00:03:59,252 --> 00:04:00,546
to edit after that.
68
00:04:01,956 --> 00:04:07,625
This is how the Mbabel looks like
on the bottom form.
69
00:04:07,625 --> 00:04:14,507
You just have insert the item number there
and call the desired template
70
00:04:14,507 --> 00:04:21,673
and then you have article to edit
and expand, and everything.
71
00:04:22,135 --> 00:04:26,856
So, more importantly, why we did it?
Not because it's cool to develop
72
00:04:26,856 --> 00:04:30,922
things here in Wikidata,
we know, we all hear, know about it.
73
00:04:30,922 --> 00:04:36,178
But we are experimenting this integration
from Wikidata to Wikipedia
74
00:04:36,178 --> 00:04:39,226
and we want to focus
on meaningful individual contributions.
75
00:04:39,226 --> 00:04:42,608
So we've been working
on education programs
76
00:04:42,608 --> 00:04:45,067
and we want the students to feel the value
77
00:04:45,067 --> 00:04:47,280
of their entries too, but not only--
78
00:04:47,280 --> 00:04:49,405
Oh, five minutes only,
Geez, I'm gonna rush here.
79
00:04:49,405 --> 00:04:50,599
(laughing)
80
00:04:50,794 --> 00:04:54,160
And we want you all to make tasks
for users in general,
81
00:04:54,270 --> 00:04:57,801
especially on tables
and this kind of content
82
00:04:57,801 --> 00:04:59,988
that it's a bit of a rush to do.
83
00:05:02,456 --> 00:05:05,523
And we're working on this concept
of abstract Wikipedia.
84
00:05:05,523 --> 00:05:09,269
Denny Vrandečić wrote an article
super interesting about it
85
00:05:09,269 --> 00:05:11,500
so I linked here too.
86
00:05:11,500 --> 00:05:14,792
And we also want to now support
small language communities
87
00:05:14,792 --> 00:05:17,845
to fill the lack of content there.
88
00:05:18,784 --> 00:05:23,885
This is an example of how we've been using
this Mbabel tool for GLAM
89
00:05:23,885 --> 00:05:25,748
and education programs,
90
00:05:25,748 --> 00:05:29,861
and I showed you earlier
the bottom form of the Mbabel tool
91
00:05:29,861 --> 00:05:34,264
but also we can make red links
that aren't exactly empty.
92
00:05:34,264 --> 00:05:35,931
So you click on this red link
93
00:05:35,931 --> 00:05:38,862
and you automatically have
this article draft
94
00:05:38,862 --> 00:05:41,660
on your user page to edit.
95
00:05:42,964 --> 00:05:48,762
And I'm going to briefly talk about it
because I only have some minutes more.
96
00:05:50,009 --> 00:05:51,356
On educational projects,
97
00:05:51,356 --> 00:05:56,799
we've been doing this with elections
in Brazil for journalism students.
98
00:05:56,799 --> 00:06:01,993
We have the experience
with the [inaudible] students
99
00:06:02,087 --> 00:06:05,314
with user Joalpe--
he's not here right now,
100
00:06:05,314 --> 00:06:07,867
but we all know him, I think.
101
00:06:07,867 --> 00:06:11,930
And we realize that we have the data
about Brazilian elections
102
00:06:11,930 --> 00:06:14,748
but we don't have media cover on it.
103
00:06:15,049 --> 00:06:18,249
So we were lacking also
Wikipedia entries on it.
104
00:06:19,029 --> 00:06:23,000
How do we insert this meaningful
information on Wikipedia
105
00:06:23,000 --> 00:06:24,672
that people really access?
106
00:06:24,672 --> 00:06:27,989
Next year we're going
to have some election,
107
00:06:27,989 --> 00:06:30,710
people are going to look for
this kind of information on Wikipedia
108
00:06:30,710 --> 00:06:32,433
and they simply won't find it.
109
00:06:32,433 --> 00:06:35,726
So this tool looks quite useful
for this purpose
110
00:06:35,726 --> 00:06:40,214
and the students were introduced,
not only to Wikipedia,
111
00:06:40,214 --> 00:06:42,701
but also to Wikidata.
112
00:06:42,701 --> 00:06:46,575
Actually, they were introduced
to Wikipedia with Wikidata,
113
00:06:46,575 --> 00:06:50,675
which is an experience super interesting
and we had a lot of fun,
114
00:06:50,675 --> 00:06:52,823
and it was quite challenging
to organize all that.
115
00:06:52,823 --> 00:06:54,513
We can talk about it later too.
116
00:06:54,979 --> 00:06:58,582
And they also added the background
and the analysis sections
117
00:06:58,582 --> 00:07:01,663
on these elections articles,
118
00:07:01,663 --> 00:07:05,336
because we don't want them
to just simply automate the content there.
119
00:07:05,336 --> 00:07:06,660
We can do better.
120
00:07:06,660 --> 00:07:09,247
So this is the example
I'm going to show you.
121
00:07:09,247 --> 00:07:13,106
This is from a municipal election
in Brazil.
122
00:07:15,603 --> 00:07:17,121
Two minutes... oh my!
123
00:07:18,577 --> 00:07:23,268
This example here was entirely created
with the Mbabel tool.
124
00:07:23,268 --> 00:07:29,496
You have here this introduction text.
It really sounds natural for the reader.
125
00:07:29,496 --> 00:07:32,165
The Wikidata infobox here--
126
00:07:32,165 --> 00:07:34,907
it's a masterpiece
of Ederporto right there.
127
00:07:34,907 --> 00:07:36,769
(laughter)
128
00:07:37,438 --> 00:07:42,456
And we have here the tables with the
election results for each position.
129
00:07:42,456 --> 00:07:46,415
And we also have these results here
on the textual form too,
130
00:07:46,415 --> 00:07:51,767
so it really looks like an article
that was made, that was handcrafted.
131
00:07:53,893 --> 00:07:57,814
The references here were also made
with the Mbabel tool
132
00:07:57,814 --> 00:08:01,393
and we used identifiers
to build these references here
133
00:08:01,393 --> 00:08:03,167
and the categories too.
134
00:08:10,726 --> 00:08:14,999
So, to wrap things up here,
it is still a work in progress,
135
00:08:14,999 --> 00:08:19,326
and we have some challenges
on outreach and technical
136
00:08:19,326 --> 00:08:22,999
to bring Mbabel
to other language communities,
137
00:08:22,999 --> 00:08:24,844
especially the smaller ones,
138
00:08:24,844 --> 00:08:27,210
and how do we support those tools
139
00:08:27,210 --> 00:08:29,819
on lower resource
language communities too.
140
00:08:29,819 --> 00:08:33,991
And finally, is it possible
to create an Mbabel
141
00:08:33,991 --> 00:08:36,261
that overcomes language barriers?
142
00:08:36,261 --> 00:08:39,740
I think that's a question
very interesting for the conference
143
00:08:39,740 --> 00:08:43,835
and hopefully we can figure
that out together.
144
00:08:44,818 --> 00:08:49,799
So, thank you very much,
and look for the Mbabel poster downstairs
145
00:08:49,799 --> 00:08:53,615
if you like to have all this information
wrapped up, okay?
146
00:08:53,615 --> 00:08:55,038
Thank you.
147
00:08:55,288 --> 00:08:57,564
(audience clapping)
148
00:09:00,311 --> 00:09:02,778
(moderator) I'm afraid
we're a little too short for questions
149
00:09:02,778 --> 00:09:05,783
but yes, Erica, as she said,
has a poster and is very friendly.
150
00:09:05,783 --> 00:09:07,518
So I'm sure you can talk to her
afterwards,
151
00:09:07,518 --> 00:09:09,389
and if there's time at the end,
I'll allow it.
152
00:09:09,389 --> 00:09:12,131
But in the meantime,
I'd like to bring up our next speaker...
153
00:09:12,237 --> 00:09:13,611
Thank you.
154
00:09:15,549 --> 00:09:17,140
(audience chattering)
155
00:09:23,058 --> 00:09:27,016
Next we've got Yolanda Gil,
talking about Wikidata and Geosciences.
156
00:09:27,908 --> 00:09:29,031
Thank you.
157
00:09:29,031 --> 00:09:31,624
I come from the University
of Southern California
158
00:09:31,624 --> 00:09:35,164
and I've been working with
Semantic Technologies for a long time.
159
00:09:35,164 --> 00:09:37,894
I want to talk about geosciences
in particular,
160
00:09:37,894 --> 00:09:41,225
where this idea of crowd-sourcing
from the community is very important.
161
00:09:41,791 --> 00:09:45,033
So I'll give you a sense
that individual scientists,
162
00:09:45,033 --> 00:09:47,070
most of them in colleges,
163
00:09:47,070 --> 00:09:50,085
collect their own data
for their particular project.
164
00:09:50,085 --> 00:09:51,932
They describe it in their own way.
165
00:09:51,932 --> 00:09:55,352
They use their own properties,
their own metadata characteristics.
166
00:09:55,352 --> 00:09:58,560
This is an example
of some collaborators of mine
167
00:09:58,560 --> 00:10:00,124
that collect data from a river.
168
00:10:00,124 --> 00:10:02,091
They have their own sensors,
their own robots,
169
00:10:02,091 --> 00:10:05,339
and they study the water quality.
170
00:10:05,339 --> 00:10:11,423
I'm going to talk today about an effort
that we did to crowdsource metadata
171
00:10:11,423 --> 00:10:14,712
for a community that works
in paleoclimate.
172
00:10:14,712 --> 00:10:17,747
The article just came out
so it's in the slides if you're curious,
173
00:10:17,747 --> 00:10:20,619
but it's a pretty large community
that work together
174
00:10:20,619 --> 00:10:24,042
to integrate data more efficiently
through crowdsourcing.
175
00:10:24,042 --> 00:10:28,631
So, if you've heard of the
hockey stick graphics for climate,
176
00:10:28,631 --> 00:10:31,680
this is the community that does this.
177
00:10:31,680 --> 00:10:34,520
This is a study for climate
in the last 200 years,
178
00:10:34,520 --> 00:10:38,188
and it takes them literally many years
to look at data
179
00:10:38,188 --> 00:10:39,618
from different parts of the globe.
180
00:10:39,618 --> 00:10:42,607
Each dataset is collected by
a different investigator.
181
00:10:42,699 --> 00:10:44,433
The data is very, very different,
182
00:10:44,433 --> 00:10:47,017
so it takes them a long time
to put together
183
00:10:47,017 --> 00:10:49,230
these global studies of climate,
184
00:10:49,230 --> 00:10:51,665
and our goal is to make that
more efficient.
185
00:10:51,665 --> 00:10:53,690
So, I've done a lot of work
over the years.
186
00:10:53,690 --> 00:10:56,585
Going back to 2005, we used to call it,
187
00:10:56,585 --> 00:10:59,615
"Knowledge Collection from Web Volunteers"
188
00:10:59,615 --> 00:11:02,236
or from netizens at that time.
189
00:11:02,236 --> 00:11:04,267
We had a system called "Learner."
190
00:11:04,267 --> 00:11:07,048
It collected 700,000 common sense,
191
00:11:07,048 --> 00:11:09,368
common knowledge statements
about the world.
192
00:11:09,368 --> 00:11:11,367
We did a lot of different techniques.
193
00:11:11,367 --> 00:11:15,333
The forms that we did
to extract knowledge from volunteers
194
00:11:15,333 --> 00:11:19,136
really fit the knowledge models,
the data models that we used
195
00:11:19,136 --> 00:11:21,381
and the properties that we wanted to use.
196
00:11:21,381 --> 00:11:25,051
I worked with Denny
in the system called "Shortipedia"
197
00:11:25,051 --> 00:11:27,259
when he was a Post Doc at ISI,
198
00:11:27,259 --> 00:11:31,946
looking at keeping track
of the prominence of the assertions,
199
00:11:31,946 --> 00:11:35,129
and we started to build
on Semantic Media Wiki software.
200
00:11:35,129 --> 00:11:37,113
So everything that
I'm going to describe today
201
00:11:37,113 --> 00:11:38,936
builds on that software,
202
00:11:38,936 --> 00:11:41,117
but I think that now we have Wikibase,
203
00:11:41,117 --> 00:11:43,676
we'll be starting to work more
on Wikibase.
204
00:11:43,676 --> 00:11:48,935
So the LinkedEarth is the project
where we work with paleoclimate scientists
205
00:11:48,935 --> 00:11:50,636
to crowdsource the metadata,
206
00:11:50,636 --> 00:11:54,328
and seeing the title that we said,
"controlled crowdsourcing."
207
00:11:54,328 --> 00:11:57,101
So we found a nice niche
208
00:11:57,101 --> 00:12:00,538
where we could let them create
new properties
209
00:12:00,538 --> 00:12:02,599
but we had an editorial process for it.
210
00:12:02,599 --> 00:12:04,444
So I'll describe to you how it works.
211
00:12:04,444 --> 00:12:10,055
For them, if you're looking at a sample
from lake sediments from 200 years ago,
212
00:12:10,055 --> 00:12:12,622
you use different properties
to describe it
213
00:12:12,622 --> 00:12:15,692
than if you have coral sediments
that you're looking at
214
00:12:15,692 --> 00:12:18,979
or coral samples that you're looking at
that you extract from the ocean.
215
00:12:18,979 --> 00:12:23,532
Palmyra is a coral atoll in the Pacific.
216
00:12:23,532 --> 00:12:27,918
So if you have coral, you care
about the species and the genus,
217
00:12:27,918 --> 00:12:31,691
but if you're just looking at lake sand,
you don't have that.
218
00:12:31,691 --> 00:12:35,313
So each type of sample
has very different properties.
219
00:12:35,313 --> 00:12:38,798
In LinkedEarth,
they're able to see in a map
220
00:12:38,798 --> 00:12:40,264
where the datasets are.
221
00:12:40,264 --> 00:12:45,500
They actually annotate their own datasets
or the datasets of other researchers
222
00:12:45,500 --> 00:12:46,787
when they're using it.
223
00:12:46,787 --> 00:12:50,254
So they have a reason
why they want certain properties
224
00:12:50,254 --> 00:12:52,289
to describe those datasets.
225
00:12:52,289 --> 00:12:56,683
Whenever there are disagreements,
or whenever there are agreements,
226
00:12:56,683 --> 00:12:58,595
there's community discussions
about them
227
00:12:58,595 --> 00:13:02,894
and they're also polls to decide on
what properties to settle.
228
00:13:02,894 --> 00:13:05,659
So it's a nice ecosystem.
I'll give you examples.
229
00:13:05,659 --> 00:13:11,322
You look at a particular dataset,
in this case it's a lake in Africa.
230
00:13:11,322 --> 00:13:14,241
So you have the category of the page;
it can be a dataset,
231
00:13:14,241 --> 00:13:15,491
it can be other things.
232
00:13:15,491 --> 00:13:21,181
You can download the dataset itself
and you have kind of canonical properties
233
00:13:21,181 --> 00:13:23,737
that they have all agreed to have
for datasets,
234
00:13:23,737 --> 00:13:25,992
and then under Extra Information,
235
00:13:25,992 --> 00:13:29,369
those are properties
that the person describing this dataset,
236
00:13:29,369 --> 00:13:31,007
added on their own accord.
237
00:13:31,007 --> 00:13:32,628
So these can be new properties.
238
00:13:32,628 --> 00:13:36,730
We call them "crowd properties,"
rather than "core properties."
239
00:13:37,291 --> 00:13:41,319
And then when you're describing
your dataset,
240
00:13:41,319 --> 00:13:43,774
in this case
it's an ice core that you got
241
00:13:43,774 --> 00:13:45,716
from a glacier dataset,
242
00:13:45,765 --> 00:13:49,178
and your'e adding a dataset
you want to talk about measurements,
243
00:13:49,178 --> 00:13:54,073
you have an offering
of all the existing properties
244
00:13:54,073 --> 00:13:55,278
that match what you're saying.
245
00:13:55,278 --> 00:13:58,409
So we do this search completion
so that you can adopt that.
246
00:13:58,409 --> 00:14:00,140
That promotes normalization.
247
00:14:00,140 --> 00:14:04,260
The core of the properties
has been agreed by the community
248
00:14:04,260 --> 00:14:06,220
so we're really extending that core.
249
00:14:06,220 --> 00:14:08,795
And that core is very important
because it gives structure
250
00:14:08,795 --> 00:14:10,735
to all the extensions.
251
00:14:10,735 --> 00:14:14,382
We engage the community
through many different ways.
252
00:14:14,382 --> 00:14:17,260
We had one face-to-face meeting
at the beginning
253
00:14:17,260 --> 00:14:21,611
and after about a year and a half,
we do have a new standard,
254
00:14:21,611 --> 00:14:25,154
and a new way for them
to continue to evolve that standard.
255
00:14:25,154 --> 00:14:30,569
They have editors, very much
in the Wikipedia style
256
00:14:30,569 --> 00:14:31,582
of editorial boards.
257
00:14:31,582 --> 00:14:34,098
They have working groups
for different types of data.
258
00:14:34,098 --> 00:14:36,090
They do polls with the community,
259
00:14:36,090 --> 00:14:40,879
and they have pretty nice engagement
of the community at large,
260
00:14:40,879 --> 00:14:43,706
even if they've never visited our Wiki.
261
00:14:43,706 --> 00:14:46,183
The metadata evolves
262
00:14:46,183 --> 00:14:48,775
so what we do is that people annotate
their datasets,
263
00:14:48,775 --> 00:14:52,321
then the schema evolves,
the properties evolve
264
00:14:52,321 --> 00:14:55,379
and we have an entire infrastructure
and mechanisms
265
00:14:55,379 --> 00:15:00,336
to re-annotate the datasets
with the new structure of the ontology
266
00:15:00,336 --> 00:15:01,711
and the new properties.
267
00:15:01,711 --> 00:15:05,210
This is described in the paper.
I won't go into the details.
268
00:15:05,210 --> 00:15:07,583
But I think that
having that kind of capability
269
00:15:07,583 --> 00:15:10,342
in Wikibase would be really interesting.
270
00:15:10,342 --> 00:15:14,041
We basically extended
Semantic Media Wiki and Media Wiki
271
00:15:14,041 --> 00:15:15,722
to create our own infrastructure.
272
00:15:15,722 --> 00:15:18,855
I think a lot of this is now something
that we find in Wikibase,
273
00:15:18,961 --> 00:15:20,615
but this is older than that.
274
00:15:20,615 --> 00:15:24,999
And in general, we have many projects
where we look at crowdsourcing
275
00:15:24,999 --> 00:15:29,885
not just descriptions of datasets
but also descriptions of hydrology models,
276
00:15:29,885 --> 00:15:33,563
descriptions of multi-step
data analytic workflows
277
00:15:33,563 --> 00:15:36,080
and many other things in the sciences.
278
00:15:36,080 --> 00:15:42,833
So we are also interested in including
in Wikidata additional things
279
00:15:42,833 --> 00:15:46,250
that are not just datasets or entities
280
00:15:46,250 --> 00:15:48,512
but also other things
that have to do with science.
281
00:15:48,512 --> 00:15:53,770
I think Geosciences are more complex
in this sense than Biology, for example.
282
00:15:54,923 --> 00:15:56,233
That's it.
283
00:15:56,513 --> 00:15:57,885
Thank you.
(audience clapping)
284
00:16:01,640 --> 00:16:03,772
- Do I have time for questions?
- Yes.
285
00:16:03,772 --> 00:16:06,871
(moderator) We have time
for just a couple of short questions.
286
00:16:07,751 --> 00:16:11,342
When answering,
can go back to the microphone?
287
00:16:12,529 --> 00:16:14,520
- Yes.
- Hopefully, yeah.
288
00:16:21,314 --> 00:16:25,002
(audience 1) Does the structure allow
tabular datasets to be described
289
00:16:25,002 --> 00:16:26,988
and can you talk a bit about that?
290
00:16:27,225 --> 00:16:32,667
Yes. So the properties of the datasets
talk more about who collected them,
291
00:16:32,667 --> 00:16:36,759
what kind of data was collected,
what kind of sample it was,
292
00:16:36,759 --> 00:16:39,790
and then there's a separate standard
which is called "lipid"
293
00:16:39,790 --> 00:16:43,065
that's complementary and mapped
to the properties
294
00:16:43,065 --> 00:16:46,994
that describes the format
of the actual files
295
00:16:47,075 --> 00:16:49,343
and the actual structure of the data.
296
00:16:49,343 --> 00:16:53,631
So, you're right that there's both,
"how do I find data about x"
297
00:16:53,631 --> 00:16:55,557
but also, "Now, how do I use it?
298
00:16:55,557 --> 00:17:00,211
How do I know where
the temperature that I'm looking for
299
00:17:00,211 --> 00:17:03,013
is actually in the file?"
300
00:17:03,656 --> 00:17:05,394
(moderator) This will be the last.
301
00:17:06,887 --> 00:17:09,034
(audience 2) I'll have
to make it relevant.
302
00:17:09,504 --> 00:17:15,667
So, you have shown this process
of how users can suggest
303
00:17:15,667 --> 00:17:18,985
or like actually already put in
properties,
304
00:17:18,985 --> 00:17:22,705
and I didn't fully understand
how this thing works,
305
00:17:22,705 --> 00:17:24,027
or what's the process behind it.
306
00:17:24,027 --> 00:17:28,045
Is there some kind of
folksonomy approach--obviously--
307
00:17:28,045 --> 00:17:33,387
but how is it promoted
into the core vocabulary
308
00:17:33,387 --> 00:17:36,255
if something is promoted?
309
00:17:36,255 --> 00:17:37,882
Yes, yes. It is.
310
00:17:37,882 --> 00:17:42,202
So what we do is we have a core ontology
and the initial one was actually
311
00:17:42,202 --> 00:17:45,618
very thoughtfully put together
through a lot of discussion
312
00:17:45,618 --> 00:17:47,964
by very few people.
313
00:17:47,964 --> 00:17:51,052
And then the idea was
the whole community can extend that
314
00:17:51,052 --> 00:17:52,971
or propose changes to that.
315
00:17:52,971 --> 00:17:56,919
So, as they are describing datasets,
they can add new properties
316
00:17:56,919 --> 00:17:59,526
and those become "crowd properties."
317
00:17:59,526 --> 00:18:02,941
And every now and then,
the Editorial Committee
318
00:18:02,941 --> 00:18:04,367
looks at all of those properties,
319
00:18:04,367 --> 00:18:07,795
the working groups look at all of those
crowd properties,
320
00:18:07,795 --> 00:18:11,714
and decide whether to incorporate them
into the main ontology.
321
00:18:11,714 --> 00:18:15,804
So it could be because they're used
for a lot of dataset descriptions.
322
00:18:15,804 --> 00:18:18,920
It could be because
they are proposed by somebody
323
00:18:18,920 --> 00:18:23,339
and they're found to be really interesting
or key, or uncontroversial.
324
00:18:23,339 --> 00:18:30,267
So there's an entire editorial process
to incorporate those new crowd properties
325
00:18:30,267 --> 00:18:32,188
or the folksonomy part of it,
326
00:18:32,188 --> 00:18:36,308
but they are really built around the core
of the ontology.
327
00:18:36,404 --> 00:18:40,280
The core ontology then grows
with more crowd properties
328
00:18:40,280 --> 00:18:44,311
and then people propose
additional crowd properties again.
329
00:18:44,311 --> 00:18:46,979
So we've gone through a couple
of these iterations
330
00:18:46,979 --> 00:18:51,386
of rolling out a new core,
and then extending it,
331
00:18:51,386 --> 00:18:55,570
and then rolling out a new core
and then extending it.
332
00:18:55,570 --> 00:18:57,779
- (audience 2) Great. Thank you.
- Thanks.
333
00:18:57,779 --> 00:19:00,437
(moderator) Thank you.
(audience applauding)
334
00:19:02,295 --> 00:19:03,777
(moderator) Thank you, Yolanda.
335
00:19:03,777 --> 00:19:07,494
And now we have Adam Shorn
with "Something About Wikibase,"
336
00:19:07,599 --> 00:19:09,299
according to the title.
337
00:19:09,708 --> 00:19:12,956
Uh... where's the internet? There it is.
338
00:19:13,245 --> 00:19:18,925
So, I'm going to do a live demo,
which is probably a bad idea
339
00:19:18,925 --> 00:19:21,362
but I'm going to try and do it
as the birthday present later
340
00:19:21,362 --> 00:19:24,268
so I figure I might as well try it here.
341
00:19:24,292 --> 00:19:27,304
And I also have some notes on my phone
because I have no slides.
342
00:19:29,349 --> 00:19:32,248
So, two years ago,
I made these Wikibase doc images
343
00:19:32,248 --> 00:19:34,052
that quite a few people have tried out,
344
00:19:34,052 --> 00:19:38,087
and even before then,
I was working on another project,
345
00:19:38,087 --> 00:19:42,363
which is kind of ready now,
and here it is.
346
00:19:43,690 --> 00:19:46,832
It's a website that allows you
to instantly create a Wikibase
347
00:19:46,900 --> 00:19:48,930
with a query service and quick statements,
348
00:19:48,930 --> 00:19:51,616
without needing to know about
any of the technical details,
349
00:19:51,616 --> 00:19:54,295
without needing to manage
any of them either.
350
00:19:54,295 --> 00:19:57,054
There are still lots of features to go
and there's still some bugs,
351
00:19:57,054 --> 00:19:59,348
but here goes the demo.
352
00:19:59,348 --> 00:20:02,628
Let me get my emails up ready...
because I need them too...
353
00:20:03,315 --> 00:20:06,514
Da da da... Stopwatch.
354
00:20:07,272 --> 00:20:08,488
Okay.
355
00:20:08,829 --> 00:20:14,253
So it's a simple as...
at the moment it's locked down behind...
356
00:20:14,337 --> 00:20:16,495
Oh no! German keyboard!
357
00:20:16,495 --> 00:20:18,703
(audience laughing)
358
00:20:22,556 --> 00:20:23,923
Foiled... okay.
359
00:20:24,955 --> 00:20:26,214
Okay.
360
00:20:26,634 --> 00:20:28,417
(audience continues to laugh)
361
00:20:30,434 --> 00:20:31,989
Aha! Okay.
362
00:20:32,950 --> 00:20:35,335
I'll remember that for later.
(laughs)
363
00:20:36,911 --> 00:20:38,119
Yes.
364
00:20:39,438 --> 00:20:40,855
♪ (humming) ♪
365
00:20:40,961 --> 00:20:44,932
Oh my god... now it's American.
366
00:20:53,871 --> 00:20:56,131
All you have to do is create an account...
367
00:20:58,570 --> 00:21:00,007
da da da...
368
00:21:00,566 --> 00:21:02,432
Click this button up here...
369
00:21:02,478 --> 00:21:05,512
Come up with a name for Wiki--
"Demo1"
370
00:21:05,862 --> 00:21:07,299
"Demo1"
371
00:21:07,568 --> 00:21:09,135
"Demo user"
372
00:21:09,203 --> 00:21:11,864
Agree to the terms
which don't really exist yet.
373
00:21:12,298 --> 00:21:14,247
(audience laughing)
374
00:21:15,264 --> 00:21:17,698
Click on this thing which isn't a link.
375
00:21:21,519 --> 00:21:23,886
And then you have your Wikibase.
376
00:21:23,886 --> 00:21:26,602
(audience cheers and claps)
377
00:21:28,554 --> 00:21:30,421
Anmelden in German.
378
00:21:30,421 --> 00:21:35,126
Demo... oh god! I'm learning lots about
my demo later.
379
00:21:35,569 --> 00:21:40,069
1-6-1-4-S-G...
380
00:21:40,166 --> 00:21:42,567
- (audience 3) Y...
- (Adam) It's random.
381
00:21:43,016 --> 00:21:44,567
(audience laughing)
382
00:21:46,237 --> 00:21:47,958
Oh, come on....
(audience laughing)
383
00:21:48,001 --> 00:21:50,543
Oh no. It's because this is a capital U...
384
00:21:51,333 --> 00:21:53,283
(audience chattering)
385
00:21:54,453 --> 00:21:56,545
6-1-4....
386
00:21:57,465 --> 00:22:01,248
S-G-ENJ...
387
00:22:01,623 --> 00:22:03,794
Is J... oh no. That's... oh yeah. Okay.
388
00:22:03,843 --> 00:22:06,242
I'm really... I'm gonna have to look
at the laptop
389
00:22:06,242 --> 00:22:07,836
that I'm doing this on later.
390
00:22:07,836 --> 00:22:09,129
Cool...
391
00:22:11,046 --> 00:22:13,709
Da da da da da...
392
00:22:14,687 --> 00:22:17,040
Maybe I should have some things
in my clipboard ready.
393
00:22:17,539 --> 00:22:19,093
Okay, so now I'm logged in.
394
00:22:22,631 --> 00:22:25,065
Oh... keyboards.
395
00:22:28,083 --> 00:22:30,012
So you can go and create an item...
396
00:22:36,194 --> 00:22:38,508
Yeah, maybe I should make a video.
It might be easier.
397
00:22:38,927 --> 00:22:42,207
So, yeah. You can make items,
you have quick statements here
398
00:22:42,207 --> 00:22:43,901
that have... oh... it is all in German.
399
00:22:43,901 --> 00:22:45,088
(audience laughing)
400
00:22:45,088 --> 00:22:46,297
(sighs)
401
00:22:46,926 --> 00:22:49,021
Oh, log in? Log in?
402
00:22:50,348 --> 00:22:52,088
It has... Oh, set up ready.
403
00:22:52,088 --> 00:22:53,482
Da da da...
404
00:22:55,965 --> 00:22:57,850
It's as easy as...
405
00:22:58,966 --> 00:23:01,350
I learned how to use
Quick Statements yesterday...
406
00:23:01,350 --> 00:23:03,245
that's what I know how to do.
407
00:23:04,657 --> 00:23:07,089
I can then go back to the Wiki...
408
00:23:08,008 --> 00:23:09,804
We can go and see in Recent Changes
409
00:23:09,804 --> 00:23:11,942
that there are now two items,
the one that I made
410
00:23:11,942 --> 00:23:13,759
and the one from Quick Statements...
411
00:23:13,759 --> 00:23:14,881
and then you go to Quick...
412
00:23:14,881 --> 00:23:16,511
♪ (hums a tune) ♪
413
00:23:17,637 --> 00:23:18,770
Stop...no...
414
00:23:18,927 --> 00:23:20,120
No...
415
00:23:20,454 --> 00:23:22,437
(audience laughing)
416
00:23:28,394 --> 00:23:30,006
Oh god...
417
00:23:30,061 --> 00:23:32,012
I'm glad I tried this out in advance.
418
00:23:33,464 --> 00:23:35,678
There you go.
And the query service is updated.
419
00:23:35,830 --> 00:23:37,763
(audience clapping)
420
00:23:42,357 --> 00:23:45,359
And the idea of this is it'll allow
people to try out Wikibases.
421
00:23:45,359 --> 00:23:48,493
Hopefully, it'll even be able
to allow people to...
422
00:23:49,110 --> 00:23:50,945
have their real Wikibases here.
423
00:23:50,945 --> 00:23:53,783
At the moment you can create
as many as you want
424
00:23:53,783 --> 00:23:55,653
and they all just appear
in this lovely list.
425
00:23:55,653 --> 00:23:59,182
As I said, there's lots of bugs
but it's all super quick.
426
00:23:59,914 --> 00:24:03,392
Exactly how this is going to continue
in the future, we don't know yet
427
00:24:03,392 --> 00:24:05,757
because I only finished writing this
in the last few days.
428
00:24:05,757 --> 00:24:09,286
It's currently behind an invitation code
so that if you want to come try it out,
429
00:24:09,286 --> 00:24:10,888
come and talk to me.
430
00:24:11,645 --> 00:24:15,730
And if you have any other comments
or thoughts, let me know.
431
00:24:15,861 --> 00:24:19,711
Oh, three minutes...40. That's...
That's not that bad.
432
00:24:19,986 --> 00:24:21,022
Thanks.
433
00:24:21,022 --> 00:24:22,622
(audience clapping)
434
00:24:28,435 --> 00:24:30,006
Any questions?
435
00:24:31,020 --> 00:24:35,553
(audience 5) Does the Quick Statements
and the Query Service
436
00:24:35,553 --> 00:24:38,602
are automatically updated?
437
00:24:39,553 --> 00:24:42,345
Yes. So the idea is that
there will be somebody,
438
00:24:42,345 --> 00:24:43,500
at the moment, me,
439
00:24:43,500 --> 00:24:45,144
maintaining all of the horrible stuff
440
00:24:45,144 --> 00:24:47,290
that you don't have to behind the scenes.
441
00:24:47,657 --> 00:24:50,157
So kind of think of it like GitHub.com,
442
00:24:50,157 --> 00:24:54,058
but you don't have to know anything
about Git to use it. It's just all there.
443
00:24:55,241 --> 00:24:56,886
- [inaudible]
- Yeah, we'll get that.
444
00:24:56,886 --> 00:25:00,247
But any of those
big hosted solution things.
445
00:25:00,833 --> 00:25:03,263
- (audience 6) A feature request.
- Yes.
446
00:25:03,263 --> 00:25:05,479
Is there any-- In Scope
447
00:25:05,479 --> 00:25:09,799
do you have plans on making it
so you can easily import existing...
448
00:25:09,799 --> 00:25:12,549
- Wikidata...
- I have loads of plans.
449
00:25:12,549 --> 00:25:14,909
Like I want there to be a button
where you can just import
450
00:25:14,909 --> 00:25:17,348
another whole Wikibase and all of--yeah.
451
00:25:17,436 --> 00:25:20,723
There will, in the future list
that's really long. Yeah.
452
00:25:24,454 --> 00:25:28,406
(audience 7) I understand that it's...
you want to make it user-friendly
453
00:25:28,406 --> 00:25:32,242
but if I want to access
to the machine itself, can I do that?
454
00:25:32,242 --> 00:25:34,673
Nope.
(audience laughing)
455
00:25:37,006 --> 00:25:40,863
So again, like, in the longer term future,
there are possib...
456
00:25:40,863 --> 00:25:43,810
Everything's possible,
but at the moment, no.
457
00:25:45,156 --> 00:25:49,743
(audience 8) Two questions.
Is there a plan to have export tools
458
00:25:49,743 --> 00:25:52,791
so that you can export it
to your own Wikibase maybe at some point?
459
00:25:52,791 --> 00:25:53,824
- Yes.
- Great.
460
00:25:53,824 --> 00:25:55,565
And is this a business?
461
00:25:56,003 --> 00:25:58,164
I have no idea.
(audience laughing)
462
00:26:00,015 --> 00:26:01,545
Not currently.
463
00:26:05,754 --> 00:26:08,451
(audience 9) What if I stop
using it tomorrow,
464
00:26:08,451 --> 00:26:11,096
how long will the data be there?
465
00:26:11,181 --> 00:26:14,632
So my plan was at the end of WikidataCon
I was going to delete all of the data
466
00:26:14,632 --> 00:26:18,060
and there's a Wikibase Workshop
on a Sunday,
467
00:26:18,060 --> 00:26:21,671
and we will maybe be using this
for the Wikibase workshop
468
00:26:21,671 --> 00:26:23,801
so that everyone can have
their own Wikibase.
469
00:26:23,801 --> 00:26:27,366
And then, from that point,
I probably won't be deleting the data
470
00:26:27,366 --> 00:26:29,008
so it will all just stay there.
471
00:26:31,763 --> 00:26:32,923
(moderator) Question.
472
00:26:34,524 --> 00:26:36,114
(audience 10) It's two minutes...
473
00:26:36,175 --> 00:26:39,505
Alright, fine. I'll allow two more
questions if you talk quickly.
474
00:26:39,505 --> 00:26:41,550
(audience laughing)
475
00:26:47,370 --> 00:26:49,999
- Alright, good people.
- Thank you, Adam.
476
00:26:49,999 --> 00:26:52,418
Thank you for letting me test
my demo... I mean...
477
00:26:52,418 --> 00:26:54,640
I'm going to do it different.
(audience clapping)
478
00:26:59,512 --> 00:27:00,753
(moderator) Thank you.
479
00:27:00,753 --> 00:27:03,869
Now we have Dennis Diefenbach
presenting Q Answer.
480
00:27:04,489 --> 00:27:08,129
Hello, I'm Dennis Diefenbach,
I would like to present Q-Answer
481
00:27:08,129 --> 00:27:11,392
which is a question-answering system
on top of Wikidata.
482
00:27:11,392 --> 00:27:16,203
So, what we need are some questions
and this is the interface of QAnswer.
483
00:27:16,203 --> 00:27:23,460
For example, where is WikidataCon?
484
00:27:23,901 --> 00:27:25,975
Alright, I think it's written like this.
485
00:27:27,432 --> 00:27:32,432
2019... And we get this response
which is Berlin.
486
00:27:32,458 --> 00:27:38,425
So, other questions. For example,
"When did Wikidata start?"
487
00:27:38,430 --> 00:27:42,383
It started the 30 October 2012
so it's birthday is approaching.
488
00:27:44,079 --> 00:27:48,014
It is 6 years old,
so it will be their 7th birthday.
489
00:27:49,133 --> 00:27:51,583
Who is developing Wikidata?
490
00:27:51,583 --> 00:27:54,371
The Wikimedia Foundation
and Wikimedia Deutschland,
491
00:27:54,371 --> 00:27:55,988
so thank you very much to them.
492
00:27:57,013 --> 00:28:02,947
Something like museums in Berlin...
I don't know why this is not so...
493
00:28:05,494 --> 00:28:07,737
Only one museum... no, yeah, a few more.
494
00:28:09,167 --> 00:28:10,995
So, when you ask something like this,
495
00:28:10,995 --> 00:28:14,178
we allow the user
to explore the information
496
00:28:14,178 --> 00:28:16,308
with different aggregations.
497
00:28:16,308 --> 00:28:18,953
For example,
if there are many geo coordinates
498
00:28:18,953 --> 00:28:21,476
attached to the entities,
we will display a map.
499
00:28:21,476 --> 00:28:26,357
If there are many images attached to them,
we will display the images,
500
00:28:26,357 --> 00:28:29,057
and otherwise there is a list
where you can explore
501
00:28:29,057 --> 00:28:30,855
the different entities.
502
00:28:33,236 --> 00:28:35,605
You can ask something like
"Who is the mayor of Berlin,"
503
00:28:36,643 --> 00:28:40,201
"Give me politicians born in Berlin,"
and things like this.
504
00:28:40,201 --> 00:28:44,428
So you can both ask keyword questions
and foreign natural language questions.
505
00:28:45,171 --> 00:28:48,604
The whole data is coming from Wikidata
506
00:28:48,604 --> 00:28:55,346
so all entities which are in Wikidata
are queryable by this service.
507
00:28:55,869 --> 00:28:59,244
And the data is really all from Wikidata
508
00:28:59,244 --> 00:29:01,207
in the sense,
there are some Wikipedia snippets,
509
00:29:01,207 --> 00:29:04,851
there are images from Wikimedia Commons,
510
00:29:04,851 --> 00:29:07,644
but the rest is all Wikidata data.
511
00:29:08,760 --> 00:29:11,678
We can do this in several languages.
This is now in Chinese.
512
00:29:11,678 --> 00:29:15,441
I don't know what is written there
so do not ask me.
513
00:29:15,441 --> 00:29:19,893
We are currently supporting this languages
with more or less good quality
514
00:29:19,893 --> 00:29:22,094
because... yeah.
515
00:29:23,332 --> 00:29:27,563
So, how can this be useful
for the Wikidata community?
516
00:29:27,968 --> 00:29:30,052
I think there are different reasons.
517
00:29:30,052 --> 00:29:33,786
First of all, this thing helps you
to generate SPARQL queries
518
00:29:33,786 --> 00:29:37,043
and I know there are even some workshops
about how to use SPARQL.
519
00:29:37,043 --> 00:29:39,444
It's not a language that everyone speaks.
520
00:29:39,444 --> 00:29:45,147
So, if you ask something like
"a philosopher born before 1908,"
521
00:29:45,147 --> 00:29:48,697
to figure out, to construct
a SPARQL query like this could be tricky,
522
00:29:50,001 --> 00:29:54,257
In fact when you ask a question,
we generate many SPARQL queries
523
00:29:54,301 --> 00:29:57,486
and the first one is always the thing,
the SPARQL query where we think
524
00:29:57,486 --> 00:29:59,008
this is the good one.
525
00:29:59,017 --> 00:30:02,651
So, if you ask your question
and then you go on SPARQL list,
526
00:30:02,691 --> 00:30:06,468
then there is this button
for the Wikidata query service
527
00:30:06,468 --> 00:30:11,811
and you have the SPARQL query right there
and you will get the same result
528
00:30:11,811 --> 00:30:15,184
as you would get in the interface.
529
00:30:16,906 --> 00:30:19,289
Another thing where it could be useful for
530
00:30:19,289 --> 00:30:23,468
is for finding missing
contextual information.
531
00:30:23,468 --> 00:30:27,057
For example, if you ask for actors
in "The Lord of the Rings,"
532
00:30:27,057 --> 00:30:30,776
most of these entities
will have associated an image
533
00:30:30,776 --> 00:30:32,490
but not all of them.
534
00:30:32,490 --> 00:30:37,861
So here there is some missing metadata
that could be added.
535
00:30:37,861 --> 00:30:40,376
You could go to this entity at an image
536
00:30:40,376 --> 00:30:45,462
and then see first
that there is an image missing and so on.
537
00:30:46,457 --> 00:30:52,047
Another thing is that you could find
schema issues.
538
00:30:52,047 --> 00:30:55,424
For example, if you ask
"books by Andrea Camilleri,"
539
00:30:55,428 --> 00:30:57,711
which is a famous Italian writer,
540
00:30:57,711 --> 00:30:59,981
you would currently get
these three books.
541
00:30:59,981 --> 00:31:02,681
But he wrote many more.
He wrote more than 50.
542
00:31:02,681 --> 00:31:05,701
And so the question is,
are they not in Wikidata
543
00:31:05,701 --> 00:31:09,704
or is maybe my knowledge
not correctly currently like it is.
544
00:31:09,704 --> 00:31:12,804
And in this case, I know
there is another book from him,
545
00:31:12,804 --> 00:31:14,737
which is "Un mese con Montalbano."
546
00:31:14,737 --> 00:31:18,207
It has only an Italian label
so you can only search it in Italian.
547
00:31:18,207 --> 00:31:22,103
And if you go to this entity,
you will say that he has written it.
548
00:31:22,103 --> 00:31:27,504
It's a short story by Andrea Camilleri
and it's an instance of literary work,
549
00:31:27,504 --> 00:31:29,220
but it's not instance of book
550
00:31:29,220 --> 00:31:31,338
so that's the reason why
it doesn't appear.
551
00:31:31,338 --> 00:31:35,904
This is a way to track
where things are missing
552
00:31:35,904 --> 00:31:37,499
in the Wikidata model
553
00:31:37,499 --> 00:31:39,539
not as you would expect.
554
00:31:40,794 --> 00:31:42,968
Another reason is just to have fun.
555
00:31:43,588 --> 00:31:47,546
I imagine that many of you added
many Wikidata entities
556
00:31:47,546 --> 00:31:50,776
so just search for the ones
that you care most
557
00:31:50,776 --> 00:31:52,529
or you have edited yourself.
558
00:31:52,529 --> 00:31:56,893
So in this case, who developed
QAnswer, and that's it.
559
00:31:56,893 --> 00:32:00,226
For any other questions,
go to www.QAnswer.eu/qa
560
00:32:00,226 --> 00:32:03,575
and hopefully we'll find
an answer for you.
561
00:32:03,782 --> 00:32:05,649
(audience clapping)
562
00:32:13,994 --> 00:32:17,040
- Sorry.
- I'm just the dumbest person here.
563
00:32:17,530 --> 00:32:22,722
(audience 11) So I want to know
how is this kind of agnostic
564
00:32:22,752 --> 00:32:25,104
to Wikibase instance,
565
00:32:25,104 --> 00:32:29,020
or has it been tied to the exact
like property numbers
566
00:32:29,020 --> 00:32:31,054
and things in Wikidata?
567
00:32:31,054 --> 00:32:33,442
Has it learned in some way
or how was it set up?
568
00:32:33,442 --> 00:32:36,456
There is training data
and we rely on training data
569
00:32:36,456 --> 00:32:40,585
and this is also most of the cases
why you will not get good resutls.
570
00:32:40,585 --> 00:32:44,881
But we're training the system
by the simple yes and no answer.
571
00:32:44,881 --> 00:32:48,936
When you ask a question,
and we ask always for feedback, yes or no,
572
00:32:48,936 --> 00:32:51,899
and this feedback is used by
the machine learning algorithm.
573
00:32:51,899 --> 00:32:54,124
This is where machine learning
comes into play.
574
00:32:54,124 --> 00:32:58,600
But basically, we put up separate
Wikibase instances
575
00:32:58,600 --> 00:33:00,482
and we can plug this in.
576
00:33:00,482 --> 00:33:04,249
In fact, the system is agnostic
in the sense that it only wants RDF.
577
00:33:04,249 --> 00:33:06,618
And RDF, you have in each Wikibase,
578
00:33:06,618 --> 00:33:08,059
there are some few configurations
579
00:33:08,059 --> 00:33:10,432
but you can have this on top
of any Wikibase.
580
00:33:11,654 --> 00:33:13,039
(audience 11) Awesome.
581
00:33:23,573 --> 00:33:27,004
(audience 12) You mentioned that
it's being trained by yes/no answers.
582
00:33:27,073 --> 00:33:32,662
So I guess this is assuming that
the Wikidata instance is free of errors
583
00:33:32,722 --> 00:33:34,356
or is it also...?
584
00:33:34,356 --> 00:33:37,140
You assume that the Wikidata instances...
585
00:33:37,140 --> 00:33:40,731
(audience 12) I guess I'm asking, like,
are you distinguishing
586
00:33:40,731 --> 00:33:46,289
between source level errors
or misunderstanding the question
587
00:33:46,289 --> 00:33:50,856
versus a bad mapping, etc.?
588
00:33:51,706 --> 00:33:55,474
Generally, we assume that the data
in Wikidata is true.
589
00:33:55,474 --> 00:33:59,172
So if you click "no"
and the data in Wikidata would be false,
590
00:33:59,172 --> 00:34:03,023
then yeah... we would not catch
this difference.
591
00:34:03,023 --> 00:34:05,081
But sincerely, Wikidata quality
is very good,
592
00:34:05,081 --> 00:34:08,231
so I rarely have had this problem.
593
00:34:16,592 --> 00:34:22,068
(audience 12) Is this data available
as a dataset by any chance, sir?
594
00:34:22,209 --> 00:34:27,218
- What is... direct service?
- The... dataset of...
595
00:34:27,218 --> 00:34:30,803
"is this answer correct
versus the query versus the answer?"
596
00:34:30,872 --> 00:34:33,340
Is that something you're publishing
as part of this?
597
00:34:33,340 --> 00:34:38,040
- The training data that you've...
- We published the training data.
598
00:34:38,040 --> 00:34:43,423
We published some old training data
but no, just a--
599
00:34:44,573 --> 00:34:47,313
There is a question there.
I don't know if we have still time.
600
00:34:51,215 --> 00:34:55,104
(audience 13) Maybe I just missed this
but is it running on a live,
601
00:34:55,104 --> 00:34:57,080
like the Live Query Service,
602
00:34:57,080 --> 00:34:59,393
or is it running on
some static dump you loaded
603
00:34:59,393 --> 00:35:01,690
or where is the data source
for Wikidata?
604
00:35:01,784 --> 00:35:07,014
Yes. The problem is
to apply this technology,
605
00:35:07,014 --> 00:35:08,414
you need a local dump.
606
00:35:08,414 --> 00:35:10,673
Because we do not rely only
on the SPARQL end point,
607
00:35:10,673 --> 00:35:12,873
we rely on special indexes.
608
00:35:12,873 --> 00:35:16,192
So, we are currently loading
the Wikidata dump.
609
00:35:16,192 --> 00:35:18,699
We are updating this every two weeks.
610
00:35:18,699 --> 00:35:20,756
We would like to do it more often,
611
00:35:20,756 --> 00:35:23,823
in fact we would like to get the difs
for each day, for example,
612
00:35:23,823 --> 00:35:25,271
to put them in our index.
613
00:35:25,271 --> 00:35:28,719
But unfortunately, right now,
the Wikidata dumps are released
614
00:35:28,719 --> 00:35:31,753
only once every week.
615
00:35:31,753 --> 00:35:35,150
So, we cannot be faster than that
and we also need some time
616
00:35:35,150 --> 00:35:39,073
to re-index the data,
so it takes one or two days.
617
00:35:39,073 --> 00:35:41,833
So we are always behind. Yeah.
618
00:35:48,202 --> 00:35:49,780
(moderator) Any more?
619
00:35:50,430 --> 00:35:53,268
- Okay, thank you very much.
- Thank you all very much.
620
00:35:53,547 --> 00:35:54,966
(audience clapping)
621
00:35:57,266 --> 00:36:00,165
(moderator) And now last, we have
Eugene Alvin Villar,
622
00:36:00,165 --> 00:36:02,049
talking about Panandâ.
623
00:36:10,630 --> 00:36:12,637
Good afternoon,
my name is Eugene Alvin Villar
624
00:36:12,637 --> 00:36:15,297
and I'm from the Philippines,
and I'll be talking about Panandâ:
625
00:36:15,297 --> 00:36:18,185
a mobile app powered by Wikidata.
626
00:36:18,862 --> 00:36:21,678
This is a follow-up to my lightning talk
that I presented two years ago
627
00:36:21,678 --> 00:36:25,004
at WikidataCon 2017
together with Carlo Moskito.
628
00:36:25,004 --> 00:36:26,557
You can download the slides
629
00:36:26,557 --> 00:36:28,727
and there's a link
to that presentation there.
630
00:36:28,727 --> 00:36:30,868
I'll give you a bit of a background.
631
00:36:30,868 --> 00:36:33,471
Wiki Society of the Philippines,
formerly, Wikimedia Philippines,
632
00:36:33,471 --> 00:36:37,477
had a series of projects related
to Philippine heritage and history.
633
00:36:37,477 --> 00:36:41,705
So we have the usual photo contests,
Wikipedia Takes Manila,
634
00:36:41,705 --> 00:36:43,238
Wiki Loves Monuments,
635
00:36:43,238 --> 00:36:46,657
and then our media project
was Cultural Heritage Mapping Project
636
00:36:46,657 --> 00:36:49,094
back in 2014-2015.
637
00:36:50,044 --> 00:36:53,039
In that project, we trained volunteers
to edit articles
638
00:36:53,039 --> 00:36:54,389
related to cultural heritage.
639
00:36:54,914 --> 00:36:59,032
This is our biggest,
and most successful project that we had.
640
00:36:59,032 --> 00:37:03,037
794 articles were created or improved,
including 37 "Did You Knows"
641
00:37:03,037 --> 00:37:05,238
and 4 "Good Articles,"
642
00:37:05,308 --> 00:37:08,688
and more than 5,000 images were uploaded
to Commons.
643
00:37:08,688 --> 00:37:11,039
As a result of that, we then launched
644
00:37:11,039 --> 00:37:13,689
the Encyclopedia
of Philippine Heritage program
645
00:37:13,689 --> 00:37:18,444
in order to expand the scope
and also include Wikidata in the scope.
646
00:37:18,444 --> 00:37:21,695
Here's the Core Team: myself,
Carlo and Roel.
647
00:37:21,695 --> 00:37:26,870
Our first pilot project was to document
the country's historical markers
648
00:37:26,870 --> 00:37:29,153
in Wikidata and Commons,
649
00:37:29,153 --> 00:37:34,053
starting with those created by
our historical national agency, NHCP.
650
00:37:34,053 --> 00:37:38,904
For example, they installed a marker
for our national hero, here in Berlin,
651
00:37:38,904 --> 00:37:41,421
so there's no Wikidata page
for that marker
652
00:37:41,421 --> 00:37:45,102
and a collection of photos of that marker
in Commons.
653
00:37:46,166 --> 00:37:50,397
Unfortunately, the government agency
does not keep a good database
654
00:37:50,397 --> 00:37:53,480
up-to-date or complete of their markers,
655
00:37:53,480 --> 00:37:58,004
so we have to painstakingly input these
to Wikidata manually.
656
00:37:58,004 --> 00:38:02,772
After careful research and confirmation,
here's a graph of the number of markers
657
00:38:02,772 --> 00:38:07,466
that we've added to Wikidata over time,
over the past three years.
658
00:38:07,466 --> 00:38:11,230
And we've developed
this Historical Markers Map web app
659
00:38:11,230 --> 00:38:15,289
that lets users view
these markers on a map,
660
00:38:15,289 --> 00:38:21,051
so we can browse it as a list,
view a good visualization of the markers
661
00:38:21,051 --> 00:38:23,253
with information and inscriptions.
662
00:38:23,253 --> 00:38:28,885
All of this is powered by Live Query
from Wikidata Query Service.
663
00:38:29,732 --> 00:38:32,005
There's the link
if you want to play around with it.
664
00:38:33,349 --> 00:38:37,428
And so we developed
a mobile app for this one.
665
00:38:37,428 --> 00:38:42,117
To better publicize our project,
I developed the Panandâ
666
00:38:42,117 --> 00:38:45,434
which is Tagalog for "marker",
as an android app,
667
00:38:45,434 --> 00:38:48,393
that was published back in 2018,
668
00:38:48,393 --> 00:38:53,934
and I'll publish the IOS version
sometime in the future, hopefully.
669
00:38:54,868 --> 00:38:57,892
I'd like to demo the app
but we have no time,
670
00:38:57,892 --> 00:39:00,935
so here are some
of the features of the app.
671
00:39:00,935 --> 00:39:04,586
There's a Map and a List view,
with text search,
672
00:39:04,586 --> 00:39:07,452
so you can drill down as needed.
673
00:39:07,452 --> 00:39:10,169
You can filter by region or by distance,
674
00:39:10,169 --> 00:39:12,193
and whether you have marked
these markers,
675
00:39:12,193 --> 00:39:15,499
as either you have visited them
or you'd like to bookmark them
676
00:39:15,499 --> 00:39:16,949
for future visits.
677
00:39:16,949 --> 00:39:19,482
Then you can use your GPS
on your mobile phone
678
00:39:19,482 --> 00:39:21,860
to use for distance filtering.
679
00:39:21,860 --> 00:39:26,765
For example, if I want markers
that are near me, you can do that.
680
00:39:26,765 --> 00:39:30,918
And when you click on the Details page,
you can see the same thing,
681
00:39:30,918 --> 00:39:35,850
photos from Commons,
inscription about the marker,
682
00:39:35,850 --> 00:39:40,484
how to find the marker,
its location and address, etc.
683
00:39:41,601 --> 00:39:45,993
And one thing that's unique for this app
is you can, again, visit
684
00:39:46,011 --> 00:39:50,407
or put a bookmark of these,
so on the map or on the list,
685
00:39:50,407 --> 00:39:51,692
or on the Details page,
686
00:39:51,692 --> 00:39:54,891
you can just tap on those buttons
and say that you've visited them,
687
00:39:54,891 --> 00:39:58,520
or you'd like to bookmark them
for future visits.
688
00:39:58,520 --> 00:40:03,527
And my app has been covered by the press
and given recognition,
689
00:40:03,527 --> 00:40:06,743
so plenty of local press articles.
690
00:40:06,743 --> 00:40:11,281
Recently, it was selected
as one of the Top 5 finalists
691
00:40:11,281 --> 00:40:15,247
for the Android Masters competition
in the App for Social Good category.
692
00:40:15,247 --> 00:40:17,351
The final event will be next month.
693
00:40:17,351 --> 00:40:18,999
Hopefully, we'll win.
694
00:40:20,380 --> 00:40:22,378
Okay, so some behind the scenes.
695
00:40:22,378 --> 00:40:25,477
How did I develop this app?
696
00:40:25,477 --> 00:40:28,578
Panandâ is actually a hybrid app,
it's not native.
697
00:40:28,578 --> 00:40:30,745
Basically it's just a web app
packaged as a mobile app
698
00:40:30,745 --> 00:40:32,518
using Apache Cordova.
699
00:40:32,518 --> 00:40:34,026
That reduces development time
700
00:40:34,026 --> 00:40:36,181
because I don't have to learn
a different language.
701
00:40:36,181 --> 00:40:37,769
I know JavaScript, HTML.
702
00:40:37,879 --> 00:40:42,131
It's cross-platform, allows code reuse
from the Historical Markers Map.
703
00:40:42,385 --> 00:40:46,311
And the app is also FIN Open Source.
under the MIT license.
704
00:40:46,311 --> 00:40:49,429
So there's the GitHub repository
over there.
705
00:40:50,469 --> 00:40:53,624
The challenge is
the apps data is not live.
706
00:40:54,750 --> 00:40:56,820
Because if you query the data live,
707
00:40:56,843 --> 00:41:00,638
it means you pulling around half
a megabyte of compressed JSON every time
708
00:41:00,638 --> 00:41:03,594
which is not friendly
for those on mobile data,
709
00:41:03,594 --> 00:41:06,723
incurs too much delay when starting
the app,
710
00:41:06,723 --> 00:41:13,097
and if there are any errors in Wikidata,
that may result in poor user experience.
711
00:41:14,253 --> 00:41:18,046
So instead, what I did was
the app is updated every few months
712
00:41:18,046 --> 00:41:20,468
with fresh data, compiled using
a Perl script
713
00:41:20,468 --> 00:41:23,037
that queries Wikidata Query Service,
714
00:41:23,037 --> 00:41:25,678
and this script also does
some data validation
715
00:41:25,678 --> 00:41:30,944
to highlight consistency or schema errors,
so that allows fixes before updates
716
00:41:30,944 --> 00:41:34,735
in order to provide a good experience
for the mobile user.
717
00:41:35,174 --> 00:41:39,274
And here's the... if you're tech-oriented,
here's the more or less,
718
00:41:39,274 --> 00:41:41,644
the technologies that I'm using.
719
00:41:41,644 --> 00:41:43,976
So a bunch of JavaScript libraries.
720
00:41:43,976 --> 00:41:46,287
Here's the first script
that queries Wikidata,
721
00:41:46,287 --> 00:41:48,598
some Cordova plug-ins,
722
00:41:48,598 --> 00:41:53,035
and building it using Cordova
and then publishing this app.
723
00:41:53,763 --> 00:41:55,586
And that's it.
724
00:41:55,748 --> 00:41:58,164
(audience clapping)
725
00:42:01,800 --> 00:42:04,072
(moderator) I hope you win.
Alright, questions.
726
00:42:16,286 --> 00:42:17,990
(audience 14) Sorry if I missed this.
727
00:42:17,990 --> 00:42:21,317
Are you opening your code
so the people can adapt your app
728
00:42:21,317 --> 00:42:24,501
and do it for other cities?
729
00:42:24,501 --> 00:42:28,516
Yes, as I've mentioned,
the app is free and open source,
730
00:42:28,516 --> 00:42:31,095
- (audience 14) But where is it?
- There's the GitHub repository.
731
00:42:31,095 --> 00:42:33,610
You can download the slides,
and there's a link
732
00:42:33,610 --> 00:42:36,841
in one of the previous slides
to the repository.
733
00:42:36,841 --> 00:42:38,732
(audience 14) Okay. Can you put it?
734
00:42:42,392 --> 00:42:43,747
Yeah, at the bottom.
735
00:42:46,577 --> 00:42:49,222
(audience 15) Hi. Sorry, maybe
I also missed this,
736
00:42:49,222 --> 00:42:51,628
but how do you check for a schema errors?
737
00:42:53,055 --> 00:42:56,007
Basically, we have a Wikiproject
on Wikidata,
738
00:42:56,106 --> 00:43:02,425
so we try to put the other guidelines
on how to model these markers correctly.
739
00:43:02,425 --> 00:43:05,190
Although it's not updated right now.
740
00:43:06,197 --> 00:43:09,023
As far as I know, we're the only country
741
00:43:09,023 --> 00:43:12,874
that's currently modeling these
in Wikidata.
742
00:43:13,930 --> 00:43:20,152
There's also an effort
to add [inaudible]
743
00:43:20,161 --> 00:43:22,411
in Wikidata,
744
00:43:22,474 --> 00:43:25,705
but I think that's
a different thing altogether.
745
00:43:34,056 --> 00:43:35,895
(audience 16) So I guess this may be part
746
00:43:35,895 --> 00:43:37,725
of this Wikiproject you just described,
747
00:43:37,725 --> 00:43:42,800
but for the consistency checks,
have you considered moving those
748
00:43:42,800 --> 00:43:46,743
into like complex schema constraints
that then can be flagged
749
00:43:46,743 --> 00:43:50,583
on the Wikidata side for
what there is to fix on there?
750
00:43:52,930 --> 00:43:55,547
I'm actually interested in seeing
if I can do, for example,
751
00:43:55,598 --> 00:44:00,296
shape expressions, so that, yeah,
we can do those things.
752
00:44:04,256 --> 00:44:06,776
(moderator) At this point,
we have quite a few minutes left.
753
00:44:06,776 --> 00:44:09,026
The speakers did very well,
so if Erica is okay with it,
754
00:44:09,026 --> 00:44:11,238
I'm also going to allow
some time for questions,
755
00:44:11,238 --> 00:44:13,407
still about this presentation,
but also about Mbabel,
756
00:44:13,407 --> 00:44:15,498
if anyone wants to jump in
with something there,
757
00:44:15,498 --> 00:44:17,318
either presentation is fair game.
758
00:44:22,790 --> 00:44:25,639
Unless like me, you're all so dazzled
that you just want to go to snacks
759
00:44:25,639 --> 00:44:27,955
and think about it.
(audience giggles)
760
00:44:29,308 --> 00:44:31,179
- (moderator) You know...
- Yeah.
761
00:44:31,953 --> 00:44:34,491
(audience 17) I will always have
questions about everything.
762
00:44:34,491 --> 00:44:37,642
So, I came in late for the Mbabel tool.
763
00:44:37,642 --> 00:44:40,350
But I was looking through
and I saw there's a number of templates,
764
00:44:40,350 --> 00:44:43,232
and I was wondering
if there's a place to contribute
765
00:44:43,232 --> 00:44:45,564
to adding more templates
for different types
766
00:44:45,564 --> 00:44:47,620
or different languages and the like?
767
00:44:50,497 --> 00:44:53,683
(Erica) So for now, we're developing
those narrative templates
768
00:44:53,683 --> 00:44:55,566
on Portuguese Wikipedia.
769
00:44:55,566 --> 00:44:57,856
I can show you if you like.
770
00:44:57,856 --> 00:45:02,051
We're inserting those templates
on English Wikipedia too.
771
00:45:02,051 --> 00:45:07,017
It's not complicated to do
but we have to expand for other languages.
772
00:45:07,017 --> 00:45:08,236
- French?
- French.
773
00:45:08,236 --> 00:45:10,465
- Yes.
- French and German already have.
774
00:45:10,465 --> 00:45:11,465
(laughing)
775
00:45:12,002 --> 00:45:13,018
Yeah.
776
00:45:15,755 --> 00:45:18,287
(inaudible chatter)
777
00:45:21,756 --> 00:45:24,446
(audience 18) I also have a question
about Mbabel,
778
00:45:24,446 --> 00:45:27,676
which is, is this really just templates?
779
00:45:27,676 --> 00:45:33,893
Is this based on the LUA scripting?
Is that all? Wow. Okay.
780
00:45:33,956 --> 00:45:37,404
Yeah, so it's very deployable. Okay. Cool.
781
00:45:38,102 --> 00:45:40,199
(moderator) Just to catch that
for the live stream,
782
00:45:40,199 --> 00:45:42,745
the answer was an emphatic nod
of the head, and a yes.
783
00:45:42,915 --> 00:45:44,648
(audience laughing)
784
00:45:44,754 --> 00:45:47,203
- (Erica) Super simple.
- (moderator) Super simple.
785
00:45:47,745 --> 00:45:49,819
(audience 19) Yeah.
I would also like to ask.
786
00:45:49,819 --> 00:45:53,386
Sorry I haven't delved
into Mbabel earlier.
787
00:45:53,386 --> 00:45:57,018
I'm wondering, you're working also
with the links, the red links.
788
00:45:57,018 --> 00:46:00,052
Are you adding some code there?
789
00:46:03,987 --> 00:46:07,970
- (Erica) For the lists?
- Wherever the link comes from...
790
00:46:07,970 --> 00:46:11,595
(audience 19) The architecture.
Maybe I will have to look into it.
791
00:46:11,595 --> 00:46:13,355
(Erica) I'll show you later.
792
00:46:20,506 --> 00:46:23,221
(moderator) Alright. You're all ready
for snack break, I can tell.
793
00:46:23,221 --> 00:46:24,456
So let's wrap it up.
794
00:46:24,456 --> 00:46:26,429
But our kind speakers,
I'm sure will stick around
795
00:46:26,429 --> 00:46:27,958
if you have questions for them.
796
00:46:27,958 --> 00:46:31,179
Please join me in giving... first of all
we didn't give a round of applause yet.
797
00:46:31,179 --> 00:46:33,221
I can tell you're interested in doing so.
798
00:46:33,221 --> 00:46:34,886
(audience clapping)