-
Hi, guys! Can everybody hear me?
-
So, hi! Nice to meet you all.
I'm Erica Azzellini.
-
I'm one of the Wikimovement
Brazil's Liaison,
-
and this is my first international
Wikimedia event,
-
so I'm super excited to be here
and I hopefully,
-
will share something interesting for you
all here on this lengthy talk.
-
So this work starts with research
that I was developing in Brazil,
-
Computational Journalism
and Structured Narratives with Wikidata.
-
So in journalism,
-
they're using some natural language
generation software
-
for automating news
-
for news that have
quite similar narrative structure.
-
And we developed this concept here
of structured narratives,
-
thinking about this practice
on computational journalism,
-
that is the development of verbal text,
understandable by humans,
-
automated from predetermined
arrangements that process information
-
from structured databases,
which looks like that,
-
the Wikimedia universe
and on this tool that we developed.
-
So, when I'm talking about verbal text
understandable by humans,
-
I'm talking about Wikipedia entries.
-
When I'm talking about
structured databases,
-
of course, I'm talking about
Wikidata here.
-
And predetermined arrangement,
I'm talking about Mbabel,
-
that is this tool.
-
The Mbabel tool was inspired by a template
by user Pharos, right here in front of me,
-
thank you very much,
-
and it was developed with Ederporto
that is right here too,
-
the brilliant Ederporto.
-
We developed this tool
-
that automatically generates
Wikipedia entries
-
based on information from Wikidata.
-
We actually do some thematic templates
-
that are created on the Wikidata module,
-
WikidataIB Module,
-
and these templates are pre-determined,
generic and editable templates
-
for various article themes.
-
We realized that many Wikipedia entries
had a quite similar structured narrative
-
so we could create a tool
that automatically generates that
-
for many Wikidata items.
-
Until now we have templates for museums,
works of art, books, films,
-
journals, earthquakes, libraries,
archives,
-
and Brazilian municipal
and state elections, and growing.
-
So, everybody here is able to contribute
and create new templates.
-
Each narrative template includes
an introduction, Wikidata infobox,
-
section suggestions for the users,
-
content tables or lists with Listeria,
depending on the case,
-
references and categories,
and of course the sentences,
-
that are created
with the Wikidata information.
-
I'm gonna show you in a sec
an example of that.
-
It's an integration with Wikipedia,
integration with Wikidata,
-
so the more properties properly filled
on Wikidata,
-
the more text entries you'll get
on your article stub.
-
That's very important to highlight here.
-
Structuring this Wikidata
can get more complex
-
as I'm going to show you
on the election projects that we've made.
-
So I'm going to let you hear this
Wikidata Lab XIV for you
-
after this lengthy talk
-
that is very brief,
so you'll be able to choose
-
on the work that we've been doing
on structuring Wikidata
-
for this purpose too.
-
We have this challenge to build
a narrative template
-
that is generic enough
to cover different Wikidata items
-
and to suppress the gender
-
and the number of difficulties
of languages,
-
and still sounding natural for the user
-
because we don't want to sound like
it doesn't click for the user
-
to edit after that.
-
This is how the Mbabel looks like
on the bottom form.
-
You just have insert the item number there
and call the desired template
-
and then you have article to edit
and expand, and everything.
-
So, more importantly, why we did it?
Not because it's cool to develop
-
things here in Wikidata,
we know, we all hear, know about it.
-
But we are experimenting this integration
from Wikidata to Wikipedia
-
and we want to focus
on meaningful individual contributions.
-
So we've been working
on education programs
-
and we want the students to feel the value
-
of their entries too, but not only--
-
Oh, five minutes only,
Geez, I'm gonna rush here.
-
(laughing)
-
And we want you all to make tasks
for users in general,
-
especially on tables
and this kind of content
-
that it's a bit of a rush to do.
-
And we're working on this concept
of abstract Wikipedia.
-
Denny Vrandečić wrote an article
super interesting about it
-
so I linked here too.
-
And we also want to now support
small language communities
-
to fill the lack of content there.
-
This is an example of how we've been using
this Mbabel tool for GLAM
-
and education programs,
-
and I showed you earlier
the bottom form of the Mbabel tool
-
but also we can make red links
that aren't exactly empty.
-
So you click on this red link
-
and you automatically have
this article draft
-
on your user page to edit.
-
And I'm going to briefly talk about it
because I only have some minutes more.
-
On educational projects,
-
we've been doing this with elections
in Brazil for journalism students.
-
We have the experience
with the [inaudible] students
-
with user Joalpe--
he's not here right now,
-
but we all know him, I think.
-
And we realize that we have the data
about Brazilian elections
-
but we don't have media cover on it.
-
So we were lacking also
Wikipedia entries on it.
-
How do we insert this meaningful
information on Wikipedia
-
that people really access?
-
Next year we're going
to have some election,
-
people are going to look for
this kind of information on Wikipedia
-
and they simply won't find it.
-
So this tool looks quite useful
for this purpose
-
and the students were introduced,
not only to Wikipedia,
-
but also to Wikidata.
-
Actually, they were introduced
to Wikipedia with Wikidata,
-
which is an experience super interesting
and we had a lot of fun,
-
and it was quite challenging
to organize all that.
-
We can talk about it later too.
-
And they also added the background
and the analysis sections
-
on these elections articles,
-
because we don't want them
to just simply automate the content there.
-
We can do better.
-
So this is the example
I'm going to show you.
-
This is from a municipal election
in Brazil.
-
Two minutes... oh my!
-
This example here was entirely created
with the Mbabel tool.
-
You have here this introduction text.
It really sounds natural for the reader.
-
The Wikidata infobox here--
-
it's a masterpiece
of Ederporto right there.
-
(laughter)
-
And we have here the tables with the
election results for each position.
-
And we also have these results here
on the textual form too,
-
so it really looks like an article
that was made, that was handcrafted.
-
The references here were also made
with the Mbabel tool
-
and we used identifiers
to build these references here
-
and the categories too.
-
So, to wrap things up here,
it is still a work in progress,
-
and we have some challenges
on outreach and technical
-
to bring Mbabel
to other language communities,
-
especially the smaller ones,
-
and how do we support those tools
-
on lower resource
language communities too.
-
And finally, is it possible
to create an Mbabel
-
that overcomes language barriers?
-
I think that's a question
very interesting for the conference
-
and hopefully we can figure
that out together.
-
So, thank you very much,
and look for the Mbabel poster downstairs
-
if you like to have all this information
wrapped up, okay?
-
Thank you.
-
(audience clapping)
-
(moderator) I'm afraid
we're a little too short for questions
-
but yes, Erica, as she said,
has a poster and is very friendly.
-
So I'm sure you can talk to her
afterwards,
-
and if there's time at the end,
I'll allow it.
-
But in the meantime,
I'd like to bring up our next speaker...
-
Thank you.
-
(audience chattering)
-
Next we've got Yolanda Gil,
talking about Wikidata and Geosciences.
-
Thank you.
-
I come from the University
of Southern California
-
and I've been working with
Semantic Technologies for a long time.
-
I want to talk about geosciences
in particular,
-
where this idea of crowd-sourcing
from the community is very important.
-
So I'll give you a sense
that individual scientists,
-
most of them in colleges,
-
collect their own data
for their particular project.
-
They describe it in their own way.
-
They use their own properties,
their own metadata characteristics.
-
This is an example
of some collaborators of mine
-
that collect data from a river.
-
They have their own sensors,
their own robots,
-
and they study the water quality.
-
I'm going to talk today about an effort
that we did to crowdsource metadata
-
for a community that works
in paleoclimate.
-
The article just came out
so it's in the slides if you're curious,
-
but it's a pretty large community
that work together
-
to integrate data more efficiently
through crowdsourcing.
-
So, if you've heard of the
hockey stick graphics for climate,
-
this is the community that does this.
-
This is a study for climate
in the last 200 years,
-
and it takes them literally many years
to look at data
-
from different parts of the globe.
-
Each dataset is collected by
a different investigator.
-
The data is very, very different,
-
so it takes them a long time
to put together
-
these global studies of climate,
-
and our goal is to make that
more efficient.
-
So, I've done a lot of work
over the years.
-
Going back to 2005, we used to call it,
-
"Knowledge Collection from Web Volunteers"
-
or from netizens at that time.
-
We had a system called "Learner."
-
It collected 700,000 common sense,
-
common knowledge statements
about the world.
-
We did a lot of different techniques.
-
The forms that we did
to extract knowledge from volunteers
-
really fit the knowledge models,
the data models that we used
-
and the properties that we wanted to use.
-
I worked with Denny
in the system called "Shortipedia"
-
when he was a Post Doc at ISI,
-
looking at keeping track
of the prominence of the assertions,
-
and we started to build
on Semantic Media Wiki software.
-
So everything that
I'm going to describe today
-
builds on that software,
-
but I think that now we have Wikibase,
-
we'll be starting to work more
on Wikibase.
-
So the LinkedEarth is the project
where we work with paleoclimate scientists
-
to crowdsource the metadata,
-
and seeing the title that we said,
"controlled crowdsourcing."
-
So we found a nice niche
-
where we could let them create
new properties
-
but we had an editorial process for it.
-
So I'll describe to you how it works.
-
For them, if you're looking at a sample
from lake sediments from 200 years ago,
-
you use different properties
to describe it
-
than if you have coral sediments
that you're looking at
-
or coral samples that you're looking at
that you extract from the ocean.
-
Palmyra is a coral atoll in the Pacific.
-
So if you have coral, you care
about the species and the genus,
-
but if you're just looking at lake sand,
you don't have that.
-
So each type of sample
has very different properties.
-
In LinkedEarth,
they're able to see in a map
-
where the datasets are.
-
They actually annotate their own datasets
or the datasets of other researchers
-
when they're using it.
-
So they have a reason
why they want certain properties
-
to describe those datasets.
-
Whenever there are disagreements,
or whenever there are agreements,
-
there's community discussions
about them
-
and they're also polls to decide on
what properties to settle.
-
So it's a nice ecosystem.
I'll give you examples.
-
You look at a particular dataset,
in this case it's a lake in Africa.
-
So you have the category of the page;
it can be a dataset,
-
it can be other things.
-
You can download the dataset itself
and you have kind of canonical properties
-
that they have all agreed to have
for datasets,
-
and then under Extra Information,
-
those are properties
that the person describing this dataset,
-
added on their own accord.
-
So these can be new properties.
-
We call them "crowd properties,"
rather than "core properties."
-
And then when you're describing
your dataset,
-
in this case
it's an ice core that you got
-
from a glacier dataset,
-
and your'e adding a dataset
you want to talk about measurements,
-
you have an offering
of all the existing properties
-
that match what you're saying.
-
So we do this search completion
so that you can adopt that.
-
That promotes normalization.
-
The core of the properties
has been agreed by the community
-
so we're really extending that core.
-
And that core is very important
because it gives structure
-
to all the extensions.
-
We engage the community
through many different ways.
-
We had one face-to-face meeting
at the beginning
-
and after about a year and a half,
we do have a new standard,
-
and a new way for them
to continue to evolve that standard.
-
They have editors, very much
in the Wikipedia style
-
of editorial boards.
-
They have working groups
for different types of data.
-
They do polls with the community,
-
and they have pretty nice engagement
of the community at large,
-
even if they've never visited our Wiki.
-
The metadata evolves
-
so what we do is that people annotate
their datasets,
-
then the schema evolves,
the properties evolve
-
and we have an entire infrastructure
and mechanisms
-
to re-annotate the datasets
with the new structure of the ontology
-
and the new properties.
-
This is described in the paper.
I won't go into the details.
-
But I think that
having that kind of capability
-
in Wikibase would be really interesting.
-
We basically extended
Semantic Media Wiki and Media Wiki
-
to create our own infrastructure.
-
I think a lot of this is now something
that we find in Wikibase,
-
but this is older than that.
-
And in general, we have many projects
where we look at crowdsourcing
-
not just descriptions of datasets
but also descriptions of hydrology models,
-
descriptions of multi-step
data analytic workflows
-
and many other things in the sciences.
-
So we are also interested in including
in Wikidata additional things
-
that are not just datasets or entities
-
but also other things
that have to do with science.
-
I think Geosciences are more complex
in this sense than Biology, for example.
-
That's it.
-
Thank you.
(audience clapping)
-
- Do I have time for questions?
- Yes.
-
(moderator) We have time
for just a couple of short questions.
-
When answering,
can go back to the microphone?
-
- Yes.
- Hopefully, yeah.
-
(audience 1) Does the structure allow
tabular datasets to be described
-
and can you talk a bit about that?
-
Yes. So the properties of the datasets
talk more about who collected them,
-
what kind of data was collected,
what kind of sample it was,
-
and then there's a separate standard
which is called "lipid"
-
that's complementary and mapped
to the properties
-
that describes the format
of the actual files
-
and the actual structure of the data.
-
So, you're right that there's both,
"how do I find data about x"
-
but also, "Now, how do I use it?
-
How do I know where
the temperature that I'm looking for
-
is actually in the file?"
-
(moderator) This will be the last.
-
(audience 2) I'll have
to make it relevant.
-
So, you have shown this process
of how users can suggest
-
or like actually already put in
properties,
-
and I didn't fully understand
how this thing works,
-
or what's the process behind it.
-
Is there some kind of
folksonomy approach--obviously--
-
but how is it promoted
into the core vocabulary
-
if something is promoted?
-
Yes, yes. It is.
-
So what we do is we have a core ontology
and the initial one was actually
-
very thoughtfully put together
through a lot of discussion
-
by very few people.
-
And then the idea was
the whole community can extend that
-
or propose changes to that.
-
So, as they are describing datasets,
they can add new properties
-
and those become "crowd properties."
-
And every now and then,
the Editorial Committee
-
looks at all of those properties,
-
the working groups look at all of those
crowd properties,
-
and decide whether to incorporate them
into the main ontology.
-
So it could be because they're used
for a lot of dataset descriptions.
-
It could be because
they are proposed by somebody
-
and they're found to be really interesting
or key, or uncontroversial.
-
So there's an entire editorial process
to incorporate those new crowd properties
-
or the folksonomy part of it,
-
but they are really built around the core
of the ontology.
-
The core ontology then grows
with more crowd properties
-
and then people propose
additional crowd properties again.
-
So we've gone through a couple
of these iterations
-
of rolling out a new core,
and then extending it,
-
and then rolling out a new core
and then extending it.
-
- (audience 2) Great. Thank you.
- Thanks.
-
(moderator) Thank you.
(audience applauding)
-
(moderator) Thank you, Yolanda.
-
And now we have Adam Shorn
with "Something About Wikibase,"
-
according to the title.
-
Uh... where's the internet? There it is.
-
So, I'm going to do a live demo,
which is probably a bad idea
-
but I'm going to try and do it
as the birthday present later
-
so I figure I might as well try it here.
-
And I also have some notes on my phone
because I have no slides.
-
So, two years ago,
I made these Wikibase doc images
-
that quite a few people have tried out,
-
and even before then,
I was working on another project,
-
which is kind of ready now,
and here it is.
-
It's a website that allows you
to instantly create a Wikibase
-
with a query service and quick statements,
-
without needing to know about
any of the technical details,
-
without needing to manage
any of them either.
-
There are still lots of features to go
and there's still some bugs,
-
but here goes the demo.
-
Let me get my emails up ready...
because I need them too...
-
Da da da... Stopwatch.
-
Okay.
-
So it's a simple as...
at the moment it's locked down behind...
-
Oh no! German keyboard!
-
(audience laughing)
-
Foiled... okay.
-
Okay.
-
(audience continues to laugh)
-
Aha! Okay.
-
I'll remember that for later.
(laughs)
-
Yes.
-
♪ (humming) ♪
-
Oh my god... now it's American.
-
All you have to do is create an account...
-
da da da...
-
Click this button up here...
-
Come up with a name for Wiki--
"Demo1"
-
"Demo1"
-
"Demo user"
-
Agree to the terms
which don't really exist yet.
-
(audience laughing)
-
Click on this thing which isn't a link.
-
And then you have your Wikibase.
-
(audience cheers and claps)
-
Anmelden in German.
-
Demo... oh god! I'm learning lots about
my demo later.
-
1-6-1-4-S-G...
-
- (audience 3) Y...
- (Adam) It's random.
-
(audience laughing)
-
Oh, come on....
(audience laughing)
-
Oh no. It's because this is a capital U...
-
(audience chattering)
-
6-1-4....
-
S-G-ENJ...
-
Is J... oh no. That's... oh yeah. Okay.
-
I'm really... I'm gonna have to look
at the laptop
-
that I'm doing this on later.
-
Cool...
-
Da da da da da...
-
Maybe I should have some things
in my clipboard ready.
-
Okay, so now I'm logged in.
-
Oh... keyboards.
-
So you can go and create an item...
-
Yeah, maybe I should make a video.
It might be easier.
-
So, yeah. You can make items,
you have quick statements here
-
that have... oh... it is all in German.
-
(audience laughing)
-
(sighs)
-
Oh, log in? Log in?
-
It has... Oh, set up ready.
-
Da da da...
-
It's as easy as...
-
I learned how to use
Quick Statements yesterday...
-
that's what I know how to do.
-
I can then go back to the Wiki...
-
We can go and see in Recent Changes
-
that there are now two items,
the one that I made
-
and the one from Quick Statements...
-
and then you go to Quick...
-
♪ (hums a tune) ♪
-
Stop...no...
-
No...
-
(audience laughing)
-
Oh god...
-
I'm glad I tried this out in advance.
-
There you go.
And the query service is updated.
-
(audience clapping)
-
And the idea of this is it'll allow
people to try out Wikibases.
-
Hopefully, it'll even be able
to allow people to...
-
have their real Wikibases here.
-
At the moment you can create
as many as you want
-
and they all just appear
in this lovely list.
-
As I said, there's lots of bugs
but it's all super quick.
-
Exactly how this is going to continue
in the future, we don't know yet
-
because I only finished writing this
in the last few days.
-
It's currently behind an invitation code
so that if you want to come try it out,
-
come and talk to me.
-
And if you have any other comments
or thoughts, let me know.
-
Oh, three minutes...40. That's...
That's not that bad.
-
Thanks.
-
(audience clapping)
-
Any questions?
-
(audience 5) Does the Quick Statements
and the Query Service
-
are automatically updated?
-
Yes. So the idea is that
there will be somebody,
-
at the moment, me,
-
maintaining all of the horrible stuff
-
that you don't have to behind the scenes.
-
So kind of think of it like GitHub.com,
-
but you don't have to know anything
about Git to use it. It's just all there.
-
- [inaudible]
- Yeah, we'll get that.
-
But any of those
big hosted solution things.
-
- (audience 6) A feature request.
- Yes.
-
Is there any-- In Scope
-
do you have plans on making it
so you can easily import existing...
-
- Wikidata...
- I have loads of plans.
-
Like I want there to be a button
where you can just import
-
another whole Wikibase and all of--yeah.
-
There will, in the future list
that's really long. Yeah.
-
(audience 7) I understand that it's...
you want to make it user-friendly
-
but if I want to access
to the machine itself, can I do that?
-
Nope.
(audience laughing)
-
So again, like, in the longer term future,
there are possib...
-
Everything's possible,
but at the moment, no.
-
(audience 8) Two questions.
Is there a plan to have export tools
-
so that you can export it
to your own Wikibase maybe at some point?
-
- Yes.
- Great.
-
And is this a business?
-
I have no idea.
(audience laughing)
-
Not currently.
-
(audience 9) What if I stop
using it tomorrow,
-
how long will the data be there?
-
So my plan was at the end of WikidataCon
I was going to delete all of the data
-
and there's a Wikibase Workshop
on a Sunday,
-
and we will maybe be using this
for the Wikibase workshop
-
so that everyone can have
their own Wikibase.
-
And then, from that point,
I probably won't be deleting the data
-
so it will all just stay there.
-
(moderator) Question.
-
(audience 10) It's two minutes...
-
Alright, fine. I'll allow two more
questions if you talk quickly.
-
(audience laughing)
-
- Alright, good people.
- Thank you, Adam.
-
Thank you for letting me test
my demo... I mean...
-
I'm going to do it different.
(audience clapping)
-
(moderator) Thank you.
-
Now we have Dennis Diefenbach
presenting Q Answer.
-
Hello, I'm Dennis Diefenbach,
I would like to present Q-Answer
-
which is a question-answering system
on top of Wikidata.
-
So, what we need are some questions
and this is the interface of QAnswer.
-
For example, where is WikidataCon?
-
Alright, I think it's written like this.
-
2019... And we get this response
which is Berlin.
-
So, other questions. For example,
"When did Wikidata start?"
-
It started the 30 October 2012
so it's birthday is approaching.
-
It is 6 years old,
so it will be their 7th birthday.
-
Who is developing Wikidata?
-
The Wikimedia Foundation
and Wikimedia Deutschland,
-
so thank you very much to them.
-
Something like museums in Berlin...
I don't know why this is not so...
-
Only one museum... no, yeah, a few more.
-
So, when you ask something like this,
-
we allow the user
to explore the information
-
with different aggregations.
-
For example,
if there are many geo coordinates
-
attached to the entities,
we will display a map.
-
If there are many images attached to them,
we will display the images,
-
and otherwise there is a list
where you can explore
-
the different entities.
-
You can ask something like
"Who is the mayor of Berlin,"
-
"Give me politicians born in Berlin,"
and things like this.
-
So you can both ask keyword questions
and foreign natural language questions.
-
The whole data is coming from Wikidata
-
so all entities which are in Wikidata
are queryable by this service.
-
And the data is really all from Wikidata
-
in the sense,
there are some Wikipedia snippets,
-
there are images from Wikimedia Commons,
-
but the rest is all Wikidata data.
-
We can do this in several languages.
This is now in Chinese.
-
I don't know what is written there
so do not ask me.
-
We are currently supporting this languages
with more or less good quality
-
because... yeah.
-
So, how can this be useful
for the Wikidata community?
-
I think there are different reasons.
-
First of all, this thing helps you
to generate SPARQL queries
-
and I know there are even some workshops
about how to use SPARQL.
-
It's not a language that everyone speaks.
-
So, if you ask something like
"a philosopher born before 1908,"
-
to figure out, to construct
a SPARQL query like this could be tricky,
-
In fact when you ask a question,
we generate many SPARQL queries
-
and the first one is always the thing,
the SPARQL query where we think
-
this is the good one.
-
So, if you ask your question
and then you go on SPARQL list,
-
then there is this button
for the Wikidata query service
-
and you have the SPARQL query right there
and you will get the same result
-
as you would get in the interface.
-
Another thing where it could be useful for
-
is for finding missing
contextual information.
-
For example, if you ask for actors
in "The Lord of the Rings,"
-
most of these entities
will have associated an image
-
but not all of them.
-
So here there is some missing metadata
that could be added.
-
You could go to this entity at an image
-
and then see first
that there is an image missing and so on.
-
Another thing is that you could find
schema issues.
-
For example, if you ask
"books by Andrea Camilleri,"
-
which is a famous Italian writer,
-
you would currently get
these three books.
-
But he wrote many more.
He wrote more than 50.
-
And so the question is,
are they not in Wikidata
-
or is maybe my knowledge
not correctly currently like it is.
-
And in this case, I know
there is another book from him,
-
which is "Un mese con Montalbano."
-
It has only an Italian label
so you can only search it in Italian.
-
And if you go to this entity,
you will say that he has written it.
-
It's a short story by Andrea Camilleri
and it's an instance of literary work,
-
but it's not instance of book
-
so that's the reason why
it doesn't appear.
-
This is a way to track
where things are missing
-
in the Wikidata model
-
not as you would expect.
-
Another reason is just to have fun.
-
I imagine that many of you added
many Wikidata entities
-
so just search for the ones
that you care most
-
or you have edited yourself.
-
So in this case, who developed
QAnswer, and that's it.
-
For any other questions,
go to www.QAnswer.eu/qa
-
and hopefully we'll find
an answer for you.
-
(audience clapping)
-
- Sorry.
- I'm just the dumbest person here.
-
(audience 11) So I want to know
how is this kind of agnostic
-
to Wikibase instance,
-
or has it been tied to the exact
like property numbers
-
and things in Wikidata?
-
Has it learned in some way
or how was it set up?
-
There is training data
and we rely on training data
-
and this is also most of the cases
why you will not get good resutls.
-
But we're training the system
by the simple yes and no answer.
-
When you ask a question,
and we ask always for feedback, yes or no,
-
and this feedback is used by
the machine learning algorithm.
-
This is where machine learning
comes into play.
-
But basically, we put up separate
Wikibase instances
-
and we can plug this in.
-
In fact, the system is agnostic
in the sense that it only wants RDF.
-
And RDF, you have in each Wikibase,
-
there are some few configurations
-
but you can have this on top
of any Wikibase.
-
(audience 11) Awesome.
-
(audience 12) You mentioned that
it's being trained by yes/no answers.
-
So I guess this is assuming that
the Wikidata instance is free of errors
-
or is it also...?
-
You assume that the Wikidata instances...
-
(audience 12) I guess I'm asking, like,
are you distinguishing
-
between source level errors
or misunderstanding the question
-
versus a bad mapping, etc.?
-
Generally, we assume that the data
in Wikidata is true.
-
So if you click "no"
and the data in Wikidata would be false,
-
then yeah... we would not catch
this difference.
-
But sincerely, Wikidata quality
is very good,
-
so I rarely have had this problem.
-
(audience 12) Is this data available
as a dataset by any chance, sir?
-
- What is... direct service?
- The... dataset of...
-
"is this answer correct
versus the query versus the answer?"
-
Is that something you're publishing
as part of this?
-
- The training data that you've...
- We published the training data.
-
We published some old training data
but no, just a--
-
There is a question there.
I don't know if we have still time.
-
(audience 13) Maybe I just missed this
but is it running on a live,
-
like the Live Query Service,
-
or is it running on
some static dump you loaded
-
or where is the data source
for Wikidata?
-
Yes. The problem is
to apply this technology,
-
you need a local dump.
-
Because we do not rely only
on the SPARQL end point,
-
we rely on special indexes.
-
So, we are currently loading
the Wikidata dump.
-
We are updating this every two weeks.
-
We would like to do it more often,
-
in fact we would like to get the difs
for each day, for example,
-
to put them in our index.
-
But unfortunately, right now,
the Wikidata dumps are released
-
only once every week.
-
So, we cannot be faster than that
and we also need some time
-
to re-index the data,
so it takes one or two days.
-
So we are always behind. Yeah.
-
(moderator) Any more?
-
- Okay, thank you very much.
- Thank you all very much.
-
(audience clapping)
-
(moderator) And now last, we have
Eugene Alvin Villar,
-
talking about Panandâ.
-
Good afternoon,
my name is Eugene Alvin Villar
-
and I'm from the Philippines,
and I'll be talking about Panandâ:
-
a mobile app powered by Wikidata.
-
This is a follow-up to my lightning talk
that I presented two years ago
-
at WikidataCon 2017
together with Carlo Moskito.
-
You can download the slides
-
and there's a link
to that presentation there.
-
I'll give you a bit of a background.
-
Wiki Society of the Philippines,
formerly, Wikimedia Philippines,
-
had a series of projects related
to Philippine heritage and history.
-
So we have the usual photo contests,
Wikipedia Takes Manila,
-
Wiki Loves Monuments,
-
and then our media project
was Cultural Heritage Mapping Project
-
back in 2014-2015.
-
In that project, we trained volunteers
to edit articles
-
related to cultural heritage.
-
This is our biggest,
and most successful project that we had.
-
794 articles were created or improved,
including 37 "Did You Knows"
-
and 4 "Good Articles,"
-
and more than 5,000 images were uploaded
to Commons.
-
As a result of that, we then launched
-
the Encyclopedia
of Philippine Heritage program
-
in order to expand the scope
and also include Wikidata in the scope.
-
Here's the Core Team: myself,
Carlo and Roel.
-
Our first pilot project was to document
the country's historical markers
-
in Wikidata and Commons,
-
starting with those created by
our historical national agency, NHCP.
-
For example, they installed a marker
for our national hero, here in Berlin,
-
so there's no Wikidata page
for that marker
-
and a collection of photos of that marker
in Commons.
-
Unfortunately, the government agency
does not keep a good database
-
up-to-date or complete of their markers,
-
so we have to painstakingly input these
to Wikidata manually.
-
After careful research and confirmation,
here's a graph of the number of markers
-
that we've added to Wikidata over time,
over the past three years.
-
And we've developed
this Historical Markers Map web app
-
that lets users view
these markers on a map,
-
so we can browse it as a list,
view a good visualization of the markers
-
with information and inscriptions.
-
All of this is powered by Live Query
from Wikidata Query Service.
-
There's the link
if you want to play around with it.
-
And so we developed
a mobile app for this one.
-
To better publicize our project,
I developed the Panandâ
-
which is Tagalog for "marker",
as an android app,
-
that was published back in 2018,
-
and I'll publish the IOS version
sometime in the future, hopefully.
-
I'd like to demo the app
but we have no time,
-
so here are some
of the features of the app.
-
There's a Map and a List view,
with text search,
-
so you can drill down as needed.
-
You can filter by region or by distance,
-
and whether you have marked
these markers,
-
as either you have visited them
or you'd like to bookmark them
-
for future visits.
-
Then you can use your GPS
on your mobile phone
-
to use for distance filtering.
-
For example, if I want markers
that are near me, you can do that.
-
And when you click on the Details page,
you can see the same thing,
-
photos from Commons,
inscription about the marker,
-
how to find the marker,
its location and address, etc.
-
And one thing that's unique for this app
is you can, again, visit
-
or put a bookmark of these,
so on the map or on the list,
-
or on the Details page,
-
you can just tap on those buttons
and say that you've visited them,
-
or you'd like to bookmark them
for future visits.
-
And my app has been covered by the press
and given recognition,
-
so plenty of local press articles.
-
Recently, it was selected
as one of the Top 5 finalists
-
for the Android Masters competition
in the App for Social Good category.
-
The final event will be next month.
-
Hopefully, we'll win.
-
Okay, so some behind the scenes.
-
How did I develop this app?
-
Panandâ is actually a hybrid app,
it's not native.
-
Basically it's just a web app
packaged as a mobile app
-
using Apache Cordova.
-
That reduces development time
-
because I don't have to learn
a different language.
-
I know JavaScript, HTML.
-
It's cross-platform, allows code reuse
from the Historical Markers Map.
-
And the app is also FIN Open Source.
under the MIT license.
-
So there's the GitHub repository
over there.
-
The challenge is
the apps data is not live.
-
Because if you query the data live,
-
it means you pulling around half
a megabyte of compressed JSON every time
-
which is not friendly
for those on mobile data,
-
incurs too much delay when starting
the app,
-
and if there are any errors in Wikidata,
that may result in poor user experience.
-
So instead, what I did was
the app is updated every few months
-
with fresh data, compiled using
a Perl script
-
that queries Wikidata Query Service,
-
and this script also does
some data validation
-
to highlight consistency or schema errors,
so that allows fixes before updates
-
in order to provide a good experience
for the mobile user.
-
And here's the... if you're tech-oriented,
here's the more or less,
-
the technologies that I'm using.
-
So a bunch of JavaScript libraries.
-
Here's the first script
that queries Wikidata,
-
some Cordova plug-ins,
-
and building it using Cordova
and then publishing this app.
-
And that's it.
-
(audience clapping)
-
(moderator) I hope you win.
Alright, questions.
-
(audience 14) Sorry if I missed this.
-
Are you opening your code
so the people can adapt your app
-
and do it for other cities?
-
Yes, as I've mentioned,
the app is free and open source,
-
- (audience 14) But where is it?
- There's the GitHub repository.
-
You can download the slides,
and there's a link
-
in one of the previous slides
to the repository.
-
(audience 14) Okay. Can you put it?
-
Yeah, at the bottom.
-
(audience 15) Hi. Sorry, maybe
I also missed this,
-
but how do you check for a schema errors?
-
Basically, we have a Wikiproject
on Wikidata,
-
so we try to put the other guidelines
on how to model these markers correctly.
-
Although it's not updated right now.
-
As far as I know, we're the only country
-
that's currently modeling these
in Wikidata.
-
There's also an effort
to add [inaudible]
-
in Wikidata,
-
but I think that's
a different thing altogether.
-
(audience 16) So I guess this may be part
-
of this Wikiproject you just described,
-
but for the consistency checks,
have you considered moving those
-
into like complex schema constraints
that then can be flagged
-
on the Wikidata side for
what there is to fix on there?
-
I'm actually interested in seeing
if I can do, for example,
-
shape expressions, so that, yeah,
we can do those things.
-
(moderator) At this point,
we have quite a few minutes left.
-
The speakers did very well,
so if Erica is okay with it,
-
I'm also going to allow
some time for questions,
-
still about this presentation,
but also about Mbabel,
-
if anyone wants to jump in
with something there,
-
either presentation is fair game.
-
Unless like me, you're all so dazzled
that you just want to go to snacks
-
and think about it.
(audience giggles)
-
- (moderator) You know...
- Yeah.
-
(audience 17) I will always have
questions about everything.
-
So, I came in late for the Mbabel tool.
-
But I was looking through
and I saw there's a number of templates,
-
and I was wondering
if there's a place to contribute
-
to adding more templates
for different types
-
or different languages and the like?
-
(Erica) So for now, we're developing
those narrative templates
-
on Portuguese Wikipedia.
-
I can show you if you like.
-
We're inserting those templates
on English Wikipedia too.
-
It's not complicated to do
but we have to expand for other languages.
-
- French?
- French.
-
- Yes.
- French and German already have.
-
(laughing)
-
Yeah.
-
(inaudible chatter)
-
(audience 18) I also have a question
about Mbabel,
-
which is, is this really just templates?
-
Is this based on the LUA scripting?
Is that all? Wow. Okay.
-
Yeah, so it's very deployable. Okay. Cool.
-
(moderator) Just to catch that
for the live stream,
-
the answer was an emphatic nod
of the head, and a yes.
-
(audience laughing)
-
- (Erica) Super simple.
- (moderator) Super simple.
-
(audience 19) Yeah.
I would also like to ask.
-
Sorry I haven't delved
into Mbabel earlier.
-
I'm wondering, you're working also
with the links, the red links.
-
Are you adding some code there?
-
- (Erica) For the lists?
- Wherever the link comes from...
-
(audience 19) The architecture.
Maybe I will have to look into it.
-
(Erica) I'll show you later.
-
(moderator) Alright. You're all ready
for snack break, I can tell.
-
So let's wrap it up.
-
But our kind speakers,
I'm sure will stick around
-
if you have questions for them.
-
Please join me in giving... first of all
we didn't give a round of applause yet.
-
I can tell you're interested in doing so.
-
(audience clapping)