-
(Dan) Hello everyone.
-
So this session is about teaching SPARQL.
-
The presenter is Martin Poulter,
so I leave you the stage.
-
Have fun.
-
(Martin) Thank you very much.
-
Hi, everybody.
-
I trust you'll agree
that Wikidata is great,
-
it has lots of interesting data
on different topics,
-
the tools people make with it
are fun to use and fun to explore,
-
and easy to use.
-
And maybe you'll agree with the suggestion
that to get the best out of Wikidata
-
you need to know SPARQL,
-
you need to be able to phrase
your own queries.
-
So you might see that
as a barrier, an obstacle,
-
that we ideally need a big program
of training for developers,
-
for librarians, for curators,
for ordinary people
-
to get them literate in this language,
and that's a big effort,
-
an aspect of Wikidata outreach.
-
My suggestion is to kind of
turn that around,
-
that Wikidata,
especially the Query Service,
-
because it's so helpful,
because it's so full of good stuff,
-
because it's so colorful,
-
because it has so many
visualization abilities,
-
is the ideal platform
for people to learn SPARQL,
-
also to learn about databases,
-
learn about knowledge representation,
-
learn about data and computers.
-
There's no necessity
that someone's first encounter
-
with data and computers,
has to be a relational database system.
-
So I'm going to put forward,
-
I'm going to report on
a training workshop
-
I've delivered to library staff
in University of Oxford,
-
and I've also done as a public event,
-
so just with members of the public
coming to an open data week
-
that university hosted.
-
And also done some of this
with researchers as well.
-
So I teach in a way
that is very particular to me,
-
so it's not like
I hand over materials to you.
-
I'll show you my approach
and then you'll take it up
-
and improve on it,
and make it personal to you
-
and the audiences you're dealing with.
-
And I want to avoid this.
-
So in my career, I had to learn
data technologies, and SQL, and XML,
-
and the content of tutorials,
-
or examples, is very much like this.
-
I'm not objecting to the language--
because that's what you got to learn--
-
but employees, invoices.
-
So your task might be
you have a sales force
-
and you've got to identify
the person who sold the most items,
-
and calculate their bonus
-
and then issue the invoices
to the customers,
-
and it's the most boring--
I can't get excited about that,
-
or I don't feel like I'm learning a topic.
-
With Wikidata, we have so many topics
we can engage people in,
-
and it might be things
in the solar system,
-
or characters in Shakespeare,
-
or things in the solar system
named after characters in Shakespeare,
-
which is what most of this is.
-
So when you have a teaching approach,
-
one question is
what things do you leave out.
-
So in the workshop I run,
I don't explain what SPARQL stands for,
-
that doesn't help you write SPARQL at all.
-
It doesn't help to explain what RDF is.
-
Obviously, it's historically
really important,
-
but telling people there's a format
for describing resources
-
that's called resource description format,
-
and resource is whatever's described,
it's not really a format.
-
That doesn't help people,
-
that gets people no closer to actually,
practically, using this.
-
Linked open data, LOD, I may mention.
-
So the library museum professionals
that come to my training
-
have definitely heard about
linked open data,
-
and know that it's the future
of their discipline,
-
and it's going to
revolutionize their work.
-
But at the moment,
they're not using that kind of system.
-
So they've not seen a real
practical example of that technology.
-
So that's what
they're going to get from this.
-
So I might mention linked open data,
-
but I don't get into the definition.
-
I basically say, this is a service
you can use for free.
-
It's been given to you to use for free,
-
and that gets the point across.
-
Semantic identifiers and namespaces,
-
I want to get across implicitly,
-
I don't want to teach people
these concepts,
-
I want them to pick up the concepts
even if I don't use the terms.
-
Reification, so people already
using a RDF database want to know
-
does Wikidata have statement IDs,
and I try to avoid that.
-
I hardly even mention Wikidata.
-
So these workshops are advertised
as like Introduction to SPARQL,
-
or for the public event one, it was
-
Asking and Answering Questions
with Open Data.
-
And then in the blurb, I'd say
we're going to be using this platform,
-
And I'll introduce it and say,
well, this is the best platform
-
on which to learn
this language, this skill.
-
It's the most helpful,
it's got the most interesting stuff.
-
And then in the course of the workshop,
-
maybe we'll get into more about Wikidata,
-
why this exists, who put this data here.
-
So there's a whole lot of background
-
that kind of professional RDF
or link data people will have,
-
but you don't need.
-
I just want to get people thinking
about nodes and arcs,
-
and thinking in triples,
-
and imagining how a triple representation
can be created and queried.
-
I want them to phrase questions
in their own language,
-
and translate into SPARQL,
via a kind of a baby talk intermediary.
-
But I want them to think in triples
-
and get used to asking questions
in that way, and just to get to the point
-
where they ask interesting questions
relevant to their work, or their hobbies,
-
or whatever, and they come away
with something.
-
So it's not the theoretical understanding
-
that I'm getting
in these quite short sessions.
-
And the first thing I present them with
is this, they've got to look at this.
-
And there's a "what the hell?" reaction
-
in the workshop
and probably in the room now,
-
because, "I thought this was
about technology skills!
-
Why have we got to look at a cute dog?"
-
But this is to introduce my toy world.
-
So there are three human beings.
Two of them are a married couple.
-
One is the child from that couple.
-
There are two beings
that are pets of this couple,
-
and we've got the types of the pets.
-
Clearly, this is not official data.
-
This knowledge representation,
which it is,
-
only exists in this slide,
it's not a database.
-
So I'm getting people thinking
of a toy world.
-
And there's loads that can be learnt
-
with just discussing this,
and kind of role-playing about this.
-
And you're going to
make your own toy world.
-
So a point to come from this
is this isn't a representation
-
of all of my family
or of all my parent's pets.
-
It's a tiny fragment.
-
When we query things,
-
we're querying a representation
of the world, not the world.
-
There's so much that's missed out.
-
That's a really important first lesson
to get about any database, any querying.
-
So everything's expressed
in triples, and nodes, and arcs.
-
Arcs have a direction.
-
How do the names work?
-
So one of these nodes is marked Bob.
-
Is that the name Bob,
does that stand for the name Bob?
-
Well, not quite, because other people
use the name Bob.
-
And Dan, you probably know a Bob.
-
(Dan) Like Bob [inaudible].
-
Yeah, you know a Bob.
-
And that's the Bob I think--
no, that isn't this Bob.
-
So we talk about that.
-
So names are relative
to the system that they're in,
-
and we could talk about Martin's Bob
and Dan's Bob not being the same person.
-
So it's not the names.
-
So we could think of them
as relative to a system.
-
So we can even say Martin:Bob
is the name for one thing,
-
and Dan:Bob identifies another thing
in another system.
-
And I emphasize triples, so three things.
-
You might be tempted to say,
"Cindy and Bob, together, have a pet dog,"
-
but you can't do that in this system
unless you have a node for the couple.
-
Things have to have a direction.
That may not make much sense.
-
There's a married couple--
that doesn't have a direction,
-
that's a relation between two people,
-
but we are modeling it
with things that have a direction
-
so we have to have the two directions.
-
There are arbitrary choices.
-
So why have "Cindy has child, Martin,
and not Martin has parent, Cindy?"
-
It's an arbitrary choice.
-
Arbitrary choices like that--
choices of name, choices of direction--
-
are built into this system and intrinsic.
-
So there are arbitrary choices to be made,
-
how to represent this,
-
even the same facts
could be represented in different ways.
-
Who makes that decision?
-
Well, whoever creates the system,
-
whoever sets up
the knowledge-based system.
-
So people can see that this--
called serializable--
-
this could be expressed
as triple statements.
-
So, "Cindy has pet, Tilly,
Martin is a human,"
-
and getting to the core insight
-
is comparing how do we make
a question in English?
-
Well, we have a statement
and it's incomplete,
-
like, "Who has pet, Tilly?"
-
So we go from "Cindy has pet Tilly,"
to "Who has pet Tilly?"
-
We've taken something out,
-
we've put in a placeholder,
and we've introduced a question mark.
-
I say that's just like
what we do with SPARQL.
-
We take something out,
we have an incomplete statement,
-
or incomplete statements,
-
we put a placeholder in the missing place,
and we have a question mark
-
to mark that that's a placeholder.
-
So it can be a role play
where I'm the query service
-
for this knowledge base.
-
And so people can learn
what a query service does
-
by seeing a query service and role-playing
-
and being a query service,
which we'll get to.
-
So people can see that
working on the level of triples.
-
"Who has pet, Tilly?"
-
If you say that to me, and I can say,
"results Cindy, Bob."
-
Then I put it to the trainees,
-
how do you ask more complicated questions?
-
So, "Who has a dog as a pet?"
-
And some will get it straightaway,
some will say, "Oh, it's a triple--
-
Who? has pet dog?"
-
So my role as the query service
is to look at this and match your triple,
-
"Who? has pet dog,"
-
so I got to find things that have pet dog,
-
and results None.
-
So this is the discussion--
what is this node I've called dog?
-
It's not a dog.
-
Although it's called dog,
it's not a dog, it stands for a class.
-
Obvious when you're a SPARQL user,
but this is getting people
-
over the threshold
of thinking in this way.
-
And you got to do
what kinds of things have pets.
-
People see that they can't do that
in one triple,
-
you got to do multiple triples,
-
and those multiple triples
ask for multiple things.
-
So if you've got,
"What kinds of things have pets?"
-
then you're going to identify people,
-
and then you've got to
identify those types,
-
and it naturally comes up,
"How do I specify the columns I want?
-
How do I specify that I want the types?"
That's the question.
-
And then you say,
"You have these partial statements,
-
and you enclose them
in curly brackets and put Select."
-
So this is kind of the first half hour
of the workshop,
-
and it's not on computers,
it's all with role play
-
and thinking about this.
-
And I invite people in the workshop
to make their own toy world,
-
and you'll be going toy world,
I hope, after this.
-
So five minutes, eight to ten nodes
to represent your family, your work place,
-
the thing you're working on,
the TV you were watching last night,
-
and to have some
meaningful links between them.
-
And the lesson that--
you make arbitrary decisions,
-
you name things, you create properties,
-
but they're the creation of the person
who sets up the knowledge system.
-
And then, in pairs, they explain
their graphs to each other, and query.
-
So, "What's a query you could ask
about this little world,
-
and then what would be the answer?"
-
So, like I say, people mostly get it,
-
but people want a four-
or five-part relation,
-
so they might want to say,
-
"This couple, together, have a pet."
-
Or they might want to say,
"Tilly is a pet, is a dog."
-
And you can enforce nodes, triples,
and triples have a direction.
-
So I'll explain what a triple is
and say also, not in this example,
-
but, "Triples, generally,
they have an item, they have a property,
-
and then they have
a number of other things
-
which could be values,
could be time periods,
-
could be locations on a globe."
-
So with that role-play exercise,
we're 40 minutes into a 2-hour workshop,
-
and in a computer room,
and we haven't touched computers yet.
-
But I think it's useful
to get people thinking in that way,
-
and to think about
how they would make the model
-
and what the query is,
and to actually translate,
-
so your translation exercise.
-
And then I'd direct people to
query.wikidata.org.
-
So there's a bunch of things
they've got to take on.
-
We've been doing--
I will have a flip chart, and we will--
-
Is that six?
-
Six minutes elapsed?
-
(man) [inaudible]
-
Right.
-
So I'll give them a task.
-
I don't want them to learn
Q numbers and P numbers.
-
So I'll tell them what the names are
and show them the Ctrl+Shift trick.
-
But there's a lot to take on,
-
so they're taking on
Q numbers and P numbers,
-
they've seen the triple format,
and they've seen Select,
-
but they've got to apply this
all in one go.
-
So I'll give people a task.
-
Some will get it immediately,
some will struggle
-
because they missed a bit of discussion,
-
or more often, because they're familiar
with another kind of database system,
-
and they have
particular expectations from that.
-
So I set bonus things
or more complicated things
-
if people are getting bored.
-
Or I say, "If you get bored and you work
on an entirely different question,
-
that's fine, but show me."
-
So I'll run through this in front of them,
-
tell them to do it, just show the hints
of what properties they'll be using,
-
and then run through it again.
-
And then, go through the cycle
of adding on extra things
-
to enhance the query.
-
So we might have done a query
and I'll say,
-
"Here's how you add on
an optional property."
-
And then give them a task
involving optional property.
-
In the Bodleian, I say,
"Find manuscripts in Latin
-
for a public event
at University of Bristol,
-
where there's lots of celebrities
who study at the University of Bristol,
-
so get that as an example."
-
So going to the interface,
-
there's still a hump in the learning curve
-
because they've got
to put the query into action,
-
they've got to think in this language,
-
and they've got to look up
Q numbers and P numbers,
-
and then there's all the things
they can do with the query,
-
once they've done it.
-
And the visualization options,
the bookmarking, getting the data.
-
So I'll suggest refinements.
-
So we can take a succession of steps
of getting people doing a query,
-
and taking it up to the next level.
-
Like, "Find landscape paintings
taller than they are wide."
-
So within the two-hour thing,
we get people doing basic queries,
-
adding refinements onto them,
-
not doing much filtering,
-
but starting to introduce measurements,
-
and so on.
-
Not getting into qualifiers
or another level.
-
If it's a whole day thing,
you probably could.
-
It comes up, inevitably, "Where else
can I use the SPARQL language?"
-
And I observe that that is a question,
and questions can be framed in SPARQL,
-
and put to Wikidata,
and you'll get answers,
-
and there is a Wikidata property
called SPARQL endpoint.
-
So when they ask that,
that becomes their task.
-
And then they get
that list of institutions
-
that have SPARQL endpoints.
-
And it's worth pointing out,
-
so in an introductory session
on other computer languages,
-
people will typically
learn how to do loops,
-
how to do functions,
how to do conditionals.
-
They'll learn the basic grammar
-
but they won't make something
fantastic and useful,
-
they'll just learn the basic grammar.
-
But in an introductory session
on Wikidata SPARQL you can make--
-
if you're interested
in German literature--
-
a map of the birthplace
of German poets, and so on.
-
And so we get feedback like this.
-
This is how great
the Wikidata Query Service is
-
as an educational tool.
-
"What is this sorcery?"
Isn't even from someone in the room.
-
A trainee in the room made a map,
-
emailed it to her colleagues
and got back, "What is this sorcery!?
-
How have you made this?"
-
And was just not expecting this to happen.
-
People are not expecting to look at
the picture of the cute dog,
-
they're not expecting to do the role play
where they represent their family
-
and query each other.
-
They're not expecting
to actually make something concrete
-
which they take away as a link
and show to their colleagues.
-
And all of this, being unexpected,
-
makes it memorable
and makes them want to go away
-
and talk to other people about it.
-
It's not like your run-of-the-mill
IT training.
-
The lower quote is from a researcher
who saw how he could make a map
-
of famous people with his first name
-
and another one of famous people
with his wife's first name.
-
And then he just had more and more ideas
of things and charts, and so on,
-
he's going to create with Wikidata,
-
and so he's glad to say,
-
"You've destroyed my productivity
for the next month."
-
So that's my recommendation.
-
I think we can take it as a positive,
-
and we take beyond
training people about Wikidata,
-
training people about data.
-
The stuff that came up
in the keynote this morning,
-
making people literate
about ideas of representation
-
and starting people off
and being involved in that discussion,
-
involves this [inaudible].
-
So this could be done--
-
doesn't have to be like
a workplace training thing,
-
it could be a public event,
-
to get people familiar
with these technologies.
-
But I will stop there for discussion.
-
And like I say, it's respectfully
submitted to people in the room
-
who do SPARQL training a different way,
but I hope this is useful to you.
-
(audience applause)
-
(Dan) Okay, are there any questions?
-
(man) Hi, it's [Mohammed Hijah]
from Palestine.
-
Thank you for the session.
-
I was wondering if there are resources
-
that we can get to learn
SPARQL language professionally?
-
I've got the SPARQL book,
the O'Reilly book.
-
I find the Wikibook on SPARQL
-
is really, really useful.
-
That's like the most useful
and accessible reference.
-
The tutorials on Wikidata itself
are going to vary in quality.
-
(Mohammed) I think
that they are for beginners.
-
I can handle with SPARQL
but in the beginner level,
-
but I want to deal with it professionally.
-
So my concern is to get
as many people as possible
-
across the threshold
into being aware of how this works,
-
and dabbling.
-
I'd like it to be a deeper course
by going into more of the...
-
how it works--
qualifiers and references, and so on.
-
Where in a professional context,
you're probably aiming towards
-
people using a particular SPARQL endpoint,
-
and Wikidata has some customizations
-
We've discussed in Twitter
that there's some things we use
-
that actually aren't a SPARQL standard.
-
They're like an optimization.
-
So in the professional context,
-
I'd hope it would be tailored
to that particular data set and endpoint,
-
but there's not a demand for that yet,
-
because like I said, I deal with people
who are aware of linked open data,
-
and the word out, it's a good thing,
but haven't seen an example yet,
-
haven't an example
they can apply to their work,
-
they're not enthusiastic about it yet.
-
So I think we want to
get my whole workplace
-
and other workplaces and developers
across that threshold
-
to where they're demanding
that kind of really in deep,
-
like using endpoint in a library
kind of training.
-
(Mohammed) Thank you.
-
(woman) It's just a question.
I really liked that, thank you so much.
-
Is it documented step-by-step anywhere?
-
I can share my succession of tasks.
-
That's very much tailored
to where I'm presenting it.
-
Like I said, with librarians,
I start with manuscripts and go on.
-
You want to end up
with people asking a question
-
which is the question they came,
in their heads, to the event with.
-
So there's an order
of querying with a triple,
-
and then with multiple triples,
and then with an optional triple,
-
and then with a measurement
in a filter, and so on.
-
And, yeah, I can share...
-
Yeah, I'll share a separate set of slides
-
for those exercises.
-
(woman) Thank you so much
because I will take that
-
and customize it for my own needs.
Thank you.
-
(Dan) Okay. No questions?
-
(man) What would you recommend
if you also want to teach editing,
-
apart from just querying?
-
I'm pleased to report
that people find Wikidata editing,
-
when I demonstrate it, to be so simple,
-
that it just takes them by surprise.
-
It's Wikidata editing,
and I've got to add knowledge
-
to this huge knowledge base.
-
Sounds like something
that really technical people can do.
-
And then you show it,
and they go, "Oh, right.
-
Martin is instance of human."
-
So I haven't done that systematically yet.
-
I think a precondition would be
getting people thinking in triples,
-
and maybe underline that
triples need references,
-
and triples need qualifiers
and that multiple triples,
-
triples have multiple conflicting values.
-
So I'd still do the toy world,
-
maybe a more professionally relevant
toy world, and translation exercise,
-
but then go to, "So now the exercise
we're going to do with triples
-
is adding them."
-
There's a lot of work done,
and maybe Jason's done,
-
with guessing a table of identifiers.
-
So something I'd like to do,
-
there's an online database
-
of people who've won a Rhodes Scholarship.
-
There's a scholarship to Oxford University
from other countries.
-
But it's not in Wikidata yet.
-
So you can kind of divide up
the room and say,
-
"You're going to find
these people in Wikidata
-
and your task is to add
-
with the reference
to this online database."
-
And then you can do a query
to see how many have been added
-
in that session.
-
So I think, with all the training I do,
-
I think the comprehension
is more important
-
than the taking action immediately.
-
So when I'm training people on Wikipedia,
-
I first show them article histories,
contribution records, talk page,
-
quality scale, so they're comprehending
the process before they edit,
-
and actually change something.
-
(man) Not really a question but a comment.
-
There is, for beginners,
a good tutorial on YouTube,
-
How to Query and Start with SPARQL,
-
and if you want to go deeper, also,
-
How to Add Data with OpenRefine.
-
And I've also made some videos
-
and uploaded them in German language.
-
Oh, great! Thanks.
-
I should also mention Hilary Thorsen,
who's from Stanford Library,
-
did, last week,
a really good video capture
-
of adding a data set to Wikidata
with OpenRefine.
-
This is for the LD4P, the Linked Data
for Production project,
-
and that was a really good video tutorial
-
I'd recommend to anybody for--
-
That's the next couple of levels up
from what I'm doing.
-
(Dan) Is there a last question?
-
(man) So SPARQL's sort of SQL-ish.
-
If someone walked into your tutorial
with an SQL background,
-
is that a blessing or a curse?
-
It's a bit of a curse
because I had to learn SQL,
-
so I did the...
-
generate the invoices
using SQL for your fictitious company,
-
and definitely had to unlearn
an SQL way of thinking about things
-
to get to SPARQL.
-
But it was freeing, it was freeing.
-
Databases without built-in schemas
are liberating.
-
When you think about
how many columns there are,
-
and it's this number
of columns for a book,
-
and it's this number of columns
for the address,
-
and it's just three columns.
-
Well, three and a bit more.
-
That's really liberating.
-
So that's my point, I kind of glanced at,
-
that people make different progress
in these workshops as in all training,
-
but it's not like intelligent versus dumb,
-
it's like the preconceptions
you're coming with,
-
are more the obstacle.
-
So it's actually more--
-
I'm more optimistic about training people
who have never encountered databases,
-
coding, or any of that before, than...
-
The worst people to try and train
are linked data experts
-
because they've used DBpedia a lot.
-
They used a particular approach
of querying
-
and expecting to get certain things,
-
and it looks odd when Wikidata
does things differently.
-
And they need to get with the program.
-
(Dan) Okay, let's thank Martin
for his insights.
-
Thanks very much.
-
(audience applause)