(Dan) Hello everyone.
So this session is about teaching SPARQL.
The presenter is Martin Poulter,
so I leave you the stage.
Have fun.
(Martin) Thank you very much.
Hi, everybody.
I trust you'll agree
that Wikidata is great,
it has lots of interesting data
on different topics,
the tools people make with it
are fun to use and fun to explore,
and easy to use.
And maybe you'll agree with the suggestion
that to get the best out of Wikidata
you need to know SPARQL,
you need to be able to phrase
your own queries.
So you might see that
as a barrier, an obstacle,
that we ideally need a big program
of training for developers,
for librarians, for curators,
for ordinary people
to get them literate in this language,
and that's a big effort,
an aspect of Wikidata outreach.
My suggestion is to kind of
turn that around,
that Wikidata,
especially the Query Service,
because it's so helpful,
because it's so full of good stuff,
because it's so colorful,
because it has so many
visualization abilities,
is the ideal platform
for people to learn SPARQL,
also to learn about databases,
learn about knowledge representation,
learn about data and computers.
There's no necessity
that someone's first encounter
with data and computers,
has to be a relational database system.
So I'm going to put forward,
I'm going to report on
a training workshop
I've delivered to library staff
in University of Oxford,
and I've also done as a public event,
so just with members of the public
coming to an open data week
that university hosted.
And also done some of this
with researchers as well.
So I teach in a way
that is very particular to me,
so it's not like
I hand over materials to you.
I'll show you my approach
and then you'll take it up
and improve on it,
and make it personal to you
and the audiences you're dealing with.
And I want to avoid this.
So in my career, I had to learn
data technologies, and SQL, and XML,
and the content of tutorials,
or examples, is very much like this.
I'm not objecting to the language--
because that's what you got to learn--
but employees, invoices.
So your task might be
you have a sales force
and you've got to identify
the person who sold the most items,
and calculate their bonus
and then issue the invoices
to the customers,
and it's the most boring--
I can't get excited about that,
or I don't feel like I'm learning a topic.
With Wikidata, we have so many topics
we can engage people in,
and it might be things
in the solar system,
or characters in Shakespeare,
or things in the solar system
named after characters in Shakespeare,
which is what most of this is.
So when you have a teaching approach,
one question is
what things do you leave out.
So in the workshop I run,
I don't explain what SPARQL stands for,
that doesn't help you write SPARQL at all.
It doesn't help to explain what RDF is.
Obviously, it's historically
really important,
but telling people there's a format
for describing resources
that's called resource description format,
and resource is whatever's described,
it's not really a format.
That doesn't help people,
that gets people no closer to actually,
practically, using this.
Linked open data, LOD, I may mention.
So the library museum professionals
that come to my training
have definitely heard about
linked open data,
and know that it's the future
of their discipline,
and it's going to
revolutionize their work.
But at the moment,
they're not using that kind of system.
So they've not seen a real
practical example of that technology.
So that's what
they're going to get from this.
So I might mention linked open data,
but I don't get into the definition.
I basically say, this is a service
you can use for free.
It's been given to you to use for free,
and that gets the point across.
Semantic identifiers and namespaces,
I want to get across implicitly,
I don't want to teach people
these concepts,
I want them to pick up the concepts
even if I don't use the terms.
Reification, so people already
using a RDF database want to know
does Wikidata have statement IDs,
and I try to avoid that.
I hardly even mention Wikidata.
So these workshops are advertised
as like Introduction to SPARQL,
or for the public event one, it was
Asking and Answering Questions
with Open Data.
And then in the blurb, I'd say
we're going to be using this platform,
And I'll introduce it and say,
well, this is the best platform
on which to learn
this language, this skill.
It's the most helpful,
it's got the most interesting stuff.
And then in the course of the workshop,
maybe we'll get into more about Wikidata,
why this exists, who put this data here.
So there's a whole lot of background
that kind of professional RDF
or link data people will have,
but you don't need.
I just want to get people thinking
about nodes and arcs,
and thinking in triples,
and imagining how a triple representation
can be created and queried.
I want them to phrase questions
in their own language,
and translate into SPARQL,
via a kind of a baby talk intermediary.
But I want them to think in triples
and get used to asking questions
in that way, and just to get to the point
where they ask interesting questions
relevant to their work, or their hobbies,
or whatever, and they come away
with something.
So it's not the theoretical understanding
that I'm getting
in these quite short sessions.
And the first thing I present them with
is this, they've got to look at this.
And there's a "what the hell?" reaction
in the workshop
and probably in the room now,
because, "I thought this was
about technology skills!
Why have we got to look at a cute dog?"
But this is to introduce my toy world.
So there are three human beings.
Two of them are a married couple.
One is the child from that couple.
There are two beings
that are pets of this couple,
and we've got the types of the pets.
Clearly, this is not official data.
This knowledge representation,
which it is,
only exists in this slide,
it's not a database.
So I'm getting people thinking
of a toy world.
And there's loads that can be learnt
with just discussing this,
and kind of role-playing about this.
And you're going to
make your own toy world.
So a point to come from this
is this isn't a representation
of all of my family
or of all my parent's pets.
It's a tiny fragment.
When we query things,
we're querying a representation
of the world, not the world.
There's so much that's missed out.
That's a really important first lesson
to get about any database, any querying.
So everything's expressed
in triples, and nodes, and arcs.
Arcs have a direction.
How do the names work?
So one of these nodes is marked Bob.
Is that the name Bob,
does that stand for the name Bob?
Well, not quite, because other people
use the name Bob.
And Dan, you probably know a Bob.
(Dan) Like Bob [inaudible].
Yeah, you know a Bob.
And that's the Bob I think--
no, that isn't this Bob.
So we talk about that.
So names are relative
to the system that they're in,
and we could talk about Martin's Bob
and Dan's Bob not being the same person.
So it's not the names.
So we could think of them
as relative to a system.
So we can even say Martin:Bob
is the name for one thing,
and Dan:Bob identifies another thing
in another system.
And I emphasize triples, so three things.
You might be tempted to say,
"Cindy and Bob, together, have a pet dog,"
but you can't do that in this system
unless you have a node for the couple.
Things have to have a direction.
That may not make much sense.
There's a married couple--
that doesn't have a direction,
that's a relation between two people,
but we are modeling it
with things that have a direction
so we have to have the two directions.
There are arbitrary choices.
So why have "Cindy has child, Martin,
and not Martin has parent, Cindy?"
It's an arbitrary choice.
Arbitrary choices like that--
choices of name, choices of direction--
are built into this system and intrinsic.
So there are arbitrary choices to be made,
how to represent this,
even the same facts
could be represented in different ways.
Who makes that decision?
Well, whoever creates the system,
whoever sets up
the knowledge-based system.
So people can see that this--
called serializable--
this could be expressed
as triple statements.
So, "Cindy has pet, Tilly,
Martin is a human,"
and getting to the core insight
is comparing how do we make
a question in English?
Well, we have a statement
and it's incomplete,
like, "Who has pet, Tilly?"
So we go from "Cindy has pet Tilly,"
to "Who has pet Tilly?"
We've taken something out,
we've put in a placeholder,
and we've introduced a question mark.
I say that's just like
what we do with SPARQL.
We take something out,
we have an incomplete statement,
or incomplete statements,
we put a placeholder in the missing place,
and we have a question mark
to mark that that's a placeholder.
So it can be a role play
where I'm the query service
for this knowledge base.
And so people can learn
what a query service does
by seeing a query service and role-playing
and being a query service,
which we'll get to.
So people can see that
working on the level of triples.
"Who has pet, Tilly?"
If you say that to me, and I can say,
"results Cindy, Bob."
Then I put it to the trainees,
how do you ask more complicated questions?
So, "Who has a dog as a pet?"
And some will get it straightaway,
some will say, "Oh, it's a triple--
Who? has pet dog?"
So my role as the query service
is to look at this and match your triple,
"Who? has pet dog,"
so I got to find things that have pet dog,
and results None.
So this is the discussion--
what is this node I've called dog?
It's not a dog.
Although it's called dog,
it's not a dog, it stands for a class.
Obvious when you're a SPARQL user,
but this is getting people
over the threshold
of thinking in this way.
And you got to do
what kinds of things have pets.
People see that they can't do that
in one triple,
you got to do multiple triples,
and those multiple triples
ask for multiple things.
So if you've got,
"What kinds of things have pets?"
then you're going to identify people,
and then you've got to
identify those types,
and it naturally comes up,
"How do I specify the columns I want?
How do I specify that I want the types?"
That's the question.
And then you say,
"You have these partial statements,
and you enclose them
in curly brackets and put Select."
So this is kind of the first half hour
of the workshop,
and it's not on computers,
it's all with role play
and thinking about this.
And I invite people in the workshop
to make their own toy world,
and you'll be going toy world,
I hope, after this.
So five minutes, eight to ten nodes
to represent your family, your work place,
the thing you're working on,
the TV you were watching last night,
and to have some
meaningful links between them.
And the lesson that--
you make arbitrary decisions,
you name things, you create properties,
but they're the creation of the person
who sets up the knowledge system.
And then, in pairs, they explain
their graphs to each other, and query.
So, "What's a query you could ask
about this little world,
and then what would be the answer?"
So, like I say, people mostly get it,
but people want a four-
or five-part relation,
so they might want to say,
"This couple, together, have a pet."
Or they might want to say,
"Tilly is a pet, is a dog."
And you can enforce nodes, triples,
and triples have a direction.
So I'll explain what a triple is
and say also, not in this example,
but, "Triples, generally,
they have an item, they have a property,
and then they have
a number of other things
which could be values,
could be time periods,
could be locations on a globe."
So with that role-play exercise,
we're 40 minutes into a 2-hour workshop,
and in a computer room,
and we haven't touched computers yet.
But I think it's useful
to get people thinking in that way,
and to think about
how they would make the model
and what the query is,
and to actually translate,
so your translation exercise.
And then I'd direct people to
query.wikidata.org.
So there's a bunch of things
they've got to take on.
We've been doing--
I will have a flip chart, and we will--
Is that six?
Six minutes elapsed?
(man) [inaudible]
Right.
So I'll give them a task.
I don't want them to learn
Q numbers and P numbers.
So I'll tell them what the names are
and show them the Ctrl+Shift trick.
But there's a lot to take on,
so they're taking on
Q numbers and P numbers,
they've seen the triple format,
and they've seen Select,
but they've got to apply this
all in one go.
So I'll give people a task.
Some will get it immediately,
some will struggle
because they missed a bit of discussion,
or more often, because they're familiar
with another kind of database system,
and they have
particular expectations from that.
So I set bonus things
or more complicated things
if people are getting bored.
Or I say, "If you get bored and you work
on an entirely different question,
that's fine, but show me."
So I'll run through this in front of them,
tell them to do it, just show the hints
of what properties they'll be using,
and then run through it again.
And then, go through the cycle
of adding on extra things
to enhance the query.
So we might have done a query
and I'll say,
"Here's how you add on
an optional property."
And then give them a task
involving optional property.
In the Bodleian, I say,
"Find manuscripts in Latin
for a public event
at University of Bristol,
where there's lots of celebrities
who study at the University of Bristol,
so get that as an example."
So going to the interface,
there's still a hump in the learning curve
because they've got
to put the query into action,
they've got to think in this language,
and they've got to look up
Q numbers and P numbers,
and then there's all the things
they can do with the query,
once they've done it.
And the visualization options,
the bookmarking, getting the data.
So I'll suggest refinements.
So we can take a succession of steps
of getting people doing a query,
and taking it up to the next level.
Like, "Find landscape paintings
taller than they are wide."
So within the two-hour thing,
we get people doing basic queries,
adding refinements onto them,
not doing much filtering,
but starting to introduce measurements,
and so on.
Not getting into qualifiers
or another level.
If it's a whole day thing,
you probably could.
It comes up, inevitably, "Where else
can I use the SPARQL language?"
And I observe that that is a question,
and questions can be framed in SPARQL,
and put to Wikidata,
and you'll get answers,
and there is a Wikidata property
called SPARQL endpoint.
So when they ask that,
that becomes their task.
And then they get
that list of institutions
that have SPARQL endpoints.
And it's worth pointing out,
so in an introductory session
on other computer languages,
people will typically
learn how to do loops,
how to do functions,
how to do conditionals.
They'll learn the basic grammar
but they won't make something
fantastic and useful,
they'll just learn the basic grammar.
But in an introductory session
on Wikidata SPARQL you can make--
if you're interested
in German literature--
a map of the birthplace
of German poets, and so on.
And so we get feedback like this.
This is how great
the Wikidata Query Service is
as an educational tool.
"What is this sorcery?"
Isn't even from someone in the room.
A trainee in the room made a map,
emailed it to her colleagues
and got back, "What is this sorcery!?
How have you made this?"
And was just not expecting this to happen.
People are not expecting to look at
the picture of the cute dog,
they're not expecting to do the role play
where they represent their family
and query each other.
They're not expecting
to actually make something concrete
which they take away as a link
and show to their colleagues.
And all of this, being unexpected,
makes it memorable
and makes them want to go away
and talk to other people about it.
It's not like your run-of-the-mill
IT training.
The lower quote is from a researcher
who saw how he could make a map
of famous people with his first name
and another one of famous people
with his wife's first name.
And then he just had more and more ideas
of things and charts, and so on,
he's going to create with Wikidata,
and so he's glad to say,
"You've destroyed my productivity
for the next month."
So that's my recommendation.
I think we can take it as a positive,
and we take beyond
training people about Wikidata,
training people about data.
The stuff that came up
in the keynote this morning,
making people literate
about ideas of representation
and starting people off
and being involved in that discussion,
involves this [inaudible].
So this could be done--
doesn't have to be like
a workplace training thing,
it could be a public event,
to get people familiar
with these technologies.
But I will stop there for discussion.
And like I say, it's respectfully
submitted to people in the room
who do SPARQL training a different way,
but I hope this is useful to you.
(audience applause)
(Dan) Okay, are there any questions?
(man) Hi, it's [Mohammed Hijah]
from Palestine.
Thank you for the session.
I was wondering if there are resources
that we can get to learn
SPARQL language professionally?
I've got the SPARQL book,
the O'Reilly book.
I find the Wikibook on SPARQL
is really, really useful.
That's like the most useful
and accessible reference.
The tutorials on Wikidata itself
are going to vary in quality.
(Mohammed) I think
that they are for beginners.
I can handle with SPARQL
but in the beginner level,
but I want to deal with it professionally.
So my concern is to get
as many people as possible
across the threshold
into being aware of how this works,
and dabbling.
I'd like it to be a deeper course
by going into more of the...
how it works--
qualifiers and references, and so on.
Where in a professional context,
you're probably aiming towards
people using a particular SPARQL endpoint,
and Wikidata has some customizations
We've discussed in Twitter
that there's some things we use
that actually aren't a SPARQL standard.
They're like an optimization.
So in the professional context,
I'd hope it would be tailored
to that particular data set and endpoint,
but there's not a demand for that yet,
because like I said, I deal with people
who are aware of linked open data,
and the word out, it's a good thing,
but haven't seen an example yet,
haven't an example
they can apply to their work,
they're not enthusiastic about it yet.
So I think we want to
get my whole workplace
and other workplaces and developers
across that threshold
to where they're demanding
that kind of really in deep,
like using endpoint in a library
kind of training.
(Mohammed) Thank you.
(woman) It's just a question.
I really liked that, thank you so much.
Is it documented step-by-step anywhere?
I can share my succession of tasks.
That's very much tailored
to where I'm presenting it.
Like I said, with librarians,
I start with manuscripts and go on.
You want to end up
with people asking a question
which is the question they came,
in their heads, to the event with.
So there's an order
of querying with a triple,
and then with multiple triples,
and then with an optional triple,
and then with a measurement
in a filter, and so on.
And, yeah, I can share...
Yeah, I'll share a separate set of slides
for those exercises.
(woman) Thank you so much
because I will take that
and customize it for my own needs.
Thank you.
(Dan) Okay. No questions?
(man) What would you recommend
if you also want to teach editing,
apart from just querying?
I'm pleased to report
that people find Wikidata editing,
when I demonstrate it, to be so simple,
that it just takes them by surprise.
It's Wikidata editing,
and I've got to add knowledge
to this huge knowledge base.
Sounds like something
that really technical people can do.
And then you show it,
and they go, "Oh, right.
Martin is instance of human."
So I haven't done that systematically yet.
I think a precondition would be
getting people thinking in triples,
and maybe underline that
triples need references,
and triples need qualifiers
and that multiple triples,
triples have multiple conflicting values.
So I'd still do the toy world,
maybe a more professionally relevant
toy world, and translation exercise,
but then go to, "So now the exercise
we're going to do with triples
is adding them."
There's a lot of work done,
and maybe Jason's done,
with guessing a table of identifiers.
So something I'd like to do,
there's an online database
of people who've won a Rhodes Scholarship.
There's a scholarship to Oxford University
from other countries.
But it's not in Wikidata yet.
So you can kind of divide up
the room and say,
"You're going to find
these people in Wikidata
and your task is to add
with the reference
to this online database."
And then you can do a query
to see how many have been added
in that session.
So I think, with all the training I do,
I think the comprehension
is more important
than the taking action immediately.
So when I'm training people on Wikipedia,
I first show them article histories,
contribution records, talk page,
quality scale, so they're comprehending
the process before they edit,
and actually change something.
(man) Not really a question but a comment.
There is, for beginners,
a good tutorial on YouTube,
How to Query and Start with SPARQL,
and if you want to go deeper, also,
How to Add Data with OpenRefine.
And I've also made some videos
and uploaded them in German language.
Oh, great! Thanks.
I should also mention Hilary Thorsen,
who's from Stanford Library,
did, last week,
a really good video capture
of adding a data set to Wikidata
with OpenRefine.
This is for the LD4P, the Linked Data
for Production project,
and that was a really good video tutorial
I'd recommend to anybody for--
That's the next couple of levels up
from what I'm doing.
(Dan) Is there a last question?
(man) So SPARQL's sort of SQL-ish.
If someone walked into your tutorial
with an SQL background,
is that a blessing or a curse?
It's a bit of a curse
because I had to learn SQL,
so I did the...
generate the invoices
using SQL for your fictitious company,
and definitely had to unlearn
an SQL way of thinking about things
to get to SPARQL.
But it was freeing, it was freeing.
Databases without built-in schemas
are liberating.
When you think about
how many columns there are,
and it's this number
of columns for a book,
and it's this number of columns
for the address,
and it's just three columns.
Well, three and a bit more.
That's really liberating.
So that's my point, I kind of glanced at,
that people make different progress
in these workshops as in all training,
but it's not like intelligent versus dumb,
it's like the preconceptions
you're coming with,
are more the obstacle.
So it's actually more--
I'm more optimistic about training people
who have never encountered databases,
coding, or any of that before, than...
The worst people to try and train
are linked data experts
because they've used DBpedia a lot.
They used a particular approach
of querying
and expecting to get certain things,
and it looks odd when Wikidata
does things differently.
And they need to get with the program.
(Dan) Okay, let's thank Martin
for his insights.
Thanks very much.
(audience applause)