WEBVTT
00:00:05.882 --> 00:00:07.218
(Dan) Hello everyone.
00:00:07.218 --> 00:00:09.911
So this session is about teaching SPARQL.
00:00:09.911 --> 00:00:12.423
The presenter is Martin Poulter,
so I leave you the stage.
00:00:12.423 --> 00:00:13.668
Have fun.
00:00:13.668 --> 00:00:14.943
(Martin) Thank you very much.
00:00:16.501 --> 00:00:18.717
Hi, everybody.
00:00:18.717 --> 00:00:23.355
I trust you'll agree
that Wikidata is great,
00:00:23.355 --> 00:00:27.171
it has lots of interesting data
on different topics,
00:00:27.171 --> 00:00:31.225
the tools people make with it
are fun to use and fun to explore,
00:00:31.225 --> 00:00:33.412
and easy to use.
00:00:33.412 --> 00:00:38.578
And maybe you'll agree with the suggestion
that to get the best out of Wikidata
00:00:38.578 --> 00:00:40.142
you need to know SPARQL,
00:00:40.142 --> 00:00:42.040
you need to be able to phrase
your own queries.
00:00:42.040 --> 00:00:45.141
So you might see that
as a barrier, an obstacle,
00:00:45.141 --> 00:00:50.183
that we ideally need a big program
of training for developers,
00:00:50.183 --> 00:00:54.008
for librarians, for curators,
for ordinary people
00:00:54.008 --> 00:00:58.236
to get them literate in this language,
and that's a big effort,
00:01:01.036 --> 00:01:04.031
an aspect of Wikidata outreach.
00:01:04.031 --> 00:01:06.238
My suggestion is to kind of
turn that around,
00:01:06.238 --> 00:01:09.037
that Wikidata,
especially the Query Service,
00:01:09.037 --> 00:01:11.673
because it's so helpful,
because it's so full of good stuff,
00:01:11.673 --> 00:01:13.857
because it's so colorful,
00:01:13.857 --> 00:01:16.200
because it has so many
visualization abilities,
00:01:16.200 --> 00:01:20.173
is the ideal platform
for people to learn SPARQL,
00:01:20.173 --> 00:01:21.890
also to learn about databases,
00:01:21.890 --> 00:01:23.724
learn about knowledge representation,
00:01:23.724 --> 00:01:25.305
learn about data and computers.
00:01:25.305 --> 00:01:28.671
There's no necessity
that someone's first encounter
00:01:28.671 --> 00:01:32.106
with data and computers,
has to be a relational database system.
00:01:32.106 --> 00:01:33.947
So I'm going to put forward,
00:01:33.947 --> 00:01:36.539
I'm going to report on
a training workshop
00:01:36.539 --> 00:01:40.330
I've delivered to library staff
in University of Oxford,
00:01:40.330 --> 00:01:42.550
and I've also done as a public event,
00:01:42.550 --> 00:01:46.710
so just with members of the public
coming to an open data week
00:01:46.710 --> 00:01:47.875
that university hosted.
00:01:47.875 --> 00:01:51.979
And also done some of this
with researchers as well.
00:01:51.979 --> 00:01:57.441
So I teach in a way
that is very particular to me,
00:01:57.441 --> 00:01:59.847
so it's not like
I hand over materials to you.
00:01:59.847 --> 00:02:03.164
I'll show you my approach
and then you'll take it up
00:02:03.164 --> 00:02:05.902
and improve on it,
and make it personal to you
00:02:05.902 --> 00:02:08.469
and the audiences you're dealing with.
00:02:08.469 --> 00:02:10.253
And I want to avoid this.
00:02:10.253 --> 00:02:16.256
So in my career, I had to learn
data technologies, and SQL, and XML,
00:02:16.256 --> 00:02:19.610
and the content of tutorials,
00:02:19.610 --> 00:02:23.400
or examples, is very much like this.
00:02:23.400 --> 00:02:26.330
I'm not objecting to the language--
because that's what you got to learn--
00:02:26.330 --> 00:02:28.969
but employees, invoices.
00:02:28.969 --> 00:02:32.708
So your task might be
you have a sales force
00:02:32.708 --> 00:02:36.913
and you've got to identify
the person who sold the most items,
00:02:36.913 --> 00:02:38.369
and calculate their bonus
00:02:38.369 --> 00:02:41.541
and then issue the invoices
to the customers,
00:02:41.541 --> 00:02:44.707
and it's the most boring--
I can't get excited about that,
00:02:44.707 --> 00:02:48.195
or I don't feel like I'm learning a topic.
00:02:48.195 --> 00:02:51.662
With Wikidata, we have so many topics
we can engage people in,
00:02:51.665 --> 00:02:54.613
and it might be things
in the solar system,
00:02:54.613 --> 00:02:56.591
or characters in Shakespeare,
00:02:56.591 --> 00:02:59.765
or things in the solar system
named after characters in Shakespeare,
00:02:59.765 --> 00:03:01.897
which is what most of this is.
00:03:03.497 --> 00:03:05.739
So when you have a teaching approach,
00:03:05.739 --> 00:03:08.395
one question is
what things do you leave out.
00:03:09.295 --> 00:03:15.271
So in the workshop I run,
I don't explain what SPARQL stands for,
00:03:15.271 --> 00:03:18.193
that doesn't help you write SPARQL at all.
00:03:18.193 --> 00:03:20.591
It doesn't help to explain what RDF is.
00:03:20.591 --> 00:03:22.763
Obviously, it's historically
really important,
00:03:22.763 --> 00:03:25.713
but telling people there's a format
for describing resources
00:03:25.713 --> 00:03:27.630
that's called resource description format,
00:03:27.630 --> 00:03:30.966
and resource is whatever's described,
it's not really a format.
00:03:30.966 --> 00:03:32.226
That doesn't help people,
00:03:32.226 --> 00:03:36.650
that gets people no closer to actually,
practically, using this.
00:03:36.650 --> 00:03:40.639
Linked open data, LOD, I may mention.
00:03:40.639 --> 00:03:44.317
So the library museum professionals
that come to my training
00:03:44.317 --> 00:03:46.830
have definitely heard about
linked open data,
00:03:46.830 --> 00:03:50.697
and know that it's the future
of their discipline,
00:03:50.697 --> 00:03:52.564
and it's going to
revolutionize their work.
00:03:52.564 --> 00:03:54.879
But at the moment,
they're not using that kind of system.
00:03:54.879 --> 00:03:58.404
So they've not seen a real
practical example of that technology.
00:03:58.404 --> 00:04:00.206
So that's what
they're going to get from this.
00:04:00.206 --> 00:04:01.895
So I might mention linked open data,
00:04:01.895 --> 00:04:03.971
but I don't get into the definition.
00:04:03.971 --> 00:04:06.404
I basically say, this is a service
you can use for free.
00:04:06.404 --> 00:04:08.113
It's been given to you to use for free,
00:04:08.113 --> 00:04:10.675
and that gets the point across.
00:04:10.675 --> 00:04:14.925
Semantic identifiers and namespaces,
00:04:14.925 --> 00:04:16.518
I want to get across implicitly,
00:04:16.518 --> 00:04:18.294
I don't want to teach people
these concepts,
00:04:18.294 --> 00:04:21.271
I want them to pick up the concepts
even if I don't use the terms.
00:04:21.271 --> 00:04:26.536
Reification, so people already
using a RDF database want to know
00:04:26.536 --> 00:04:31.432
does Wikidata have statement IDs,
and I try to avoid that.
00:04:31.432 --> 00:04:33.855
I hardly even mention Wikidata.
00:04:33.855 --> 00:04:39.048
So these workshops are advertised
as like Introduction to SPARQL,
00:04:39.048 --> 00:04:41.027
or for the public event one, it was
00:04:41.027 --> 00:04:45.097
Asking and Answering Questions
with Open Data.
00:04:45.097 --> 00:04:47.826
And then in the blurb, I'd say
we're going to be using this platform,
00:04:47.826 --> 00:04:50.268
And I'll introduce it and say,
well, this is the best platform
00:04:50.268 --> 00:04:52.815
on which to learn
this language, this skill.
00:04:52.815 --> 00:04:55.138
It's the most helpful,
it's got the most interesting stuff.
00:04:55.138 --> 00:04:57.265
And then in the course of the workshop,
00:04:57.265 --> 00:04:58.969
maybe we'll get into more about Wikidata,
00:04:58.969 --> 00:05:02.351
why this exists, who put this data here.
00:05:02.351 --> 00:05:04.501
So there's a whole lot of background
00:05:04.501 --> 00:05:08.347
that kind of professional RDF
or link data people will have,
00:05:08.347 --> 00:05:09.942
but you don't need.
00:05:09.942 --> 00:05:13.737
I just want to get people thinking
about nodes and arcs,
00:05:13.737 --> 00:05:15.699
and thinking in triples,
00:05:15.699 --> 00:05:19.690
and imagining how a triple representation
can be created and queried.
00:05:19.690 --> 00:05:22.897
I want them to phrase questions
in their own language,
00:05:22.897 --> 00:05:27.252
and translate into SPARQL,
via a kind of a baby talk intermediary.
00:05:27.252 --> 00:05:28.984
But I want them to think in triples
00:05:28.984 --> 00:05:34.740
and get used to asking questions
in that way, and just to get to the point
00:05:34.740 --> 00:05:38.887
where they ask interesting questions
relevant to their work, or their hobbies,
00:05:38.887 --> 00:05:42.395
or whatever, and they come away
with something.
00:05:42.395 --> 00:05:44.107
So it's not the theoretical understanding
00:05:44.107 --> 00:05:46.835
that I'm getting
in these quite short sessions.
00:05:46.835 --> 00:05:50.285
And the first thing I present them with
is this, they've got to look at this.
00:05:50.285 --> 00:05:53.650
And there's a "what the hell?" reaction
00:05:53.650 --> 00:05:55.496
in the workshop
and probably in the room now,
00:05:55.496 --> 00:05:59.361
because, "I thought this was
about technology skills!
00:05:59.361 --> 00:06:01.512
Why have we got to look at a cute dog?"
00:06:01.512 --> 00:06:05.289
But this is to introduce my toy world.
00:06:05.289 --> 00:06:10.525
So there are three human beings.
Two of them are a married couple.
00:06:10.525 --> 00:06:13.054
One is the child from that couple.
00:06:13.054 --> 00:06:16.678
There are two beings
that are pets of this couple,
00:06:16.678 --> 00:06:19.119
and we've got the types of the pets.
00:06:19.119 --> 00:06:20.839
Clearly, this is not official data.
00:06:20.839 --> 00:06:23.922
This knowledge representation,
which it is,
00:06:23.922 --> 00:06:26.854
only exists in this slide,
it's not a database.
00:06:26.854 --> 00:06:28.780
So I'm getting people thinking
of a toy world.
00:06:28.780 --> 00:06:30.512
And there's loads that can be learnt
00:06:30.512 --> 00:06:33.491
with just discussing this,
and kind of role-playing about this.
00:06:33.491 --> 00:06:38.121
And you're going to
make your own toy world.
00:06:40.721 --> 00:06:43.701
So a point to come from this
is this isn't a representation
00:06:43.701 --> 00:06:47.102
of all of my family
or of all my parent's pets.
00:06:47.102 --> 00:06:49.311
It's a tiny fragment.
00:06:49.311 --> 00:06:50.787
When we query things,
00:06:50.787 --> 00:06:53.261
we're querying a representation
of the world, not the world.
00:06:53.261 --> 00:06:55.150
There's so much that's missed out.
00:06:56.150 --> 00:07:01.104
That's a really important first lesson
to get about any database, any querying.
00:07:01.104 --> 00:07:06.281
So everything's expressed
in triples, and nodes, and arcs.
00:07:06.281 --> 00:07:08.427
Arcs have a direction.
00:07:08.427 --> 00:07:09.529
How do the names work?
00:07:09.529 --> 00:07:12.507
So one of these nodes is marked Bob.
00:07:12.507 --> 00:07:17.207
Is that the name Bob,
does that stand for the name Bob?
00:07:17.207 --> 00:07:20.624
Well, not quite, because other people
use the name Bob.
00:07:20.624 --> 00:07:22.535
And Dan, you probably know a Bob.
00:07:22.535 --> 00:07:23.649
(Dan) Like Bob [inaudible].
00:07:23.649 --> 00:07:25.247
Yeah, you know a Bob.
00:07:25.247 --> 00:07:28.617
And that's the Bob I think--
no, that isn't this Bob.
00:07:28.617 --> 00:07:29.642
So we talk about that.
00:07:29.642 --> 00:07:32.359
So names are relative
to the system that they're in,
00:07:32.359 --> 00:07:36.327
and we could talk about Martin's Bob
and Dan's Bob not being the same person.
00:07:36.327 --> 00:07:37.696
So it's not the names.
00:07:37.696 --> 00:07:39.878
So we could think of them
as relative to a system.
00:07:39.878 --> 00:07:43.828
So we can even say Martin:Bob
is the name for one thing,
00:07:43.828 --> 00:07:47.775
and Dan:Bob identifies another thing
in another system.
00:07:49.375 --> 00:07:52.121
And I emphasize triples, so three things.
00:07:52.121 --> 00:07:57.754
You might be tempted to say,
"Cindy and Bob, together, have a pet dog,"
00:07:58.511 --> 00:08:03.995
but you can't do that in this system
unless you have a node for the couple.
00:08:03.995 --> 00:08:07.350
Things have to have a direction.
That may not make much sense.
00:08:07.350 --> 00:08:09.673
There's a married couple--
that doesn't have a direction,
00:08:09.673 --> 00:08:11.196
that's a relation between two people,
00:08:11.196 --> 00:08:14.014
but we are modeling it
with things that have a direction
00:08:14.014 --> 00:08:17.464
so we have to have the two directions.
00:08:17.464 --> 00:08:18.962
There are arbitrary choices.
00:08:18.962 --> 00:08:24.206
So why have "Cindy has child, Martin,
and not Martin has parent, Cindy?"
00:08:24.206 --> 00:08:25.598
It's an arbitrary choice.
00:08:25.598 --> 00:08:28.605
Arbitrary choices like that--
choices of name, choices of direction--
00:08:28.605 --> 00:08:31.140
are built into this system and intrinsic.
00:08:31.140 --> 00:08:32.871
So there are arbitrary choices to be made,
00:08:32.871 --> 00:08:34.656
how to represent this,
00:08:34.656 --> 00:08:37.794
even the same facts
could be represented in different ways.
00:08:37.794 --> 00:08:39.233
Who makes that decision?
00:08:39.233 --> 00:08:40.731
Well, whoever creates the system,
00:08:40.731 --> 00:08:45.069
whoever sets up
the knowledge-based system.
00:08:45.069 --> 00:08:49.330
So people can see that this--
called serializable--
00:08:49.330 --> 00:08:52.459
this could be expressed
as triple statements.
00:08:52.459 --> 00:08:58.468
So, "Cindy has pet, Tilly,
Martin is a human,"
00:08:58.468 --> 00:09:02.393
and getting to the core insight
00:09:02.393 --> 00:09:06.970
is comparing how do we make
a question in English?
00:09:06.970 --> 00:09:10.953
Well, we have a statement
and it's incomplete,
00:09:10.953 --> 00:09:16.762
like, "Who has pet, Tilly?"
00:09:16.762 --> 00:09:21.585
So we go from "Cindy has pet Tilly,"
to "Who has pet Tilly?"
00:09:21.585 --> 00:09:23.316
We've taken something out,
00:09:23.316 --> 00:09:27.522
we've put in a placeholder,
and we've introduced a question mark.
00:09:27.522 --> 00:09:30.080
I say that's just like
what we do with SPARQL.
00:09:30.080 --> 00:09:33.053
We take something out,
we have an incomplete statement,
00:09:33.053 --> 00:09:35.930
or incomplete statements,
00:09:35.930 --> 00:09:40.213
we put a placeholder in the missing place,
and we have a question mark
00:09:40.213 --> 00:09:42.645
to mark that that's a placeholder.
00:09:42.645 --> 00:09:47.164
So it can be a role play
where I'm the query service
00:09:47.164 --> 00:09:49.383
for this knowledge base.
00:09:49.383 --> 00:09:53.906
And so people can learn
what a query service does
00:09:53.906 --> 00:09:56.969
by seeing a query service and role-playing
00:09:56.969 --> 00:09:59.709
and being a query service,
which we'll get to.
00:10:00.909 --> 00:10:05.414
So people can see that
working on the level of triples.
00:10:07.214 --> 00:10:09.371
"Who has pet, Tilly?"
00:10:09.371 --> 00:10:14.480
If you say that to me, and I can say,
"results Cindy, Bob."
00:10:14.480 --> 00:10:17.774
Then I put it to the trainees,
00:10:17.774 --> 00:10:19.534
how do you ask more complicated questions?
00:10:19.534 --> 00:10:22.436
So, "Who has a dog as a pet?"
00:10:23.646 --> 00:10:28.701
And some will get it straightaway,
some will say, "Oh, it's a triple--
00:10:28.701 --> 00:10:33.075
Who? has pet dog?"
00:10:33.075 --> 00:10:38.103
So my role as the query service
is to look at this and match your triple,
00:10:38.103 --> 00:10:39.385
"Who? has pet dog,"
00:10:39.385 --> 00:10:41.522
so I got to find things that have pet dog,
00:10:41.522 --> 00:10:43.024
and results None.
00:10:43.024 --> 00:10:48.082
So this is the discussion--
what is this node I've called dog?
00:10:48.082 --> 00:10:49.231
It's not a dog.
00:10:49.231 --> 00:10:53.250
Although it's called dog,
it's not a dog, it stands for a class.
00:10:53.250 --> 00:10:56.130
Obvious when you're a SPARQL user,
but this is getting people
00:10:56.130 --> 00:10:59.054
over the threshold
of thinking in this way.
00:10:59.054 --> 00:11:02.319
And you got to do
what kinds of things have pets.
00:11:02.319 --> 00:11:05.258
People see that they can't do that
in one triple,
00:11:05.258 --> 00:11:06.572
you got to do multiple triples,
00:11:06.572 --> 00:11:10.126
and those multiple triples
ask for multiple things.
00:11:12.726 --> 00:11:16.588
So if you've got,
"What kinds of things have pets?"
00:11:16.588 --> 00:11:18.861
then you're going to identify people,
00:11:18.861 --> 00:11:21.070
and then you've got to
identify those types,
00:11:21.070 --> 00:11:24.362
and it naturally comes up,
"How do I specify the columns I want?
00:11:24.362 --> 00:11:27.365
How do I specify that I want the types?"
That's the question.
00:11:27.365 --> 00:11:29.838
And then you say,
"You have these partial statements,
00:11:29.838 --> 00:11:34.643
and you enclose them
in curly brackets and put Select."
00:11:37.943 --> 00:11:41.137
So this is kind of the first half hour
of the workshop,
00:11:41.137 --> 00:11:44.162
and it's not on computers,
it's all with role play
00:11:44.162 --> 00:11:45.743
and thinking about this.
00:11:45.743 --> 00:11:51.776
And I invite people in the workshop
to make their own toy world,
00:11:51.776 --> 00:11:54.506
and you'll be going toy world,
I hope, after this.
00:11:54.506 --> 00:11:59.702
So five minutes, eight to ten nodes
to represent your family, your work place,
00:11:59.702 --> 00:12:02.351
the thing you're working on,
the TV you were watching last night,
00:12:02.351 --> 00:12:05.166
and to have some
meaningful links between them.
00:12:05.166 --> 00:12:08.688
And the lesson that--
you make arbitrary decisions,
00:12:08.688 --> 00:12:10.516
you name things, you create properties,
00:12:10.516 --> 00:12:17.228
but they're the creation of the person
who sets up the knowledge system.
00:12:17.558 --> 00:12:24.394
And then, in pairs, they explain
their graphs to each other, and query.
00:12:24.394 --> 00:12:28.166
So, "What's a query you could ask
about this little world,
00:12:28.166 --> 00:12:29.570
and then what would be the answer?"
00:12:29.570 --> 00:12:33.730
So, like I say, people mostly get it,
00:12:33.730 --> 00:12:36.451
but people want a four-
or five-part relation,
00:12:36.451 --> 00:12:38.088
so they might want to say,
00:12:38.088 --> 00:12:39.958
"This couple, together, have a pet."
00:12:39.958 --> 00:12:43.204
Or they might want to say,
"Tilly is a pet, is a dog."
00:12:43.204 --> 00:12:47.207
And you can enforce nodes, triples,
and triples have a direction.
00:12:48.307 --> 00:12:51.258
So I'll explain what a triple is
and say also, not in this example,
00:12:51.258 --> 00:12:54.639
but, "Triples, generally,
they have an item, they have a property,
00:12:54.639 --> 00:12:57.307
and then they have
a number of other things
00:12:57.307 --> 00:12:59.516
which could be values,
could be time periods,
00:12:59.516 --> 00:13:03.104
could be locations on a globe."
00:13:07.288 --> 00:13:11.235
So with that role-play exercise,
we're 40 minutes into a 2-hour workshop,
00:13:11.235 --> 00:13:14.270
and in a computer room,
and we haven't touched computers yet.
00:13:14.270 --> 00:13:17.387
But I think it's useful
to get people thinking in that way,
00:13:17.387 --> 00:13:19.535
and to think about
how they would make the model
00:13:19.535 --> 00:13:23.793
and what the query is,
and to actually translate,
00:13:23.793 --> 00:13:25.149
so your translation exercise.
00:13:26.339 --> 00:13:32.597
And then I'd direct people to
query.wikidata.org.
00:13:34.197 --> 00:13:36.240
So there's a bunch of things
they've got to take on.
00:13:36.240 --> 00:13:40.086
We've been doing--
I will have a flip chart, and we will--
00:13:40.086 --> 00:13:41.539
Is that six?
00:13:41.539 --> 00:13:43.290
Six minutes elapsed?
00:13:43.290 --> 00:13:45.278
(man) [inaudible]
00:13:45.278 --> 00:13:46.318
Right.
00:13:50.548 --> 00:13:52.485
So I'll give them a task.
00:13:52.485 --> 00:13:55.679
I don't want them to learn
Q numbers and P numbers.
00:13:55.679 --> 00:14:00.646
So I'll tell them what the names are
and show them the Ctrl+Shift trick.
00:14:00.646 --> 00:14:01.894
But there's a lot to take on,
00:14:01.894 --> 00:14:04.210
so they're taking on
Q numbers and P numbers,
00:14:04.210 --> 00:14:08.240
they've seen the triple format,
and they've seen Select,
00:14:08.240 --> 00:14:11.338
but they've got to apply this
all in one go.
00:14:11.338 --> 00:14:14.538
So I'll give people a task.
00:14:14.538 --> 00:14:17.299
Some will get it immediately,
some will struggle
00:14:17.299 --> 00:14:18.896
because they missed a bit of discussion,
00:14:18.896 --> 00:14:22.866
or more often, because they're familiar
with another kind of database system,
00:14:22.866 --> 00:14:25.490
and they have
particular expectations from that.
00:14:26.890 --> 00:14:30.656
So I set bonus things
or more complicated things
00:14:30.656 --> 00:14:31.874
if people are getting bored.
00:14:31.874 --> 00:14:37.828
Or I say, "If you get bored and you work
on an entirely different question,
00:14:37.828 --> 00:14:40.058
that's fine, but show me."
00:14:40.058 --> 00:14:42.254
So I'll run through this in front of them,
00:14:42.254 --> 00:14:45.617
tell them to do it, just show the hints
of what properties they'll be using,
00:14:45.617 --> 00:14:46.979
and then run through it again.
00:14:46.979 --> 00:14:50.277
And then, go through the cycle
of adding on extra things
00:14:50.277 --> 00:14:51.280
to enhance the query.
00:14:51.280 --> 00:14:53.084
So we might have done a query
and I'll say,
00:14:53.084 --> 00:14:55.522
"Here's how you add on
an optional property."
00:14:57.822 --> 00:15:01.046
And then give them a task
involving optional property.
00:15:01.046 --> 00:15:04.518
In the Bodleian, I say,
"Find manuscripts in Latin
00:15:04.518 --> 00:15:06.326
for a public event
at University of Bristol,
00:15:06.326 --> 00:15:09.255
where there's lots of celebrities
who study at the University of Bristol,
00:15:09.255 --> 00:15:14.113
so get that as an example."
00:15:14.113 --> 00:15:15.933
So going to the interface,
00:15:15.933 --> 00:15:20.949
there's still a hump in the learning curve
00:15:20.949 --> 00:15:24.199
because they've got
to put the query into action,
00:15:24.199 --> 00:15:25.752
they've got to think in this language,
00:15:25.752 --> 00:15:29.879
and they've got to look up
Q numbers and P numbers,
00:15:29.879 --> 00:15:32.246
and then there's all the things
they can do with the query,
00:15:32.246 --> 00:15:33.283
once they've done it.
00:15:33.283 --> 00:15:37.627
And the visualization options,
the bookmarking, getting the data.
00:15:43.881 --> 00:15:45.635
So I'll suggest refinements.
00:15:45.635 --> 00:15:50.264
So we can take a succession of steps
of getting people doing a query,
00:15:50.264 --> 00:15:53.215
and taking it up to the next level.
00:15:53.215 --> 00:15:56.069
Like, "Find landscape paintings
taller than they are wide."
00:15:56.069 --> 00:16:02.658
So within the two-hour thing,
we get people doing basic queries,
00:16:02.658 --> 00:16:07.803
adding refinements onto them,
00:16:07.803 --> 00:16:11.164
not doing much filtering,
00:16:11.164 --> 00:16:13.893
but starting to introduce measurements,
00:16:13.893 --> 00:16:14.982
and so on.
00:16:14.982 --> 00:16:17.782
Not getting into qualifiers
or another level.
00:16:17.782 --> 00:16:20.816
If it's a whole day thing,
you probably could.
00:16:20.816 --> 00:16:25.526
It comes up, inevitably, "Where else
can I use the SPARQL language?"
00:16:25.526 --> 00:16:29.581
And I observe that that is a question,
and questions can be framed in SPARQL,
00:16:29.581 --> 00:16:31.671
and put to Wikidata,
and you'll get answers,
00:16:31.671 --> 00:16:34.444
and there is a Wikidata property
called SPARQL endpoint.
00:16:34.444 --> 00:16:36.888
So when they ask that,
that becomes their task.
00:16:36.888 --> 00:16:38.809
And then they get
that list of institutions
00:16:38.809 --> 00:16:40.369
that have SPARQL endpoints.
00:16:42.499 --> 00:16:43.877
And it's worth pointing out,
00:16:43.877 --> 00:16:48.647
so in an introductory session
on other computer languages,
00:16:48.647 --> 00:16:52.065
people will typically
learn how to do loops,
00:16:52.065 --> 00:16:55.477
how to do functions,
how to do conditionals.
00:16:55.477 --> 00:16:56.803
They'll learn the basic grammar
00:16:56.803 --> 00:16:59.735
but they won't make something
fantastic and useful,
00:16:59.735 --> 00:17:01.663
they'll just learn the basic grammar.
00:17:01.663 --> 00:17:06.458
But in an introductory session
on Wikidata SPARQL you can make--
00:17:06.458 --> 00:17:08.142
if you're interested
in German literature--
00:17:08.142 --> 00:17:10.333
a map of the birthplace
of German poets, and so on.
00:17:10.333 --> 00:17:12.097
And so we get feedback like this.
00:17:12.097 --> 00:17:14.196
This is how great
the Wikidata Query Service is
00:17:14.196 --> 00:17:16.266
as an educational tool.
00:17:16.266 --> 00:17:19.298
"What is this sorcery?"
Isn't even from someone in the room.
00:17:19.298 --> 00:17:21.226
A trainee in the room made a map,
00:17:21.226 --> 00:17:24.702
emailed it to her colleagues
and got back, "What is this sorcery!?
00:17:24.702 --> 00:17:25.703
How have you made this?"
00:17:25.703 --> 00:17:29.428
And was just not expecting this to happen.
00:17:29.428 --> 00:17:32.271
People are not expecting to look at
the picture of the cute dog,
00:17:32.271 --> 00:17:36.243
they're not expecting to do the role play
where they represent their family
00:17:36.243 --> 00:17:37.865
and query each other.
00:17:37.865 --> 00:17:40.210
They're not expecting
to actually make something concrete
00:17:40.210 --> 00:17:42.587
which they take away as a link
and show to their colleagues.
00:17:42.587 --> 00:17:45.010
And all of this, being unexpected,
00:17:45.010 --> 00:17:47.092
makes it memorable
and makes them want to go away
00:17:47.092 --> 00:17:48.527
and talk to other people about it.
00:17:48.527 --> 00:17:51.399
It's not like your run-of-the-mill
IT training.
00:17:52.699 --> 00:17:58.020
The lower quote is from a researcher
who saw how he could make a map
00:17:58.020 --> 00:18:00.761
of famous people with his first name
00:18:00.761 --> 00:18:04.421
and another one of famous people
with his wife's first name.
00:18:04.421 --> 00:18:07.819
And then he just had more and more ideas
of things and charts, and so on,
00:18:07.819 --> 00:18:09.469
he's going to create with Wikidata,
00:18:09.469 --> 00:18:10.967
and so he's glad to say,
00:18:10.967 --> 00:18:13.297
"You've destroyed my productivity
for the next month."
00:18:15.805 --> 00:18:17.601
So that's my recommendation.
00:18:17.601 --> 00:18:19.702
I think we can take it as a positive,
00:18:19.702 --> 00:18:22.985
and we take beyond
training people about Wikidata,
00:18:22.985 --> 00:18:24.671
training people about data.
00:18:24.671 --> 00:18:26.716
The stuff that came up
in the keynote this morning,
00:18:26.716 --> 00:18:32.468
making people literate
about ideas of representation
00:18:32.468 --> 00:18:36.568
and starting people off
and being involved in that discussion,
00:18:36.568 --> 00:18:37.722
involves this [inaudible].
00:18:37.722 --> 00:18:38.816
So this could be done--
00:18:38.816 --> 00:18:40.822
doesn't have to be like
a workplace training thing,
00:18:40.822 --> 00:18:42.134
it could be a public event,
00:18:42.134 --> 00:18:45.250
to get people familiar
with these technologies.
00:18:46.150 --> 00:18:48.302
But I will stop there for discussion.
00:18:48.302 --> 00:18:51.150
And like I say, it's respectfully
submitted to people in the room
00:18:51.150 --> 00:18:55.280
who do SPARQL training a different way,
but I hope this is useful to you.
00:18:57.180 --> 00:19:00.184
(audience applause)
00:19:12.915 --> 00:19:15.721
(Dan) Okay, are there any questions?
00:19:23.511 --> 00:19:26.605
(man) Hi, it's [Mohammed Hijah]
from Palestine.
00:19:26.605 --> 00:19:28.420
Thank you for the session.
00:19:28.420 --> 00:19:30.921
I was wondering if there are resources
00:19:30.921 --> 00:19:35.131
that we can get to learn
SPARQL language professionally?
00:19:37.899 --> 00:19:40.213
I've got the SPARQL book,
the O'Reilly book.
00:19:40.213 --> 00:19:43.413
I find the Wikibook on SPARQL
00:19:43.413 --> 00:19:44.987
is really, really useful.
00:19:44.987 --> 00:19:48.387
That's like the most useful
and accessible reference.
00:19:49.287 --> 00:19:54.570
The tutorials on Wikidata itself
are going to vary in quality.
00:19:55.170 --> 00:19:57.694
(Mohammed) I think
that they are for beginners.
00:19:57.694 --> 00:20:01.240
I can handle with SPARQL
but in the beginner level,
00:20:01.240 --> 00:20:04.343
but I want to deal with it professionally.
00:20:10.864 --> 00:20:13.609
So my concern is to get
as many people as possible
00:20:13.609 --> 00:20:16.292
across the threshold
into being aware of how this works,
00:20:16.292 --> 00:20:17.925
and dabbling.
00:20:19.225 --> 00:20:24.920
I'd like it to be a deeper course
by going into more of the...
00:20:26.220 --> 00:20:29.120
how it works--
qualifiers and references, and so on.
00:20:29.120 --> 00:20:31.809
Where in a professional context,
you're probably aiming towards
00:20:31.809 --> 00:20:35.923
people using a particular SPARQL endpoint,
00:20:35.923 --> 00:20:39.123
and Wikidata has some customizations
00:20:39.123 --> 00:20:41.636
We've discussed in Twitter
that there's some things we use
00:20:41.636 --> 00:20:43.548
that actually aren't a SPARQL standard.
00:20:43.548 --> 00:20:46.130
They're like an optimization.
00:20:46.130 --> 00:20:48.816
So in the professional context,
00:20:50.516 --> 00:20:56.190
I'd hope it would be tailored
to that particular data set and endpoint,
00:20:56.190 --> 00:20:59.575
but there's not a demand for that yet,
00:20:59.575 --> 00:21:03.459
because like I said, I deal with people
who are aware of linked open data,
00:21:03.459 --> 00:21:07.558
and the word out, it's a good thing,
but haven't seen an example yet,
00:21:07.558 --> 00:21:09.446
haven't an example
they can apply to their work,
00:21:09.446 --> 00:21:11.693
they're not enthusiastic about it yet.
00:21:11.693 --> 00:21:13.843
So I think we want to
get my whole workplace
00:21:13.843 --> 00:21:17.726
and other workplaces and developers
across that threshold
00:21:17.726 --> 00:21:21.998
to where they're demanding
that kind of really in deep,
00:21:21.998 --> 00:21:25.333
like using endpoint in a library
kind of training.
00:21:26.082 --> 00:21:27.376
(Mohammed) Thank you.
00:21:31.883 --> 00:21:34.892
(woman) It's just a question.
I really liked that, thank you so much.
00:21:34.892 --> 00:21:37.819
Is it documented step-by-step anywhere?
00:21:39.194 --> 00:21:43.043
I can share my succession of tasks.
00:21:43.843 --> 00:21:47.100
That's very much tailored
to where I'm presenting it.
00:21:47.100 --> 00:21:50.697
Like I said, with librarians,
I start with manuscripts and go on.
00:21:53.697 --> 00:21:56.393
You want to end up
with people asking a question
00:21:56.393 --> 00:22:00.764
which is the question they came,
in their heads, to the event with.
00:22:04.764 --> 00:22:10.283
So there's an order
of querying with a triple,
00:22:10.283 --> 00:22:13.006
and then with multiple triples,
and then with an optional triple,
00:22:13.006 --> 00:22:17.147
and then with a measurement
in a filter, and so on.
00:22:17.147 --> 00:22:20.618
And, yeah, I can share...
00:22:22.438 --> 00:22:24.338
Yeah, I'll share a separate set of slides
00:22:24.338 --> 00:22:25.421
for those exercises.
00:22:25.421 --> 00:22:27.379
(woman) Thank you so much
because I will take that
00:22:27.379 --> 00:22:29.783
and customize it for my own needs.
Thank you.
00:22:31.010 --> 00:22:33.095
(Dan) Okay. No questions?
00:22:34.953 --> 00:22:38.994
(man) What would you recommend
if you also want to teach editing,
00:22:38.994 --> 00:22:41.595
apart from just querying?
00:22:46.968 --> 00:22:53.476
I'm pleased to report
that people find Wikidata editing,
00:22:53.476 --> 00:22:56.632
when I demonstrate it, to be so simple,
00:22:56.632 --> 00:22:58.943
that it just takes them by surprise.
00:22:58.943 --> 00:23:01.568
It's Wikidata editing,
and I've got to add knowledge
00:23:01.568 --> 00:23:03.018
to this huge knowledge base.
00:23:03.018 --> 00:23:05.435
Sounds like something
that really technical people can do.
00:23:05.435 --> 00:23:08.524
And then you show it,
and they go, "Oh, right.
00:23:08.524 --> 00:23:11.096
Martin is instance of human."
00:23:13.296 --> 00:23:18.851
So I haven't done that systematically yet.
00:23:21.498 --> 00:23:26.007
I think a precondition would be
getting people thinking in triples,
00:23:26.007 --> 00:23:29.675
and maybe underline that
triples need references,
00:23:29.675 --> 00:23:34.237
and triples need qualifiers
and that multiple triples,
00:23:34.237 --> 00:23:37.442
triples have multiple conflicting values.
00:23:37.442 --> 00:23:39.949
So I'd still do the toy world,
00:23:39.949 --> 00:23:45.149
maybe a more professionally relevant
toy world, and translation exercise,
00:23:45.149 --> 00:23:48.222
but then go to, "So now the exercise
we're going to do with triples
00:23:48.222 --> 00:23:49.661
is adding them."
00:23:51.561 --> 00:23:54.522
There's a lot of work done,
and maybe Jason's done,
00:23:54.522 --> 00:23:58.402
with guessing a table of identifiers.
00:23:58.402 --> 00:23:59.581
So something I'd like to do,
00:23:59.581 --> 00:24:03.710
there's an online database
00:24:03.710 --> 00:24:06.710
of people who've won a Rhodes Scholarship.
00:24:06.710 --> 00:24:10.616
There's a scholarship to Oxford University
from other countries.
00:24:10.616 --> 00:24:12.221
But it's not in Wikidata yet.
00:24:12.221 --> 00:24:14.381
So you can kind of divide up
the room and say,
00:24:14.381 --> 00:24:16.595
"You're going to find
these people in Wikidata
00:24:16.595 --> 00:24:18.874
and your task is to add
00:24:18.874 --> 00:24:21.106
with the reference
to this online database."
00:24:21.106 --> 00:24:23.449
And then you can do a query
to see how many have been added
00:24:23.449 --> 00:24:25.545
in that session.
00:24:25.545 --> 00:24:28.246
So I think, with all the training I do,
00:24:28.246 --> 00:24:31.582
I think the comprehension
is more important
00:24:31.582 --> 00:24:33.554
than the taking action immediately.
00:24:33.554 --> 00:24:35.543
So when I'm training people on Wikipedia,
00:24:35.543 --> 00:24:39.514
I first show them article histories,
contribution records, talk page,
00:24:39.514 --> 00:24:44.800
quality scale, so they're comprehending
the process before they edit,
00:24:44.800 --> 00:24:47.439
and actually change something.
00:24:49.939 --> 00:24:52.636
(man) Not really a question but a comment.
00:24:52.636 --> 00:24:58.570
There is, for beginners,
a good tutorial on YouTube,
00:24:58.570 --> 00:25:01.423
How to Query and Start with SPARQL,
00:25:01.423 --> 00:25:04.421
and if you want to go deeper, also,
00:25:04.421 --> 00:25:08.521
How to Add Data with OpenRefine.
00:25:08.521 --> 00:25:12.621
And I've also made some videos
00:25:12.621 --> 00:25:15.121
and uploaded them in German language.
00:25:15.121 --> 00:25:16.916
Oh, great! Thanks.
00:25:17.894 --> 00:25:21.823
I should also mention Hilary Thorsen,
who's from Stanford Library,
00:25:21.823 --> 00:25:25.076
did, last week,
a really good video capture
00:25:25.076 --> 00:25:28.857
of adding a data set to Wikidata
with OpenRefine.
00:25:28.857 --> 00:25:33.529
This is for the LD4P, the Linked Data
for Production project,
00:25:33.529 --> 00:25:35.932
and that was a really good video tutorial
00:25:35.932 --> 00:25:38.392
I'd recommend to anybody for--
00:25:38.392 --> 00:25:42.426
That's the next couple of levels up
from what I'm doing.
00:25:43.189 --> 00:25:45.029
(Dan) Is there a last question?
00:25:49.486 --> 00:25:52.203
(man) So SPARQL's sort of SQL-ish.
00:25:52.203 --> 00:25:54.856
If someone walked into your tutorial
with an SQL background,
00:25:54.856 --> 00:25:57.291
is that a blessing or a curse?
00:25:57.291 --> 00:26:00.164
It's a bit of a curse
because I had to learn SQL,
00:26:00.164 --> 00:26:03.398
so I did the...
00:26:03.398 --> 00:26:09.498
generate the invoices
using SQL for your fictitious company,
00:26:09.498 --> 00:26:14.369
and definitely had to unlearn
an SQL way of thinking about things
00:26:14.369 --> 00:26:15.712
to get to SPARQL.
00:26:15.712 --> 00:26:17.638
But it was freeing, it was freeing.
00:26:17.638 --> 00:26:21.302
Databases without built-in schemas
are liberating.
00:26:22.102 --> 00:26:24.042
When you think about
how many columns there are,
00:26:24.042 --> 00:26:25.727
and it's this number
of columns for a book,
00:26:25.727 --> 00:26:27.638
and it's this number of columns
for the address,
00:26:27.638 --> 00:26:28.984
and it's just three columns.
00:26:28.984 --> 00:26:31.406
Well, three and a bit more.
00:26:31.406 --> 00:26:34.443
That's really liberating.
00:26:34.443 --> 00:26:36.814
So that's my point, I kind of glanced at,
00:26:36.814 --> 00:26:41.810
that people make different progress
in these workshops as in all training,
00:26:41.810 --> 00:26:43.869
but it's not like intelligent versus dumb,
00:26:43.869 --> 00:26:46.588
it's like the preconceptions
you're coming with,
00:26:46.588 --> 00:26:47.823
are more the obstacle.
00:26:47.823 --> 00:26:50.242
So it's actually more--
00:26:50.242 --> 00:26:55.655
I'm more optimistic about training people
who have never encountered databases,
00:26:55.655 --> 00:26:58.805
coding, or any of that before, than...
00:26:58.805 --> 00:27:02.232
The worst people to try and train
are linked data experts
00:27:02.232 --> 00:27:04.631
because they've used DBpedia a lot.
00:27:04.631 --> 00:27:07.180
They used a particular approach
of querying
00:27:07.180 --> 00:27:08.834
and expecting to get certain things,
00:27:08.834 --> 00:27:12.429
and it looks odd when Wikidata
does things differently.
00:27:12.429 --> 00:27:14.540
And they need to get with the program.
00:27:15.205 --> 00:27:17.867
(Dan) Okay, let's thank Martin
for his insights.
00:27:17.867 --> 00:27:18.884
Thanks very much.
00:27:18.884 --> 00:27:21.888
(audience applause)