1
00:00:05,882 --> 00:00:07,218
(Dan) Hello everyone.
2
00:00:07,218 --> 00:00:09,911
So this session is about teaching SPARQL.
3
00:00:09,911 --> 00:00:12,423
The presenter is Martin Poulter,
so I leave you the stage.
4
00:00:12,423 --> 00:00:13,668
Have fun.
5
00:00:13,668 --> 00:00:14,943
(Martin) Thank you very much.
6
00:00:16,501 --> 00:00:18,717
Hi, everybody.
7
00:00:18,717 --> 00:00:23,355
I trust you'll agree
that Wikidata is great,
8
00:00:23,355 --> 00:00:27,171
it has lots of interesting data
on different topics,
9
00:00:27,171 --> 00:00:31,225
the tools people make with it
are fun to use and fun to explore,
10
00:00:31,225 --> 00:00:33,412
and easy to use.
11
00:00:33,412 --> 00:00:38,578
And maybe you'll agree with the suggestion
that to get the best out of Wikidata
12
00:00:38,578 --> 00:00:40,142
you need to know SPARQL,
13
00:00:40,142 --> 00:00:42,040
you need to be able to phrase
your own queries.
14
00:00:42,040 --> 00:00:45,141
So you might see that
as a barrier, an obstacle,
15
00:00:45,141 --> 00:00:50,183
that we ideally need a big program
of training for developers,
16
00:00:50,183 --> 00:00:54,008
for librarians, for curators,
for ordinary people
17
00:00:54,008 --> 00:00:58,236
to get them literate in this language,
and that's a big effort,
18
00:01:01,036 --> 00:01:04,031
an aspect of Wikidata outreach.
19
00:01:04,031 --> 00:01:06,238
My suggestion is to kind of
turn that around,
20
00:01:06,238 --> 00:01:09,037
that Wikidata,
especially the Query Service,
21
00:01:09,037 --> 00:01:11,673
because it's so helpful,
because it's so full of good stuff,
22
00:01:11,673 --> 00:01:13,857
because it's so colorful,
23
00:01:13,857 --> 00:01:16,200
because it has so many
visualization abilities,
24
00:01:16,200 --> 00:01:20,173
is the ideal platform
for people to learn SPARQL,
25
00:01:20,173 --> 00:01:21,890
also to learn about databases,
26
00:01:21,890 --> 00:01:23,724
learn about knowledge representation,
27
00:01:23,724 --> 00:01:25,305
learn about data and computers.
28
00:01:25,305 --> 00:01:28,671
There's no necessity
that someone's first encounter
29
00:01:28,671 --> 00:01:32,106
with data and computers,
has to be a relational database system.
30
00:01:32,106 --> 00:01:33,947
So I'm going to put forward,
31
00:01:33,947 --> 00:01:36,539
I'm going to report on
a training workshop
32
00:01:36,539 --> 00:01:40,330
I've delivered to library staff
in University of Oxford,
33
00:01:40,330 --> 00:01:42,550
and I've also done as a public event,
34
00:01:42,550 --> 00:01:46,710
so just with members of the public
coming to an open data week
35
00:01:46,710 --> 00:01:47,875
that university hosted.
36
00:01:47,875 --> 00:01:51,979
And also done some of this
with researchers as well.
37
00:01:51,979 --> 00:01:57,441
So I teach in a way
that is very particular to me,
38
00:01:57,441 --> 00:01:59,847
so it's not like
I hand over materials to you.
39
00:01:59,847 --> 00:02:03,164
I'll show you my approach
and then you'll take it up
40
00:02:03,164 --> 00:02:05,902
and improve on it,
and make it personal to you
41
00:02:05,902 --> 00:02:08,469
and the audiences you're dealing with.
42
00:02:08,469 --> 00:02:10,253
And I want to avoid this.
43
00:02:10,253 --> 00:02:16,256
So in my career, I had to learn
data technologies, and SQL, and XML,
44
00:02:16,256 --> 00:02:19,610
and the content of tutorials,
45
00:02:19,610 --> 00:02:23,400
or examples, is very much like this.
46
00:02:23,400 --> 00:02:26,330
I'm not objecting to the language--
because that's what you got to learn--
47
00:02:26,330 --> 00:02:28,969
but employees, invoices.
48
00:02:28,969 --> 00:02:32,708
So your task might be
you have a sales force
49
00:02:32,708 --> 00:02:36,913
and you've got to identify
the person who sold the most items,
50
00:02:36,913 --> 00:02:38,369
and calculate their bonus
51
00:02:38,369 --> 00:02:41,541
and then issue the invoices
to the customers,
52
00:02:41,541 --> 00:02:44,707
and it's the most boring--
I can't get excited about that,
53
00:02:44,707 --> 00:02:48,195
or I don't feel like I'm learning a topic.
54
00:02:48,195 --> 00:02:51,662
With Wikidata, we have so many topics
we can engage people in,
55
00:02:51,665 --> 00:02:54,613
and it might be things
in the solar system,
56
00:02:54,613 --> 00:02:56,591
or characters in Shakespeare,
57
00:02:56,591 --> 00:02:59,765
or things in the solar system
named after characters in Shakespeare,
58
00:02:59,765 --> 00:03:01,897
which is what most of this is.
59
00:03:03,497 --> 00:03:05,739
So when you have a teaching approach,
60
00:03:05,739 --> 00:03:08,395
one question is
what things do you leave out.
61
00:03:09,295 --> 00:03:15,271
So in the workshop I run,
I don't explain what SPARQL stands for,
62
00:03:15,271 --> 00:03:18,193
that doesn't help you write SPARQL at all.
63
00:03:18,193 --> 00:03:20,591
It doesn't help to explain what RDF is.
64
00:03:20,591 --> 00:03:22,763
Obviously, it's historically
really important,
65
00:03:22,763 --> 00:03:25,713
but telling people there's a format
for describing resources
66
00:03:25,713 --> 00:03:27,630
that's called resource description format,
67
00:03:27,630 --> 00:03:30,966
and resource is whatever's described,
it's not really a format.
68
00:03:30,966 --> 00:03:32,226
That doesn't help people,
69
00:03:32,226 --> 00:03:36,650
that gets people no closer to actually,
practically, using this.
70
00:03:36,650 --> 00:03:40,639
Linked open data, LOD, I may mention.
71
00:03:40,639 --> 00:03:44,317
So the library museum professionals
that come to my training
72
00:03:44,317 --> 00:03:46,830
have definitely heard about
linked open data,
73
00:03:46,830 --> 00:03:50,697
and know that it's the future
of their discipline,
74
00:03:50,697 --> 00:03:52,564
and it's going to
revolutionize their work.
75
00:03:52,564 --> 00:03:54,879
But at the moment,
they're not using that kind of system.
76
00:03:54,879 --> 00:03:58,404
So they've not seen a real
practical example of that technology.
77
00:03:58,404 --> 00:04:00,206
So that's what
they're going to get from this.
78
00:04:00,206 --> 00:04:01,895
So I might mention linked open data,
79
00:04:01,895 --> 00:04:03,971
but I don't get into the definition.
80
00:04:03,971 --> 00:04:06,404
I basically say, this is a service
you can use for free.
81
00:04:06,404 --> 00:04:08,113
It's been given to you to use for free,
82
00:04:08,113 --> 00:04:10,675
and that gets the point across.
83
00:04:10,675 --> 00:04:14,925
Semantic identifiers and namespaces,
84
00:04:14,925 --> 00:04:16,518
I want to get across implicitly,
85
00:04:16,518 --> 00:04:18,294
I don't want to teach people
these concepts,
86
00:04:18,294 --> 00:04:21,271
I want them to pick up the concepts
even if I don't use the terms.
87
00:04:21,271 --> 00:04:26,536
Reification, so people already
using a RDF database want to know
88
00:04:26,536 --> 00:04:31,432
does Wikidata have statement IDs,
and I try to avoid that.
89
00:04:31,432 --> 00:04:33,855
I hardly even mention Wikidata.
90
00:04:33,855 --> 00:04:39,048
So these workshops are advertised
as like Introduction to SPARQL,
91
00:04:39,048 --> 00:04:41,027
or for the public event one, it was
92
00:04:41,027 --> 00:04:45,097
Asking and Answering Questions
with Open Data.
93
00:04:45,097 --> 00:04:47,826
And then in the blurb, I'd say
we're going to be using this platform,
94
00:04:47,826 --> 00:04:50,268
And I'll introduce it and say,
well, this is the best platform
95
00:04:50,268 --> 00:04:52,815
on which to learn
this language, this skill.
96
00:04:52,815 --> 00:04:55,138
It's the most helpful,
it's got the most interesting stuff.
97
00:04:55,138 --> 00:04:57,265
And then in the course of the workshop,
98
00:04:57,265 --> 00:04:58,969
maybe we'll get into more about Wikidata,
99
00:04:58,969 --> 00:05:02,351
why this exists, who put this data here.
100
00:05:02,351 --> 00:05:04,501
So there's a whole lot of background
101
00:05:04,501 --> 00:05:08,347
that kind of professional RDF
or link data people will have,
102
00:05:08,347 --> 00:05:09,942
but you don't need.
103
00:05:09,942 --> 00:05:13,737
I just want to get people thinking
about nodes and arcs,
104
00:05:13,737 --> 00:05:15,699
and thinking in triples,
105
00:05:15,699 --> 00:05:19,690
and imagining how a triple representation
can be created and queried.
106
00:05:19,690 --> 00:05:22,897
I want them to phrase questions
in their own language,
107
00:05:22,897 --> 00:05:27,252
and translate into SPARQL,
via a kind of a baby talk intermediary.
108
00:05:27,252 --> 00:05:28,984
But I want them to think in triples
109
00:05:28,984 --> 00:05:34,740
and get used to asking questions
in that way, and just to get to the point
110
00:05:34,740 --> 00:05:38,887
where they ask interesting questions
relevant to their work, or their hobbies,
111
00:05:38,887 --> 00:05:42,395
or whatever, and they come away
with something.
112
00:05:42,395 --> 00:05:44,107
So it's not the theoretical understanding
113
00:05:44,107 --> 00:05:46,835
that I'm getting
in these quite short sessions.
114
00:05:46,835 --> 00:05:50,285
And the first thing I present them with
is this, they've got to look at this.
115
00:05:50,285 --> 00:05:53,650
And there's a "what the hell?" reaction
116
00:05:53,650 --> 00:05:55,496
in the workshop
and probably in the room now,
117
00:05:55,496 --> 00:05:59,361
because, "I thought this was
about technology skills!
118
00:05:59,361 --> 00:06:01,512
Why have we got to look at a cute dog?"
119
00:06:01,512 --> 00:06:05,289
But this is to introduce my toy world.
120
00:06:05,289 --> 00:06:10,525
So there are three human beings.
Two of them are a married couple.
121
00:06:10,525 --> 00:06:13,054
One is the child from that couple.
122
00:06:13,054 --> 00:06:16,678
There are two beings
that are pets of this couple,
123
00:06:16,678 --> 00:06:19,119
and we've got the types of the pets.
124
00:06:19,119 --> 00:06:20,839
Clearly, this is not official data.
125
00:06:20,839 --> 00:06:23,922
This knowledge representation,
which it is,
126
00:06:23,922 --> 00:06:26,854
only exists in this slide,
it's not a database.
127
00:06:26,854 --> 00:06:28,780
So I'm getting people thinking
of a toy world.
128
00:06:28,780 --> 00:06:30,512
And there's loads that can be learnt
129
00:06:30,512 --> 00:06:33,491
with just discussing this,
and kind of role-playing about this.
130
00:06:33,491 --> 00:06:38,121
And you're going to
make your own toy world.
131
00:06:40,721 --> 00:06:43,701
So a point to come from this
is this isn't a representation
132
00:06:43,701 --> 00:06:47,102
of all of my family
or of all my parent's pets.
133
00:06:47,102 --> 00:06:49,311
It's a tiny fragment.
134
00:06:49,311 --> 00:06:50,787
When we query things,
135
00:06:50,787 --> 00:06:53,261
we're querying a representation
of the world, not the world.
136
00:06:53,261 --> 00:06:55,150
There's so much that's missed out.
137
00:06:56,150 --> 00:07:01,104
That's a really important first lesson
to get about any database, any querying.
138
00:07:01,104 --> 00:07:06,281
So everything's expressed
in triples, and nodes, and arcs.
139
00:07:06,281 --> 00:07:08,427
Arcs have a direction.
140
00:07:08,427 --> 00:07:09,529
How do the names work?
141
00:07:09,529 --> 00:07:12,507
So one of these nodes is marked Bob.
142
00:07:12,507 --> 00:07:17,207
Is that the name Bob,
does that stand for the name Bob?
143
00:07:17,207 --> 00:07:20,624
Well, not quite, because other people
use the name Bob.
144
00:07:20,624 --> 00:07:22,535
And Dan, you probably know a Bob.
145
00:07:22,535 --> 00:07:23,649
(Dan) Like Bob [inaudible].
146
00:07:23,649 --> 00:07:25,247
Yeah, you know a Bob.
147
00:07:25,247 --> 00:07:28,617
And that's the Bob I think--
no, that isn't this Bob.
148
00:07:28,617 --> 00:07:29,642
So we talk about that.
149
00:07:29,642 --> 00:07:32,359
So names are relative
to the system that they're in,
150
00:07:32,359 --> 00:07:36,327
and we could talk about Martin's Bob
and Dan's Bob not being the same person.
151
00:07:36,327 --> 00:07:37,696
So it's not the names.
152
00:07:37,696 --> 00:07:39,878
So we could think of them
as relative to a system.
153
00:07:39,878 --> 00:07:43,828
So we can even say Martin:Bob
is the name for one thing,
154
00:07:43,828 --> 00:07:47,775
and Dan:Bob identifies another thing
in another system.
155
00:07:49,375 --> 00:07:52,121
And I emphasize triples, so three things.
156
00:07:52,121 --> 00:07:57,754
You might be tempted to say,
"Cindy and Bob, together, have a pet dog,"
157
00:07:58,511 --> 00:08:03,995
but you can't do that in this system
unless you have a node for the couple.
158
00:08:03,995 --> 00:08:07,350
Things have to have a direction.
That may not make much sense.
159
00:08:07,350 --> 00:08:09,673
There's a married couple--
that doesn't have a direction,
160
00:08:09,673 --> 00:08:11,196
that's a relation between two people,
161
00:08:11,196 --> 00:08:14,014
but we are modeling it
with things that have a direction
162
00:08:14,014 --> 00:08:17,464
so we have to have the two directions.
163
00:08:17,464 --> 00:08:18,962
There are arbitrary choices.
164
00:08:18,962 --> 00:08:24,206
So why have "Cindy has child, Martin,
and not Martin has parent, Cindy?"
165
00:08:24,206 --> 00:08:25,598
It's an arbitrary choice.
166
00:08:25,598 --> 00:08:28,605
Arbitrary choices like that--
choices of name, choices of direction--
167
00:08:28,605 --> 00:08:31,140
are built into this system and intrinsic.
168
00:08:31,140 --> 00:08:32,871
So there are arbitrary choices to be made,
169
00:08:32,871 --> 00:08:34,656
how to represent this,
170
00:08:34,656 --> 00:08:37,794
even the same facts
could be represented in different ways.
171
00:08:37,794 --> 00:08:39,233
Who makes that decision?
172
00:08:39,233 --> 00:08:40,731
Well, whoever creates the system,
173
00:08:40,731 --> 00:08:45,069
whoever sets up
the knowledge-based system.
174
00:08:45,069 --> 00:08:49,330
So people can see that this--
called serializable--
175
00:08:49,330 --> 00:08:52,459
this could be expressed
as triple statements.
176
00:08:52,459 --> 00:08:58,468
So, "Cindy has pet, Tilly,
Martin is a human,"
177
00:08:58,468 --> 00:09:02,393
and getting to the core insight
178
00:09:02,393 --> 00:09:06,970
is comparing how do we make
a question in English?
179
00:09:06,970 --> 00:09:10,953
Well, we have a statement
and it's incomplete,
180
00:09:10,953 --> 00:09:16,762
like, "Who has pet, Tilly?"
181
00:09:16,762 --> 00:09:21,585
So we go from "Cindy has pet Tilly,"
to "Who has pet Tilly?"
182
00:09:21,585 --> 00:09:23,316
We've taken something out,
183
00:09:23,316 --> 00:09:27,522
we've put in a placeholder,
and we've introduced a question mark.
184
00:09:27,522 --> 00:09:30,080
I say that's just like
what we do with SPARQL.
185
00:09:30,080 --> 00:09:33,053
We take something out,
we have an incomplete statement,
186
00:09:33,053 --> 00:09:35,930
or incomplete statements,
187
00:09:35,930 --> 00:09:40,213
we put a placeholder in the missing place,
and we have a question mark
188
00:09:40,213 --> 00:09:42,645
to mark that that's a placeholder.
189
00:09:42,645 --> 00:09:47,164
So it can be a role play
where I'm the query service
190
00:09:47,164 --> 00:09:49,383
for this knowledge base.
191
00:09:49,383 --> 00:09:53,906
And so people can learn
what a query service does
192
00:09:53,906 --> 00:09:56,969
by seeing a query service and role-playing
193
00:09:56,969 --> 00:09:59,709
and being a query service,
which we'll get to.
194
00:10:00,909 --> 00:10:05,414
So people can see that
working on the level of triples.
195
00:10:07,214 --> 00:10:09,371
"Who has pet, Tilly?"
196
00:10:09,371 --> 00:10:14,480
If you say that to me, and I can say,
"results Cindy, Bob."
197
00:10:14,480 --> 00:10:17,774
Then I put it to the trainees,
198
00:10:17,774 --> 00:10:19,534
how do you ask more complicated questions?
199
00:10:19,534 --> 00:10:22,436
So, "Who has a dog as a pet?"
200
00:10:23,646 --> 00:10:28,701
And some will get it straightaway,
some will say, "Oh, it's a triple--
201
00:10:28,701 --> 00:10:33,075
Who? has pet dog?"
202
00:10:33,075 --> 00:10:38,103
So my role as the query service
is to look at this and match your triple,
203
00:10:38,103 --> 00:10:39,385
"Who? has pet dog,"
204
00:10:39,385 --> 00:10:41,522
so I got to find things that have pet dog,
205
00:10:41,522 --> 00:10:43,024
and results None.
206
00:10:43,024 --> 00:10:48,082
So this is the discussion--
what is this node I've called dog?
207
00:10:48,082 --> 00:10:49,231
It's not a dog.
208
00:10:49,231 --> 00:10:53,250
Although it's called dog,
it's not a dog, it stands for a class.
209
00:10:53,250 --> 00:10:56,130
Obvious when you're a SPARQL user,
but this is getting people
210
00:10:56,130 --> 00:10:59,054
over the threshold
of thinking in this way.
211
00:10:59,054 --> 00:11:02,319
And you got to do
what kinds of things have pets.
212
00:11:02,319 --> 00:11:05,258
People see that they can't do that
in one triple,
213
00:11:05,258 --> 00:11:06,572
you got to do multiple triples,
214
00:11:06,572 --> 00:11:10,126
and those multiple triples
ask for multiple things.
215
00:11:12,726 --> 00:11:16,588
So if you've got,
"What kinds of things have pets?"
216
00:11:16,588 --> 00:11:18,861
then you're going to identify people,
217
00:11:18,861 --> 00:11:21,070
and then you've got to
identify those types,
218
00:11:21,070 --> 00:11:24,362
and it naturally comes up,
"How do I specify the columns I want?
219
00:11:24,362 --> 00:11:27,365
How do I specify that I want the types?"
That's the question.
220
00:11:27,365 --> 00:11:29,838
And then you say,
"You have these partial statements,
221
00:11:29,838 --> 00:11:34,643
and you enclose them
in curly brackets and put Select."
222
00:11:37,943 --> 00:11:41,137
So this is kind of the first half hour
of the workshop,
223
00:11:41,137 --> 00:11:44,162
and it's not on computers,
it's all with role play
224
00:11:44,162 --> 00:11:45,743
and thinking about this.
225
00:11:45,743 --> 00:11:51,776
And I invite people in the workshop
to make their own toy world,
226
00:11:51,776 --> 00:11:54,506
and you'll be going toy world,
I hope, after this.
227
00:11:54,506 --> 00:11:59,702
So five minutes, eight to ten nodes
to represent your family, your work place,
228
00:11:59,702 --> 00:12:02,351
the thing you're working on,
the TV you were watching last night,
229
00:12:02,351 --> 00:12:05,166
and to have some
meaningful links between them.
230
00:12:05,166 --> 00:12:08,688
And the lesson that--
you make arbitrary decisions,
231
00:12:08,688 --> 00:12:10,516
you name things, you create properties,
232
00:12:10,516 --> 00:12:17,228
but they're the creation of the person
who sets up the knowledge system.
233
00:12:17,558 --> 00:12:24,394
And then, in pairs, they explain
their graphs to each other, and query.
234
00:12:24,394 --> 00:12:28,166
So, "What's a query you could ask
about this little world,
235
00:12:28,166 --> 00:12:29,570
and then what would be the answer?"
236
00:12:29,570 --> 00:12:33,730
So, like I say, people mostly get it,
237
00:12:33,730 --> 00:12:36,451
but people want a four-
or five-part relation,
238
00:12:36,451 --> 00:12:38,088
so they might want to say,
239
00:12:38,088 --> 00:12:39,958
"This couple, together, have a pet."
240
00:12:39,958 --> 00:12:43,204
Or they might want to say,
"Tilly is a pet, is a dog."
241
00:12:43,204 --> 00:12:47,207
And you can enforce nodes, triples,
and triples have a direction.
242
00:12:48,307 --> 00:12:51,258
So I'll explain what a triple is
and say also, not in this example,
243
00:12:51,258 --> 00:12:54,639
but, "Triples, generally,
they have an item, they have a property,
244
00:12:54,639 --> 00:12:57,307
and then they have
a number of other things
245
00:12:57,307 --> 00:12:59,516
which could be values,
could be time periods,
246
00:12:59,516 --> 00:13:03,104
could be locations on a globe."
247
00:13:07,288 --> 00:13:11,235
So with that role-play exercise,
we're 40 minutes into a 2-hour workshop,
248
00:13:11,235 --> 00:13:14,270
and in a computer room,
and we haven't touched computers yet.
249
00:13:14,270 --> 00:13:17,387
But I think it's useful
to get people thinking in that way,
250
00:13:17,387 --> 00:13:19,535
and to think about
how they would make the model
251
00:13:19,535 --> 00:13:23,793
and what the query is,
and to actually translate,
252
00:13:23,793 --> 00:13:25,149
so your translation exercise.
253
00:13:26,339 --> 00:13:32,597
And then I'd direct people to
query.wikidata.org.
254
00:13:34,197 --> 00:13:36,240
So there's a bunch of things
they've got to take on.
255
00:13:36,240 --> 00:13:40,086
We've been doing--
I will have a flip chart, and we will--
256
00:13:40,086 --> 00:13:41,539
Is that six?
257
00:13:41,539 --> 00:13:43,290
Six minutes elapsed?
258
00:13:43,290 --> 00:13:45,278
(man) [inaudible]
259
00:13:45,278 --> 00:13:46,318
Right.
260
00:13:50,548 --> 00:13:52,485
So I'll give them a task.
261
00:13:52,485 --> 00:13:55,679
I don't want them to learn
Q numbers and P numbers.
262
00:13:55,679 --> 00:14:00,646
So I'll tell them what the names are
and show them the Ctrl+Shift trick.
263
00:14:00,646 --> 00:14:01,894
But there's a lot to take on,
264
00:14:01,894 --> 00:14:04,210
so they're taking on
Q numbers and P numbers,
265
00:14:04,210 --> 00:14:08,240
they've seen the triple format,
and they've seen Select,
266
00:14:08,240 --> 00:14:11,338
but they've got to apply this
all in one go.
267
00:14:11,338 --> 00:14:14,538
So I'll give people a task.
268
00:14:14,538 --> 00:14:17,299
Some will get it immediately,
some will struggle
269
00:14:17,299 --> 00:14:18,896
because they missed a bit of discussion,
270
00:14:18,896 --> 00:14:22,866
or more often, because they're familiar
with another kind of database system,
271
00:14:22,866 --> 00:14:25,490
and they have
particular expectations from that.
272
00:14:26,890 --> 00:14:30,656
So I set bonus things
or more complicated things
273
00:14:30,656 --> 00:14:31,874
if people are getting bored.
274
00:14:31,874 --> 00:14:37,828
Or I say, "If you get bored and you work
on an entirely different question,
275
00:14:37,828 --> 00:14:40,058
that's fine, but show me."
276
00:14:40,058 --> 00:14:42,254
So I'll run through this in front of them,
277
00:14:42,254 --> 00:14:45,617
tell them to do it, just show the hints
of what properties they'll be using,
278
00:14:45,617 --> 00:14:46,979
and then run through it again.
279
00:14:46,979 --> 00:14:50,277
And then, go through the cycle
of adding on extra things
280
00:14:50,277 --> 00:14:51,280
to enhance the query.
281
00:14:51,280 --> 00:14:53,084
So we might have done a query
and I'll say,
282
00:14:53,084 --> 00:14:55,522
"Here's how you add on
an optional property."
283
00:14:57,822 --> 00:15:01,046
And then give them a task
involving optional property.
284
00:15:01,046 --> 00:15:04,518
In the Bodleian, I say,
"Find manuscripts in Latin
285
00:15:04,518 --> 00:15:06,326
for a public event
at University of Bristol,
286
00:15:06,326 --> 00:15:09,255
where there's lots of celebrities
who study at the University of Bristol,
287
00:15:09,255 --> 00:15:14,113
so get that as an example."
288
00:15:14,113 --> 00:15:15,933
So going to the interface,
289
00:15:15,933 --> 00:15:20,949
there's still a hump in the learning curve
290
00:15:20,949 --> 00:15:24,199
because they've got
to put the query into action,
291
00:15:24,199 --> 00:15:25,752
they've got to think in this language,
292
00:15:25,752 --> 00:15:29,879
and they've got to look up
Q numbers and P numbers,
293
00:15:29,879 --> 00:15:32,246
and then there's all the things
they can do with the query,
294
00:15:32,246 --> 00:15:33,283
once they've done it.
295
00:15:33,283 --> 00:15:37,627
And the visualization options,
the bookmarking, getting the data.
296
00:15:43,881 --> 00:15:45,635
So I'll suggest refinements.
297
00:15:45,635 --> 00:15:50,264
So we can take a succession of steps
of getting people doing a query,
298
00:15:50,264 --> 00:15:53,215
and taking it up to the next level.
299
00:15:53,215 --> 00:15:56,069
Like, "Find landscape paintings
taller than they are wide."
300
00:15:56,069 --> 00:16:02,658
So within the two-hour thing,
we get people doing basic queries,
301
00:16:02,658 --> 00:16:07,803
adding refinements onto them,
302
00:16:07,803 --> 00:16:11,164
not doing much filtering,
303
00:16:11,164 --> 00:16:13,893
but starting to introduce measurements,
304
00:16:13,893 --> 00:16:14,982
and so on.
305
00:16:14,982 --> 00:16:17,782
Not getting into qualifiers
or another level.
306
00:16:17,782 --> 00:16:20,816
If it's a whole day thing,
you probably could.
307
00:16:20,816 --> 00:16:25,526
It comes up, inevitably, "Where else
can I use the SPARQL language?"
308
00:16:25,526 --> 00:16:29,581
And I observe that that is a question,
and questions can be framed in SPARQL,
309
00:16:29,581 --> 00:16:31,671
and put to Wikidata,
and you'll get answers,
310
00:16:31,671 --> 00:16:34,444
and there is a Wikidata property
called SPARQL endpoint.
311
00:16:34,444 --> 00:16:36,888
So when they ask that,
that becomes their task.
312
00:16:36,888 --> 00:16:38,809
And then they get
that list of institutions
313
00:16:38,809 --> 00:16:40,369
that have SPARQL endpoints.
314
00:16:42,499 --> 00:16:43,877
And it's worth pointing out,
315
00:16:43,877 --> 00:16:48,647
so in an introductory session
on other computer languages,
316
00:16:48,647 --> 00:16:52,065
people will typically
learn how to do loops,
317
00:16:52,065 --> 00:16:55,477
how to do functions,
how to do conditionals.
318
00:16:55,477 --> 00:16:56,803
They'll learn the basic grammar
319
00:16:56,803 --> 00:16:59,735
but they won't make something
fantastic and useful,
320
00:16:59,735 --> 00:17:01,663
they'll just learn the basic grammar.
321
00:17:01,663 --> 00:17:06,458
But in an introductory session
on Wikidata SPARQL you can make--
322
00:17:06,458 --> 00:17:08,142
if you're interested
in German literature--
323
00:17:08,142 --> 00:17:10,333
a map of the birthplace
of German poets, and so on.
324
00:17:10,333 --> 00:17:12,097
And so we get feedback like this.
325
00:17:12,097 --> 00:17:14,196
This is how great
the Wikidata Query Service is
326
00:17:14,196 --> 00:17:16,266
as an educational tool.
327
00:17:16,266 --> 00:17:19,298
"What is this sorcery?"
Isn't even from someone in the room.
328
00:17:19,298 --> 00:17:21,226
A trainee in the room made a map,
329
00:17:21,226 --> 00:17:24,702
emailed it to her colleagues
and got back, "What is this sorcery!?
330
00:17:24,702 --> 00:17:25,703
How have you made this?"
331
00:17:25,703 --> 00:17:29,428
And was just not expecting this to happen.
332
00:17:29,428 --> 00:17:32,271
People are not expecting to look at
the picture of the cute dog,
333
00:17:32,271 --> 00:17:36,243
they're not expecting to do the role play
where they represent their family
334
00:17:36,243 --> 00:17:37,865
and query each other.
335
00:17:37,865 --> 00:17:40,210
They're not expecting
to actually make something concrete
336
00:17:40,210 --> 00:17:42,587
which they take away as a link
and show to their colleagues.
337
00:17:42,587 --> 00:17:45,010
And all of this, being unexpected,
338
00:17:45,010 --> 00:17:47,092
makes it memorable
and makes them want to go away
339
00:17:47,092 --> 00:17:48,527
and talk to other people about it.
340
00:17:48,527 --> 00:17:51,399
It's not like your run-of-the-mill
IT training.
341
00:17:52,699 --> 00:17:58,020
The lower quote is from a researcher
who saw how he could make a map
342
00:17:58,020 --> 00:18:00,761
of famous people with his first name
343
00:18:00,761 --> 00:18:04,421
and another one of famous people
with his wife's first name.
344
00:18:04,421 --> 00:18:07,819
And then he just had more and more ideas
of things and charts, and so on,
345
00:18:07,819 --> 00:18:09,469
he's going to create with Wikidata,
346
00:18:09,469 --> 00:18:10,967
and so he's glad to say,
347
00:18:10,967 --> 00:18:13,297
"You've destroyed my productivity
for the next month."
348
00:18:15,805 --> 00:18:17,601
So that's my recommendation.
349
00:18:17,601 --> 00:18:19,702
I think we can take it as a positive,
350
00:18:19,702 --> 00:18:22,985
and we take beyond
training people about Wikidata,
351
00:18:22,985 --> 00:18:24,671
training people about data.
352
00:18:24,671 --> 00:18:26,716
The stuff that came up
in the keynote this morning,
353
00:18:26,716 --> 00:18:32,468
making people literate
about ideas of representation
354
00:18:32,468 --> 00:18:36,568
and starting people off
and being involved in that discussion,
355
00:18:36,568 --> 00:18:37,722
involves this [inaudible].
356
00:18:37,722 --> 00:18:38,816
So this could be done--
357
00:18:38,816 --> 00:18:40,822
doesn't have to be like
a workplace training thing,
358
00:18:40,822 --> 00:18:42,134
it could be a public event,
359
00:18:42,134 --> 00:18:45,250
to get people familiar
with these technologies.
360
00:18:46,150 --> 00:18:48,302
But I will stop there for discussion.
361
00:18:48,302 --> 00:18:51,150
And like I say, it's respectfully
submitted to people in the room
362
00:18:51,150 --> 00:18:55,280
who do SPARQL training a different way,
but I hope this is useful to you.
363
00:18:57,180 --> 00:19:00,184
(audience applause)
364
00:19:12,915 --> 00:19:15,721
(Dan) Okay, are there any questions?
365
00:19:23,511 --> 00:19:26,605
(man) Hi, it's [Mohammed Hijah]
from Palestine.
366
00:19:26,605 --> 00:19:28,420
Thank you for the session.
367
00:19:28,420 --> 00:19:30,921
I was wondering if there are resources
368
00:19:30,921 --> 00:19:35,131
that we can get to learn
SPARQL language professionally?
369
00:19:37,899 --> 00:19:40,213
I've got the SPARQL book,
the O'Reilly book.
370
00:19:40,213 --> 00:19:43,413
I find the Wikibook on SPARQL
371
00:19:43,413 --> 00:19:44,987
is really, really useful.
372
00:19:44,987 --> 00:19:48,387
That's like the most useful
and accessible reference.
373
00:19:49,287 --> 00:19:54,570
The tutorials on Wikidata itself
are going to vary in quality.
374
00:19:55,170 --> 00:19:57,694
(Mohammed) I think
that they are for beginners.
375
00:19:57,694 --> 00:20:01,240
I can handle with SPARQL
but in the beginner level,
376
00:20:01,240 --> 00:20:04,343
but I want to deal with it professionally.
377
00:20:10,864 --> 00:20:13,609
So my concern is to get
as many people as possible
378
00:20:13,609 --> 00:20:16,292
across the threshold
into being aware of how this works,
379
00:20:16,292 --> 00:20:17,925
and dabbling.
380
00:20:19,225 --> 00:20:24,920
I'd like it to be a deeper course
by going into more of the...
381
00:20:26,220 --> 00:20:29,120
how it works--
qualifiers and references, and so on.
382
00:20:29,120 --> 00:20:31,809
Where in a professional context,
you're probably aiming towards
383
00:20:31,809 --> 00:20:35,923
people using a particular SPARQL endpoint,
384
00:20:35,923 --> 00:20:39,123
and Wikidata has some customizations
385
00:20:39,123 --> 00:20:41,636
We've discussed in Twitter
that there's some things we use
386
00:20:41,636 --> 00:20:43,548
that actually aren't a SPARQL standard.
387
00:20:43,548 --> 00:20:46,130
They're like an optimization.
388
00:20:46,130 --> 00:20:48,816
So in the professional context,
389
00:20:50,516 --> 00:20:56,190
I'd hope it would be tailored
to that particular data set and endpoint,
390
00:20:56,190 --> 00:20:59,575
but there's not a demand for that yet,
391
00:20:59,575 --> 00:21:03,459
because like I said, I deal with people
who are aware of linked open data,
392
00:21:03,459 --> 00:21:07,558
and the word out, it's a good thing,
but haven't seen an example yet,
393
00:21:07,558 --> 00:21:09,446
haven't an example
they can apply to their work,
394
00:21:09,446 --> 00:21:11,693
they're not enthusiastic about it yet.
395
00:21:11,693 --> 00:21:13,843
So I think we want to
get my whole workplace
396
00:21:13,843 --> 00:21:17,726
and other workplaces and developers
across that threshold
397
00:21:17,726 --> 00:21:21,998
to where they're demanding
that kind of really in deep,
398
00:21:21,998 --> 00:21:25,333
like using endpoint in a library
kind of training.
399
00:21:26,082 --> 00:21:27,376
(Mohammed) Thank you.
400
00:21:31,883 --> 00:21:34,892
(woman) It's just a question.
I really liked that, thank you so much.
401
00:21:34,892 --> 00:21:37,819
Is it documented step-by-step anywhere?
402
00:21:39,194 --> 00:21:43,043
I can share my succession of tasks.
403
00:21:43,843 --> 00:21:47,100
That's very much tailored
to where I'm presenting it.
404
00:21:47,100 --> 00:21:50,697
Like I said, with librarians,
I start with manuscripts and go on.
405
00:21:53,697 --> 00:21:56,393
You want to end up
with people asking a question
406
00:21:56,393 --> 00:22:00,764
which is the question they came,
in their heads, to the event with.
407
00:22:04,764 --> 00:22:10,283
So there's an order
of querying with a triple,
408
00:22:10,283 --> 00:22:13,006
and then with multiple triples,
and then with an optional triple,
409
00:22:13,006 --> 00:22:17,147
and then with a measurement
in a filter, and so on.
410
00:22:17,147 --> 00:22:20,618
And, yeah, I can share...
411
00:22:22,438 --> 00:22:24,338
Yeah, I'll share a separate set of slides
412
00:22:24,338 --> 00:22:25,421
for those exercises.
413
00:22:25,421 --> 00:22:27,379
(woman) Thank you so much
because I will take that
414
00:22:27,379 --> 00:22:29,783
and customize it for my own needs.
Thank you.
415
00:22:31,010 --> 00:22:33,095
(Dan) Okay. No questions?
416
00:22:34,953 --> 00:22:38,994
(man) What would you recommend
if you also want to teach editing,
417
00:22:38,994 --> 00:22:41,595
apart from just querying?
418
00:22:46,968 --> 00:22:53,476
I'm pleased to report
that people find Wikidata editing,
419
00:22:53,476 --> 00:22:56,632
when I demonstrate it, to be so simple,
420
00:22:56,632 --> 00:22:58,943
that it just takes them by surprise.
421
00:22:58,943 --> 00:23:01,568
It's Wikidata editing,
and I've got to add knowledge
422
00:23:01,568 --> 00:23:03,018
to this huge knowledge base.
423
00:23:03,018 --> 00:23:05,435
Sounds like something
that really technical people can do.
424
00:23:05,435 --> 00:23:08,524
And then you show it,
and they go, "Oh, right.
425
00:23:08,524 --> 00:23:11,096
Martin is instance of human."
426
00:23:13,296 --> 00:23:18,851
So I haven't done that systematically yet.
427
00:23:21,498 --> 00:23:26,007
I think a precondition would be
getting people thinking in triples,
428
00:23:26,007 --> 00:23:29,675
and maybe underline that
triples need references,
429
00:23:29,675 --> 00:23:34,237
and triples need qualifiers
and that multiple triples,
430
00:23:34,237 --> 00:23:37,442
triples have multiple conflicting values.
431
00:23:37,442 --> 00:23:39,949
So I'd still do the toy world,
432
00:23:39,949 --> 00:23:45,149
maybe a more professionally relevant
toy world, and translation exercise,
433
00:23:45,149 --> 00:23:48,222
but then go to, "So now the exercise
we're going to do with triples
434
00:23:48,222 --> 00:23:49,661
is adding them."
435
00:23:51,561 --> 00:23:54,522
There's a lot of work done,
and maybe Jason's done,
436
00:23:54,522 --> 00:23:58,402
with guessing a table of identifiers.
437
00:23:58,402 --> 00:23:59,581
So something I'd like to do,
438
00:23:59,581 --> 00:24:03,710
there's an online database
439
00:24:03,710 --> 00:24:06,710
of people who've won a Rhodes Scholarship.
440
00:24:06,710 --> 00:24:10,616
There's a scholarship to Oxford University
from other countries.
441
00:24:10,616 --> 00:24:12,221
But it's not in Wikidata yet.
442
00:24:12,221 --> 00:24:14,381
So you can kind of divide up
the room and say,
443
00:24:14,381 --> 00:24:16,595
"You're going to find
these people in Wikidata
444
00:24:16,595 --> 00:24:18,874
and your task is to add
445
00:24:18,874 --> 00:24:21,106
with the reference
to this online database."
446
00:24:21,106 --> 00:24:23,449
And then you can do a query
to see how many have been added
447
00:24:23,449 --> 00:24:25,545
in that session.
448
00:24:25,545 --> 00:24:28,246
So I think, with all the training I do,
449
00:24:28,246 --> 00:24:31,582
I think the comprehension
is more important
450
00:24:31,582 --> 00:24:33,554
than the taking action immediately.
451
00:24:33,554 --> 00:24:35,543
So when I'm training people on Wikipedia,
452
00:24:35,543 --> 00:24:39,514
I first show them article histories,
contribution records, talk page,
453
00:24:39,514 --> 00:24:44,800
quality scale, so they're comprehending
the process before they edit,
454
00:24:44,800 --> 00:24:47,439
and actually change something.
455
00:24:49,939 --> 00:24:52,636
(man) Not really a question but a comment.
456
00:24:52,636 --> 00:24:58,570
There is, for beginners,
a good tutorial on YouTube,
457
00:24:58,570 --> 00:25:01,423
How to Query and Start with SPARQL,
458
00:25:01,423 --> 00:25:04,421
and if you want to go deeper, also,
459
00:25:04,421 --> 00:25:08,521
How to Add Data with OpenRefine.
460
00:25:08,521 --> 00:25:12,621
And I've also made some videos
461
00:25:12,621 --> 00:25:15,121
and uploaded them in German language.
462
00:25:15,121 --> 00:25:16,916
Oh, great! Thanks.
463
00:25:17,894 --> 00:25:21,823
I should also mention Hilary Thorsen,
who's from Stanford Library,
464
00:25:21,823 --> 00:25:25,076
did, last week,
a really good video capture
465
00:25:25,076 --> 00:25:28,857
of adding a data set to Wikidata
with OpenRefine.
466
00:25:28,857 --> 00:25:33,529
This is for the LD4P, the Linked Data
for Production project,
467
00:25:33,529 --> 00:25:35,932
and that was a really good video tutorial
468
00:25:35,932 --> 00:25:38,392
I'd recommend to anybody for--
469
00:25:38,392 --> 00:25:42,426
That's the next couple of levels up
from what I'm doing.
470
00:25:43,189 --> 00:25:45,029
(Dan) Is there a last question?
471
00:25:49,486 --> 00:25:52,203
(man) So SPARQL's sort of SQL-ish.
472
00:25:52,203 --> 00:25:54,856
If someone walked into your tutorial
with an SQL background,
473
00:25:54,856 --> 00:25:57,291
is that a blessing or a curse?
474
00:25:57,291 --> 00:26:00,164
It's a bit of a curse
because I had to learn SQL,
475
00:26:00,164 --> 00:26:03,398
so I did the...
476
00:26:03,398 --> 00:26:09,498
generate the invoices
using SQL for your fictitious company,
477
00:26:09,498 --> 00:26:14,369
and definitely had to unlearn
an SQL way of thinking about things
478
00:26:14,369 --> 00:26:15,712
to get to SPARQL.
479
00:26:15,712 --> 00:26:17,638
But it was freeing, it was freeing.
480
00:26:17,638 --> 00:26:21,302
Databases without built-in schemas
are liberating.
481
00:26:22,102 --> 00:26:24,042
When you think about
how many columns there are,
482
00:26:24,042 --> 00:26:25,727
and it's this number
of columns for a book,
483
00:26:25,727 --> 00:26:27,638
and it's this number of columns
for the address,
484
00:26:27,638 --> 00:26:28,984
and it's just three columns.
485
00:26:28,984 --> 00:26:31,406
Well, three and a bit more.
486
00:26:31,406 --> 00:26:34,443
That's really liberating.
487
00:26:34,443 --> 00:26:36,814
So that's my point, I kind of glanced at,
488
00:26:36,814 --> 00:26:41,810
that people make different progress
in these workshops as in all training,
489
00:26:41,810 --> 00:26:43,869
but it's not like intelligent versus dumb,
490
00:26:43,869 --> 00:26:46,588
it's like the preconceptions
you're coming with,
491
00:26:46,588 --> 00:26:47,823
are more the obstacle.
492
00:26:47,823 --> 00:26:50,242
So it's actually more--
493
00:26:50,242 --> 00:26:55,655
I'm more optimistic about training people
who have never encountered databases,
494
00:26:55,655 --> 00:26:58,805
coding, or any of that before, than...
495
00:26:58,805 --> 00:27:02,232
The worst people to try and train
are linked data experts
496
00:27:02,232 --> 00:27:04,631
because they've used DBpedia a lot.
497
00:27:04,631 --> 00:27:07,180
They used a particular approach
of querying
498
00:27:07,180 --> 00:27:08,834
and expecting to get certain things,
499
00:27:08,834 --> 00:27:12,429
and it looks odd when Wikidata
does things differently.
500
00:27:12,429 --> 00:27:14,540
And they need to get with the program.
501
00:27:15,205 --> 00:27:17,867
(Dan) Okay, let's thank Martin
for his insights.
502
00:27:17,867 --> 00:27:18,884
Thanks very much.
503
00:27:18,884 --> 00:27:21,888
(audience applause)