WEBVTT 00:00:05.882 --> 00:00:07.218 (Dan) Hello everyone. 00:00:07.218 --> 00:00:09.911 So this session is about teaching SPARQL. 00:00:09.911 --> 00:00:12.423 The presenter is Martin Poulter, so I leave you the stage. 00:00:12.423 --> 00:00:13.668 Have fun. 00:00:13.668 --> 00:00:14.943 (Martin) Thank you very much. 00:00:16.501 --> 00:00:18.717 Hi, everybody. 00:00:18.717 --> 00:00:23.355 I trust you'll agree that Wikidata is great, 00:00:23.355 --> 00:00:27.171 it has lots of interesting data on different topics, 00:00:27.171 --> 00:00:31.225 the tools people make with it are fun to use and fun to explore, 00:00:31.225 --> 00:00:33.412 and easy to use. 00:00:33.412 --> 00:00:38.578 And maybe you'll agree with the suggestion that to get the best out of Wikidata 00:00:38.578 --> 00:00:40.142 you need to know SPARQL, 00:00:40.142 --> 00:00:42.040 you need to be able to phrase your own queries. 00:00:42.040 --> 00:00:45.141 So you might see that as a barrier, an obstacle, 00:00:45.141 --> 00:00:50.183 that we ideally need a big program of training for developers, 00:00:50.183 --> 00:00:54.008 for librarians, for curators, for ordinary people 00:00:54.008 --> 00:00:58.236 to get them literate in this language, and that's a big effort, 00:01:01.036 --> 00:01:04.031 an aspect of Wikidata outreach. 00:01:04.031 --> 00:01:06.238 My suggestion is to kind of turn that around, 00:01:06.238 --> 00:01:09.037 that Wikidata, especially the Query Service, 00:01:09.037 --> 00:01:11.673 because it's so helpful, because it's so full of good stuff, 00:01:11.673 --> 00:01:13.857 because it's so colorful, 00:01:13.857 --> 00:01:16.200 because it has so many visualization abilities, 00:01:16.200 --> 00:01:20.173 is the ideal platform for people to learn SPARQL, 00:01:20.173 --> 00:01:21.890 also to learn about databases, 00:01:21.890 --> 00:01:23.724 learn about knowledge representation, 00:01:23.724 --> 00:01:25.305 learn about data and computers. 00:01:25.305 --> 00:01:28.671 There's no necessity that someone's first encounter 00:01:28.671 --> 00:01:32.106 with data and computers, has to be a relational database system. 00:01:32.106 --> 00:01:33.947 So I'm going to put forward, 00:01:33.947 --> 00:01:36.539 I'm going to report on a training workshop 00:01:36.539 --> 00:01:40.330 I've delivered to library staff in University of Oxford, 00:01:40.330 --> 00:01:42.550 and I've also done as a public event, 00:01:42.550 --> 00:01:46.710 so just with members of the public coming to an open data week 00:01:46.710 --> 00:01:47.875 that university hosted. 00:01:47.875 --> 00:01:51.979 And also done some of this with researchers as well. 00:01:51.979 --> 00:01:57.441 So I teach in a way that is very particular to me, 00:01:57.441 --> 00:01:59.847 so it's not like I hand over materials to you. 00:01:59.847 --> 00:02:03.164 I'll show you my approach and then you'll take it up 00:02:03.164 --> 00:02:05.902 and improve on it, and make it personal to you 00:02:05.902 --> 00:02:08.469 and the audiences you're dealing with. 00:02:08.469 --> 00:02:10.253 And I want to avoid this. 00:02:10.253 --> 00:02:16.256 So in my career, I had to learn data technologies, and SQL, and XML, 00:02:16.256 --> 00:02:19.610 and the content of tutorials, 00:02:19.610 --> 00:02:23.400 or examples, is very much like this. 00:02:23.400 --> 00:02:26.330 I'm not objecting to the language-- because that's what you got to learn-- 00:02:26.330 --> 00:02:28.969 but employees, invoices. 00:02:28.969 --> 00:02:32.708 So your task might be you have a sales force 00:02:32.708 --> 00:02:36.913 and you've got to identify the person who sold the most items, 00:02:36.913 --> 00:02:38.369 and calculate their bonus 00:02:38.369 --> 00:02:41.541 and then issue the invoices to the customers, 00:02:41.541 --> 00:02:44.707 and it's the most boring-- I can't get excited about that, 00:02:44.707 --> 00:02:48.195 or I don't feel like I'm learning a topic. 00:02:48.195 --> 00:02:51.662 With Wikidata, we have so many topics we can engage people in, 00:02:51.665 --> 00:02:54.613 and it might be things in the solar system, 00:02:54.613 --> 00:02:56.591 or characters in Shakespeare, 00:02:56.591 --> 00:02:59.765 or things in the solar system named after characters in Shakespeare, 00:02:59.765 --> 00:03:01.897 which is what most of this is. 00:03:03.497 --> 00:03:05.739 So when you have a teaching approach, 00:03:05.739 --> 00:03:08.395 one question is what things do you leave out. 00:03:09.295 --> 00:03:15.271 So in the workshop I run, I don't explain what SPARQL stands for, 00:03:15.271 --> 00:03:18.193 that doesn't help you write SPARQL at all. 00:03:18.193 --> 00:03:20.591 It doesn't help to explain what RDF is. 00:03:20.591 --> 00:03:22.763 Obviously, it's historically really important, 00:03:22.763 --> 00:03:25.713 but telling people there's a format for describing resources 00:03:25.713 --> 00:03:27.630 that's called resource description format, 00:03:27.630 --> 00:03:30.966 and resource is whatever's described, it's not really a format. 00:03:30.966 --> 00:03:32.226 That doesn't help people, 00:03:32.226 --> 00:03:36.650 that gets people no closer to actually, practically, using this. 00:03:36.650 --> 00:03:40.639 Linked open data, LOD, I may mention. 00:03:40.639 --> 00:03:44.317 So the library museum professionals that come to my training 00:03:44.317 --> 00:03:46.830 have definitely heard about linked open data, 00:03:46.830 --> 00:03:50.697 and know that it's the future of their discipline, 00:03:50.697 --> 00:03:52.564 and it's going to revolutionize their work. 00:03:52.564 --> 00:03:54.879 But at the moment, they're not using that kind of system. 00:03:54.879 --> 00:03:58.404 So they've not seen a real practical example of that technology. 00:03:58.404 --> 00:04:00.206 So that's what they're going to get from this. 00:04:00.206 --> 00:04:01.895 So I might mention linked open data, 00:04:01.895 --> 00:04:03.971 but I don't get into the definition. 00:04:03.971 --> 00:04:06.404 I basically say, this is a service you can use for free. 00:04:06.404 --> 00:04:08.113 It's been given to you to use for free, 00:04:08.113 --> 00:04:10.675 and that gets the point across. 00:04:10.675 --> 00:04:14.925 Semantic identifiers and namespaces, 00:04:14.925 --> 00:04:16.518 I want to get across implicitly, 00:04:16.518 --> 00:04:18.294 I don't want to teach people these concepts, 00:04:18.294 --> 00:04:21.271 I want them to pick up the concepts even if I don't use the terms. 00:04:21.271 --> 00:04:26.536 Reification, so people already using a RDF database want to know 00:04:26.536 --> 00:04:31.432 does Wikidata have statement IDs, and I try to avoid that. 00:04:31.432 --> 00:04:33.855 I hardly even mention Wikidata. 00:04:33.855 --> 00:04:39.048 So these workshops are advertised as like Introduction to SPARQL, 00:04:39.048 --> 00:04:41.027 or for the public event one, it was 00:04:41.027 --> 00:04:45.097 Asking and Answering Questions with Open Data. 00:04:45.097 --> 00:04:47.826 And then in the blurb, I'd say we're going to be using this platform, 00:04:47.826 --> 00:04:50.268 And I'll introduce it and say, well, this is the best platform 00:04:50.268 --> 00:04:52.815 on which to learn this language, this skill. 00:04:52.815 --> 00:04:55.138 It's the most helpful, it's got the most interesting stuff. 00:04:55.138 --> 00:04:57.265 And then in the course of the workshop, 00:04:57.265 --> 00:04:58.969 maybe we'll get into more about Wikidata, 00:04:58.969 --> 00:05:02.351 why this exists, who put this data here. 00:05:02.351 --> 00:05:04.501 So there's a whole lot of background 00:05:04.501 --> 00:05:08.347 that kind of professional RDF or link data people will have, 00:05:08.347 --> 00:05:09.942 but you don't need. 00:05:09.942 --> 00:05:13.737 I just want to get people thinking about nodes and arcs, 00:05:13.737 --> 00:05:15.699 and thinking in triples, 00:05:15.699 --> 00:05:19.690 and imagining how a triple representation can be created and queried. 00:05:19.690 --> 00:05:22.897 I want them to phrase questions in their own language, 00:05:22.897 --> 00:05:27.252 and translate into SPARQL, via a kind of a baby talk intermediary. 00:05:27.252 --> 00:05:28.984 But I want them to think in triples 00:05:28.984 --> 00:05:34.740 and get used to asking questions in that way, and just to get to the point 00:05:34.740 --> 00:05:38.887 where they ask interesting questions relevant to their work, or their hobbies, 00:05:38.887 --> 00:05:42.395 or whatever, and they come away with something. 00:05:42.395 --> 00:05:44.107 So it's not the theoretical understanding 00:05:44.107 --> 00:05:46.835 that I'm getting in these quite short sessions. 00:05:46.835 --> 00:05:50.285 And the first thing I present them with is this, they've got to look at this. 00:05:50.285 --> 00:05:53.650 And there's a "what the hell?" reaction 00:05:53.650 --> 00:05:55.496 in the workshop and probably in the room now, 00:05:55.496 --> 00:05:59.361 because, "I thought this was about technology skills! 00:05:59.361 --> 00:06:01.512 Why have we got to look at a cute dog?" 00:06:01.512 --> 00:06:05.289 But this is to introduce my toy world. 00:06:05.289 --> 00:06:10.525 So there are three human beings. Two of them are a married couple. 00:06:10.525 --> 00:06:13.054 One is the child from that couple. 00:06:13.054 --> 00:06:16.678 There are two beings that are pets of this couple, 00:06:16.678 --> 00:06:19.119 and we've got the types of the pets. 00:06:19.119 --> 00:06:20.839 Clearly, this is not official data. 00:06:20.839 --> 00:06:23.922 This knowledge representation, which it is, 00:06:23.922 --> 00:06:26.854 only exists in this slide, it's not a database. 00:06:26.854 --> 00:06:28.780 So I'm getting people thinking of a toy world. 00:06:28.780 --> 00:06:30.512 And there's loads that can be learnt 00:06:30.512 --> 00:06:33.491 with just discussing this, and kind of role-playing about this. 00:06:33.491 --> 00:06:38.121 And you're going to make your own toy world. 00:06:40.721 --> 00:06:43.701 So a point to come from this is this isn't a representation 00:06:43.701 --> 00:06:47.102 of all of my family or of all my parent's pets. 00:06:47.102 --> 00:06:49.311 It's a tiny fragment. 00:06:49.311 --> 00:06:50.787 When we query things, 00:06:50.787 --> 00:06:53.261 we're querying a representation of the world, not the world. 00:06:53.261 --> 00:06:55.150 There's so much that's missed out. 00:06:56.150 --> 00:07:01.104 That's a really important first lesson to get about any database, any querying. 00:07:01.104 --> 00:07:06.281 So everything's expressed in triples, and nodes, and arcs. 00:07:06.281 --> 00:07:08.427 Arcs have a direction. 00:07:08.427 --> 00:07:09.529 How do the names work? 00:07:09.529 --> 00:07:12.507 So one of these nodes is marked Bob. 00:07:12.507 --> 00:07:17.207 Is that the name Bob, does that stand for the name Bob? 00:07:17.207 --> 00:07:20.624 Well, not quite, because other people use the name Bob. 00:07:20.624 --> 00:07:22.535 And Dan, you probably know a Bob. 00:07:22.535 --> 00:07:23.649 (Dan) Like Bob [inaudible]. 00:07:23.649 --> 00:07:25.247 Yeah, you know a Bob. 00:07:25.247 --> 00:07:28.617 And that's the Bob I think-- no, that isn't this Bob. 00:07:28.617 --> 00:07:29.642 So we talk about that. 00:07:29.642 --> 00:07:32.359 So names are relative to the system that they're in, 00:07:32.359 --> 00:07:36.327 and we could talk about Martin's Bob and Dan's Bob not being the same person. 00:07:36.327 --> 00:07:37.696 So it's not the names. 00:07:37.696 --> 00:07:39.878 So we could think of them as relative to a system. 00:07:39.878 --> 00:07:43.828 So we can even say Martin:Bob is the name for one thing, 00:07:43.828 --> 00:07:47.775 and Dan:Bob identifies another thing in another system. 00:07:49.375 --> 00:07:52.121 And I emphasize triples, so three things. 00:07:52.121 --> 00:07:57.754 You might be tempted to say, "Cindy and Bob, together, have a pet dog," 00:07:58.511 --> 00:08:03.995 but you can't do that in this system unless you have a node for the couple. 00:08:03.995 --> 00:08:07.350 Things have to have a direction. That may not make much sense. 00:08:07.350 --> 00:08:09.673 There's a married couple-- that doesn't have a direction, 00:08:09.673 --> 00:08:11.196 that's a relation between two people, 00:08:11.196 --> 00:08:14.014 but we are modeling it with things that have a direction 00:08:14.014 --> 00:08:17.464 so we have to have the two directions. 00:08:17.464 --> 00:08:18.962 There are arbitrary choices. 00:08:18.962 --> 00:08:24.206 So why have "Cindy has child, Martin, and not Martin has parent, Cindy?" 00:08:24.206 --> 00:08:25.598 It's an arbitrary choice. 00:08:25.598 --> 00:08:28.605 Arbitrary choices like that-- choices of name, choices of direction-- 00:08:28.605 --> 00:08:31.140 are built into this system and intrinsic. 00:08:31.140 --> 00:08:32.871 So there are arbitrary choices to be made, 00:08:32.871 --> 00:08:34.656 how to represent this, 00:08:34.656 --> 00:08:37.794 even the same facts could be represented in different ways. 00:08:37.794 --> 00:08:39.233 Who makes that decision? 00:08:39.233 --> 00:08:40.731 Well, whoever creates the system, 00:08:40.731 --> 00:08:45.069 whoever sets up the knowledge-based system. 00:08:45.069 --> 00:08:49.330 So people can see that this-- called serializable-- 00:08:49.330 --> 00:08:52.459 this could be expressed as triple statements. 00:08:52.459 --> 00:08:58.468 So, "Cindy has pet, Tilly, Martin is a human," 00:08:58.468 --> 00:09:02.393 and getting to the core insight 00:09:02.393 --> 00:09:06.970 is comparing how do we make a question in English? 00:09:06.970 --> 00:09:10.953 Well, we have a statement and it's incomplete, 00:09:10.953 --> 00:09:16.762 like, "Who has pet, Tilly?" 00:09:16.762 --> 00:09:21.585 So we go from "Cindy has pet Tilly," to "Who has pet Tilly?" 00:09:21.585 --> 00:09:23.316 We've taken something out, 00:09:23.316 --> 00:09:27.522 we've put in a placeholder, and we've introduced a question mark. 00:09:27.522 --> 00:09:30.080 I say that's just like what we do with SPARQL. 00:09:30.080 --> 00:09:33.053 We take something out, we have an incomplete statement, 00:09:33.053 --> 00:09:35.930 or incomplete statements, 00:09:35.930 --> 00:09:40.213 we put a placeholder in the missing place, and we have a question mark 00:09:40.213 --> 00:09:42.645 to mark that that's a placeholder. 00:09:42.645 --> 00:09:47.164 So it can be a role play where I'm the query service 00:09:47.164 --> 00:09:49.383 for this knowledge base. 00:09:49.383 --> 00:09:53.906 And so people can learn what a query service does 00:09:53.906 --> 00:09:56.969 by seeing a query service and role-playing 00:09:56.969 --> 00:09:59.709 and being a query service, which we'll get to. 00:10:00.909 --> 00:10:05.414 So people can see that working on the level of triples. 00:10:07.214 --> 00:10:09.371 "Who has pet, Tilly?" 00:10:09.371 --> 00:10:14.480 If you say that to me, and I can say, "results Cindy, Bob." 00:10:14.480 --> 00:10:17.774 Then I put it to the trainees, 00:10:17.774 --> 00:10:19.534 how do you ask more complicated questions? 00:10:19.534 --> 00:10:22.436 So, "Who has a dog as a pet?" 00:10:23.646 --> 00:10:28.701 And some will get it straightaway, some will say, "Oh, it's a triple-- 00:10:28.701 --> 00:10:33.075 Who? has pet dog?" 00:10:33.075 --> 00:10:38.103 So my role as the query service is to look at this and match your triple, 00:10:38.103 --> 00:10:39.385 "Who? has pet dog," 00:10:39.385 --> 00:10:41.522 so I got to find things that have pet dog, 00:10:41.522 --> 00:10:43.024 and results None. 00:10:43.024 --> 00:10:48.082 So this is the discussion-- what is this node I've called dog? 00:10:48.082 --> 00:10:49.231 It's not a dog. 00:10:49.231 --> 00:10:53.250 Although it's called dog, it's not a dog, it stands for a class. 00:10:53.250 --> 00:10:56.130 Obvious when you're a SPARQL user, but this is getting people 00:10:56.130 --> 00:10:59.054 over the threshold of thinking in this way. 00:10:59.054 --> 00:11:02.319 And you got to do what kinds of things have pets. 00:11:02.319 --> 00:11:05.258 People see that they can't do that in one triple, 00:11:05.258 --> 00:11:06.572 you got to do multiple triples, 00:11:06.572 --> 00:11:10.126 and those multiple triples ask for multiple things. 00:11:12.726 --> 00:11:16.588 So if you've got, "What kinds of things have pets?" 00:11:16.588 --> 00:11:18.861 then you're going to identify people, 00:11:18.861 --> 00:11:21.070 and then you've got to identify those types, 00:11:21.070 --> 00:11:24.362 and it naturally comes up, "How do I specify the columns I want? 00:11:24.362 --> 00:11:27.365 How do I specify that I want the types?" That's the question. 00:11:27.365 --> 00:11:29.838 And then you say, "You have these partial statements, 00:11:29.838 --> 00:11:34.643 and you enclose them in curly brackets and put Select." 00:11:37.943 --> 00:11:41.137 So this is kind of the first half hour of the workshop, 00:11:41.137 --> 00:11:44.162 and it's not on computers, it's all with role play 00:11:44.162 --> 00:11:45.743 and thinking about this. 00:11:45.743 --> 00:11:51.776 And I invite people in the workshop to make their own toy world, 00:11:51.776 --> 00:11:54.506 and you'll be going toy world, I hope, after this. 00:11:54.506 --> 00:11:59.702 So five minutes, eight to ten nodes to represent your family, your work place, 00:11:59.702 --> 00:12:02.351 the thing you're working on, the TV you were watching last night, 00:12:02.351 --> 00:12:05.166 and to have some meaningful links between them. 00:12:05.166 --> 00:12:08.688 And the lesson that-- you make arbitrary decisions, 00:12:08.688 --> 00:12:10.516 you name things, you create properties, 00:12:10.516 --> 00:12:17.228 but they're the creation of the person who sets up the knowledge system. 00:12:17.558 --> 00:12:24.394 And then, in pairs, they explain their graphs to each other, and query. 00:12:24.394 --> 00:12:28.166 So, "What's a query you could ask about this little world, 00:12:28.166 --> 00:12:29.570 and then what would be the answer?" 00:12:29.570 --> 00:12:33.730 So, like I say, people mostly get it, 00:12:33.730 --> 00:12:36.451 but people want a four- or five-part relation, 00:12:36.451 --> 00:12:38.088 so they might want to say, 00:12:38.088 --> 00:12:39.958 "This couple, together, have a pet." 00:12:39.958 --> 00:12:43.204 Or they might want to say, "Tilly is a pet, is a dog." 00:12:43.204 --> 00:12:47.207 And you can enforce nodes, triples, and triples have a direction. 00:12:48.307 --> 00:12:51.258 So I'll explain what a triple is and say also, not in this example, 00:12:51.258 --> 00:12:54.639 but, "Triples, generally, they have an item, they have a property, 00:12:54.639 --> 00:12:57.307 and then they have a number of other things 00:12:57.307 --> 00:12:59.516 which could be values, could be time periods, 00:12:59.516 --> 00:13:03.104 could be locations on a globe." 00:13:07.288 --> 00:13:11.235 So with that role-play exercise, we're 40 minutes into a 2-hour workshop, 00:13:11.235 --> 00:13:14.270 and in a computer room, and we haven't touched computers yet. 00:13:14.270 --> 00:13:17.387 But I think it's useful to get people thinking in that way, 00:13:17.387 --> 00:13:19.535 and to think about how they would make the model 00:13:19.535 --> 00:13:23.793 and what the query is, and to actually translate, 00:13:23.793 --> 00:13:25.149 so your translation exercise. 00:13:26.339 --> 00:13:32.597 And then I'd direct people to query.wikidata.org. 00:13:34.197 --> 00:13:36.240 So there's a bunch of things they've got to take on. 00:13:36.240 --> 00:13:40.086 We've been doing-- I will have a flip chart, and we will-- 00:13:40.086 --> 00:13:41.539 Is that six? 00:13:41.539 --> 00:13:43.290 Six minutes elapsed? 00:13:43.290 --> 00:13:45.278 (man) [inaudible] 00:13:45.278 --> 00:13:46.318 Right. 00:13:50.548 --> 00:13:52.485 So I'll give them a task. 00:13:52.485 --> 00:13:55.679 I don't want them to learn Q numbers and P numbers. 00:13:55.679 --> 00:14:00.646 So I'll tell them what the names are and show them the Ctrl+Shift trick. 00:14:00.646 --> 00:14:01.894 But there's a lot to take on, 00:14:01.894 --> 00:14:04.210 so they're taking on Q numbers and P numbers, 00:14:04.210 --> 00:14:08.240 they've seen the triple format, and they've seen Select, 00:14:08.240 --> 00:14:11.338 but they've got to apply this all in one go. 00:14:11.338 --> 00:14:14.538 So I'll give people a task. 00:14:14.538 --> 00:14:17.299 Some will get it immediately, some will struggle 00:14:17.299 --> 00:14:18.896 because they missed a bit of discussion, 00:14:18.896 --> 00:14:22.866 or more often, because they're familiar with another kind of database system, 00:14:22.866 --> 00:14:25.490 and they have particular expectations from that. 00:14:26.890 --> 00:14:30.656 So I set bonus things or more complicated things 00:14:30.656 --> 00:14:31.874 if people are getting bored. 00:14:31.874 --> 00:14:37.828 Or I say, "If you get bored and you work on an entirely different question, 00:14:37.828 --> 00:14:40.058 that's fine, but show me." 00:14:40.058 --> 00:14:42.254 So I'll run through this in front of them, 00:14:42.254 --> 00:14:45.617 tell them to do it, just show the hints of what properties they'll be using, 00:14:45.617 --> 00:14:46.979 and then run through it again. 00:14:46.979 --> 00:14:50.277 And then, go through the cycle of adding on extra things 00:14:50.277 --> 00:14:51.280 to enhance the query. 00:14:51.280 --> 00:14:53.084 So we might have done a query and I'll say, 00:14:53.084 --> 00:14:55.522 "Here's how you add on an optional property." 00:14:57.822 --> 00:15:01.046 And then give them a task involving optional property. 00:15:01.046 --> 00:15:04.518 In the Bodleian, I say, "Find manuscripts in Latin 00:15:04.518 --> 00:15:06.326 for a public event at University of Bristol, 00:15:06.326 --> 00:15:09.255 where there's lots of celebrities who study at the University of Bristol, 00:15:09.255 --> 00:15:14.113 so get that as an example." 00:15:14.113 --> 00:15:15.933 So going to the interface, 00:15:15.933 --> 00:15:20.949 there's still a hump in the learning curve 00:15:20.949 --> 00:15:24.199 because they've got to put the query into action, 00:15:24.199 --> 00:15:25.752 they've got to think in this language, 00:15:25.752 --> 00:15:29.879 and they've got to look up Q numbers and P numbers, 00:15:29.879 --> 00:15:32.246 and then there's all the things they can do with the query, 00:15:32.246 --> 00:15:33.283 once they've done it. 00:15:33.283 --> 00:15:37.627 And the visualization options, the bookmarking, getting the data. 00:15:43.881 --> 00:15:45.635 So I'll suggest refinements. 00:15:45.635 --> 00:15:50.264 So we can take a succession of steps of getting people doing a query, 00:15:50.264 --> 00:15:53.215 and taking it up to the next level. 00:15:53.215 --> 00:15:56.069 Like, "Find landscape paintings taller than they are wide." 00:15:56.069 --> 00:16:02.658 So within the two-hour thing, we get people doing basic queries, 00:16:02.658 --> 00:16:07.803 adding refinements onto them, 00:16:07.803 --> 00:16:11.164 not doing much filtering, 00:16:11.164 --> 00:16:13.893 but starting to introduce measurements, 00:16:13.893 --> 00:16:14.982 and so on. 00:16:14.982 --> 00:16:17.782 Not getting into qualifiers or another level. 00:16:17.782 --> 00:16:20.816 If it's a whole day thing, you probably could. 00:16:20.816 --> 00:16:25.526 It comes up, inevitably, "Where else can I use the SPARQL language?" 00:16:25.526 --> 00:16:29.581 And I observe that that is a question, and questions can be framed in SPARQL, 00:16:29.581 --> 00:16:31.671 and put to Wikidata, and you'll get answers, 00:16:31.671 --> 00:16:34.444 and there is a Wikidata property called SPARQL endpoint. 00:16:34.444 --> 00:16:36.888 So when they ask that, that becomes their task. 00:16:36.888 --> 00:16:38.809 And then they get that list of institutions 00:16:38.809 --> 00:16:40.369 that have SPARQL endpoints. 00:16:42.499 --> 00:16:43.877 And it's worth pointing out, 00:16:43.877 --> 00:16:48.647 so in an introductory session on other computer languages, 00:16:48.647 --> 00:16:52.065 people will typically learn how to do loops, 00:16:52.065 --> 00:16:55.477 how to do functions, how to do conditionals. 00:16:55.477 --> 00:16:56.803 They'll learn the basic grammar 00:16:56.803 --> 00:16:59.735 but they won't make something fantastic and useful, 00:16:59.735 --> 00:17:01.663 they'll just learn the basic grammar. 00:17:01.663 --> 00:17:06.458 But in an introductory session on Wikidata SPARQL you can make-- 00:17:06.458 --> 00:17:08.142 if you're interested in German literature-- 00:17:08.142 --> 00:17:10.333 a map of the birthplace of German poets, and so on. 00:17:10.333 --> 00:17:12.097 And so we get feedback like this. 00:17:12.097 --> 00:17:14.196 This is how great the Wikidata Query Service is 00:17:14.196 --> 00:17:16.266 as an educational tool. 00:17:16.266 --> 00:17:19.298 "What is this sorcery?" Isn't even from someone in the room. 00:17:19.298 --> 00:17:21.226 A trainee in the room made a map, 00:17:21.226 --> 00:17:24.702 emailed it to her colleagues and got back, "What is this sorcery!? 00:17:24.702 --> 00:17:25.703 How have you made this?" 00:17:25.703 --> 00:17:29.428 And was just not expecting this to happen. 00:17:29.428 --> 00:17:32.271 People are not expecting to look at the picture of the cute dog, 00:17:32.271 --> 00:17:36.243 they're not expecting to do the role play where they represent their family 00:17:36.243 --> 00:17:37.865 and query each other. 00:17:37.865 --> 00:17:40.210 They're not expecting to actually make something concrete 00:17:40.210 --> 00:17:42.587 which they take away as a link and show to their colleagues. 00:17:42.587 --> 00:17:45.010 And all of this, being unexpected, 00:17:45.010 --> 00:17:47.092 makes it memorable and makes them want to go away 00:17:47.092 --> 00:17:48.527 and talk to other people about it. 00:17:48.527 --> 00:17:51.399 It's not like your run-of-the-mill IT training. 00:17:52.699 --> 00:17:58.020 The lower quote is from a researcher who saw how he could make a map 00:17:58.020 --> 00:18:00.761 of famous people with his first name 00:18:00.761 --> 00:18:04.421 and another one of famous people with his wife's first name. 00:18:04.421 --> 00:18:07.819 And then he just had more and more ideas of things and charts, and so on, 00:18:07.819 --> 00:18:09.469 he's going to create with Wikidata, 00:18:09.469 --> 00:18:10.967 and so he's glad to say, 00:18:10.967 --> 00:18:13.297 "You've destroyed my productivity for the next month." 00:18:15.805 --> 00:18:17.601 So that's my recommendation. 00:18:17.601 --> 00:18:19.702 I think we can take it as a positive, 00:18:19.702 --> 00:18:22.985 and we take beyond training people about Wikidata, 00:18:22.985 --> 00:18:24.671 training people about data. 00:18:24.671 --> 00:18:26.716 The stuff that came up in the keynote this morning, 00:18:26.716 --> 00:18:32.468 making people literate about ideas of representation 00:18:32.468 --> 00:18:36.568 and starting people off and being involved in that discussion, 00:18:36.568 --> 00:18:37.722 involves this [inaudible]. 00:18:37.722 --> 00:18:38.816 So this could be done-- 00:18:38.816 --> 00:18:40.822 doesn't have to be like a workplace training thing, 00:18:40.822 --> 00:18:42.134 it could be a public event, 00:18:42.134 --> 00:18:45.250 to get people familiar with these technologies. 00:18:46.150 --> 00:18:48.302 But I will stop there for discussion. 00:18:48.302 --> 00:18:51.150 And like I say, it's respectfully submitted to people in the room 00:18:51.150 --> 00:18:55.280 who do SPARQL training a different way, but I hope this is useful to you. 00:18:57.180 --> 00:19:00.184 (audience applause) 00:19:12.915 --> 00:19:15.721 (Dan) Okay, are there any questions? 00:19:23.511 --> 00:19:26.605 (man) Hi, it's [Mohammed Hijah] from Palestine. 00:19:26.605 --> 00:19:28.420 Thank you for the session. 00:19:28.420 --> 00:19:30.921 I was wondering if there are resources 00:19:30.921 --> 00:19:35.131 that we can get to learn SPARQL language professionally? 00:19:37.899 --> 00:19:40.213 I've got the SPARQL book, the O'Reilly book. 00:19:40.213 --> 00:19:43.413 I find the Wikibook on SPARQL 00:19:43.413 --> 00:19:44.987 is really, really useful. 00:19:44.987 --> 00:19:48.387 That's like the most useful and accessible reference. 00:19:49.287 --> 00:19:54.570 The tutorials on Wikidata itself are going to vary in quality. 00:19:55.170 --> 00:19:57.694 (Mohammed) I think that they are for beginners. 00:19:57.694 --> 00:20:01.240 I can handle with SPARQL but in the beginner level, 00:20:01.240 --> 00:20:04.343 but I want to deal with it professionally. 00:20:10.864 --> 00:20:13.609 So my concern is to get as many people as possible 00:20:13.609 --> 00:20:16.292 across the threshold into being aware of how this works, 00:20:16.292 --> 00:20:17.925 and dabbling. 00:20:19.225 --> 00:20:24.920 I'd like it to be a deeper course by going into more of the... 00:20:26.220 --> 00:20:29.120 how it works-- qualifiers and references, and so on. 00:20:29.120 --> 00:20:31.809 Where in a professional context, you're probably aiming towards 00:20:31.809 --> 00:20:35.923 people using a particular SPARQL endpoint, 00:20:35.923 --> 00:20:39.123 and Wikidata has some customizations 00:20:39.123 --> 00:20:41.636 We've discussed in Twitter that there's some things we use 00:20:41.636 --> 00:20:43.548 that actually aren't a SPARQL standard. 00:20:43.548 --> 00:20:46.130 They're like an optimization. 00:20:46.130 --> 00:20:48.816 So in the professional context, 00:20:50.516 --> 00:20:56.190 I'd hope it would be tailored to that particular data set and endpoint, 00:20:56.190 --> 00:20:59.575 but there's not a demand for that yet, 00:20:59.575 --> 00:21:03.459 because like I said, I deal with people who are aware of linked open data, 00:21:03.459 --> 00:21:07.558 and the word out, it's a good thing, but haven't seen an example yet, 00:21:07.558 --> 00:21:09.446 haven't an example they can apply to their work, 00:21:09.446 --> 00:21:11.693 they're not enthusiastic about it yet. 00:21:11.693 --> 00:21:13.843 So I think we want to get my whole workplace 00:21:13.843 --> 00:21:17.726 and other workplaces and developers across that threshold 00:21:17.726 --> 00:21:21.998 to where they're demanding that kind of really in deep, 00:21:21.998 --> 00:21:25.333 like using endpoint in a library kind of training. 00:21:26.082 --> 00:21:27.376 (Mohammed) Thank you. 00:21:31.883 --> 00:21:34.892 (woman) It's just a question. I really liked that, thank you so much. 00:21:34.892 --> 00:21:37.819 Is it documented step-by-step anywhere? 00:21:39.194 --> 00:21:43.043 I can share my succession of tasks. 00:21:43.843 --> 00:21:47.100 That's very much tailored to where I'm presenting it. 00:21:47.100 --> 00:21:50.697 Like I said, with librarians, I start with manuscripts and go on. 00:21:53.697 --> 00:21:56.393 You want to end up with people asking a question 00:21:56.393 --> 00:22:00.764 which is the question they came, in their heads, to the event with. 00:22:04.764 --> 00:22:10.283 So there's an order of querying with a triple, 00:22:10.283 --> 00:22:13.006 and then with multiple triples, and then with an optional triple, 00:22:13.006 --> 00:22:17.147 and then with a measurement in a filter, and so on. 00:22:17.147 --> 00:22:20.618 And, yeah, I can share... 00:22:22.438 --> 00:22:24.338 Yeah, I'll share a separate set of slides 00:22:24.338 --> 00:22:25.421 for those exercises. 00:22:25.421 --> 00:22:27.379 (woman) Thank you so much because I will take that 00:22:27.379 --> 00:22:29.783 and customize it for my own needs. Thank you. 00:22:31.010 --> 00:22:33.095 (Dan) Okay. No questions? 00:22:34.953 --> 00:22:38.994 (man) What would you recommend if you also want to teach editing, 00:22:38.994 --> 00:22:41.595 apart from just querying? 00:22:46.968 --> 00:22:53.476 I'm pleased to report that people find Wikidata editing, 00:22:53.476 --> 00:22:56.632 when I demonstrate it, to be so simple, 00:22:56.632 --> 00:22:58.943 that it just takes them by surprise. 00:22:58.943 --> 00:23:01.568 It's Wikidata editing, and I've got to add knowledge 00:23:01.568 --> 00:23:03.018 to this huge knowledge base. 00:23:03.018 --> 00:23:05.435 Sounds like something that really technical people can do. 00:23:05.435 --> 00:23:08.524 And then you show it, and they go, "Oh, right. 00:23:08.524 --> 00:23:11.096 Martin is instance of human." 00:23:13.296 --> 00:23:18.851 So I haven't done that systematically yet. 00:23:21.498 --> 00:23:26.007 I think a precondition would be getting people thinking in triples, 00:23:26.007 --> 00:23:29.675 and maybe underline that triples need references, 00:23:29.675 --> 00:23:34.237 and triples need qualifiers and that multiple triples, 00:23:34.237 --> 00:23:37.442 triples have multiple conflicting values. 00:23:37.442 --> 00:23:39.949 So I'd still do the toy world, 00:23:39.949 --> 00:23:45.149 maybe a more professionally relevant toy world, and translation exercise, 00:23:45.149 --> 00:23:48.222 but then go to, "So now the exercise we're going to do with triples 00:23:48.222 --> 00:23:49.661 is adding them." 00:23:51.561 --> 00:23:54.522 There's a lot of work done, and maybe Jason's done, 00:23:54.522 --> 00:23:58.402 with guessing a table of identifiers. 00:23:58.402 --> 00:23:59.581 So something I'd like to do, 00:23:59.581 --> 00:24:03.710 there's an online database 00:24:03.710 --> 00:24:06.710 of people who've won a Rhodes Scholarship. 00:24:06.710 --> 00:24:10.616 There's a scholarship to Oxford University from other countries. 00:24:10.616 --> 00:24:12.221 But it's not in Wikidata yet. 00:24:12.221 --> 00:24:14.381 So you can kind of divide up the room and say, 00:24:14.381 --> 00:24:16.595 "You're going to find these people in Wikidata 00:24:16.595 --> 00:24:18.874 and your task is to add 00:24:18.874 --> 00:24:21.106 with the reference to this online database." 00:24:21.106 --> 00:24:23.449 And then you can do a query to see how many have been added 00:24:23.449 --> 00:24:25.545 in that session. 00:24:25.545 --> 00:24:28.246 So I think, with all the training I do, 00:24:28.246 --> 00:24:31.582 I think the comprehension is more important 00:24:31.582 --> 00:24:33.554 than the taking action immediately. 00:24:33.554 --> 00:24:35.543 So when I'm training people on Wikipedia, 00:24:35.543 --> 00:24:39.514 I first show them article histories, contribution records, talk page, 00:24:39.514 --> 00:24:44.800 quality scale, so they're comprehending the process before they edit, 00:24:44.800 --> 00:24:47.439 and actually change something. 00:24:49.939 --> 00:24:52.636 (man) Not really a question but a comment. 00:24:52.636 --> 00:24:58.570 There is, for beginners, a good tutorial on YouTube, 00:24:58.570 --> 00:25:01.423 How to Query and Start with SPARQL, 00:25:01.423 --> 00:25:04.421 and if you want to go deeper, also, 00:25:04.421 --> 00:25:08.521 How to Add Data with OpenRefine. 00:25:08.521 --> 00:25:12.621 And I've also made some videos 00:25:12.621 --> 00:25:15.121 and uploaded them in German language. 00:25:15.121 --> 00:25:16.916 Oh, great! Thanks. 00:25:17.894 --> 00:25:21.823 I should also mention Hilary Thorsen, who's from Stanford Library, 00:25:21.823 --> 00:25:25.076 did, last week, a really good video capture 00:25:25.076 --> 00:25:28.857 of adding a data set to Wikidata with OpenRefine. 00:25:28.857 --> 00:25:33.529 This is for the LD4P, the Linked Data for Production project, 00:25:33.529 --> 00:25:35.932 and that was a really good video tutorial 00:25:35.932 --> 00:25:38.392 I'd recommend to anybody for-- 00:25:38.392 --> 00:25:42.426 That's the next couple of levels up from what I'm doing. 00:25:43.189 --> 00:25:45.029 (Dan) Is there a last question? 00:25:49.486 --> 00:25:52.203 (man) So SPARQL's sort of SQL-ish. 00:25:52.203 --> 00:25:54.856 If someone walked into your tutorial with an SQL background, 00:25:54.856 --> 00:25:57.291 is that a blessing or a curse? 00:25:57.291 --> 00:26:00.164 It's a bit of a curse because I had to learn SQL, 00:26:00.164 --> 00:26:03.398 so I did the... 00:26:03.398 --> 00:26:09.498 generate the invoices using SQL for your fictitious company, 00:26:09.498 --> 00:26:14.369 and definitely had to unlearn an SQL way of thinking about things 00:26:14.369 --> 00:26:15.712 to get to SPARQL. 00:26:15.712 --> 00:26:17.638 But it was freeing, it was freeing. 00:26:17.638 --> 00:26:21.302 Databases without built-in schemas are liberating. 00:26:22.102 --> 00:26:24.042 When you think about how many columns there are, 00:26:24.042 --> 00:26:25.727 and it's this number of columns for a book, 00:26:25.727 --> 00:26:27.638 and it's this number of columns for the address, 00:26:27.638 --> 00:26:28.984 and it's just three columns. 00:26:28.984 --> 00:26:31.406 Well, three and a bit more. 00:26:31.406 --> 00:26:34.443 That's really liberating. 00:26:34.443 --> 00:26:36.814 So that's my point, I kind of glanced at, 00:26:36.814 --> 00:26:41.810 that people make different progress in these workshops as in all training, 00:26:41.810 --> 00:26:43.869 but it's not like intelligent versus dumb, 00:26:43.869 --> 00:26:46.588 it's like the preconceptions you're coming with, 00:26:46.588 --> 00:26:47.823 are more the obstacle. 00:26:47.823 --> 00:26:50.242 So it's actually more-- 00:26:50.242 --> 00:26:55.655 I'm more optimistic about training people who have never encountered databases, 00:26:55.655 --> 00:26:58.805 coding, or any of that before, than... 00:26:58.805 --> 00:27:02.232 The worst people to try and train are linked data experts 00:27:02.232 --> 00:27:04.631 because they've used DBpedia a lot. 00:27:04.631 --> 00:27:07.180 They used a particular approach of querying 00:27:07.180 --> 00:27:08.834 and expecting to get certain things, 00:27:08.834 --> 00:27:12.429 and it looks odd when Wikidata does things differently. 00:27:12.429 --> 00:27:14.540 And they need to get with the program. 00:27:15.205 --> 00:27:17.867 (Dan) Okay, let's thank Martin for his insights. 00:27:17.867 --> 00:27:18.884 Thanks very much. 00:27:18.884 --> 00:27:21.888 (audience applause)