WEBVTT 00:00:26.335 --> 00:00:27.831 I'm here to tell you 00:00:28.021 --> 00:00:31.441 why I don't tell the truth about castles. 00:00:33.211 --> 00:00:34.913 You might think it's my job. 00:00:34.913 --> 00:00:36.664 After all, we expect professionals 00:00:36.664 --> 00:00:39.890 to speak with authority and give us clear-cut solutions, 00:00:40.110 --> 00:00:42.110 and that makes us very, very nervous 00:00:42.110 --> 00:00:45.476 because there's so much we simply don't know about history. 00:00:45.926 --> 00:00:46.932 And as a result, 00:00:46.932 --> 00:00:51.055 a lot of things have become established in our collective memory as the truth 00:00:51.055 --> 00:00:53.342 simply because someone said it once, 00:00:53.342 --> 00:00:55.101 it sounded convincing, 00:00:55.101 --> 00:00:56.866 and nobody since has stood up to say, 00:00:56.866 --> 00:00:59.627 "Well, we don't know exactly what it was like, 00:00:59.627 --> 00:01:01.404 but it wasn't like that." 00:01:03.124 --> 00:01:05.169 Take Greek temples. 00:01:05.169 --> 00:01:08.664 Everyone knows they're made of beautiful shining white marble. 00:01:08.664 --> 00:01:12.383 We've seen them that way for centuries, from postcards to museums, 00:01:12.863 --> 00:01:16.060 and that establishes certain seeing habits in our heads, 00:01:16.650 --> 00:01:21.724 where we've seen them this way so much that anything different just looks wrong. 00:01:22.094 --> 00:01:24.982 And yet today we know for a fact 00:01:24.982 --> 00:01:27.342 that they were painted in bright garish colors; 00:01:27.342 --> 00:01:30.049 we're just a little unclear on some of the details. 00:01:32.059 --> 00:01:35.856 I've colored this one in myself in about five minutes of research, 00:01:35.856 --> 00:01:38.771 so it's likely to be wrong in all the relevant places, 00:01:38.791 --> 00:01:41.772 and it's still more correct than the white one. 00:01:42.722 --> 00:01:45.821 So why do we continue to show them in white? 00:01:45.821 --> 00:01:47.497 Well, there's two reasons for that: 00:01:47.497 --> 00:01:50.907 One is that we as humans like certainty. 00:01:51.187 --> 00:01:55.070 And so we would prefer to be absolutely certain 00:01:55.070 --> 00:01:58.069 even if it's the absolute certainty that we are absolutely wrong 00:01:58.069 --> 00:01:59.069 (Laughter) 00:01:59.069 --> 00:02:01.118 than to say, "Well, 00:02:01.118 --> 00:02:04.033 maybe it could have been approximately, 00:02:04.033 --> 00:02:05.974 I think, something like …" 00:02:05.974 --> 00:02:07.170 And then there's the fact 00:02:07.170 --> 00:02:10.466 that when we're trying to establish a new truth in people's heads, 00:02:10.466 --> 00:02:13.228 we want it to be the correct truth this time. 00:02:13.988 --> 00:02:16.607 But even if we're not entirely clear on all the details, 00:02:16.607 --> 00:02:18.754 that doesn't mean we can't make a statement. 00:02:19.384 --> 00:02:21.416 If you ask me right now what time it is, 00:02:21.416 --> 00:02:22.878 I can't tell you, 00:02:23.248 --> 00:02:27.525 but I don't have to shrug my shoulders and just say, "I have no idea." 00:02:28.045 --> 00:02:31.078 I know this event is on from 12:00 till 6:00, 00:02:31.078 --> 00:02:33.394 so that eliminates half the clock right there. 00:02:34.394 --> 00:02:37.368 We've had our first coffee break, we've not had the second, 00:02:37.368 --> 00:02:39.260 so it's between 2:00 and 5:00. 00:02:40.070 --> 00:02:44.710 I know there were people ahead of me, and I'm not being told I'm out of time, 00:02:45.190 --> 00:02:47.762 so it must be around 4:30. 00:02:49.362 --> 00:02:51.039 Is that correct? 00:02:51.039 --> 00:02:51.988 I don't know. 00:02:51.988 --> 00:02:55.420 It might not be the truth, but I don't have to tell you the truth. 00:02:55.420 --> 00:02:58.317 I just have to know how correct I'm likely to be 00:02:58.587 --> 00:03:01.941 because how correct I am can be very, very important. 00:03:02.210 --> 00:03:04.587 Me telling you it's about 4:30 is pretty useless 00:03:04.587 --> 00:03:07.249 if you want to know whether you can still catch your bus; 00:03:07.249 --> 00:03:10.018 and in that case, we might have to ask more people, 00:03:10.018 --> 00:03:12.541 we might have to fill it in with more clues and so on; 00:03:12.541 --> 00:03:13.767 and that's science. 00:03:13.767 --> 00:03:15.217 We ask a question, 00:03:15.217 --> 00:03:18.744 and then we fill in the unknown and get more and more precise. 00:03:19.134 --> 00:03:21.940 So the scientific method is pretty well-established: 00:03:22.090 --> 00:03:24.563 You ask a question about the world around you, 00:03:24.563 --> 00:03:27.147 you research what you already know about it, 00:03:27.147 --> 00:03:31.228 you design an experiment to test what you don't know about it, 00:03:31.438 --> 00:03:34.571 you gather the data, you analyze them, and you reach a conclusion; 00:03:34.761 --> 00:03:37.241 and that conclusion could be, "I need more data." 00:03:37.411 --> 00:03:39.694 Then you go back, design another experiment, 00:03:39.694 --> 00:03:41.459 run it again, gather more data, 00:03:41.459 --> 00:03:43.810 and you get more data and more data and more data, 00:03:43.810 --> 00:03:45.482 and suddenly you're buried in data, 00:03:45.482 --> 00:03:47.224 and you're dealing with big data, 00:03:47.224 --> 00:03:49.423 where scientists now have this problem 00:03:49.423 --> 00:03:51.541 that there's so many data 00:03:51.541 --> 00:03:54.056 they can never read them all in one lifetime. 00:03:54.056 --> 00:03:56.230 They have to find new ways to deal with that. 00:03:56.890 --> 00:03:58.241 And then there's me. 00:03:58.971 --> 00:04:00.290 This is me. 00:04:00.290 --> 00:04:03.353 You can tell I'm not the kind of scientist with a lab coat, 00:04:03.353 --> 00:04:06.337 and my data problem is slightly different. 00:04:06.557 --> 00:04:10.120 Basically I'm dealing with one student's lab report 00:04:10.120 --> 00:04:11.940 that they dropped on the floor, 00:04:11.940 --> 00:04:14.163 lost half the pages and then shuffled the rest, 00:04:14.163 --> 00:04:16.854 and there's probably a coffee stain on the relevant bit. 00:04:16.934 --> 00:04:19.900 So what I've got is I've got half a broken castle, 00:04:20.483 --> 00:04:22.143 slightly burned, 00:04:23.066 --> 00:04:26.968 I've got a legal contract from 1388 that was written by a guy 00:04:26.968 --> 00:04:30.872 who managed to spell the name "Arnold" four different ways in three pages, 00:04:31.672 --> 00:04:33.387 I've got some rocks from the village 00:04:33.387 --> 00:04:36.433 that may or may not have belonged to this castle, 00:04:37.413 --> 00:04:39.278 I have got a map that was done by a guy 00:04:39.278 --> 00:04:42.775 for whom this was a 10-minute squiggle in an eighth-year campaign, 00:04:43.345 --> 00:04:45.689 a painting that was drawn 00:04:45.689 --> 00:04:49.026 about 300 years after the castle burnt down, 00:04:49.356 --> 00:04:51.822 and a book that was probably propaganda. 00:04:52.252 --> 00:04:54.402 And of course I could go look in the archives, 00:04:54.402 --> 00:04:57.261 I can get another archaeological excavation going, and so on, 00:04:57.261 --> 00:05:00.213 but at some point, there's simply no way to gather more data. 00:05:00.213 --> 00:05:02.507 And then you expect me to take that 00:05:02.507 --> 00:05:05.864 and mash it all up into the truth about castles? 00:05:09.304 --> 00:05:11.607 You want a reconstruction that's so realistic 00:05:11.607 --> 00:05:13.232 it feels like you're really there, 00:05:13.232 --> 00:05:16.826 like every little pebble in the courtyard is just right. 00:05:17.686 --> 00:05:19.969 There's a reason that a lot of sites and museums 00:05:19.969 --> 00:05:22.214 don't use the word "reconstruction"; 00:05:22.214 --> 00:05:23.764 instead, you find a picture, 00:05:23.764 --> 00:05:27.534 and next to it, it has the disclaimer "Artist's impression." 00:05:28.324 --> 00:05:30.657 And that doesn't mean they didn't do any research; 00:05:30.657 --> 00:05:33.276 it just means they didn't document what they researched. 00:05:33.276 --> 00:05:35.895 We don't know who they talked to, which books they read, 00:05:35.895 --> 00:05:37.648 which conclusions they drew, 00:05:37.648 --> 00:05:40.330 and which other theories they discarded. 00:05:40.330 --> 00:05:42.046 Now, imagine for a moment 00:05:42.046 --> 00:05:44.990 that we would treat a text the same way. 00:05:44.990 --> 00:05:46.092 You go into the museum. 00:05:46.092 --> 00:05:49.602 There's a plaque, and it says, "Author's impression." 00:05:49.602 --> 00:05:52.528 The author thinks there might have been a castle here. 00:05:52.738 --> 00:05:55.161 You wouldn't take that very seriously. 00:05:55.161 --> 00:05:59.127 So why do we treat text so differently from models? 00:05:59.127 --> 00:06:03.742 It's because we've come to a consensus on what makes a scientific text, 00:06:03.742 --> 00:06:05.636 and it's quite simply this. 00:06:05.956 --> 00:06:08.110 When you're writing a scientific document, 00:06:08.110 --> 00:06:09.937 you put in footnotes, 00:06:09.937 --> 00:06:12.482 you cite works by previous scholars, 00:06:12.507 --> 00:06:14.405 you show your argumentation - 00:06:15.675 --> 00:06:18.329 you simply give your document provenance - 00:06:19.139 --> 00:06:21.783 because showing you a picture of the truth 00:06:22.453 --> 00:06:25.773 isn't going to help you without me explaining why it's true. 00:06:26.423 --> 00:06:30.088 The truth is, all of these are correct at the same time. 00:06:30.668 --> 00:06:33.295 That's the truth, but it's not a very helpful truth, 00:06:34.585 --> 00:06:38.560 because without context, data are not information. 00:06:38.560 --> 00:06:40.669 So I'll give you a little context. 00:06:43.707 --> 00:06:45.305 So for a little context, 00:06:45.305 --> 00:06:48.497 this first clock shows the time in Luxembourg, 00:06:48.497 --> 00:06:50.766 and the second one has the time in Tokyo, 00:06:51.116 --> 00:06:53.215 the third one is one of those annoying clocks 00:06:53.215 --> 00:06:55.501 everyone had in their kitchens about 10 years ago 00:06:55.501 --> 00:06:57.164 that actually run counterclockwise, 00:06:57.164 --> 00:06:59.644 and the fourth one is not a clock, it's a barometer - 00:06:59.644 --> 00:07:01.777 you just wouldn't know that by looking at it. 00:07:03.577 --> 00:07:04.776 So in historic research, 00:07:04.776 --> 00:07:07.424 when we deal with images, we know what to do: 00:07:07.424 --> 00:07:10.977 We give those provenance through metadata and paradata. 00:07:11.457 --> 00:07:13.487 Metadata you've probably heard. 00:07:13.507 --> 00:07:16.044 Metadata are data about the data. 00:07:16.044 --> 00:07:18.523 You can see those when you're browsing your computer, 00:07:18.523 --> 00:07:20.859 and you can see who made a file, when it was made, 00:07:20.859 --> 00:07:22.704 when it was last opened, and so on. 00:07:23.104 --> 00:07:25.070 Paradata are slightly more complex. 00:07:25.070 --> 00:07:28.137 Paradata are data that give context for the data, 00:07:28.327 --> 00:07:30.943 so like how they were gathered, how they were processed, 00:07:30.943 --> 00:07:33.306 which decisions were made about them, and so on. 00:07:34.406 --> 00:07:35.953 The metadata for this image 00:07:35.953 --> 00:07:40.300 would be that it was taken by me on the first of June, 2017 00:07:40.300 --> 00:07:42.568 on a Sony compact camera. 00:07:42.568 --> 00:07:47.362 The paradata are that it was picture 111 in a series of 128 00:07:47.362 --> 00:07:50.564 and I took it on my first research trip to this castle. 00:07:51.314 --> 00:07:53.232 And I love to show this picture 00:07:53.232 --> 00:07:57.826 because this picture has everything in it that is wrong with models. 00:07:58.921 --> 00:08:01.786 You walk up the stairs in this castle, you come to the attic, 00:08:01.786 --> 00:08:04.502 and there's a big glass box with this model sitting in it. 00:08:04.502 --> 00:08:05.645 And what I love about it 00:08:05.645 --> 00:08:09.262 is that there are no data attached to it whatsoever. 00:08:09.262 --> 00:08:10.918 You don't have a scale bar. 00:08:11.958 --> 00:08:15.025 You don't have a date it was made or who made it. 00:08:15.055 --> 00:08:17.355 You don't have a date it's supposed to represent. 00:08:17.355 --> 00:08:18.647 There's nothing even to say 00:08:18.647 --> 00:08:21.603 that it's supposed to be this castle that you're standing in. 00:08:22.483 --> 00:08:24.959 And if you're talking about decision-making processes 00:08:24.959 --> 00:08:26.018 in the reconstruction, 00:08:26.018 --> 00:08:28.565 if you take a closer look at that center tower there, 00:08:28.565 --> 00:08:30.326 it becomes very, very obvious 00:08:30.326 --> 00:08:33.896 the size of that tower was not based on an archeological excavation 00:08:33.896 --> 00:08:36.386 or because there was a foundation there or something. 00:08:36.386 --> 00:08:38.923 No, that's the size of the toilet paper roll they had. 00:08:38.923 --> 00:08:41.138 (Laughter) 00:08:41.138 --> 00:08:43.393 And so this model makes me happy 00:08:43.393 --> 00:08:45.851 because it's everything I'm trying to avoid. 00:08:48.151 --> 00:08:51.113 And I'm not the only person trying to avoid this kind of thing. 00:08:51.113 --> 00:08:54.283 A lot of intelligent people are working and avoiding this. 00:08:54.593 --> 00:08:57.540 There are some hugely complex systems these days 00:08:57.540 --> 00:08:59.891 that go into great detail on data, 00:08:59.891 --> 00:09:02.775 metadata, paradata, how they all relate, and so forth; 00:09:02.775 --> 00:09:05.954 and my favorite one takes about six months to learn. 00:09:06.514 --> 00:09:08.724 Now that's bad enough for me as a researcher, 00:09:08.724 --> 00:09:10.790 but imagine that you, as a museum visitor, 00:09:10.790 --> 00:09:12.839 have to go on a six-month training course 00:09:12.839 --> 00:09:14.852 to understand what you're seeing. 00:09:15.522 --> 00:09:19.772 So, instead, I have a system that's just good enough for me. 00:09:19.772 --> 00:09:21.686 I simply take my model, 00:09:21.686 --> 00:09:26.792 and I tell you which parts are true and which ones are not. 00:09:26.792 --> 00:09:28.893 So probably true is the easiest. 00:09:28.893 --> 00:09:31.826 That's the category of things that I think are true 00:09:31.826 --> 00:09:33.858 because they're still there, 00:09:33.858 --> 00:09:36.567 so that could be things like the castle ruins. 00:09:37.757 --> 00:09:39.696 Next, pretty close to true, 00:09:40.981 --> 00:09:42.696 we have a lot of evidence for those. 00:09:42.696 --> 00:09:43.699 So for example, 00:09:43.699 --> 00:09:46.621 I was saying foundations, towers on foundations - 00:09:46.621 --> 00:09:49.844 we fill in the gaps what we have good evidence. 00:09:50.254 --> 00:09:53.962 Third stage, extrapolation, could be true - maybe not. 00:09:53.962 --> 00:09:56.583 That's where I'm working on secondary and tertiary data, 00:09:56.583 --> 00:09:58.253 like the maps and images. 00:09:58.253 --> 00:10:00.360 And then there's my favorite category - 00:10:01.015 --> 00:10:03.495 the stuff that's not really true. 00:10:04.900 --> 00:10:07.399 Now, these things I need to put in my model 00:10:07.709 --> 00:10:10.328 because the model would be missing something without it. 00:10:10.328 --> 00:10:13.094 If I didn't put these in, I would be telling you a lie, 00:10:14.014 --> 00:10:16.142 but I have no idea what to really put in. 00:10:16.482 --> 00:10:17.816 It's an interesting problem. 00:10:17.816 --> 00:10:21.882 So that's things like I know the great hall had paintings on the walls, 00:10:21.882 --> 00:10:24.259 I will never know what exactly was painted on them, 00:10:24.259 --> 00:10:25.829 so I have to make something up, 00:10:25.829 --> 00:10:28.628 but if I left them as a blank stone the way they are now, 00:10:28.628 --> 00:10:30.668 that would be making a statement. 00:10:31.398 --> 00:10:34.773 And then, of course, I need to attach my metadata and my paradata, 00:10:34.773 --> 00:10:37.738 and tell you why it's in that category. 00:10:38.058 --> 00:10:41.174 And finally, I need to make very, very sure 00:10:41.174 --> 00:10:43.531 that you don't only know why it's in that category 00:10:43.531 --> 00:10:45.892 but which part exactly I'm talking about. 00:10:46.492 --> 00:10:49.142 If you remember that clock from earlier, 00:10:49.472 --> 00:10:52.769 well, I can tell you for a fact that it's Friday afternoon. 00:10:53.089 --> 00:10:55.788 I can also tell you with absolute certainty 00:10:55.788 --> 00:10:59.842 that sometime in the last two millennia, we had a castle on this hill. 00:11:00.542 --> 00:11:02.537 What I cannot tell you 00:11:02.537 --> 00:11:06.119 is whether in that window, in 1548, 00:11:06.119 --> 00:11:09.719 we had an archway and that archway had a stone 00:11:09.719 --> 00:11:12.864 and that stone was exactly 312 millimeters wide. 00:11:13.234 --> 00:11:15.101 It could have been 317, 00:11:15.101 --> 00:11:17.800 but my drawing is going to say one way or the other. 00:11:19.600 --> 00:11:23.669 And that is the really, really interesting point for future researchers 00:11:23.949 --> 00:11:26.531 because if I've told you I have no idea what was here, 00:11:26.531 --> 00:11:29.530 they can use that point to research, and then they can say, 00:11:29.530 --> 00:11:32.540 "Look, we found more data, and actually you're completely wrong. 00:11:32.540 --> 00:11:35.025 It was 483 millimeters." 00:11:35.025 --> 00:11:36.569 And I can say, "Hooray!" 00:11:36.569 --> 00:11:39.615 because that advances our state of collective knowledge. 00:11:40.025 --> 00:11:42.127 So if I'm doing science properly, 00:11:42.127 --> 00:11:44.751 I want people to be able to prove me wrong. 00:11:46.390 --> 00:11:49.887 So that's why I'm not going to tell you the truth about castles, 00:11:51.454 --> 00:11:54.344 and why I make it very, very clear to you 00:11:54.804 --> 00:11:56.336 when I'm just making it up. 00:11:56.336 --> 00:11:57.707 (Laughter) 00:11:58.487 --> 00:11:59.740 Thank you. 00:11:59.740 --> 00:12:02.454 (Applause)