WEBVTT 00:00:01.014 --> 00:00:04.185 (lift) 00:00:04.185 --> 00:00:07.244 (lift 12 - Feb 24 2012 - Geneva) 00:00:07.244 --> 00:00:10.044 (Rufus Pollock - Stories) 00:00:10.044 --> 00:00:11.788 [Rufus Pollock] Just to say for those of you who don't know: 00:00:11.788 --> 00:00:13.666 the Open Knowledge Foundation is a not-profit -- not for profit 00:00:13.666 --> 00:00:15.611 founded in 2004 00:00:15.611 --> 00:00:17.865 and which builds tools and communities 00:00:17.865 --> 00:00:20.934 to create, use and share open information 00:00:20.934 --> 00:00:24.585 and that's information that anyone can use, reuse and redistribute. 00:00:24.585 --> 00:00:28.321 And as such, we've been working on open data for quite a long time 00:00:28.321 --> 00:00:30.011 since we started in 2004. 00:00:30.011 --> 00:00:34.817 And today, I want to start the story by going back in time 5000 years, 00:00:34.817 --> 00:00:37.610 to ancient Mesopotamia. 00:00:37.610 --> 00:00:41.393 There, between the Tigris and the Euphrates rivers, 00:00:42.069 --> 00:00:44.390 flourished the Sumerian civilization. 00:00:44.390 --> 00:00:47.298 And they were confronted by a problem. 00:00:47.298 --> 00:00:50.269 They were confronted by the limitations of human memory 00:00:50.899 --> 00:00:54.338 in the recording of taxes, food and other goods. 00:00:54.338 --> 00:00:59.642 And those ancient civil servants and businessmen hit on a novel solution: 00:01:00.380 --> 00:01:04.666 What they decided to do was they would start counting things with small clay chits, 00:01:04.666 --> 00:01:09.234 which they would bake inside of a clay -- a little clay box 00:01:09.234 --> 00:01:12.617 and then mark, on the outside of that box, what they were counting. 00:01:12.617 --> 00:01:15.303 You know, was it grain, was it tax payments, whatever. 00:01:16.150 --> 00:01:19.786 And so, born out of necessity for a state and a society, 00:01:20.632 --> 00:01:25.773 came one of the great information technology revolutions of all time: writing. 00:01:25.773 --> 00:01:28.172 The Sumerians invented writing via cuneiform. 00:01:28.910 --> 00:01:34.039 And if we fast-forward from that a few thousand years, we come to the UK census. 00:01:34.039 --> 00:01:37.577 Again, it's always interesting that states, governments are often at the forefront 00:01:37.577 --> 00:01:42.681 of at least driving information technology and information systems innovations. 00:01:42.681 --> 00:01:44.654 The UK census: again, the state, 00:01:44.654 --> 00:01:46.565 this is during the Napoleon Wars, 00:01:46.565 --> 00:01:48.601 desired to count the population more accurately: 00:01:48.601 --> 00:01:51.995 and we have the first UK census in 1801. 00:01:51.995 --> 00:01:56.189 And in the US, they also had censuses, in fact starting in 1790. 00:01:56.819 --> 00:01:59.383 And one of the problems encountered in the 1880 census 00:01:59.383 --> 00:02:01.592 was they tabulated the census by hand. 00:02:02.345 --> 00:02:05.699 And by the 1880 census, it was taking seven years 00:02:05.699 --> 00:02:06.822 to tabulate the census. 00:02:06.822 --> 00:02:10.241 So after it got taken in 1880, it wasn't until 1887 00:02:10.241 --> 00:02:12.892 they actually had any data they could use. 00:02:12.892 --> 00:02:16.004 And they calculated that for the next census in 1890, 00:02:16.004 --> 00:02:18.164 they wouldn't be finished by 1900. 00:02:18.164 --> 00:02:21.936 They still wouldn't have the results of the census by the time they started the next one. 00:02:21.936 --> 00:02:24.233 They had a crisis of information technology. 00:02:24.233 --> 00:02:26.979 And what they went and did is they commissioned Herman Hollerith 00:02:26.979 --> 00:02:29.747 to build the first automatic tabulator. 00:02:29.747 --> 00:02:32.835 And for those of you who know your company history, of course, 00:02:32.835 --> 00:02:34.513 Herman Hollerith's company went on 00:02:34.513 --> 00:02:35.899 to be one of the founders, if you like, 00:02:35.899 --> 00:02:38.808 one of the companies that came and created IBM. 00:02:38.808 --> 00:02:42.258 And IBM, by the sixties, were building this 00:02:42.258 --> 00:02:44.374 -- they replaced those hand -- 00:02:44.374 --> 00:02:45.905 those kind of wooden, mechanical tabulators 00:02:45.905 --> 00:02:48.524 with this stuff: digital tabulators, 00:02:48.524 --> 00:02:50.375 the modern computer of this age. 00:02:50.375 --> 00:02:52.610 And again, much of this -- I don't know if you guys know -- 00:02:52.610 --> 00:02:53.705 IBM would have gone bankrupt 00:02:53.705 --> 00:02:58.477 if it hadn't been for Franklin Roosevelt passing the Social Security Act in the States, 00:02:58.477 --> 00:03:01.132 which necessitated a huge amount of new tabulation. 00:03:01.132 --> 00:03:04.629 So, again, a lot of innovation in this space came out of government need 00:03:04.629 --> 00:03:06.370 and also, of course, the nuclear program, 00:03:06.370 --> 00:03:08.641 the other great needer of computational power. 00:03:09.317 --> 00:03:11.899 And today, today, 00:03:12.623 --> 00:03:15.485 we find ourselves again in the midst of a revolution. 00:03:16.438 --> 00:03:19.331 It's a revolution driven by two needs: 00:03:19.331 --> 00:03:22.027 ones that have been the same throughout history as I've just shown, 00:03:22.027 --> 00:03:23.886 information complexity, which is the necessity, 00:03:24.456 --> 00:03:27.575 and information technology, which is the opportunity. 00:03:28.544 --> 00:03:32.702 And what we're doing in this case is a policy innovation, if you like. 00:03:32.702 --> 00:03:36.468 We are innovating by opening up information. 00:03:37.052 --> 00:03:39.436 So just take the obvious example, government, 00:03:39.436 --> 00:03:41.097 as I said, often the innovator. 00:03:41.097 --> 00:03:43.308 In the last -- 3 years ago, you go back 3 years, 00:03:43.308 --> 00:03:45.829 there's almost no open government data initiatives 00:03:45.829 --> 00:03:46.688 in the world. 00:03:46.688 --> 00:03:48.442 Today there are dozens. 00:03:48.442 --> 00:03:51.162 The UK, the US, Finland, Kenya, The Netherlands, 00:03:51.162 --> 00:03:53.049 and there's new ones almost every week. 00:03:53.049 --> 00:03:57.407 There's been a launch of an official kind of movement as a part of the UN 00:03:57.407 --> 00:04:00.097 called the Open Government Partnership in which countries sign up, 00:04:00.097 --> 00:04:02.433 and among other things, they open up their data. 00:04:03.002 --> 00:04:05.325 And of course, it's been, in the UK and other countries, 00:04:05.325 --> 00:04:06.562 Tim Berners-Lee has been involved. 00:04:06.562 --> 00:04:09.106 I've helped advise the government around this in the UK. 00:04:09.106 --> 00:04:11.221 But it's not just government, it's also companies. 00:04:11.651 --> 00:04:13.982 Companies are opening up data. 00:04:13.982 --> 00:04:15.690 Very interestingly, last year, 00:04:15.690 --> 00:04:19.092 Nike started an open data initiative there 00:04:19.092 --> 00:04:21.372 to open up supply chain and sustainability data, 00:04:21.372 --> 00:04:23.931 for themselves and also for their suppliers, 00:04:23.931 --> 00:04:26.800 which I think is a very interesting change. 00:04:26.800 --> 00:04:28.004 And it's also communities. 00:04:28.004 --> 00:04:29.715 Often, in fact, back there in the beginning, 00:04:29.715 --> 00:04:31.927 this incredible map that you saw in an earlier slide, 00:04:31.927 --> 00:04:35.002 is a OpenStreetMap activity, around the world. 00:04:35.002 --> 00:04:38.073 People adding to this crowd-built map of the world. 00:04:38.073 --> 00:04:41.074 And in the last 6 years, OpenStreetMap, 00:04:41.074 --> 00:04:42.445 from a bottom-up community, 00:04:42.445 --> 00:04:44.435 have built a complete, comprehensive, 00:04:44.435 --> 00:04:47.918 map of the world, of fully open data. 00:04:48.872 --> 00:04:50.898 So I've just gone on about Open Data, 00:04:50.898 --> 00:04:52.766 and one thing I'm aware of, of this audience, 00:04:52.766 --> 00:04:54.035 is you might not all know what it is. 00:04:54.035 --> 00:04:59.152 So I'm going to take a brief moment, a brief moment, to say what it is. 00:04:59.152 --> 00:05:01.493 What does it mean when I say 'open'? 00:05:01.493 --> 00:05:05.557 And was it, you know, what's different from anything else? What's different from simply public data? 00:05:05.557 --> 00:05:07.083 So there's actually a definition, 00:05:07.083 --> 00:05:10.177 a definition we the Open Knowledge Foundation helped write, it's very simple. 00:05:10.177 --> 00:05:13.671 In a nutshell, a piece of information, a piece of data, 00:05:13.671 --> 00:05:18.384 is open if anyone is free to use, reuse, 00:05:18.384 --> 00:05:20.797 and redistribute it, subject only at most 00:05:20.797 --> 00:05:22.891 to a requirement to attribute and share alike. 00:05:23.214 --> 00:05:25.784 And anyone means anyone! 00:05:25.784 --> 00:05:28.055 It doesn't mean -- there can't be any commercial restrictions. 00:05:28.055 --> 00:05:32.262 You can't say: hey, here's this data, but only people using it for non-commercial purposes. 00:05:32.262 --> 00:05:34.849 Or only people working in education. 00:05:34.849 --> 00:05:38.051 Or only people living in the developing world, or the developed world. 00:05:38.051 --> 00:05:40.743 There can't be any restrictions like that. 00:05:41.343 --> 00:05:43.189 And there's a reason for this, by the way, 00:05:43.209 --> 00:05:48.615 and it isn't just because one's obsessed about if you like, trademarking an attractive term. 00:05:49.315 --> 00:05:51.081 It's because it's about interoperability. 00:05:51.291 --> 00:05:54.617 One of my experiences at this conference, which I remember from previous trips to Geneva, 00:05:54.627 --> 00:05:56.974 is I've been unable to plug in my laptop! 00:05:56.974 --> 00:06:02.048 Even though I have a French adaptor, in fact, these wonderful Swiss plugs here, are, you know, 00:06:02.048 --> 00:06:03.582 these wonderful, small octagonal shape. 00:06:03.582 --> 00:06:05.379 And even with my adaptor I can't plug in. 00:06:05.379 --> 00:06:07.347 Right? And it's called interoperability. 00:06:07.347 --> 00:06:10.929 When we travel around to different countries, our power adaptors don't actually fit in. 00:06:10.929 --> 00:06:12.581 We have to buy something. 00:06:12.581 --> 00:06:16.755 And the point about this definition, and the point about caring about Open Data, 00:06:16.755 --> 00:06:18.317 is, it's about interoperability. 00:06:18.317 --> 00:06:22.112 The dream of Open Data is interoperability. 00:06:22.112 --> 00:06:26.058 Of seamlessly being able to share and interweave information. 00:06:27.898 --> 00:06:31.704 And if every time I get information from two different people I have to consult a lawyer, 00:06:31.704 --> 00:06:35.300 I have to work out whether I'm allowed to do it, whether I'm allowed to put these things together, 00:06:35.300 --> 00:06:37.634 we lose that dream, that dream is shattered. 00:06:37.634 --> 00:06:42.166 And the key point is, this definition, and those conditions, ensure interoperability. 00:06:42.166 --> 00:06:45.744 If you comply with them, we know that any piece of info, of Open Data, 00:06:45.744 --> 00:06:47.880 will work with any other piece of Open Data. 00:06:48.681 --> 00:06:52.932 And also, it's worth saying for a quick moment, what kind of data, and to emphasize a point. 00:06:52.932 --> 00:06:55.985 Just to foreclose those kinds of questions, otherwise I always get asked. 00:06:55.985 --> 00:06:58.809 When we talk about opening up data, in general, 00:06:58.809 --> 00:07:01.026 we're not talking about personal data. 00:07:01.026 --> 00:07:04.161 We're not talking about opening up your private health records 00:07:04.161 --> 00:07:08.302 or opening up your personal tax information. 00:07:08.302 --> 00:07:11.267 We're talking about information that is non-personal in nature. 00:07:11.267 --> 00:07:15.667 And for the government for example: transport, geodata, statistics, electoral, legal. 00:07:15.667 --> 00:07:19.510 Stuff that the UK has, in fact, for example been opening up over the last few years. 00:07:19.510 --> 00:07:23.381 This financial information, on government spending, this information on health outcomes, 00:07:23.381 --> 00:07:28.625 on prescriptions, this information on educational outcomes, this information on the law. 00:07:28.625 --> 00:07:30.765 This information -- statistical information. 00:07:30.785 --> 00:07:32.691 That's the kind of thing that we're talking about. 00:07:34.186 --> 00:07:37.393 Now, I want to say, it's in this story, we have this story of over time. 00:07:37.393 --> 00:07:38.996 But why governments are doing it now? 00:07:39.596 --> 00:07:40.598 And why Open Data? 00:07:41.268 --> 00:07:43.930 So, okay, for thousands of years, governments innovate, 00:07:43.930 --> 00:07:47.274 but why do they innovate at this particular moment and in this way? 00:07:47.274 --> 00:07:51.976 So I want to start here with a quick story, a story of medicine gone wrong. 00:07:52.006 --> 00:07:54.484 It is from a great book by a guy called Stephen Klaidman. 00:07:54.484 --> 00:07:55.918 It's in fact one of the things 00:07:55.918 --> 00:07:57.781 that made me think quite deeply about this: 00:07:57.781 --> 00:07:59.852 why I was interested in Open Data. 00:08:01.172 --> 00:08:02.917 In that picture there, you can see 00:08:02.917 --> 00:08:05.726 what was the Redding Medical Centre in Northern California. 00:08:05.726 --> 00:08:10.471 There, in 2002, in the Summer of 2002, John Corapi, 00:08:11.231 --> 00:08:12.401 in typical American style, 00:08:12.401 --> 00:08:15.374 an ex-accountant from Vegas turned Catholic priest, 00:08:15.374 --> 00:08:17.243 [scattered laughter] 00:08:17.783 --> 00:08:22.274 ...arrived at the Redding Medical Centre having been referred by his doctor for having chest pains. 00:08:22.784 --> 00:08:28.419 He had a cardiogram by the local cardiologist and was told that he needed an immediate heart bypass, 00:08:28.419 --> 00:08:31.484 that he was at serious risk, and that he should come back later that day, 00:08:31.484 --> 00:08:34.514 or at the latest, tomorrow, to have open heart surgery. 00:08:35.764 --> 00:08:37.985 Rather shocked, and dazed by this news, 00:08:37.985 --> 00:08:41.225 he returned home to pack his bags in order to return to hospital. 00:08:41.225 --> 00:08:45.102 He called up his best friend, who was still an accountant in Vegas, 00:08:46.032 --> 00:08:52.568 whose partner was a hospital nurse, and who advised him that he should get a second opinion, 00:08:52.568 --> 00:08:55.904 that, according to his partner, it was not, you know, 00:08:55.904 --> 00:08:58.981 it was very unusual that you would need to have immediate open heart surgery, 00:08:58.981 --> 00:09:00.235 and that he should get a second opinion. 00:09:00.975 --> 00:09:04.507 Rather doubtful about this, because he was extremely worried, he did get on a plane. 00:09:04.507 --> 00:09:07.919 He went to Vegas, he got seen by another specialist... 00:09:07.919 --> 00:09:11.785 who, to his complete surprise, told him there was nothing wrong with his heart. 00:09:12.805 --> 00:09:15.289 He saw another specialist, just to make sure. 00:09:15.289 --> 00:09:18.563 They told him also, there was nothing wrong with his heart. 00:09:19.343 --> 00:09:25.067 Relieved, and rather, you know, happy, he returned home and just wanted to really forget about it. 00:09:25.067 --> 00:09:27.389 But his friend said: "No, what's going on here? Something's wrong". 00:09:27.389 --> 00:09:32.613 And they went in to see the CEO of the Tenet Healthcare, the people running this hospital 00:09:32.613 --> 00:09:35.654 (which, by the way, was a private hospital), and said: 00:09:35.654 --> 00:09:38.614 "Look, something's wrong, what's going on, what are you going to do about this?" 00:09:38.614 --> 00:09:40.256 And basically they were told: not very much. 00:09:40.256 --> 00:09:44.581 You know, mistakes get made, it's bad luck, don't worry about it, 00:09:44.581 --> 00:09:46.233 we'll look into it, but thank you very much. 00:09:46.763 --> 00:09:51.631 They weren't convinced by this, and eventually they decided to contact the FBI. 00:09:51.631 --> 00:09:53.826 The reason they contacted the FBI, by the way, 00:09:53.826 --> 00:09:56.401 is it's a private healthcare provider in the United States, 00:09:56.401 --> 00:10:00.476 they provide Medicare provision of healthcare to the Federal Government. 00:10:00.476 --> 00:10:04.202 So, if the Federal Government is getting defrauded, the FBI can get involved. 00:10:04.982 --> 00:10:06.850 The FBI started investigating. 00:10:08.281 --> 00:10:12.081 Eventually it turned out, that hundreds, probably thousands of people 00:10:12.081 --> 00:10:15.854 over a ten or longer year period, had been operated on unnecessarily. 00:10:16.704 --> 00:10:19.561 Most of them had had serious procedures performed on them, 00:10:19.561 --> 00:10:22.189 open heart surgery, some had died as a result. 00:10:22.189 --> 00:10:24.325 Obviously it's quite a serious operation. 00:10:24.325 --> 00:10:27.437 Some people had basically been condemned to a lifetime of pain. 00:10:27.437 --> 00:10:31.437 One of the most traumatic examples was a 36-year-old, he had been cut open, 00:10:31.437 --> 00:10:33.000 which is obviously what happens in open heart surgery, 00:10:33.000 --> 00:10:35.369 and his chest had never knitted back together correctly. 00:10:35.999 --> 00:10:38.125 Basically, he would be in pain for the rest of his life. 00:10:39.395 --> 00:10:43.000 So, hundreds, thousands of people had been harmed. 00:10:43.610 --> 00:10:45.968 One of the interesting things was that in this community 00:10:45.968 --> 00:10:48.159 there was already some suspicion, there were anecdotes. 00:10:48.159 --> 00:10:50.853 I mean, one of the ones I really liked from this book was the story that went: 00:10:50.853 --> 00:10:56.021 'Don't get a flat tyre outside of Redding Medical Centre because you'll end up with a heart bypass.' 00:10:56.021 --> 00:10:57.303 [scattered laughter] 00:10:57.303 --> 00:11:00.258 You know, but the thing was, there was no data. 00:11:00.728 --> 00:11:04.563 People were you know, a bit suspicious, but it was among doctors who knew, 00:11:04.563 --> 00:11:06.867 you know, in the community, and who wants to doubt it. 00:11:06.867 --> 00:11:12.171 And guess what? Also, Redding Medical Centre had one of the best mortality rates, 00:11:13.001 --> 00:11:15.350 for cardiac procedures in the United States, 00:11:15.350 --> 00:11:19.609 because if you operate on healthy people, you have a good mortality rate! 00:11:19.619 --> 00:11:21.129 [scattered laughter] 00:11:21.129 --> 00:11:23.390 So, the other thing, though, 00:11:23.390 --> 00:11:25.452 and this is the point that comes to Open Data for me 00:11:25.452 --> 00:11:28.722 the other red flag if you had been looking at the data, 00:11:28.741 --> 00:11:31.927 was these two things: one is incredibly low mortality rate, 00:11:31.927 --> 00:11:35.351 and (B) that it had almost the highest number of procedures 00:11:35.351 --> 00:11:37.464 for the population that it covered in the United States, 00:11:38.144 --> 00:11:39.634 which should be a red flag, right? 00:11:39.634 --> 00:11:42.618 Because, one, it's just a massive outlier on that basis, and also, 00:11:42.618 --> 00:11:45.815 the more people you should be operating on, the more you're doing marginal cases, 00:11:45.815 --> 00:11:49.450 the higher should be your mortality rate unless something very odd is going on. 00:11:50.030 --> 00:11:53.015 The thought was: what if people had been looking at this data? 00:11:53.015 --> 00:11:56.045 What if we'd - if this data had been open and public, 00:11:56.045 --> 00:11:59.517 and not maybe just for particular researchers to look at or the government? 00:11:59.927 --> 00:12:04.129 And it kind of reminded me of a phrase that's very famous in Open Source software, which is: 00:12:04.129 --> 00:12:05.965 "To many eyes, all bugs are shallow". 00:12:05.965 --> 00:12:10.504 What's great about Open Source software is lots of people can look at it, lots of people can fix it. 00:12:10.504 --> 00:12:14.730 And for me, what this was saying was: to many eyes, all anomalies are noticeable. 00:12:14.730 --> 00:12:16.679 It's somewhat of an exaggeration, 00:12:16.679 --> 00:12:18.908 but what happens if rather than ten or twenty people 00:12:18.908 --> 00:12:22.077 who worked in monitoring Medicare provision in the US government, 00:12:22.077 --> 00:12:23.877 we'd had thousands or millions of people? 00:12:23.877 --> 00:12:26.919 If the local journalists or citizens, who had suspicions, 00:12:26.919 --> 00:12:28.747 had been able to go and look at that data and say: 00:12:28.747 --> 00:12:32.485 "Whoa! What's going on here? This isn't just anecdotes, there's some data". 00:12:34.205 --> 00:12:40.225 And so, and it's not just then, about kind of spotting healthcare errors, or issues, or risks, 00:12:40.225 --> 00:12:42.415 it's also about things like apps and services 00:12:42.415 --> 00:12:43.857 that you can build with Open Data. 00:12:43.867 --> 00:12:46.667 This is a great app built by mySociety in the UK, 00:12:46.667 --> 00:12:47.640 called Mapumental. 00:12:47.640 --> 00:12:48.974 And the question is, I don't know if people know, 00:12:48.974 --> 00:12:50.646 London house prices are very expensive, 00:12:50.646 --> 00:12:52.510 I don't know whether they rival Geneva's, 00:12:52.510 --> 00:12:55.238 but they're, it's a pretty difficult thing. 00:12:55.238 --> 00:12:57.978 And one of the questions was, if I have to work somewhere, 00:12:57.978 --> 00:13:01.752 and I want to know where I can live, and afford, 00:13:01.752 --> 00:13:05.757 and I can commute to work in a certain time, and it's not too ugly, 00:13:05.757 --> 00:13:07.583 this is what this app does. 00:13:07.583 --> 00:13:11.200 You can choose the price, you can say where you're going to work, 00:13:11.200 --> 00:13:14.195 you can choose the commute time, and you can choose the scenicness. 00:13:14.195 --> 00:13:17.167 And it will show you, on this map, where you can live. 00:13:18.427 --> 00:13:20.796 Another example, more about transparency, 00:13:20.796 --> 00:13:22.746 is a project we did called "Where Does My Money Go?". 00:13:23.976 --> 00:13:25.406 It's an interactive version, 00:13:25.406 --> 00:13:26.211 you can kind of draw it out, 00:13:26.211 --> 00:13:29.114 so what it starts with, is one, is it tells you what your tax is, 00:13:29.114 --> 00:13:30.821 something that most people often don't know, 00:13:30.821 --> 00:13:33.668 and it will tell you how much you're paying each day 00:13:33.668 --> 00:13:36.254 to a particular area of society. 00:13:36.254 --> 00:13:37.328 And the dream for me, 00:13:37.328 --> 00:13:39.127 a dream that we're on the way to realising, 00:13:39.127 --> 00:13:42.817 is in this visualisation, you can drill down into areas. 00:13:42.817 --> 00:13:45.092 And my dream is to keep drilling down. 00:13:45.472 --> 00:13:47.633 So depending on what day we have, I want to go down, 00:13:47.633 --> 00:13:49.628 right down through those bubbles, step by step, 00:13:49.628 --> 00:13:52.403 until I see the money spent on street lights on my street, 00:13:52.403 --> 00:13:55.270 on filling in potholes, on collecting my rubbish. 00:13:56.190 --> 00:13:57.138 And for two reasons: 00:13:57.138 --> 00:13:59.704 One, obviously there's a question, particularly in some countries, 00:13:59.704 --> 00:14:01.016 of inefficiency or corruption, 00:14:01.436 --> 00:14:05.176 but also, just because most of us don't feel very happy about paying tax. 00:14:06.066 --> 00:14:08.157 It's not one of those things people welcome! 00:14:08.157 --> 00:14:09.817 But it's something that we should. 00:14:09.817 --> 00:14:11.960 Government does an awful lot for us, 00:14:11.960 --> 00:14:14.287 and having a better sense of where it's going 00:14:14.287 --> 00:14:17.120 could make us feel an awful lot better about paying that tax. 00:14:17.120 --> 00:14:18.657 In the way that when we go to a restaurant, 00:14:18.657 --> 00:14:21.397 we don't, when we get the bill, we don't necessarily feel bad. 00:14:21.397 --> 00:14:24.334 We feel "Wow, I had a great meal. That was worth it." 00:14:25.274 --> 00:14:26.366 But why Open? 00:14:26.366 --> 00:14:29.079 I've given you examples, and you know, we see a lot of apps and services. 00:14:29.079 --> 00:14:30.948 Why is Open relevant here? 00:14:31.598 --> 00:14:36.350 This goes back to what I said about the information technology, the revolution. 00:14:36.350 --> 00:14:37.813 So it's the challenge and the opportunity. 00:14:37.813 --> 00:14:42.387 It's the challenge that we see today, is exploding informational complexity. 00:14:42.797 --> 00:14:43.924 I mean, another great story: 00:14:43.924 --> 00:14:47.728 in the 1820s, all bank clearing in the largest financial centre in the world 00:14:47.728 --> 00:14:51.848 was done in a single room, where people -- one person from each bank gathered 00:14:51.848 --> 00:14:56.402 and they'd go round the room pulling out gold, and swapping it around, between different banks. 00:14:56.402 --> 00:14:58.434 And that's how they did bank clearing. 00:14:59.074 --> 00:15:01.991 Today we have billions of transactions a minute. 00:15:01.991 --> 00:15:07.945 And the way we as humans deal with complexity is by dividing and conquering it. 00:15:07.945 --> 00:15:10.683 We split it up into manageable chunks that we deal with. 00:15:11.013 --> 00:15:12.381 The other answer, 00:15:12.381 --> 00:15:14.883 and this answer's particularly relevant about Open Data, 00:15:14.883 --> 00:15:16.219 is information technology. 00:15:16.219 --> 00:15:18.951 Today, a smartphone has as much computing power 00:15:18.951 --> 00:15:22.260 as the system that ran the Apollo moon landings. 00:15:22.260 --> 00:15:24.027 And an even better example is storage: 00:15:24.027 --> 00:15:26.930 one terabyte of storage today is a hundred dollars. 00:15:26.930 --> 00:15:30.297 In 1994, this would have cost 400,000 dollars. 00:15:30.297 --> 00:15:33.977 I can have every financial transaction 00:15:33.977 --> 00:15:38.376 the UK government, or the US government made last year, or even for the last decade, 00:15:38.376 --> 00:15:39.665 on my laptop. 00:15:39.665 --> 00:15:43.543 That was not possible for an average citizen a decade ago. 00:15:44.283 --> 00:15:48.187 So it's mass participation, information access, processing, and production. 00:15:48.187 --> 00:15:49.557 It's decentralisation. 00:15:49.557 --> 00:15:51.728 And the claim here is that openness is key. 00:15:51.728 --> 00:15:53.957 It's because it's about scaling. 00:15:54.547 --> 00:15:57.399 What we are doing is weaving data together. 00:15:57.399 --> 00:15:59.615 As I said, we deal with complexity by splitting it up. 00:15:59.615 --> 00:16:02.928 We componentise, we split data up into blocks 00:16:02.928 --> 00:16:04.670 that we recombine. 00:16:04.670 --> 00:16:07.201 But if we are going to recombine information, 00:16:07.961 --> 00:16:10.076 we need to put Humpty Dumpty back together again, 00:16:10.076 --> 00:16:12.909 it won't work most of the time if it is closed. 00:16:13.449 --> 00:16:17.039 We need Open Data to scale and to componentise. 00:16:17.679 --> 00:16:21.518 And it's a point just to make here in this respect, that you might think: 00:16:21.518 --> 00:16:23.351 "Well you know, you're talking about Open Data, 00:16:23.351 --> 00:16:24.721 you know, this could be true of anything! 00:16:24.721 --> 00:16:25.789 Why don't we have like, 00:16:25.789 --> 00:16:28.232 Open Cars, and Open Shoes, and you know, 00:16:28.232 --> 00:16:29.578 why don't we just share everything, man! 00:16:29.578 --> 00:16:31.026 It would be so beautiful!". 00:16:31.706 --> 00:16:33.393 Right? And the sad thing is, 00:16:33.393 --> 00:16:39.070 is that that hasn't generally worked as a way of organising most production in our society. 00:16:39.070 --> 00:16:44.074 Instead, we have private property, and so we don't do that much openness relatively. 00:16:44.074 --> 00:16:45.848 But there's something different about digital information. 00:16:45.848 --> 00:16:48.944 We all know it, but it's worth emphasising, which is, it's very cheaply copied. 00:16:49.344 --> 00:16:52.782 I mean, give me a copy of your data isn't a problem if you're the government. 00:16:52.782 --> 00:16:56.393 Give me a copy of your car, or your house, or whatever, is. 00:16:56.803 --> 00:16:58.584 And it's also about innovation here. 00:16:58.584 --> 00:17:01.219 I mean, in a way it's almost the purest aspect of markets. 00:17:01.219 --> 00:17:05.619 Markets are about moving things to the person who could use them most best. 00:17:06.389 --> 00:17:07.200 And that's true of data. 00:17:07.200 --> 00:17:10.660 The best thing to do with your data will likely be thought of by someone else. 00:17:11.340 --> 00:17:14.973 And vice versa! You will think of the best thing to do with someone else's data. 00:17:15.783 --> 00:17:20.103 And Open Data allows us, in the most frictionless, easiest way, 00:17:20.103 --> 00:17:22.708 to move data to where it can be most optimally used, 00:17:22.728 --> 00:17:23.905 particularly if you're government. 00:17:24.275 --> 00:17:26.843 So in short, it's about better understanding, it's about better government, 00:17:26.843 --> 00:17:29.115 it's about better research, it's about better economy. 00:17:29.115 --> 00:17:31.124 And something also for companies and governments: 00:17:31.124 --> 00:17:32.707 I think it's about better engagement. 00:17:33.137 --> 00:17:34.986 It's about a closer relationship, sometimes, 00:17:34.986 --> 00:17:37.331 between your citizens and you as the government. 00:17:37.331 --> 00:17:40.960 Between you, even possibly, as a company, and your users. 00:17:41.690 --> 00:17:43.972 So I wanted to kind of finish here by saying where we're going. 00:17:43.972 --> 00:17:46.429 The story was, of this talk, was, you know, where are we? 00:17:46.869 --> 00:17:49.654 Why have we got here? And where are we going? 00:17:50.734 --> 00:17:52.248 So one answer is just more use. 00:17:52.248 --> 00:17:55.313 So right now, I just said at the beginning, Open Data is relatively young. 00:17:55.313 --> 00:17:57.844 This vast outpouring, for example, of government data, 00:17:57.844 --> 00:18:02.058 that anyone can freely use, reuse, and redistribute, is really new, 00:18:02.058 --> 00:18:03.382 even if it's done three years ago. 00:18:03.382 --> 00:18:06.479 For example, in the UK, much of the most useful data that could be released 00:18:06.479 --> 00:18:09.091 has only been released in the last six months or a year. 00:18:09.091 --> 00:18:10.478 You want prescription data? 00:18:10.478 --> 00:18:11.894 Are you a pharmaceutical company, 00:18:11.894 --> 00:18:15.261 and you want to know what kind of prescription habits are going on in the UK? 00:18:15.261 --> 00:18:18.837 I would emphasise: at an anonymised or somewhat aggregate level. 00:18:18.837 --> 00:18:20.707 Do you want to know about what crime is going on? 00:18:20.707 --> 00:18:24.241 Are you building a real estate website and you want data on environment, 00:18:24.241 --> 00:18:25.742 or you want data on unemployment, 00:18:25.742 --> 00:18:28.742 or other information about where properties are situated? 00:18:28.742 --> 00:18:30.077 You can now get that. 00:18:30.687 --> 00:18:32.783 So I think there's going to be a lot more use from business. 00:18:33.743 --> 00:18:35.398 There'll be a lot more use from everyone. 00:18:35.398 --> 00:18:38.686 But I think particularly business is going to wake up to the opportunities here. 00:18:38.686 --> 00:18:40.192 I think it's also going to lead to more data. 00:18:40.192 --> 00:18:41.789 One is, government is going to be more data. 00:18:41.789 --> 00:18:45.597 I think also businesses are going to realise, and communities, 00:18:45.597 --> 00:18:47.719 that they want to share back some of that data, 00:18:47.719 --> 00:18:48.764 some of the data they have. 00:18:48.764 --> 00:18:50.863 It's not going to be their kind of crown jewels, 00:18:50.863 --> 00:18:53.707 and it's not going -- often start out with data that's not core to their business. 00:18:53.707 --> 00:18:58.108 It's like. kind of Nike, they realised that by opening and sharing data, 00:18:58.108 --> 00:19:00.569 they can scale in a way they can't on their own. 00:19:01.029 --> 00:19:03.069 And does it mean that richer data, going back 00:19:03.069 --> 00:19:05.809 -- how could I leave out Hegel and Marx in a talk like this -- 00:19:05.809 --> 00:19:08.869 "Quantity changes quality" as Hegel told us. 00:19:09.319 --> 00:19:14.156 And more data, going back to that woven ball, more data actually means better data. 00:19:14.156 --> 00:19:17.634 It means richer data, it's a qualitative difference in what we can do. 00:19:17.634 --> 00:19:19.781 Geodata on it's own isn't that useful. 00:19:19.781 --> 00:19:21.593 Transport data on it's own isn't useful. 00:19:21.593 --> 00:19:24.023 Geodata plus transport data is useful! 00:19:24.703 --> 00:19:26.187 And we're going to be seeing data refining. 00:19:26.187 --> 00:19:27.441 Data is the new oil, right? 00:19:27.441 --> 00:19:28.928 So, we're going to refine it. 00:19:28.928 --> 00:19:32.037 And that's going to be a big business: higher quality data. 00:19:32.487 --> 00:19:34.149 I want to leave you with a couple of thoughts. 00:19:34.149 --> 00:19:35.646 So, one is, some people say: 00:19:35.646 --> 00:19:37.527 "Well, okay, but, you know, selling data is big business". 00:19:37.527 --> 00:19:40.987 And it is, but going forward in some of these things like software, 00:19:40.987 --> 00:19:42.485 data is going to be a platform. 00:19:42.485 --> 00:19:43.720 It's not a commodity. 00:19:43.720 --> 00:19:46.084 Businesses built purely on selling data, 00:19:46.084 --> 00:19:47.558 I just don't think are going to make it. 00:19:47.558 --> 00:19:51.668 You need to be building on your data, not attempting to purely sell it. 00:19:52.588 --> 00:19:54.638 And the other answer is to be modest. 00:19:55.278 --> 00:19:56.797 So I said: where are we going? 00:19:56.797 --> 00:19:57.917 I don't know if people know 00:19:57.917 --> 00:20:02.124 -- and this takes us back to an earlier age, an age of electricity and steam -- of Faraday. 00:20:02.124 --> 00:20:04.678 So he's demonstrating electricity at the Royal Society, 00:20:04.678 --> 00:20:08.316 and Gladstone, the future Prime Minister of England, sees him do this stuff, you know, 00:20:08.316 --> 00:20:10.445 the frog legs move, and Gladstone's like: 00:20:10.445 --> 00:20:12.392 "Well, I mean, this is party trick, Faraday. 00:20:12.392 --> 00:20:15.971 It's great, but, what's really, you know, what's electricity going to amount to?" (20:16) 00:20:15.971 --> 00:20:20.832 And Faraday says to him: "Well, what's the use of a baby?" 00:20:20.832 --> 00:20:24.299 You know, a baby when it's young is not very useful. 00:20:24.299 --> 00:20:25.718 [scattered laughter] 00:20:25.718 --> 00:20:27.426 But it grows up into something! 00:20:27.916 --> 00:20:29.725 And that is where we are going today. 00:20:29.725 --> 00:20:32.488 We are the beginning of the Open Data journey. 00:20:33.088 --> 00:20:36.157 And partly is, we don't know what it's going to grow up into. 00:20:36.157 --> 00:20:37.004 Thank you very much! 00:20:37.004 --> 00:20:40.558 [Applause] 00:20:40.558 --> 00:20:44.708 [Questioner] Um, citizens and I guess patients at hospitals, 00:20:44.708 --> 00:20:48.683 assume that the institutions have all this data and it's very well organised, 00:20:48.683 --> 00:20:49.925 and it's a question of will. 00:20:50.555 --> 00:20:53.889 Have you encountered cases in which they simply don't have it, 00:20:53.889 --> 00:20:57.268 or they have it, and it's just such a mess that they're too embarrassed to give it out? 00:20:57.268 --> 00:20:58.661 [Rufus Pollock] Absolutely. 00:20:58.661 --> 00:21:00.959 I mean, one story that kind of intrigues me, 00:21:00.959 --> 00:21:03.914 is we've been building this "Where Does My Money Go?" open spending project. 00:21:04.094 --> 00:21:06.948 And one of the things the government mandated was giving out, 00:21:06.948 --> 00:21:09.065 rather than just high-level financial information, 00:21:09.065 --> 00:21:11.445 giving out information at a detailed level, you know, 00:21:11.445 --> 00:21:12.838 so they now publish, for example, 00:21:12.838 --> 00:21:15.601 spending data from each government department monthly, 00:21:15.601 --> 00:21:17.713 every transaction within 5,000 pounds (check). 00:21:17.713 --> 00:21:22.119 Every purchase they make, every mobile phone provider they contract with, we get that data. 00:21:22.119 --> 00:21:25.590 And one of the intriguing things, of their mandating this, was it turned out, 00:21:25.590 --> 00:21:30.022 before, they had no way, before they did this, of actually seeing, on any regular basis, 00:21:30.022 --> 00:21:31.232 what their department spent money on. 00:21:31.232 --> 00:21:35.029 Because in fact, the only thing they reported up on to, in central government to Treasury, 00:21:35.029 --> 00:21:38.871 was kind of like, how much did you spend against Project X that you were allocated budget for? 00:21:38.871 --> 00:21:41.156 You know, departments, were actually really intrigued, they [say]: 00:21:41.156 --> 00:21:43.435 "Oh, that other department's going with Vodafone, 00:21:43.435 --> 00:21:46.349 and we're with Orange, and look how much they're paying per month!" 00:21:46.349 --> 00:21:49.310 So I think in essence, it is really driving changes in government, 00:21:49.310 --> 00:21:52.700 and yeah, there are people, I think you'd been worried about giving out data quality. 00:21:52.700 --> 00:21:54.841 I was just talking to the Department of Education last week and they said 00:21:54.841 --> 00:21:57.650 -- you know, one of the things -- they had financial information from schools, 00:21:57.650 --> 00:21:59.567 and which they were slowly being mandated to publish. 00:21:59.567 --> 00:22:01.220 And schools are suddenly all ringing up, saying: 00:22:01.220 --> 00:22:04.290 "Well we never really bothered to really update that information to be accurate! 00:22:04.290 --> 00:22:05.900 Uh, we really want to do it right now". 00:22:06.210 --> 00:22:08.164 So I think that definitely does happen, yep. 00:22:08.164 --> 00:22:12.460 [Questioner] Are you seeing now new roles in government, to help facilitate this? 00:22:13.099 --> 00:22:16.009 [Rufus Pollock] Yeah. I mean, to take another example, I, sorry. 00:22:16.009 --> 00:22:19.492 Both in government, so the UK government has a transparency kind of 'czar' if you like. 00:22:19.492 --> 00:22:22.812 Also I learnt, is Nike hired an Open Data evangelist. 00:22:22.812 --> 00:22:24.611 One of the things they, while they were implementing this programme, 00:22:24.611 --> 00:22:27.990 they actually hired explicitly, an Open Data evangelist. 00:22:27.990 --> 00:22:30.023 So yeah, I think we are, we're definitely seeing this in government. 00:22:30.023 --> 00:22:32.357 Both in the tech level, but also at the policy level. 00:22:32.357 --> 00:22:34.729 And I think it's not just government, 00:22:34.729 --> 00:22:37.957 it will also be companies doing this, and so on, who will be saying: 00:22:37.957 --> 00:22:39.192 "We need an Open Data expert. 00:22:39.192 --> 00:22:43.662 We need to be aware of what's going on here and be able to plan it as part of our strategy." 00:22:44.472 --> 00:22:45.489 [Questioner] A final question. 00:22:45.489 --> 00:22:48.972 You mentioned that, kind of outsourcing, almost, some of this data refining, 00:22:48.972 --> 00:22:51.306 outside government or the big institutions, has helped them. 00:22:51.306 --> 00:22:54.446 Can you tell us any stories of kind of gratitude being expressed by the government? I mean... 00:22:55.146 --> 00:22:57.247 [Rufus Pollock] Well, I mean, to kind of, yeah. 00:22:57.247 --> 00:22:58.773 I mean there was an interesting example actually 00:22:58.773 --> 00:23:01.792 where we had some complaint because the open spending data I told you about 00:23:01.792 --> 00:23:05.323 where we're aggregating the government spending and financial data 00:23:05.323 --> 00:23:10.651 -- you know, the site had a few performance issues, occasionally, as we loaded more data in. 00:23:10.651 --> 00:23:12.755 I remember kind of getting this call kind of going : 00:23:12.755 --> 00:23:16.266 "Well, you know, we're a little bit upset, you know, data.gov.uk," 00:23:16.266 --> 00:23:19.133 and it turned out the reason was, the Treasury kept looking at this data, 00:23:19.133 --> 00:23:21.002 and they were annoyed when the site was going down. 00:23:21.002 --> 00:23:22.563 So that was really intriguing to me 00:23:22.563 --> 00:23:26.139 that we were kind of one of the best, at least, up-to-date aggregators out there. 00:23:26.139 --> 00:23:29.484 Um, I think you are already seeing people doing stuff with the data 00:23:29.484 --> 00:23:30.979 and kind of doing stuff, sometimes for free, you know. 00:23:31.289 --> 00:23:33.131 You don't have to have the shiny front-end. 00:23:33.131 --> 00:23:35.019 I mean, one of the things we went about, on about, 00:23:35.019 --> 00:23:36.493 I know Tim Berners-Lee went on about -- 00:23:36.493 --> 00:23:41.393 raw data now, you know, you can build fewer shiny front-ends, and just release raw data. 00:23:41.393 --> 00:23:46.455 And you know, someone else will help you build the app, the front-end, the interface, 00:23:46.455 --> 00:23:47.762 and help you innovate about it. 00:23:47.762 --> 00:23:50.917 What is the best way to provide healthcare data to citizens, 00:23:50.917 --> 00:23:52.254 or education data to citizens, 00:23:52.254 --> 00:23:54.367 so they make better and more informed choices? 00:23:54.367 --> 00:23:56.608 I don't know, and the government probably doesn't know. 00:23:56.608 --> 00:23:59.155 But somewhere out there, someone is going to innovate 00:23:59.155 --> 00:24:02.672 and really provide the best way for us to deliver that kind of information to citizens. 00:24:02.672 --> 00:24:04.250 QUESTIONER: Thank you very much. 00:24:04.250 --> 00:24:05.296 [Rufus Pollock] Thank you. 00:24:05.296 --> 00:24:06.901 [Applause] 00:24:06.901 --> 00:24:09.423 lift _ Video Production ACTUA 00:24:09.423 --> 00:24:11.311 Copyright (c) 2012 Lift conference