1 00:00:01,014 --> 00:00:04,185 (lift) 2 00:00:04,185 --> 00:00:07,244 (lift 12 - Feb 24 2012 - Geneva) 3 00:00:07,244 --> 00:00:10,044 (Rufus Pollock - Stories) 4 00:00:10,044 --> 00:00:11,788 [Rufus Pollock] Just to say for those of you who don't know: 5 00:00:11,788 --> 00:00:13,666 the Open Knowledge Foundation is a not-profit -- not for profit 6 00:00:13,666 --> 00:00:15,611 founded in 2004 7 00:00:15,611 --> 00:00:17,865 and which builds tools and communities 8 00:00:17,865 --> 00:00:20,934 to create, use and share open information 9 00:00:20,934 --> 00:00:24,585 and that's information that anyone can use, reuse and redistribute. 10 00:00:24,585 --> 00:00:28,321 And as such, we've been working on open data for quite a long time 11 00:00:28,321 --> 00:00:30,011 since we started in 2004. 12 00:00:30,011 --> 00:00:34,817 And today, I want to start the story by going back in time 5000 years, 13 00:00:34,817 --> 00:00:37,610 to ancient Mesopotamia. 14 00:00:37,610 --> 00:00:41,393 There, between the Tigris and the Euphrates rivers, 15 00:00:42,069 --> 00:00:44,390 flourished the Sumerian civilization. 16 00:00:44,390 --> 00:00:47,298 And they were confronted by a problem. 17 00:00:47,298 --> 00:00:50,269 They were confronted by the limitations of human memory 18 00:00:50,899 --> 00:00:54,338 in the recording of taxes, food and other goods. 19 00:00:54,338 --> 00:00:59,642 And those ancient civil servants and businessmen hit on a novel solution: 20 00:01:00,380 --> 00:01:04,666 What they decided to do was they would start counting things with small clay chits, 21 00:01:04,666 --> 00:01:09,234 which they would bake inside of a clay -- a little clay box 22 00:01:09,234 --> 00:01:12,617 and then mark, on the outside of that box, what they were counting. 23 00:01:12,617 --> 00:01:15,303 You know, was it grain, was it tax payments, whatever. 24 00:01:16,150 --> 00:01:19,786 And so, born out of necessity for a state and a society, 25 00:01:20,632 --> 00:01:25,773 came one of the great information technology revolutions of all time: writing. 26 00:01:25,773 --> 00:01:28,172 The Sumerians invented writing via cuneiform. 27 00:01:28,910 --> 00:01:34,039 And if we fast-forward from that a few thousand years, we come to the UK census. 28 00:01:34,039 --> 00:01:37,577 Again, it's always interesting that states, governments are often at the forefront 29 00:01:37,577 --> 00:01:42,681 of at least driving information technology and information systems innovations. 30 00:01:42,681 --> 00:01:44,654 The UK census: again, the state, 31 00:01:44,654 --> 00:01:46,565 this is during the Napoleon Wars, 32 00:01:46,565 --> 00:01:48,601 desired to count the population more accurately: 33 00:01:48,601 --> 00:01:51,995 and we have the first UK census in 1801. 34 00:01:51,995 --> 00:01:56,189 And in the US, they also had censuses, in fact starting in 1790. 35 00:01:56,819 --> 00:01:59,383 And one of the problems encountered in the 1880 census 36 00:01:59,383 --> 00:02:01,592 was they tabulated the census by hand. 37 00:02:02,345 --> 00:02:05,699 And by the 1880 census, it was taking seven years 38 00:02:05,699 --> 00:02:06,822 to tabulate the census. 39 00:02:06,822 --> 00:02:10,241 So after it got taken in 1880, it wasn't until 1887 40 00:02:10,241 --> 00:02:12,892 they actually had any data they could use. 41 00:02:12,892 --> 00:02:16,004 And they calculated that for the next census in 1890, 42 00:02:16,004 --> 00:02:18,164 they wouldn't be finished by 1900. 43 00:02:18,164 --> 00:02:21,936 They still wouldn't have the results of the census by the time they started the next one. 44 00:02:21,936 --> 00:02:24,233 They had a crisis of information technology. 45 00:02:24,233 --> 00:02:26,979 And what they went and did is they commissioned Herman Hollerith 46 00:02:26,979 --> 00:02:29,747 to build the first automatic tabulator. 47 00:02:29,747 --> 00:02:32,835 And for those of you who know your company history, of course, 48 00:02:32,835 --> 00:02:34,513 Herman Hollerith's company went on 49 00:02:34,513 --> 00:02:35,899 to be one of the founders, if you like, 50 00:02:35,899 --> 00:02:38,808 one of the companies that came and created IBM. 51 00:02:38,808 --> 00:02:42,258 And IBM, by the sixties, were building this 52 00:02:42,258 --> 00:02:44,374 -- they replaced those hand -- 53 00:02:44,374 --> 00:02:45,905 those kind of wooden, mechanical tabulators 54 00:02:45,905 --> 00:02:48,524 with this stuff: digital tabulators, 55 00:02:48,524 --> 00:02:50,375 the modern computer of this age. 56 00:02:50,375 --> 00:02:52,610 And again, much of this -- I don't know if you guys know -- 57 00:02:52,610 --> 00:02:53,705 IBM would have gone bankrupt 58 00:02:53,705 --> 00:02:58,477 if it hadn't been for Franklin Roosevelt passing the Social Security Act in the States, 59 00:02:58,477 --> 00:03:01,132 which necessitated a huge amount of new tabulation. 60 00:03:01,132 --> 00:03:04,629 So, again, a lot of innovation in this space came out of government need 61 00:03:04,629 --> 00:03:06,370 and also, of course, the nuclear program, 62 00:03:06,370 --> 00:03:08,641 the other great needer of computational power. 63 00:03:09,317 --> 00:03:11,899 And today, today, 64 00:03:12,623 --> 00:03:15,485 we find ourselves again in the midst of a revolution. 65 00:03:16,438 --> 00:03:19,331 It's a revolution driven by two needs: 66 00:03:19,331 --> 00:03:22,027 ones that have been the same throughout history as I've just shown, 67 00:03:22,027 --> 00:03:23,886 information complexity, which is the necessity, 68 00:03:24,456 --> 00:03:27,575 and information technology, which is the opportunity. 69 00:03:28,544 --> 00:03:32,702 And what we're doing in this case is a policy innovation, if you like. 70 00:03:32,702 --> 00:03:36,468 We are innovating by opening up information. 71 00:03:37,052 --> 00:03:39,436 So just take the obvious example, government, 72 00:03:39,436 --> 00:03:41,097 as I said, often the innovator. 73 00:03:41,097 --> 00:03:43,308 In the last -- 3 years ago, you go back 3 years, 74 00:03:43,308 --> 00:03:45,829 there's almost no open government data initiatives 75 00:03:45,829 --> 00:03:46,688 in the world. 76 00:03:46,688 --> 00:03:48,442 Today there are dozens. 77 00:03:48,442 --> 00:03:51,162 The UK, the US, Finland, Kenya, The Netherlands, 78 00:03:51,162 --> 00:03:53,049 and there's new ones almost every week. 79 00:03:53,049 --> 00:03:57,407 There's been a launch of an official kind of movement as a part of the UN 80 00:03:57,407 --> 00:04:00,097 called the Open Government Partnership in which countries sign up, 81 00:04:00,097 --> 00:04:02,433 and among other things, they open up their data. 82 00:04:03,002 --> 00:04:05,325 And of course, it's been, in the UK and other countries, 83 00:04:05,325 --> 00:04:06,562 Tim Berners-Lee has been involved. 84 00:04:06,562 --> 00:04:09,106 I've helped advise the government around this in the UK. 85 00:04:09,106 --> 00:04:11,221 But it's not just government, it's also companies. 86 00:04:11,651 --> 00:04:13,982 Companies are opening up data. 87 00:04:13,982 --> 00:04:15,690 Very interestingly, last year, 88 00:04:15,690 --> 00:04:19,092 Nike started an open data initiative there 89 00:04:19,092 --> 00:04:21,372 to open up supply chain and sustainability data, 90 00:04:21,372 --> 00:04:23,931 for themselves and also for their suppliers, 91 00:04:23,931 --> 00:04:26,800 which I think is a very interesting change. 92 00:04:26,800 --> 00:04:28,004 And it's also communities. 93 00:04:28,004 --> 00:04:29,715 Often, in fact, back there in the beginning, 94 00:04:29,715 --> 00:04:31,927 this incredible map that you saw in an earlier slide, 95 00:04:31,927 --> 00:04:35,002 is a OpenStreetMap activity, around the world. 96 00:04:35,002 --> 00:04:38,073 People adding to this crowd-built map of the world. 97 00:04:38,073 --> 00:04:41,074 And in the last 6 years, OpenStreetMap, 98 00:04:41,074 --> 00:04:42,445 from a bottom-up community, 99 00:04:42,445 --> 00:04:44,435 have built a complete, comprehensive, 100 00:04:44,435 --> 00:04:47,918 map of the world, of fully open data. 101 00:04:48,872 --> 00:04:50,898 So I've just gone on about Open Data, 102 00:04:50,898 --> 00:04:52,766 and one thing I'm aware of, of this audience, 103 00:04:52,766 --> 00:04:54,035 is you might not all know what it is. 104 00:04:54,035 --> 00:04:59,152 So I'm going to take a brief moment, a brief moment, to say what it is. 105 00:04:59,152 --> 00:05:01,493 What does it mean when I say 'open'? 106 00:05:01,493 --> 00:05:05,557 And was it, you know, what's different from anything else? What's different from simply public data? 107 00:05:05,557 --> 00:05:07,083 So there's actually a definition, 108 00:05:07,083 --> 00:05:10,177 a definition we the Open Knowledge Foundation helped write, it's very simple. 109 00:05:10,177 --> 00:05:13,671 In a nutshell, a piece of information, a piece of data, 110 00:05:13,671 --> 00:05:18,384 is open if anyone is free to use, reuse, 111 00:05:18,384 --> 00:05:20,797 and redistribute it, subject only at most 112 00:05:20,797 --> 00:05:22,891 to a requirement to attribute and share alike. 113 00:05:23,214 --> 00:05:25,784 And anyone means anyone! 114 00:05:25,784 --> 00:05:28,055 It doesn't mean -- there can't be any commercial restrictions. 115 00:05:28,055 --> 00:05:32,262 You can't say: hey, here's this data, but only people using it for non-commercial purposes. 116 00:05:32,262 --> 00:05:34,849 Or only people working in education. 117 00:05:34,849 --> 00:05:38,051 Or only people living in the developing world, or the developed world. 118 00:05:38,051 --> 00:05:40,743 There can't be any restrictions like that. 119 00:05:41,343 --> 00:05:43,189 And there's a reason for this, by the way, 120 00:05:43,209 --> 00:05:48,615 and it isn't just because one's obsessed about if you like, trademarking an attractive term. 121 00:05:49,315 --> 00:05:51,081 It's because it's about interoperability. 122 00:05:51,291 --> 00:05:54,617 One of my experiences at this conference, which I remember from previous trips to Geneva, 123 00:05:54,627 --> 00:05:56,974 is I've been unable to plug in my laptop! 124 00:05:56,974 --> 00:06:02,048 Even though I have a French adaptor, in fact, these wonderful Swiss plugs here, are, you know, 125 00:06:02,048 --> 00:06:03,582 these wonderful, small octagonal shape. 126 00:06:03,582 --> 00:06:05,379 And even with my adaptor I can't plug in. 127 00:06:05,379 --> 00:06:07,347 Right? And it's called interoperability. 128 00:06:07,347 --> 00:06:10,929 When we travel around to different countries, our power adaptors don't actually fit in. 129 00:06:10,929 --> 00:06:12,581 We have to buy something. 130 00:06:12,581 --> 00:06:16,755 And the point about this definition, and the point about caring about Open Data, 131 00:06:16,755 --> 00:06:18,317 is, it's about interoperability. 132 00:06:18,317 --> 00:06:22,112 The dream of Open Data is interoperability. 133 00:06:22,112 --> 00:06:26,058 Of seamlessly being able to share and interweave information. 134 00:06:27,898 --> 00:06:31,704 And if every time I get information from two different people I have to consult a lawyer, 135 00:06:31,704 --> 00:06:35,300 I have to work out whether I'm allowed to do it, whether I'm allowed to put these things together, 136 00:06:35,300 --> 00:06:37,634 we lose that dream, that dream is shattered. 137 00:06:37,634 --> 00:06:42,166 And the key point is, this definition, and those conditions, ensure interoperability. 138 00:06:42,166 --> 00:06:45,744 If you comply with them, we know that any piece of info, of Open Data, 139 00:06:45,744 --> 00:06:47,880 will work with any other piece of Open Data. 140 00:06:48,681 --> 00:06:52,932 And also, it's worth saying for a quick moment, what kind of data, and to emphasize a point. 141 00:06:52,932 --> 00:06:55,985 Just to foreclose those kinds of questions, otherwise I always get asked. 142 00:06:55,985 --> 00:06:58,809 When we talk about opening up data, in general, 143 00:06:58,809 --> 00:07:01,026 we're not talking about personal data. 144 00:07:01,026 --> 00:07:04,161 We're not talking about opening up your private health records 145 00:07:04,161 --> 00:07:08,302 or opening up your personal tax information. 146 00:07:08,302 --> 00:07:11,267 We're talking about information that is non-personal in nature. 147 00:07:11,267 --> 00:07:15,667 And for the government for example: transport, geodata, statistics, electoral, legal. 148 00:07:15,667 --> 00:07:19,510 Stuff that the UK has, in fact, for example been opening up over the last few years. 149 00:07:19,510 --> 00:07:23,381 This financial information, on government spending, this information on health outcomes, 150 00:07:23,381 --> 00:07:28,625 on prescriptions, this information on educational outcomes, this information on the law. 151 00:07:28,625 --> 00:07:30,765 This information -- statistical information. 152 00:07:30,785 --> 00:07:32,691 That's the kind of thing that we're talking about. 153 00:07:34,186 --> 00:07:37,393 Now, I want to say, it's in this story, we have this story of over time. 154 00:07:37,393 --> 00:07:38,996 But why governments are doing it now? 155 00:07:39,596 --> 00:07:40,598 And why Open Data? 156 00:07:41,268 --> 00:07:43,930 So, okay, for thousands of years, governments innovate, 157 00:07:43,930 --> 00:07:47,274 but why do they innovate at this particular moment and in this way? 158 00:07:47,274 --> 00:07:51,976 So I want to start here with a quick story, a story of medicine gone wrong. 159 00:07:52,006 --> 00:07:54,484 It is from a great book by a guy called Stephen Klaidman. 160 00:07:54,484 --> 00:07:55,918 It's in fact one of the things 161 00:07:55,918 --> 00:07:57,781 that made me think quite deeply about this: 162 00:07:57,781 --> 00:07:59,852 why I was interested in Open Data. 163 00:08:01,172 --> 00:08:02,917 In that picture there, you can see 164 00:08:02,917 --> 00:08:05,726 what was the Redding Medical Centre in Northern California. 165 00:08:05,726 --> 00:08:10,471 There, in 2002, in the Summer of 2002, John Corapi, 166 00:08:11,231 --> 00:08:12,401 in typical American style, 167 00:08:12,401 --> 00:08:15,374 an ex-accountant from Vegas turned Catholic priest, 168 00:08:15,374 --> 00:08:17,243 [scattered laughter] 169 00:08:17,783 --> 00:08:22,274 ...arrived at the Redding Medical Centre having been referred by his doctor for having chest pains. 170 00:08:22,784 --> 00:08:28,419 He had a cardiogram by the local cardiologist and was told that he needed an immediate heart bypass, 171 00:08:28,419 --> 00:08:31,484 that he was at serious risk, and that he should come back later that day, 172 00:08:31,484 --> 00:08:34,514 or at the latest, tomorrow, to have open heart surgery. 173 00:08:35,764 --> 00:08:37,985 Rather shocked, and dazed by this news, 174 00:08:37,985 --> 00:08:41,225 he returned home to pack his bags in order to return to hospital. 175 00:08:41,225 --> 00:08:45,102 He called up his best friend, who was still an accountant in Vegas, 176 00:08:46,032 --> 00:08:52,568 whose partner was a hospital nurse, and who advised him that he should get a second opinion, 177 00:08:52,568 --> 00:08:55,904 that, according to his partner, it was not, you know, 178 00:08:55,904 --> 00:08:58,981 it was very unusual that you would need to have immediate open heart surgery, 179 00:08:58,981 --> 00:09:00,235 and that he should get a second opinion. 180 00:09:00,975 --> 00:09:04,507 Rather doubtful about this, because he was extremely worried, he did get on a plane. 181 00:09:04,507 --> 00:09:07,919 He went to Vegas, he got seen by another specialist... 182 00:09:07,919 --> 00:09:11,785 who, to his complete surprise, told him there was nothing wrong with his heart. 183 00:09:12,805 --> 00:09:15,289 He saw another specialist, just to make sure. 184 00:09:15,289 --> 00:09:18,563 They told him also, there was nothing wrong with his heart. 185 00:09:19,343 --> 00:09:25,067 Relieved, and rather, you know, happy, he returned home and just wanted to really forget about it. 186 00:09:25,067 --> 00:09:27,389 But his friend said: "No, what's going on here? Something's wrong". 187 00:09:27,389 --> 00:09:32,613 And they went in to see the CEO of the Tenet Healthcare, the people running this hospital 188 00:09:32,613 --> 00:09:35,654 (which, by the way, was a private hospital), and said: 189 00:09:35,654 --> 00:09:38,614 "Look, something's wrong, what's going on, what are you going to do about this?" 190 00:09:38,614 --> 00:09:40,256 And basically they were told: not very much. 191 00:09:40,256 --> 00:09:44,581 You know, mistakes get made, it's bad luck, don't worry about it, 192 00:09:44,581 --> 00:09:46,233 we'll look into it, but thank you very much. 193 00:09:46,763 --> 00:09:51,631 They weren't convinced by this, and eventually they decided to contact the FBI. 194 00:09:51,631 --> 00:09:53,826 The reason they contacted the FBI, by the way, 195 00:09:53,826 --> 00:09:56,401 is it's a private healthcare provider in the United States, 196 00:09:56,401 --> 00:10:00,476 they provide Medicare provision of healthcare to the Federal Government. 197 00:10:00,476 --> 00:10:04,202 So, if the Federal Government is getting defrauded, the FBI can get involved. 198 00:10:04,982 --> 00:10:06,850 The FBI started investigating. 199 00:10:08,281 --> 00:10:12,081 Eventually it turned out, that hundreds, probably thousands of people 200 00:10:12,081 --> 00:10:15,854 over a ten or longer year period, had been operated on unnecessarily. 201 00:10:16,704 --> 00:10:19,561 Most of them had had serious procedures performed on them, 202 00:10:19,561 --> 00:10:22,189 open heart surgery, some had died as a result. 203 00:10:22,189 --> 00:10:24,325 Obviously it's quite a serious operation. 204 00:10:24,325 --> 00:10:27,437 Some people had basically been condemned to a lifetime of pain. 205 00:10:27,437 --> 00:10:31,437 One of the most traumatic examples was a 36-year-old, he had been cut open, 206 00:10:31,437 --> 00:10:33,000 which is obviously what happens in open heart surgery, 207 00:10:33,000 --> 00:10:35,369 and his chest had never knitted back together correctly. 208 00:10:35,999 --> 00:10:38,125 Basically, he would be in pain for the rest of his life. 209 00:10:39,395 --> 00:10:43,000 So, hundreds, thousands of people had been harmed. 210 00:10:43,610 --> 00:10:45,968 One of the interesting things was that in this community 211 00:10:45,968 --> 00:10:48,159 there was already some suspicion, there were anecdotes. 212 00:10:48,159 --> 00:10:50,853 I mean, one of the ones I really liked from this book was the story that went: 213 00:10:50,853 --> 00:10:56,021 'Don't get a flat tyre outside of Redding Medical Centre because you'll end up with a heart bypass.' 214 00:10:56,021 --> 00:10:57,303 [scattered laughter] 215 00:10:57,303 --> 00:11:00,258 You know, but the thing was, there was no data. 216 00:11:00,728 --> 00:11:04,563 People were you know, a bit suspicious, but it was among doctors who knew, 217 00:11:04,563 --> 00:11:06,867 you know, in the community, and who wants to doubt it. 218 00:11:06,867 --> 00:11:12,171 And guess what? Also, Redding Medical Centre had one of the best mortality rates, 219 00:11:13,001 --> 00:11:15,350 for cardiac procedures in the United States, 220 00:11:15,350 --> 00:11:19,609 because if you operate on healthy people, you have a good mortality rate! 221 00:11:19,619 --> 00:11:21,129 [scattered laughter] 222 00:11:21,129 --> 00:11:23,390 So, the other thing, though, 223 00:11:23,390 --> 00:11:25,452 and this is the point that comes to Open Data for me 224 00:11:25,452 --> 00:11:28,722 the other red flag if you had been looking at the data, 225 00:11:28,741 --> 00:11:31,927 was these two things: one is incredibly low mortality rate, 226 00:11:31,927 --> 00:11:35,351 and (B) that it had almost the highest number of procedures 227 00:11:35,351 --> 00:11:37,464 for the population that it covered in the United States, 228 00:11:38,144 --> 00:11:39,634 which should be a red flag, right? 229 00:11:39,634 --> 00:11:42,618 Because, one, it's just a massive outlier on that basis, and also, 230 00:11:42,618 --> 00:11:45,815 the more people you should be operating on, the more you're doing marginal cases, 231 00:11:45,815 --> 00:11:49,450 the higher should be your mortality rate unless something very odd is going on. 232 00:11:50,030 --> 00:11:53,015 The thought was: what if people had been looking at this data? 233 00:11:53,015 --> 00:11:56,045 What if we'd - if this data had been open and public, 234 00:11:56,045 --> 00:11:59,517 and not maybe just for particular researchers to look at or the government? 235 00:11:59,927 --> 00:12:04,129 And it kind of reminded me of a phrase that's very famous in Open Source software, which is: 236 00:12:04,129 --> 00:12:05,965 "To many eyes, all bugs are shallow". 237 00:12:05,965 --> 00:12:10,504 What's great about Open Source software is lots of people can look at it, lots of people can fix it. 238 00:12:10,504 --> 00:12:14,730 And for me, what this was saying was: to many eyes, all anomalies are noticeable. 239 00:12:14,730 --> 00:12:16,679 It's somewhat of an exaggeration, 240 00:12:16,679 --> 00:12:18,908 but what happens if rather than ten or twenty people 241 00:12:18,908 --> 00:12:22,077 who worked in monitoring Medicare provision in the US government, 242 00:12:22,077 --> 00:12:23,877 we'd had thousands or millions of people? 243 00:12:23,877 --> 00:12:26,919 If the local journalists or citizens, who had suspicions, 244 00:12:26,919 --> 00:12:28,747 had been able to go and look at that data and say: 245 00:12:28,747 --> 00:12:32,485 "Whoa! What's going on here? This isn't just anecdotes, there's some data". 246 00:12:34,205 --> 00:12:40,225 And so, and it's not just then, about kind of spotting healthcare errors, or issues, or risks, 247 00:12:40,225 --> 00:12:42,415 it's also about things like apps and services 248 00:12:42,415 --> 00:12:43,857 that you can build with Open Data. 249 00:12:43,867 --> 00:12:46,667 This is a great app built by mySociety in the UK, 250 00:12:46,667 --> 00:12:47,640 called Mapumental. 251 00:12:47,640 --> 00:12:48,974 And the question is, I don't know if people know, 252 00:12:48,974 --> 00:12:50,646 London house prices are very expensive, 253 00:12:50,646 --> 00:12:52,510 I don't know whether they rival Geneva's, 254 00:12:52,510 --> 00:12:55,238 but they're, it's a pretty difficult thing. 255 00:12:55,238 --> 00:12:57,978 And one of the questions was, if I have to work somewhere, 256 00:12:57,978 --> 00:13:01,752 and I want to know where I can live, and afford, 257 00:13:01,752 --> 00:13:05,757 and I can commute to work in a certain time, and it's not too ugly, 258 00:13:05,757 --> 00:13:07,583 this is what this app does. 259 00:13:07,583 --> 00:13:11,200 You can choose the price, you can say where you're going to work, 260 00:13:11,200 --> 00:13:14,195 you can choose the commute time, and you can choose the scenicness. 261 00:13:14,195 --> 00:13:17,167 And it will show you, on this map, where you can live. 262 00:13:18,427 --> 00:13:20,796 Another example, more about transparency, 263 00:13:20,796 --> 00:13:22,746 is a project we did called "Where Does My Money Go?". 264 00:13:23,976 --> 00:13:25,406 It's an interactive version, 265 00:13:25,406 --> 00:13:26,211 you can kind of draw it out, 266 00:13:26,211 --> 00:13:29,114 so what it starts with, is one, is it tells you what your tax is, 267 00:13:29,114 --> 00:13:30,821 something that most people often don't know, 268 00:13:30,821 --> 00:13:33,668 and it will tell you how much you're paying each day 269 00:13:33,668 --> 00:13:36,254 to a particular area of society. 270 00:13:36,254 --> 00:13:37,328 And the dream for me, 271 00:13:37,328 --> 00:13:39,127 a dream that we're on the way to realising, 272 00:13:39,127 --> 00:13:42,817 is in this visualisation, you can drill down into areas. 273 00:13:42,817 --> 00:13:45,092 And my dream is to keep drilling down. 274 00:13:45,472 --> 00:13:47,633 So depending on what day we have, I want to go down, 275 00:13:47,633 --> 00:13:49,628 right down through those bubbles, step by step, 276 00:13:49,628 --> 00:13:52,403 until I see the money spent on street lights on my street, 277 00:13:52,403 --> 00:13:55,270 on filling in potholes, on collecting my rubbish. 278 00:13:56,190 --> 00:13:57,138 And for two reasons: 279 00:13:57,138 --> 00:13:59,704 One, obviously there's a question, particularly in some countries, 280 00:13:59,704 --> 00:14:01,016 of inefficiency or corruption, 281 00:14:01,436 --> 00:14:05,176 but also, just because most of us don't feel very happy about paying tax. 282 00:14:06,066 --> 00:14:08,157 It's not one of those things people welcome! 283 00:14:08,157 --> 00:14:09,817 But it's something that we should. 284 00:14:09,817 --> 00:14:11,960 Government does an awful lot for us, 285 00:14:11,960 --> 00:14:14,287 and having a better sense of where it's going 286 00:14:14,287 --> 00:14:17,120 could make us feel an awful lot better about paying that tax. 287 00:14:17,120 --> 00:14:18,657 In the way that when we go to a restaurant, 288 00:14:18,657 --> 00:14:21,397 we don't, when we get the bill, we don't necessarily feel bad. 289 00:14:21,397 --> 00:14:24,334 We feel "Wow, I had a great meal. That was worth it." 290 00:14:25,274 --> 00:14:26,366 But why Open? 291 00:14:26,366 --> 00:14:29,079 I've given you examples, and you know, we see a lot of apps and services. 292 00:14:29,079 --> 00:14:30,948 Why is Open relevant here? 293 00:14:31,598 --> 00:14:36,350 This goes back to what I said about the information technology, the revolution. 294 00:14:36,350 --> 00:14:37,813 So it's the challenge and the opportunity. 295 00:14:37,813 --> 00:14:42,387 It's the challenge that we see today, is exploding informational complexity. 296 00:14:42,797 --> 00:14:43,924 I mean, another great story: 297 00:14:43,924 --> 00:14:47,728 in the 1820s, all bank clearing in the largest financial centre in the world 298 00:14:47,728 --> 00:14:51,848 was done in a single room, where people -- one person from each bank gathered 299 00:14:51,848 --> 00:14:56,402 and they'd go round the room pulling out gold, and swapping it around, between different banks. 300 00:14:56,402 --> 00:14:58,434 And that's how they did bank clearing. 301 00:14:59,074 --> 00:15:01,991 Today we have billions of transactions a minute. 302 00:15:01,991 --> 00:15:07,945 And the way we as humans deal with complexity is by dividing and conquering it. 303 00:15:07,945 --> 00:15:10,683 We split it up into manageable chunks that we deal with. 304 00:15:11,013 --> 00:15:12,381 The other answer, 305 00:15:12,381 --> 00:15:14,883 and this answer's particularly relevant about Open Data, 306 00:15:14,883 --> 00:15:16,219 is information technology. 307 00:15:16,219 --> 00:15:18,951 Today, a smartphone has as much computing power 308 00:15:18,951 --> 00:15:22,260 as the system that ran the Apollo moon landings. 309 00:15:22,260 --> 00:15:24,027 And an even better example is storage: 310 00:15:24,027 --> 00:15:26,930 one terabyte of storage today is a hundred dollars. 311 00:15:26,930 --> 00:15:30,297 In 1994, this would have cost 400,000 dollars. 312 00:15:30,297 --> 00:15:33,977 I can have every financial transaction 313 00:15:33,977 --> 00:15:38,376 the UK government, or the US government made last year, or even for the last decade, 314 00:15:38,376 --> 00:15:39,665 on my laptop. 315 00:15:39,665 --> 00:15:43,543 That was not possible for an average citizen a decade ago. 316 00:15:44,283 --> 00:15:48,187 So it's mass participation, information access, processing, and production. 317 00:15:48,187 --> 00:15:49,557 It's decentralisation. 318 00:15:49,557 --> 00:15:51,728 And the claim here is that openness is key. 319 00:15:51,728 --> 00:15:53,957 It's because it's about scaling. 320 00:15:54,547 --> 00:15:57,399 What we are doing is weaving data together. 321 00:15:57,399 --> 00:15:59,615 As I said, we deal with complexity by splitting it up. 322 00:15:59,615 --> 00:16:02,928 We componentise, we split data up into blocks 323 00:16:02,928 --> 00:16:04,670 that we recombine. 324 00:16:04,670 --> 00:16:07,201 But if we are going to recombine information, 325 00:16:07,961 --> 00:16:10,076 we need to put Humpty Dumpty back together again, 326 00:16:10,076 --> 00:16:12,909 it won't work most of the time if it is closed. 327 00:16:13,449 --> 00:16:17,039 We need Open Data to scale and to componentise. 328 00:16:17,679 --> 00:16:21,518 And it's a point just to make here in this respect, that you might think: 329 00:16:21,518 --> 00:16:23,351 "Well you know, you're talking about Open Data, 330 00:16:23,351 --> 00:16:24,721 you know, this could be true of anything! 331 00:16:24,721 --> 00:16:25,789 Why don't we have like, 332 00:16:25,789 --> 00:16:28,232 Open Cars, and Open Shoes, and you know, 333 00:16:28,232 --> 00:16:29,578 why don't we just share everything, man! 334 00:16:29,578 --> 00:16:31,026 It would be so beautiful!". 335 00:16:31,706 --> 00:16:33,393 Right? And the sad thing is, 336 00:16:33,393 --> 00:16:39,070 is that that hasn't generally worked as a way of organising most production in our society. 337 00:16:39,070 --> 00:16:44,074 Instead, we have private property, and so we don't do that much openness relatively. 338 00:16:44,074 --> 00:16:45,848 But there's something different about digital information. 339 00:16:45,848 --> 00:16:48,944 We all know it, but it's worth emphasising, which is, it's very cheaply copied. 340 00:16:49,344 --> 00:16:52,782 I mean, give me a copy of your data isn't a problem if you're the government. 341 00:16:52,782 --> 00:16:56,393 Give me a copy of your car, or your house, or whatever, is. 342 00:16:56,803 --> 00:16:58,584 And it's also about innovation here. 343 00:16:58,584 --> 00:17:01,219 I mean, in a way it's almost the purest aspect of markets. 344 00:17:01,219 --> 00:17:05,619 Markets are about moving things to the person who could use them most best. 345 00:17:06,389 --> 00:17:07,200 And that's true of data. 346 00:17:07,200 --> 00:17:10,660 The best thing to do with your data will likely be thought of by someone else. 347 00:17:11,340 --> 00:17:14,973 And vice versa! You will think of the best thing to do with someone else's data. 348 00:17:15,783 --> 00:17:20,103 And Open Data allows us, in the most frictionless, easiest way, 349 00:17:20,103 --> 00:17:22,708 to move data to where it can be most optimally used, 350 00:17:22,728 --> 00:17:23,905 particularly if you're government. 351 00:17:24,275 --> 00:17:26,843 So in short, it's about better understanding, it's about better government, 352 00:17:26,843 --> 00:17:29,115 it's about better research, it's about better economy. 353 00:17:29,115 --> 00:17:31,124 And something also for companies and governments: 354 00:17:31,124 --> 00:17:32,707 I think it's about better engagement. 355 00:17:33,137 --> 00:17:34,986 It's about a closer relationship, sometimes, 356 00:17:34,986 --> 00:17:37,331 between your citizens and you as the government. 357 00:17:37,331 --> 00:17:40,960 Between you, even possibly, as a company, and your users. 358 00:17:41,690 --> 00:17:43,972 So I wanted to kind of finish here by saying where we're going. 359 00:17:43,972 --> 00:17:46,429 The story was, of this talk, was, you know, where are we? 360 00:17:46,869 --> 00:17:49,654 Why have we got here? And where are we going? 361 00:17:50,734 --> 00:17:52,248 So one answer is just more use. 362 00:17:52,248 --> 00:17:55,313 So right now, I just said at the beginning, Open Data is relatively young. 363 00:17:55,313 --> 00:17:57,844 This vast outpouring, for example, of government data, 364 00:17:57,844 --> 00:18:02,058 that anyone can freely use, reuse, and redistribute, is really new, 365 00:18:02,058 --> 00:18:03,382 even if it's done three years ago. 366 00:18:03,382 --> 00:18:06,479 For example, in the UK, much of the most useful data that could be released 367 00:18:06,479 --> 00:18:09,091 has only been released in the last six months or a year. 368 00:18:09,091 --> 00:18:10,478 You want prescription data? 369 00:18:10,478 --> 00:18:11,894 Are you a pharmaceutical company, 370 00:18:11,894 --> 00:18:15,261 and you want to know what kind of prescription habits are going on in the UK? 371 00:18:15,261 --> 00:18:18,837 I would emphasise: at an anonymised or somewhat aggregate level. 372 00:18:18,837 --> 00:18:20,707 Do you want to know about what crime is going on? 373 00:18:20,707 --> 00:18:24,241 Are you building a real estate website and you want data on environment, 374 00:18:24,241 --> 00:18:25,742 or you want data on unemployment, 375 00:18:25,742 --> 00:18:28,742 or other information about where properties are situated? 376 00:18:28,742 --> 00:18:30,077 You can now get that. 377 00:18:30,687 --> 00:18:32,783 So I think there's going to be a lot more use from business. 378 00:18:33,743 --> 00:18:35,398 There'll be a lot more use from everyone. 379 00:18:35,398 --> 00:18:38,686 But I think particularly business is going to wake up to the opportunities here. 380 00:18:38,686 --> 00:18:40,192 I think it's also going to lead to more data. 381 00:18:40,192 --> 00:18:41,789 One is, government is going to be more data. 382 00:18:41,789 --> 00:18:45,597 I think also businesses are going to realise, and communities, 383 00:18:45,597 --> 00:18:47,719 that they want to share back some of that data, 384 00:18:47,719 --> 00:18:48,764 some of the data they have. 385 00:18:48,764 --> 00:18:50,863 It's not going to be their kind of crown jewels, 386 00:18:50,863 --> 00:18:53,707 and it's not going -- often start out with data that's not core to their business. 387 00:18:53,707 --> 00:18:58,108 It's like. kind of Nike, they realised that by opening and sharing data, 388 00:18:58,108 --> 00:19:00,569 they can scale in a way they can't on their own. 389 00:19:01,029 --> 00:19:03,069 And does it mean that richer data, going back 390 00:19:03,069 --> 00:19:05,809 -- how could I leave out Hegel and Marx in a talk like this -- 391 00:19:05,809 --> 00:19:08,869 "Quantity changes quality" as Hegel told us. 392 00:19:09,319 --> 00:19:14,156 And more data, going back to that woven ball, more data actually means better data. 393 00:19:14,156 --> 00:19:17,634 It means richer data, it's a qualitative difference in what we can do. 394 00:19:17,634 --> 00:19:19,781 Geodata on it's own isn't that useful. 395 00:19:19,781 --> 00:19:21,593 Transport data on it's own isn't useful. 396 00:19:21,593 --> 00:19:24,023 Geodata plus transport data is useful! 397 00:19:24,703 --> 00:19:26,187 And we're going to be seeing data refining. 398 00:19:26,187 --> 00:19:27,441 Data is the new oil, right? 399 00:19:27,441 --> 00:19:28,928 So, we're going to refine it. 400 00:19:28,928 --> 00:19:32,037 And that's going to be a big business: higher quality data. 401 00:19:32,487 --> 00:19:34,149 I want to leave you with a couple of thoughts. 402 00:19:34,149 --> 00:19:35,646 So, one is, some people say: 403 00:19:35,646 --> 00:19:37,527 "Well, okay, but, you know, selling data is big business". 404 00:19:37,527 --> 00:19:40,987 And it is, but going forward in some of these things like software, 405 00:19:40,987 --> 00:19:42,485 data is going to be a platform. 406 00:19:42,485 --> 00:19:43,720 It's not a commodity. 407 00:19:43,720 --> 00:19:46,084 Businesses built purely on selling data, 408 00:19:46,084 --> 00:19:47,558 I just don't think are going to make it. 409 00:19:47,558 --> 00:19:51,668 You need to be building on your data, not attempting to purely sell it. 410 00:19:52,588 --> 00:19:54,638 And the other answer is to be modest. 411 00:19:55,278 --> 00:19:56,797 So I said: where are we going? 412 00:19:56,797 --> 00:19:57,917 I don't know if people know 413 00:19:57,917 --> 00:20:02,124 -- and this takes us back to an earlier age, an age of electricity and steam -- of Faraday. 414 00:20:02,124 --> 00:20:04,678 So he's demonstrating electricity at the Royal Society, 415 00:20:04,678 --> 00:20:08,316 and Gladstone, the future Prime Minister of England, sees him do this stuff, you know, 416 00:20:08,316 --> 00:20:10,445 the frog legs move, and Gladstone's like: 417 00:20:10,445 --> 00:20:12,392 "Well, I mean, this is party trick, Faraday. 418 00:20:12,392 --> 00:20:15,971 It's great, but, what's really, you know, what's electricity going to amount to?" (20:16) 419 00:20:15,971 --> 00:20:20,832 And Faraday says to him: "Well, what's the use of a baby?" 420 00:20:20,832 --> 00:20:24,299 You know, a baby when it's young is not very useful. 421 00:20:24,299 --> 00:20:25,718 [scattered laughter] 422 00:20:25,718 --> 00:20:27,426 But it grows up into something! 423 00:20:27,916 --> 00:20:29,725 And that is where we are going today. 424 00:20:29,725 --> 00:20:32,488 We are the beginning of the Open Data journey. 425 00:20:33,088 --> 00:20:36,157 And partly is, we don't know what it's going to grow up into. 426 00:20:36,157 --> 00:20:37,004 Thank you very much! 427 00:20:37,004 --> 00:20:40,558 [Applause] 428 00:20:40,558 --> 00:20:44,708 [Questioner] Um, citizens and I guess patients at hospitals, 429 00:20:44,708 --> 00:20:48,683 assume that the institutions have all this data and it's very well organised, 430 00:20:48,683 --> 00:20:49,925 and it's a question of will. 431 00:20:50,555 --> 00:20:53,889 Have you encountered cases in which they simply don't have it, 432 00:20:53,889 --> 00:20:57,268 or they have it, and it's just such a mess that they're too embarrassed to give it out? 433 00:20:57,268 --> 00:20:58,661 [Rufus Pollock] Absolutely. 434 00:20:58,661 --> 00:21:00,959 I mean, one story that kind of intrigues me, 435 00:21:00,959 --> 00:21:03,914 is we've been building this "Where Does My Money Go?" open spending project. 436 00:21:04,094 --> 00:21:06,948 And one of the things the government mandated was giving out, 437 00:21:06,948 --> 00:21:09,065 rather than just high-level financial information, 438 00:21:09,065 --> 00:21:11,445 giving out information at a detailed level, you know, 439 00:21:11,445 --> 00:21:12,838 so they now publish, for example, 440 00:21:12,838 --> 00:21:15,601 spending data from each government department monthly, 441 00:21:15,601 --> 00:21:17,713 every transaction within 5,000 pounds (check). 442 00:21:17,713 --> 00:21:22,119 Every purchase they make, every mobile phone provider they contract with, we get that data. 443 00:21:22,119 --> 00:21:25,590 And one of the intriguing things, of their mandating this, was it turned out, 444 00:21:25,590 --> 00:21:30,022 before, they had no way, before they did this, of actually seeing, on any regular basis, 445 00:21:30,022 --> 00:21:31,232 what their department spent money on. 446 00:21:31,232 --> 00:21:35,029 Because in fact, the only thing they reported up on to, in central government to Treasury, 447 00:21:35,029 --> 00:21:38,871 was kind of like, how much did you spend against Project X that you were allocated budget for? 448 00:21:38,871 --> 00:21:41,156 You know, departments, were actually really intrigued, they [say]: 449 00:21:41,156 --> 00:21:43,435 "Oh, that other department's going with Vodafone, 450 00:21:43,435 --> 00:21:46,349 and we're with Orange, and look how much they're paying per month!" 451 00:21:46,349 --> 00:21:49,310 So I think in essence, it is really driving changes in government, 452 00:21:49,310 --> 00:21:52,700 and yeah, there are people, I think you'd been worried about giving out data quality. 453 00:21:52,700 --> 00:21:54,841 I was just talking to the Department of Education last week and they said 454 00:21:54,841 --> 00:21:57,650 -- you know, one of the things -- they had financial information from schools, 455 00:21:57,650 --> 00:21:59,567 and which they were slowly being mandated to publish. 456 00:21:59,567 --> 00:22:01,220 And schools are suddenly all ringing up, saying: 457 00:22:01,220 --> 00:22:04,290 "Well we never really bothered to really update that information to be accurate! 458 00:22:04,290 --> 00:22:05,900 Uh, we really want to do it right now". 459 00:22:06,210 --> 00:22:08,164 So I think that definitely does happen, yep. 460 00:22:08,164 --> 00:22:12,460 [Questioner] Are you seeing now new roles in government, to help facilitate this? 461 00:22:13,099 --> 00:22:16,009 [Rufus Pollock] Yeah. I mean, to take another example, I, sorry. 462 00:22:16,009 --> 00:22:19,492 Both in government, so the UK government has a transparency kind of 'czar' if you like. 463 00:22:19,492 --> 00:22:22,812 Also I learnt, is Nike hired an Open Data evangelist. 464 00:22:22,812 --> 00:22:24,611 One of the things they, while they were implementing this programme, 465 00:22:24,611 --> 00:22:27,990 they actually hired explicitly, an Open Data evangelist. 466 00:22:27,990 --> 00:22:30,023 So yeah, I think we are, we're definitely seeing this in government. 467 00:22:30,023 --> 00:22:32,357 Both in the tech level, but also at the policy level. 468 00:22:32,357 --> 00:22:34,729 And I think it's not just government, 469 00:22:34,729 --> 00:22:37,957 it will also be companies doing this, and so on, who will be saying: 470 00:22:37,957 --> 00:22:39,192 "We need an Open Data expert. 471 00:22:39,192 --> 00:22:43,662 We need to be aware of what's going on here and be able to plan it as part of our strategy." 472 00:22:44,472 --> 00:22:45,489 [Questioner] A final question. 473 00:22:45,489 --> 00:22:48,972 You mentioned that, kind of outsourcing, almost, some of this data refining, 474 00:22:48,972 --> 00:22:51,306 outside government or the big institutions, has helped them. 475 00:22:51,306 --> 00:22:54,446 Can you tell us any stories of kind of gratitude being expressed by the government? I mean... 476 00:22:55,146 --> 00:22:57,247 [Rufus Pollock] Well, I mean, to kind of, yeah. 477 00:22:57,247 --> 00:22:58,773 I mean there was an interesting example actually 478 00:22:58,773 --> 00:23:01,792 where we had some complaint because the open spending data I told you about 479 00:23:01,792 --> 00:23:05,323 where we're aggregating the government spending and financial data 480 00:23:05,323 --> 00:23:10,651 -- you know, the site had a few performance issues, occasionally, as we loaded more data in. 481 00:23:10,651 --> 00:23:12,755 I remember kind of getting this call kind of going : 482 00:23:12,755 --> 00:23:16,266 "Well, you know, we're a little bit upset, you know, data.gov.uk," 483 00:23:16,266 --> 00:23:19,133 and it turned out the reason was, the Treasury kept looking at this data, 484 00:23:19,133 --> 00:23:21,002 and they were annoyed when the site was going down. 485 00:23:21,002 --> 00:23:22,563 So that was really intriguing to me 486 00:23:22,563 --> 00:23:26,139 that we were kind of one of the best, at least, up-to-date aggregators out there. 487 00:23:26,139 --> 00:23:29,484 Um, I think you are already seeing people doing stuff with the data 488 00:23:29,484 --> 00:23:30,979 and kind of doing stuff, sometimes for free, you know. 489 00:23:31,289 --> 00:23:33,131 You don't have to have the shiny front-end. 490 00:23:33,131 --> 00:23:35,019 I mean, one of the things we went about, on about, 491 00:23:35,019 --> 00:23:36,493 I know Tim Berners-Lee went on about -- 492 00:23:36,493 --> 00:23:41,393 raw data now, you know, you can build fewer shiny front-ends, and just release raw data. 493 00:23:41,393 --> 00:23:46,455 And you know, someone else will help you build the app, the front-end, the interface, 494 00:23:46,455 --> 00:23:47,762 and help you innovate about it. 495 00:23:47,762 --> 00:23:50,917 What is the best way to provide healthcare data to citizens, 496 00:23:50,917 --> 00:23:52,254 or education data to citizens, 497 00:23:52,254 --> 00:23:54,367 so they make better and more informed choices? 498 00:23:54,367 --> 00:23:56,608 I don't know, and the government probably doesn't know. 499 00:23:56,608 --> 00:23:59,155 But somewhere out there, someone is going to innovate 500 00:23:59,155 --> 00:24:02,672 and really provide the best way for us to deliver that kind of information to citizens. 501 00:24:02,672 --> 00:24:04,250 QUESTIONER: Thank you very much. 502 00:24:04,250 --> 00:24:05,296 [Rufus Pollock] Thank you. 503 00:24:05,296 --> 00:24:06,901 [Applause] 504 00:24:06,901 --> 00:24:09,423 lift _ Video Production ACTUA 505 00:24:09,423 --> 00:24:11,311 Copyright (c) 2012 Lift conference