1 00:00:01,014 --> 00:00:04,185 (lift) 2 00:00:04,185 --> 00:00:07,244 (lift 12 - Feb 24 2012 - Geneva) 3 00:00:07,244 --> 00:00:10,044 (Rufus Pollock - Stories) 4 00:00:10,044 --> 00:00:11,788 [Rufus Pollock] Just to say for those of you who don't know: 5 00:00:11,788 --> 00:00:13,666 the Open Knowledge Foundation is a not-profit -- not for profit 6 00:00:13,666 --> 00:00:15,611 founded in 2004 7 00:00:15,611 --> 00:00:17,865 and which builds tools and communities 8 00:00:17,865 --> 00:00:20,934 to create, use and share open information 9 00:00:20,934 --> 00:00:24,585 and that's information that anyone can use, reuse and redistribute. 10 00:00:24,585 --> 00:00:28,321 And as such, we've been working on open data for quite a long time 11 00:00:28,321 --> 00:00:30,011 since we started in 2004. 12 00:00:30,011 --> 00:00:34,817 And today, I want to start the story by going back in time 5000 years, 13 00:00:34,817 --> 00:00:37,610 to ancient Mesopotamia. 14 00:00:37,610 --> 00:00:41,393 There, between the Tigris and the Euphrates rivers, 15 00:00:42,069 --> 00:00:44,390 flourished the Sumerian civilization. 16 00:00:44,390 --> 00:00:47,298 And they were confronted by a problem. 17 00:00:47,298 --> 00:00:50,269 They were confronted by the limitations of human memory 18 00:00:50,899 --> 00:00:54,338 in the recording of taxes, food and other goods. 19 00:00:54,338 --> 00:00:59,642 And those ancient civil servants and businessmen hit on a novel solution: 20 00:01:00,380 --> 00:01:04,666 What they decided to do was they would start counting things with small clay chits, 21 00:01:04,666 --> 00:01:09,234 which they would bake inside of a clay -- a little clay box 22 00:01:09,234 --> 00:01:12,617 and then mark, on the outside of that box, what they were counting. 23 00:01:12,617 --> 00:01:15,303 You know, was it grain, was it tax payments, whatever. 24 00:01:16,150 --> 00:01:19,786 And so, born out of necessity for a state and a society, 25 00:01:20,632 --> 00:01:25,773 came one of the great information technology revolutions of all time: writing. 26 00:01:25,773 --> 00:01:28,172 The Sumerians invented writing via cuneiform. 27 00:01:28,910 --> 00:01:34,039 And if we fast-forward from that a few thousand years, we come to the UK census. 28 00:01:34,039 --> 00:01:37,577 Again, it's always interesting that states, governments are often at the forefront 29 00:01:37,577 --> 00:01:42,681 of at least driving information technology and information systems innovations. 30 00:01:42,681 --> 00:01:44,654 The UK census: again, the state, 31 00:01:44,654 --> 00:01:46,565 this is during the Napoleon Wars, 32 00:01:46,565 --> 00:01:48,601 desired to count the population more accurately: 33 00:01:48,601 --> 00:01:51,995 and we have the first UK census in 1801. 34 00:01:51,995 --> 00:01:56,189 And in the US, they also had censuses, in fact starting in 1790. 35 00:01:56,819 --> 00:01:59,383 And one of the problems encountered in the 1880 census 36 00:01:59,383 --> 00:02:01,592 was they tabulated the census by hand. 37 00:02:02,345 --> 00:02:05,699 And by the 1880 census, it was taking seven years 38 00:02:05,699 --> 00:02:06,822 to tabulate the census. 39 00:02:06,822 --> 00:02:10,241 So after it got taken in 1880, it wasn't until 1887 40 00:02:10,241 --> 00:02:12,892 they actually had any data they could use. 41 00:02:12,892 --> 00:02:16,004 And they calculated that for the next census in 1890, 42 00:02:16,004 --> 00:02:18,164 they wouldn't be finished by 1900. 43 00:02:18,164 --> 00:02:21,936 They still wouldn't have the results of the census by the time they started the next one. 44 00:02:21,936 --> 00:02:24,233 They had a crisis of information technology. 45 00:02:24,233 --> 00:02:26,979 And what they went and did is they commissioned Herman Hollerith 46 00:02:26,979 --> 00:02:29,747 to build the first automatic tabulator. 47 00:02:29,747 --> 00:02:32,835 And for those of you who know your company history, of course, 48 00:02:32,835 --> 00:02:34,513 Herman Hollerith's company went on 49 00:02:34,513 --> 00:02:35,899 to be one of the founders, if you like, 50 00:02:35,899 --> 00:02:38,808 one of the companies that came and created IBM. 51 00:02:38,808 --> 00:02:42,258 And IBM, by the sixties, were building this 52 00:02:42,258 --> 00:02:44,374 -- they replaced those hand -- 53 00:02:44,374 --> 00:02:45,905 those kind of wooden, mechanical tabulators 54 00:02:45,905 --> 00:02:48,524 with this stuff: digital tabulators, 55 00:02:48,524 --> 00:02:50,375 the modern computer of this age. 56 00:02:50,375 --> 00:02:52,610 And again, much of this -- I don't know if you guys know -- 57 00:02:52,610 --> 00:02:53,705 IBM would have gone bankrupt 58 00:02:53,705 --> 00:02:58,477 if it hadn't been for Franklin Roosevelt passing the Social Security Act in the States, 59 00:02:58,477 --> 00:03:01,132 which necessitated a huge amount of new tabulation. 60 00:03:01,132 --> 00:03:04,629 So, again, a lot of innovation in this space came out of government need 61 00:03:04,629 --> 00:03:06,370 and also, of course, the nuclear program, 62 00:03:06,370 --> 00:03:08,641 the other great needer of computational power. 63 00:03:09,317 --> 00:03:11,899 And today, today, 64 00:03:12,623 --> 00:03:15,485 we find ourselves again in the midst of a revolution. 65 00:03:16,438 --> 00:03:19,331 It's a revolution driven by two needs: 66 00:03:19,331 --> 00:03:22,027 ones that have been the same throughout history as I've just shown, 67 00:03:22,027 --> 00:03:23,886 information complexity, which is the necessity, 68 00:03:24,456 --> 00:03:27,575 and information technology, which is the opportunity. 69 00:03:28,544 --> 00:03:32,702 And what we're doing in this case is a policy innovation, if you like. 70 00:03:32,702 --> 00:03:36,468 We are innovating by opening up information. 71 00:03:37,052 --> 00:03:39,436 So just take the obvious example, government, 72 00:03:39,436 --> 00:03:41,097 as I said, often the innovator. 73 00:03:41,097 --> 00:03:43,308 In the last -- 3 years ago, you go back 3 years, 74 00:03:43,308 --> 00:03:45,829 there's almost no open government data initiatives 75 00:03:45,829 --> 00:03:46,688 in the world. 76 00:03:46,688 --> 00:03:48,442 Today there are dozens. 77 00:03:48,442 --> 00:03:51,162 The UK, the US, Finland, Kenya, The Netherlands, 78 00:03:51,162 --> 00:03:53,049 and there's new ones almost every week. 79 00:03:53,049 --> 00:03:57,407 There's been a launch of an official kind of movement as a part of the UN 80 00:03:57,407 --> 00:04:00,097 called the Open Government Partnership in which countries sign up, 81 00:04:00,097 --> 00:04:02,433 and among other things, they open up their data. 82 00:04:03,002 --> 00:04:05,325 And of course, it's been, in the UK and other countries, 83 00:04:05,325 --> 00:04:06,562 Tim Berners-Lee has been involved. 84 00:04:06,562 --> 00:04:09,106 I've helped advise the government around this in the UK. 85 00:04:09,106 --> 00:04:11,221 But it's not just government, it's also companies. 86 00:04:11,651 --> 00:04:13,982 Companies are opening up data. 87 00:04:13,982 --> 00:04:15,690 Very interestingly, last year, 88 00:04:15,690 --> 00:04:19,092 Nike started an open data initiative there 89 00:04:19,092 --> 00:04:21,372 to open up supply chain and sustainability data, 90 00:04:21,372 --> 00:04:23,931 for themselves and also for their suppliers, 91 00:04:23,931 --> 00:04:26,800 which I think is a very interesting change. 92 00:04:26,800 --> 00:04:28,004 And it's also communities. 93 00:04:28,004 --> 00:04:29,715 Often, in fact, back there in the beginning, 94 00:04:29,715 --> 00:04:31,927 this incredible map that you saw in an earlier slide, 95 00:04:31,927 --> 00:04:35,002 is a OpenStreetMap activity, around the world. 96 00:04:35,002 --> 00:04:38,073 People adding to this crowd-built map of the world. 97 00:04:38,073 --> 00:04:41,074 And in the last 6 years, OpenStreetMap, 98 00:04:41,074 --> 00:04:42,445 from a bottom-up community, 99 00:04:42,445 --> 00:04:44,435 have built a complete, comprehensive, 100 00:04:44,435 --> 00:04:47,918 map of the world, of fully open data. 101 00:04:48,872 --> 00:04:50,898 So I've just gone on about Open Data, 102 00:04:50,898 --> 00:04:52,766 and one thing I'm aware of, of this audience, 103 00:04:52,766 --> 00:04:54,035 is you might not all know what it is. 104 00:04:54,035 --> 00:04:59,152 So I'm going to take a brief moment, a brief moment, to say what it is. 105 00:04:59,152 --> 00:05:01,493 What does it mean when I say 'open'? 106 00:05:01,493 --> 00:05:05,557 And was it, you know, what's different from anything else? What's different from simply public data? 107 00:05:05,557 --> 00:05:07,083 So there's actually a definition, 108 00:05:07,083 --> 00:05:10,177 a definition we the Open Knowledge Foundation helped write, it's very simple. 109 00:05:10,177 --> 00:05:13,671 In a nutshell, a piece of information, a piece of data, 110 00:05:13,671 --> 00:05:18,384 is open if anyone is free to use, reuse, 111 00:05:18,384 --> 00:05:20,797 and redistribute it, subject only at most 112 00:05:20,797 --> 00:05:22,891 to a requirement to attribute and share alike. 113 00:05:23,214 --> 00:05:25,784 And anyone means anyone! 114 00:05:25,784 --> 00:05:28,055 It doesn't mean -- there can't be any commercial restrictions. 115 00:05:28,055 --> 00:05:32,262 You can't say: hey, here's this data, but only people using it for non-commercial purposes. 116 00:05:32,262 --> 00:05:34,849 Or only people working in education. 117 00:05:34,849 --> 00:05:38,051 Or only people living in the developing world, or the developed world. 118 00:05:38,051 --> 00:05:40,743 There can't be any restrictions like that. 119 00:05:41,343 --> 00:05:43,189 And there's a reason for this, by the way, 120 00:05:43,209 --> 00:05:48,615 and it isn't just because one's obsessed about if you like, trademarking an attractive term. 121 00:05:49,315 --> 00:05:51,081 It's because it's about interoperability. 122 00:05:51,291 --> 00:05:54,617 One of my experiences at this conference, which I remember from previous trips to Geneva, 123 00:05:54,627 --> 00:05:56,974 is I've been unable to plug in my laptop! 124 00:05:56,974 --> 00:06:02,048 Even though I have a French adaptor, in fact, these wonderful Swiss plugs here, are, you know, 125 00:06:02,048 --> 00:06:03,582 these wonderful, small octagonal shape. 126 00:06:03,582 --> 00:06:05,379 And even with my adaptor I can't plug in. 127 00:06:05,379 --> 00:06:07,347 Right? And it's called interoperability. 128 00:06:07,347 --> 00:06:10,929 When we travel around to different countries, our power adaptors don't actually fit in. 129 00:06:10,929 --> 00:06:12,581 We have to buy something. 130 00:06:12,581 --> 00:06:16,755 And the point about this definition, and the point about caring about Open Data, 131 00:06:16,755 --> 00:06:18,317 is, it's about interoperability. 132 00:06:18,317 --> 00:06:22,112 The dream of Open Data is interoperability. 133 00:06:22,112 --> 00:06:26,058 Of seamlessly being able to share and interweave information. 134 00:06:27,898 --> 00:06:31,704 And if every time I get information from two different people I have to consult a lawyer, 135 00:06:31,704 --> 00:06:35,300 I have to work out whether I'm allowed to do it, whether I'm allowed to put these things together, 136 00:06:35,300 --> 00:06:37,634 we lose that dream, that dream is shattered. 137 00:06:37,634 --> 00:06:42,166 And the key point is, this definition, and those conditions, ensure interoperability. 138 00:06:42,166 --> 00:06:45,744 If you comply with them, we know that any piece of info, of Open Data, 139 00:06:45,744 --> 00:06:47,880 will work with any other piece of Open Data. 140 00:06:48,681 --> 00:06:52,932 And also, it's worth saying for a quick moment, what kind of data, and to emphasize a point. 141 00:06:52,932 --> 00:06:55,985 Just to foreclose those kinds of questions, otherwise I always get asked. 142 00:06:55,985 --> 00:06:58,809 When we talk about opening up data, in general, 143 00:06:58,809 --> 00:07:01,026 we're not talking about personal data. 144 00:07:01,026 --> 00:07:04,161 We're not talking about opening up your private health records 145 00:07:04,161 --> 00:07:08,302 or opening up your personal tax information. 146 00:07:08,302 --> 00:07:11,267 We're talking about information that is non-personal in nature. 147 00:07:11,267 --> 00:07:15,667 And for the government for example: transport, geodata, statistics, electoral, legal. 148 00:07:15,667 --> 00:07:19,510 Stuff that the UK has, in fact, for example been opening up over the last few years. 149 00:07:19,510 --> 00:07:23,381 This financial information, on government spending, this information on health outcomes, 150 00:07:23,381 --> 00:07:28,625 on prescriptions, this information on educational outcomes, this information on the law. 151 00:07:28,625 --> 00:07:30,765 This information -- statistical information. 152 00:07:30,785 --> 00:07:32,691 That's the kind of thing that we're talking about. 153 00:07:34,186 --> 00:07:37,393 Now, I want to say, it's in this story, we have this story of over time. 154 00:07:37,393 --> 00:07:38,996 But why governments are doing it now? 155 00:07:39,596 --> 00:07:40,598 And why Open Data? 156 00:07:41,268 --> 00:07:43,930 So, okay, for thousands of years, governments innovate, 157 00:07:43,930 --> 00:07:47,274 but why do they innovate at this particular moment and in this way? 158 00:07:47,274 --> 00:07:51,976 So I want to start here with a quick story, a story of medicine gone wrong. 159 00:07:52,006 --> 00:07:54,484 It is from a great book by a guy called Stephen Klaidman. 160 00:07:54,484 --> 00:07:55,918 It's in fact one of the things 161 00:07:55,918 --> 00:07:57,781 that made me think quite deeply about this: 162 00:07:57,781 --> 00:07:59,852 why I was interested in Open Data. 163 00:08:01,172 --> 00:08:02,917 In that picture there, you can see 164 00:08:02,917 --> 00:08:05,726 what was the Redding Medical Centre in Northern California. 165 00:08:05,726 --> 00:08:10,471 There, in 2002, in the Summer of 2002, John Corapi, 166 00:08:11,231 --> 00:08:12,401 in typical American style, 167 00:08:12,401 --> 00:08:15,374 an ex-accountant from Vegas turned Catholic priest, 168 00:08:15,374 --> 00:08:17,243 [scattered laughter] 169 00:08:17,783 --> 00:08:22,274 ...arrived at the Redding Medical Centre having been referred by his doctor for having chest pains. 170 00:08:22,784 --> 00:08:28,419 He had a cardiogram by the local cardiologist and was told that he needed an immediate heart bypass, 171 00:08:28,419 --> 00:08:31,484 that he was at serious risk, and that he should come back later that day, 172 00:08:31,484 --> 00:08:34,514 or at the latest, tomorrow, to have open heart surgery. 173 00:08:35,764 --> 00:08:37,985 Rather shocked, and dazed by this news, 174 00:08:37,985 --> 00:08:41,225 he returned home to pack his bags in order to return to hospital. 175 00:08:41,225 --> 00:08:45,102 He called up his best friend, who was still an accountant in Vegas, 176 00:08:46,032 --> 00:08:52,568 whose partner was a hospital nurse, and who advised him that he should get a second opinion, 177 00:08:52,568 --> 00:08:55,904 that, according to his partner, it was not, you know, 178 00:08:55,904 --> 00:08:58,981 it was very unusual that you would need to have immediate open heart surgery, 179 00:08:58,981 --> 00:09:00,235 and that he should get a second opinion. 180 00:09:00,975 --> 00:09:04,507 Rather doubtful about this, because he was extremely worried, he did get on a plane. 181 00:09:04,507 --> 00:09:07,919 He went to Vegas, he got seen by another specialist... 182 00:09:07,919 --> 00:09:11,785 who, to his complete surprise, told him there was nothing wrong with his heart. 183 00:09:12,805 --> 00:09:15,289 He saw another specialist, just to make sure. 184 00:09:15,289 --> 00:09:18,563 They told him also, there was nothing wrong with his heart. 185 00:09:19,343 --> 00:09:25,067 Relieved, and rather, you know, happy, he returned home and just wanted to really forget about it. 186 00:09:25,067 --> 00:09:27,389 But his friend said: "No, what's going on here? Something's wrong". 187 00:09:27,389 --> 00:09:32,613 And they went in to see the CEO of the Tenet Healthcare, the people running this hospital 188 00:09:32,613 --> 00:09:35,654 (which, by the way, was a private hospital), and said: 189 00:09:35,654 --> 00:09:38,614 "Look, something's wrong, what's going on, what are you going to do about this?" 190 00:09:38,614 --> 00:09:40,256 And basically they were told: not very much. 191 00:09:40,256 --> 00:09:44,581 You know, mistakes get made, it's bad luck, don't worry about it, 192 00:09:44,581 --> 00:09:46,233 we'll look into it, but thank you very much. 193 00:09:46,763 --> 00:09:51,631 They weren't convinced by this, and eventually they decided to contact the FBI. 194 00:09:51,631 --> 00:09:53,826 The reason they contacted the FBI, by the way, 195 00:09:53,826 --> 00:09:56,401 is it's a private healthcare provider in the United States, 196 00:09:56,401 --> 00:10:00,476 they provide Medicare provision of healthcare to the Federal Government. 197 00:10:00,476 --> 00:10:04,202 So, if the Federal Government is getting defrauded, the FBI can get involved. 198 00:10:04,982 --> 00:10:06,850 The FBI started investigating. 199 00:10:08,281 --> 00:10:12,081 Eventually it turned out, that hundreds, probably thousands of people 200 00:10:12,081 --> 00:10:15,854 over a ten or longer year period, had been operated on unnecessarily. 201 00:10:16,704 --> 00:10:19,561 Most of them had had serious procedures performed on them, 202 00:10:19,561 --> 00:10:22,189 open heart surgery, some had died as a result. 203 00:10:22,189 --> 00:10:24,325 Obviously it's quite a serious operation. 204 00:10:24,325 --> 00:10:27,437 Some people had basically been condemned to a lifetime of pain. 205 00:10:27,437 --> 00:10:31,437 One of the most traumatic examples was a 36-year-old, he had been cut open, 206 00:10:31,437 --> 00:10:33,000 which is obviously what happens in open heart surgery, 207 00:10:33,000 --> 00:10:35,369 and his chest had never knitted back together correctly. 208 00:10:35,999 --> 00:10:38,125 Basically, he would be in pain for the rest of his life. 209 00:10:39,395 --> 00:10:43,000 So, hundreds, thousands of people had been harmed. 210 00:10:43,610 --> 00:10:45,968 One of the interesting things was that in this community 211 00:10:45,968 --> 00:10:48,159 there was already some suspicion, there were anecdotes. 212 00:10:48,159 --> 00:10:50,853 I mean, one of the ones I really liked from this book was the story that went: 213 00:10:50,853 --> 00:10:56,021 'Don't get a flat tyre outside of Redding Medical Centre because you'll end up with a heart bypass.' 214 00:10:56,021 --> 00:10:57,303 [scattered laughter] 215 00:10:57,303 --> 00:11:00,258 You know, but the thing was, there was no data. 216 00:11:00,728 --> 00:11:04,563 People were you know, a bit suspicious, but it was among doctors who knew, 217 00:11:04,563 --> 00:11:06,867 you know, in the community, and who wants to doubt it. 218 00:11:06,867 --> 00:11:12,171 And guess what? Also, Redding Medical Centre had one of the best mortality rates, 219 00:11:13,001 --> 00:11:15,350 for cardiac procedures in the United States, 220 00:11:15,350 --> 00:11:19,609 because if you operate on healthy people, you have a good mortality rate! 221 00:11:19,619 --> 00:11:21,129 [scattered laughter] 222 00:11:21,129 --> 00:11:23,390 So, the other thing, though, 223 00:11:23,390 --> 00:11:25,452 and this is the point that comes to Open Data for me 224 00:11:25,452 --> 00:11:28,722 the other red flag if you had been looking at the data, 225 00:11:28,741 --> 00:11:31,927 was these two things: one is incredibly low mortality rate, 226 00:11:31,927 --> 00:11:35,351 and (B) that it had almost the highest number of procedures 227 00:11:35,351 --> 00:11:37,464 for the population that it covered in the United States, 228 00:11:38,144 --> 00:11:39,634 which should be a red flag, right? 229 00:11:39,634 --> 00:11:42,618 Because, one, it's just a massive outlier on that basis, and also, 230 00:11:42,618 --> 00:11:45,815 the more people you should be operating on, the more you're doing marginal cases, 231 00:11:45,815 --> 00:11:49,450 the higher should be your mortality rate unless something very odd is going on. 232 00:11:50,030 --> 00:11:53,015 The thought was: what if people had been looking at this data? 233 00:11:53,015 --> 00:11:56,045 What if we'd - if this data had been open and public, 234 00:11:56,045 --> 00:11:59,517 and not maybe just for particular researchers to look at or the government? 235 00:11:59,927 --> 00:12:04,129 And it kind of reminded me of a phrase that's very famous in Open Source software, which is: 236 00:12:04,129 --> 00:12:05,965 "To many eyes, all bugs are shallow". 237 00:12:05,965 --> 00:12:10,504 What's great about Open Source software is lots of people can look at it, lots of people can fix it. 238 00:12:10,504 --> 00:12:14,730 And for me, what this was saying was: to many eyes, all anomalies are noticeable. 239 00:12:14,730 --> 00:12:16,679 It's somewhat of an exaggeration, 240 00:12:16,679 --> 00:12:18,908 but what happens if rather than ten or twenty people 241 00:12:18,908 --> 00:12:22,077 who worked in monitoring Medicare provision in the US government, 242 00:12:22,077 --> 00:12:23,877 we'd had thousands or millions of people? 243 00:12:23,877 --> 00:12:26,919 If the local journalists or citizens, who had suspicions, 244 00:12:26,919 --> 00:12:28,747 had been able to go and look at that data and say: 245 00:12:28,747 --> 00:12:32,485 "Whoa! What's going on here? This isn't just anecdotes, there's some data". 246 00:12:34,205 --> 00:12:40,225 And so, and it's not just then, about kind of spotting healthcare errors, or issues, or risks, 247 00:12:40,225 --> 00:12:42,415 it's also about things like apps and services 248 00:12:42,415 --> 00:12:43,857 that you can build with Open Data. 249 00:12:43,867 --> 00:12:46,667 This is a great app built by mySociety in the UK, 250 00:12:46,667 --> 00:12:47,640 called Mapumental. 251 00:12:47,640 --> 00:12:48,974 And the question is, I don't know if people know, 252 00:12:48,974 --> 00:12:50,646 London house prices are very expensive, 253 00:12:50,646 --> 00:12:52,510 I don't know whether they rival Geneva's, 254 00:12:52,510 --> 00:12:55,238 but they're, it's a pretty difficult thing. 255 00:12:55,238 --> 00:12:57,978 And one of the questions was, if I have to work somewhere, 256 00:12:57,978 --> 00:13:01,752 and I want to know where I can live, and afford, 257 00:13:01,752 --> 00:13:05,757 and I can commute to work in a certain time, and it's not too ugly, 258 00:13:05,757 --> 00:13:07,583 this is what this app does. 259 00:13:07,583 --> 00:13:11,200 You can choose the price, you can say where you're going to work, 260 00:13:11,200 --> 00:13:14,195 you can choose the commute time, and you can choose the scenicness. 261 00:13:14,195 --> 00:13:17,167 And it will show you, on this map, where you can live. 262 00:13:18,427 --> 00:13:20,796 Another example, more about transparency, 263 00:13:20,796 --> 00:13:22,746 is a project we did called "Where Does My Money Go?". 264 00:13:23,976 --> 00:13:25,406 It's an interactive version, 265 00:13:25,406 --> 00:13:26,211 you can kind of draw it out, 266 00:13:26,211 --> 00:13:29,114 so what it starts with, is one, is it tells you what your tax is, 267 00:13:29,114 --> 00:13:30,821 something that most people often don't know, 268 00:13:30,821 --> 00:13:33,668 and it will tell you how much you're paying each day 269 00:13:33,668 --> 00:13:36,254 to a particular area of society. 270 00:13:36,254 --> 00:13:37,328 And the dream for me, 271 00:13:37,328 --> 00:13:39,127 a dream that we're on the way to realising, 272 00:13:39,127 --> 00:13:42,817 is in this visualisation, you can drill down into areas. 273 00:13:42,817 --> 00:13:45,092 And my dream is to keep drilling down. 274 00:13:45,472 --> 00:13:47,633 So depending on what day we have, I want to go down, 275 00:13:47,633 --> 00:13:49,628 right down through those bubbles, step by step, 276 00:13:49,628 --> 00:13:52,403 until I see the money spent on street lights on my street, 277 00:13:52,403 --> 00:13:55,270 on filling in potholes, on collecting my rubbish. 278 00:13:56,190 --> 00:13:57,138 And for two reasons: 279 00:13:57,138 --> 00:13:59,704 One, obviously there's a question, particularly in some countries, 280 00:13:59,704 --> 00:14:01,016 of inefficiency or corruption, 281 00:14:01,436 --> 00:14:05,176 but also, just because most of us don't feel very happy about paying tax. 282 00:14:06,066 --> 00:14:08,157 It's not one of those things people welcome! 283 00:14:08,157 --> 00:14:09,817 But it's something that we should. 284 00:14:09,817 --> 00:14:11,960 Government does an awful lot for us, 285 00:14:11,960 --> 00:14:14,287 and having a better sense of where it's going 286 00:14:14,287 --> 00:14:17,120 could make us feel an awful lot better about paying that tax. 287 00:14:17,120 --> 00:14:18,657 In the way that when we go to a restaurant, 288 00:14:18,657 --> 00:14:21,397 we don't, when we get the bill, we don't necessarily feel bad. 289 00:14:21,397 --> 00:14:24,334 We feel "Wow, I had a great meal. That was worth it." 290 00:14:25,274 --> 00:14:26,366 But why Open? 291 00:14:26,366 --> 00:14:29,079 I've given you examples, and you know, we see a lot of apps and services. 292 00:14:29,079 --> 00:14:30,948 Why is Open relevant here? 293 00:14:31,598 --> 00:14:36,350 This goes back to what I said about the information technology, the revolution. 294 00:14:36,350 --> 00:14:37,813 So it's the challenge and the opportunity. 295 00:14:37,813 --> 00:14:42,387 It's the challenge that we see today, is exploding informational complexity. 296 00:14:42,797 --> 00:14:43,924 I mean, another great story: 297 00:14:43,924 --> 00:14:47,728 in the 1820s, all bank clearing in the largest financial centre in the world 298 00:14:47,728 --> 00:14:51,848 was done in a single room, where people -- one person from each bank gathered 299 00:14:51,848 --> 00:14:56,402 and they'd go round the room pulling out gold, and swapping it around, between different banks. 300 00:14:56,402 --> 00:14:58,434 And that's how they did bank clearing. 301 00:14:59,074 --> 00:15:01,991 Today we have billions of transactions a minute. 302 00:15:01,991 --> 00:15:07,945 And the way we as humans deal with complexity is by dividing and conquering it. 303 00:15:07,945 --> 00:15:10,683 We split it up into manageable chunks that we deal with. 304 00:15:11,013 --> 00:15:12,381 The other answer, 305 00:15:12,381 --> 00:15:14,883 and this answer's particularly relevant about Open Data, 306 00:15:14,883 --> 00:15:16,219 is information technology. 307 00:15:16,219 --> 00:15:18,951 Today, a smartphone has as much computing power 308 00:15:18,951 --> 00:15:22,260 as the system that ran the Apollo moon landings. 309 00:15:22,260 --> 00:15:24,027 And an even better example is storage: 310 00:15:24,027 --> 00:15:26,930 one terabyte of storage today is a hundred dollars. 311 00:15:26,930 --> 00:15:30,297 In 1994, this would have cost 400,000 dollars. 312 00:15:30,297 --> 00:15:33,977 I can have every financial transaction 313 00:15:33,977 --> 00:15:38,376 the UK government, or the US government made last year, or even for the last decade, 314 00:15:38,376 --> 00:15:39,665 on my laptop. 315 00:15:39,665 --> 00:15:43,543 That was not possible for an average citizen a decade ago. 316 00:15:44,283 --> 00:15:48,187 So it's mass participation, information access, processing, and production. 317 00:15:48,187 --> 00:15:49,557 It's decentralisation. 318 00:15:49,557 --> 00:15:51,728 And the claim here is that openness is key. 319 00:15:51,728 --> 00:15:53,957 It's because it's about scaling. 320 00:15:54,547 --> 00:15:57,399 What we are doing is weaving data together. 321 00:15:57,399 --> 00:15:59,615 As I said, we deal with complexity by splitting it up. 322 00:15:59,615 --> 00:16:02,928 We componentise, we split data up into blocks 323 00:16:02,928 --> 00:16:04,670 that we recombine. 324 00:16:04,670 --> 00:16:07,201 But if we are going to recombine information, 325 00:16:07,961 --> 00:16:10,076 we need to put Humpty Dumpty back together again, 326 00:16:10,076 --> 00:16:12,909 it won't work most of the time if it is closed. 327 00:16:13,449 --> 00:16:17,039 We need Open Data to scale and to componentise. 328 00:16:17,679 --> 00:16:21,518 And it's a point just to make here in this respect, that you might think: 329 00:16:21,518 --> 00:16:23,351 "Well you know, you're talking about Open Data, 330 00:16:23,351 --> 00:16:24,721 you know, this could be true of anything! 331 00:16:24,721 --> 00:16:25,789 Why don't we have like, 332 00:16:25,789 --> 00:16:28,232 Open Cars, and Open Shoes, and you know, 333 00:16:28,232 --> 00:16:29,578 why don't we just share everything, man! 334 00:16:29,578 --> 00:16:31,026 It would be so beautiful!". 335 00:16:31,706 --> 00:16:33,393 Right? And the sad thing is, 336 00:16:33,393 --> 00:16:39,070 is that that hasn't generally worked as a way of organising most production in our society. 337 00:16:39,070 --> 00:16:44,074 Instead, we have private property, and so we don't do that much openness relatively. 338 00:16:44,074 --> 00:16:45,848 But there's something different about digital information. 339 00:16:45,848 --> 00:16:48,944 We all know it, but it's worth emphasising, which is, it's very cheaply copied. 340 00:16:49,344 --> 00:16:52,782 I mean, give me a copy of your data isn't a problem if you're the government. 341 00:16:52,782 --> 00:16:56,393 Give me a copy of your car, or your house, or whatever, is. 342 00:16:56,803 --> 00:16:58,584 And it's also about innovation here. 343 00:16:58,584 --> 00:17:01,219 I mean, in a way it's almost the purest aspect of markets. 344 00:17:01,219 --> 00:17:05,619 Markets are about moving things to the person who could use them most best. 345 00:17:06,389 --> 00:17:07,200 And that's true of data. 346 00:17:07,200 --> 00:17:10,660 The best thing to do with your data will likely be thought of by someone else. 347 00:17:11,340 --> 00:17:14,973 And vice versa! You will think of the best thing to do with someone else's data. 348 00:17:15,783 --> 00:17:20,103 And Open Data allows us, in the most frictionless, easiest way, 349 00:17:20,103 --> 00:17:22,708 to move data to where it can be most optimally used, 350 00:17:22,728 --> 00:17:23,905 particularly if you're government. 351 00:17:24,275 --> 00:17:26,843 So in short, it's about better understanding, it's about better government, 352 00:17:26,843 --> 00:17:29,115 it's about better research, it's about better economy. 353 00:17:29,115 --> 00:17:31,124 And something also for companies and governments: 354 00:17:31,124 --> 00:17:32,707 I think it's about better engagement. 355 00:17:33,137 --> 00:17:34,986 It's about a closer relationship, sometimes, 356 00:17:34,986 --> 00:17:37,331 between your citizens and you as the government. 357 00:17:37,331 --> 00:17:40,960 Between you, even possibly, as a company, and your users. 358 00:17:41,690 --> 00:17:43,972 So I wanted to kind of finish here by saying where we're going. 359 00:17:43,972 --> 00:17:46,429 The story was, of this talk, was, you know, where are we? 360 00:17:46,869 --> 00:17:49,654 Why have we got here? And where are we going? 361 00:17:50,734 --> 00:17:52,248 So one answer is just more use. 362 00:17:52,248 --> 00:17:55,313 So right now, I just said at the beginning, Open Data is relatively young. 363 00:17:55,313 --> 00:17:57,844 This vast outpouring, for example, of government data, 364 00:17:57,844 --> 00:18:02,058 that anyone can freely use, reuse, and redistribute, is really new, 365 00:18:02,058 --> 00:18:03,382 even if it's done three years ago. 366 00:18:03,382 --> 00:18:06,479 For example, in the UK, much of the most useful data that could be released 367 00:18:06,479 --> 00:18:09,091 has only been released in the last six months or a year. 368 00:18:09,091 --> 00:18:10,478 You want prescription data? 369 00:18:10,478 --> 00:18:11,894 Are you a pharmaceutical company, 370 00:18:11,894 --> 00:18:15,261 and you want to know what kind of prescription habits are going on in the UK? 371 00:18:15,261 --> 00:18:18,837 I would emphasise: at an anonymised or somewhat aggregate level. 372 00:18:18,837 --> 00:18:20,707 Do you want to know about what crime is going on? 373 00:18:20,707 --> 00:18:24,241 Are you building a real estate website and you want data on environment, 374 00:18:24,241 --> 00:18:25,742 or you want data on unemployment, 375 00:18:25,742 --> 00:18:28,742 or other information about where properties are situated? 376 00:18:28,742 --> 00:18:30,077 You can now get that. 377 00:18:30,687 --> 00:18:32,783 So I think there's going to be a lot more use from business. 378 00:18:33,743 --> 00:18:35,398 There'll be a lot more use from everyone. 379 00:18:35,398 --> 00:18:38,686 But I think particularly business is going to wake up to the opportunities here. 380 00:18:38,686 --> 00:18:40,192 I think it's also going to lead to more data. 381 00:18:40,192 --> 00:18:41,789 One is, government is going to be more data. 382 00:18:41,789 --> 00:18:45,597 I think also businesses are going to realise, and communities, 383 00:18:45,597 --> 00:18:47,719 that they want to share back some of that data, 384 00:18:47,719 --> 00:18:48,764 some of the data they have. 385 00:18:48,764 --> 00:18:50,863 It's not going to be their kind of crown jewels, 386 00:18:50,863 --> 00:18:53,707 and it's not going -- often start out with data that's not core to their business. 387 00:18:53,707 --> 00:18:58,108 It's like. kind of Nike, they realised that by opening and sharing data, 388 00:18:58,108 --> 00:19:00,569 they can scale in a way they can't on their own. 389 00:19:01,029 --> 00:19:03,069 And does it mean that richer data, going back 390 00:19:03,069 --> 00:19:05,809 -- how could I leave out Hegel and Marx in a talk like this -- (19:06) 391 00:19:05,809 --> 00:19:17,221 "Quantity changes quality" as Hegel told us. 392 00:19:17,221 --> 00:19:20,630 And more data, going back to that woven ball, more data actually means better data. 393 00:19:20,630 --> 00:19:24,040 It means richer data, it's a qualitative difference in what we can do. Geodata on it's own isn't that useful. Transport data on it's own isn't useful. Geodata plus transport data is useful! 394 00:19:24,040 --> 00:19:31,912 And we're going to be seeing data refining. Data is the new oil, right? So, we're going to refine it. And that's going to be a big business. Higher quality data. 395 00:19:31,912 --> 00:19:42,440 And I want to leave you with a couple of thoughts. So, one is: Some people say, "well, okay, but, you know, selling data is big business". And it is, but going forward in some of these things like software, data is going to be a platform. 396 00:19:42,440 --> 00:19:47,548 It's not a commodity. Businesses built purely on selling data, I just don't think I'm going to make it. 397 00:19:47,548 --> 00:19:52,067 You need to be building on your data, not attempting to purely sell it. 398 00:19:52,067 --> 00:20:02,223 And the other answer is to be modest. So I said: where are we going? I don't know if people know, and this takes us back to an earlier age, an age of electricity and steam, of Faraday. 399 00:20:02,223 --> 00:20:16,694 So he's demonstrating electricity at the Royal Society, and Gladstone, the future Prime Minister of England, sees him do this stuff, you know, the frog legs move, and Gladstone's like: "well, I mean, this is party trick, Faraday, it's great, but, what's really, you know, what's electricity going to amount to?" 400 00:20:16,694 --> 00:20:20,832 And Faraday says to him: "Well, what's the use of a baby?" 401 00:20:20,832 --> 00:20:23,739 You know, a baby when it's young is not very useful. 402 00:20:23,739 --> 00:20:25,718 [scattered laughter] 403 00:20:25,718 --> 00:20:36,165 But it grows up into something! And that is where we are going today. We are the beginning of the Open Data journey. And partly is, we don't know what it's going to grow up into. 404 00:20:36,165 --> 00:20:37,774 Thank you very much! 405 00:20:37,774 --> 00:20:40,558 [Applause] 406 00:20:40,558 --> 00:20:57,332 QUESTIONER: Um, citizens and um, I guess patients in hospitals, assume that the institutions have all this data and it's very well organised, and it's a question of will. Have you encountered cases in which they simply don't have it, or they have it, and it's just such a mess that they're too embarrassed to give it out? 407 00:20:57,332 --> 00:21:17,527 RUFUS: Absolutely. I mean, one story that kind of intrigues me, is we've been building this "Where Does My Money Go?" open spending project. And one of the things the government mandated was giving out, rather than just high-level financial information, giving out information at a detailed level. You know, so they now publish, for example, spending data from each government department monthly, every transaction over 25,000 pounds. 408 00:21:17,527 --> 00:21:31,130 Every purchase they make, every mobile phone provider they contract with, we get that data. And one of the intriguing things, of their mandating this, was it turned out, before, they had no way, before they did this, of actually seeing, on any regular basis, what their department spent money on. 409 00:21:31,130 --> 00:21:38,839 Because in fact, the only thing they reported up onto, in central government to Treasury, was kind of like, how much did you spend against Project X that you were allocated budget for? 410 00:21:38,839 --> 00:21:46,454 You know, departments, were actually really intrigued, they say "Oh, well that other department's going with Vodafone, and we're with Orange, and look how much they're paying per month!" 411 00:21:46,454 --> 00:21:59,519 So I think in essence it is really driving changes in government, and yeah, there are people, I think you'd been worried about giving out data quality. I was just talking to the Department of Education last week and they said -- you know, one of the things -- they had financial information from schools, and which they were slowly being mandated to publish. 412 00:21:59,519 --> 00:22:05,974 And schools are suddenly all ringing up, saying: "Well we never really bothered to really update that information to be accurate! Uh, we really want to do it right now". 413 00:22:05,974 --> 00:22:08,659 So I think that definitely does happen, yep. 414 00:22:08,659 --> 00:22:12,617 QUESTIONER: Are you seeing now new roles in government, to help facilitate this? 415 00:22:12,617 --> 00:22:27,580 RUFUS: Yeah. I mean, to take another example, I, sorry. Both in government, so the UK government has a transparency kind of 'czar' if you like. Also I learnt, is Nike hired an Open Data evangelist. One of the things they, while they were implementing this programme, they actually hired explicitly, an Open Data evangelist. 416 00:22:27,580 --> 00:22:32,406 So yeah, I think we are, we're definitely seeing this in government. Both in the tech level, but also at the policy level. 417 00:22:32,406 --> 00:22:43,837 And I think it's not just government, it will also be companies doing this, and so on, who will be saying: "We need an Open Data expert. We need to be aware of what's going on here and be able to plan it as part of our strategy." 418 00:22:43,837 --> 00:22:55,085 QUESTIONER: A final question. You mentioned that, kind of outsourcing, almost, some of this data refining, outside government or the big institutions, has helped them. Can you tell us any stories of kind of gratitude being expressed by the government? 419 00:22:55,085 --> 00:23:10,447 RUFUS: Well, I mean, to kind of, yeah. I mean there was an interesting example actually where we had some complaint because the open spending data I told you about, where we're aggregating the government spending and financial data -- you know, the site had a few performance issues, occasionally, as we loaded more data in. 420 00:23:10,447 --> 00:23:20,864 I remember kind of getting this call kind of going : "Well, you know, we're a little bit upset". You know, data.gov.uk, and it turned out the reason was, the Treasury kept looking at this data, and they were annoyed when the site was going down. 421 00:23:20,864 --> 00:23:25,878 So that was really intriguing to me, that we were kind of one of the best, at least, up-to-date aggregators out there. 422 00:23:25,878 --> 00:23:33,184 Um, I think you are already seeing people doing stuff with the data and kind of doing stuff, sometimes for free. And you don't have to have the shiny front-end. 423 00:23:33,184 --> 00:23:46,246 I mean, one of the things, we went about, we went on about, I know Tim Berners-Lee went on about -- raw data now, you know, you can build fewer shiny front-ends, and just release raw data. And you know, someone else will help you build the app, the front-end, the interface. 424 00:23:46,246 --> 00:23:54,122 And help you innovate about it. What is the best way to provide healthcare data to citizens, or education data to citizens, so they make better and more informed choices? 425 00:23:54,122 --> 00:23:56,767 I don't know, and the government probably doesn't know. 426 00:23:56,767 --> 00:24:02,799 But somewhere out there, someone is going to innovate and really provide the best way for us to deliver that kind of information to citizens. 427 00:24:02,799 --> 00:24:04,250 QUESTIONER: Thank you very much. 428 00:24:04,250 --> 00:24:05,496 RUFUS: Thank you. 429 00:24:05,496 --> 00:24:06,901 [Applause] 430 00:24:06,901 --> 00:24:09,423 lift _ Video Production ACTUA 431 00:24:09,423 --> 00:24:11,751 Copyright (c) 2012 Lift conference