1 99:59:59,999 --> 99:59:59,999 The world we live in is awashed with data 2 99:59:59,999 --> 99:59:59,999 that comes pouring in from everywhere around us. 3 99:59:59,999 --> 99:59:59,999 On it own, this data is just noise and confusion. 4 99:59:59,999 --> 99:59:59,999 To make sense of data, to find the meaning in it, 5 99:59:59,999 --> 99:59:59,999 we need a powerful branch of science: statistics. 6 99:59:59,999 --> 99:59:59,999 Believe me, there's nothing boring about statistics 7 99:59:59,999 --> 99:59:59,999 especially not today, when we can make the data sing. 8 99:59:59,999 --> 99:59:59,999 With statistics we can really make sense of the world. 9 99:59:59,999 --> 99:59:59,999 Are statistics, the data diluge as it's been called, 10 99:59:59,999 --> 99:59:59,999 leading us to a greater understanding 11 99:59:59,999 --> 99:59:59,999 of the life on Earth and the world beyond? 12 99:59:59,999 --> 99:59:59,999 Thanks to incredible power of today's computers 13 99:59:59,999 --> 99:59:59,999 it may fundamentally transform the process of scientific discovery. 14 99:59:59,999 --> 99:59:59,999 I kid you not, statistics is now the sexiest subject around. 15 99:59:59,999 --> 99:59:59,999 Did you know that there's is one million boats in Sweden? 16 99:59:59,999 --> 99:59:59,999 That's one boat per nine people. 17 99:59:59,999 --> 99:59:59,999 It's the highest number of boats per person in Europe. 18 99:59:59,999 --> 99:59:59,999 Being statistician, you don't like telling your profession at dinner parties, 19 99:59:59,999 --> 99:59:59,999 but really, statisticians shouldn't be shy 20 99:59:59,999 --> 99:59:59,999 because they always want to understand what's going on. 21 99:59:59,999 --> 99:59:59,999 Stastistics gives us a persperctive of the world we live in 22 99:59:59,999 --> 99:59:59,999 that we can't get in any other way. 23 99:59:59,999 --> 99:59:59,999 Statistics tells us whether the things we think and believe are actually true. 24 99:59:59,999 --> 99:59:59,999 Statistics are far more useful than we usually like to admit. 25 99:59:59,999 --> 99:59:59,999 In the last recession, there was this famous call into Talk Radio Station. 26 99:59:59,999 --> 99:59:59,999 The man complained: "in times like this, when unemployment rates are up to 13%, 27 99:59:59,999 --> 99:59:59,999 and income has fallen by 5%, and suicide rates are climbing, 28 99:59:59,999 --> 99:59:59,999 I get so angry that the government is wasting money on things like correctional statistics." 29 99:59:59,999 --> 99:59:59,999 I'm not oficially a statistician, strictly speaking my field is global health. 30 99:59:59,999 --> 99:59:59,999 But I got really obsessed with stats, when I realised how many people in Sweden 31 99:59:59,999 --> 99:59:59,999 don't know anything about the rest of the world. 32 99:59:59,999 --> 99:59:59,999 I started in our Medical University in Karolinksa Institute, 33 99:59:59,999 --> 99:59:59,999 an ungraduate course called Global Health. 34 99:59:59,999 --> 99:59:59,999 These students coming to us have actually the highest grades you can get in theSwedish college system. 35 99:59:59,999 --> 99:59:59,999 So I thought maybe they know everything I'm going to teach them. 36 99:59:59,999 --> 99:59:59,999 So I did a pre-test when they came. 37 99:59:59,999 --> 99:59:59,999 One of the questions, from which I learnt a lot, was: 38 99:59:59,999 --> 99:59:59,999 Which country has the highest child mortality of these five pairs? 39 99:59:59,999 --> 99:59:59,999 I won't put you at test here, but it's Turkey which is higher there, 40 99:59:59,999 --> 99:59:59,999 Poland, Russia, Pakistan and South Africa. 41 99:59:59,999 --> 99:59:59,999 And these were the results of the Swedish students. 42 99:59:59,999 --> 99:59:59,999 1.8 answers right out of 5 possible, 43 99:59:59,999 --> 99:59:59,999 that means that there was a place for a professor in International Health 44 99:59:59,999 --> 99:59:59,999 and for my course. 45 99:59:59,999 --> 99:59:59,999 But one late night when I was compiling my report 46 99:59:59,999 --> 99:59:59,999 I really realise my discovery. 47 99:59:59,999 --> 99:59:59,999 I have shown that Swedish top students know 48 99:59:59,999 --> 99:59:59,999 statistically significantly less about the world than the chimpanzees. 49 99:59:59,999 --> 99:59:59,999 Beacuse the chimpanzee would score half right. 50 99:59:59,999 --> 99:59:59,999 If I gave them two bananas with Sri Lanka and Turkey 51 99:59:59,999 --> 99:59:59,999 they would be right half of the cases. 52 99:59:59,999 --> 99:59:59,999 But the students are not there. 53 99:59:59,999 --> 99:59:59,999 I did also an unethical study of the professors of the Karolinska Institute 54 99:59:59,999 --> 99:59:59,999 that hands out the Nobel Prize in Medicine, and they aren't on par with the chimpanzee. 55 99:59:59,999 --> 99:59:59,999 Today, there's more information accesible than ever before, 56 99:59:59,999 --> 99:59:59,999 and I work with my team at the Gapminder Foundation 57 99:59:59,999 --> 99:59:59,999 using new tools that help everyone make sense of the changing world. 58 99:59:59,999 --> 99:59:59,999 We draw on the masses of data that are now free available 59 99:59:59,999 --> 99:59:59,999 from international institutions like the UN and the World Bank. 60 99:59:59,999 --> 99:59:59,999 It's become my mission to share my insights from this data 61 99:59:59,999 --> 99:59:59,999 with anyone who listen, and to reveal how statistics is nothing to be frightened of. 62 99:59:59,999 --> 99:59:59,999 I'm going to provide you a view of the global health situation across mankind, 63 99:59:59,999 --> 99:59:59,999 and I'm going to do that in a hopefully enjoyable way. So relax. 64 99:59:59,999 --> 99:59:59,999 We did this software which displays it like this, 65 99:59:59,999 --> 99:59:59,999 every bubble here is a country, this is China, this is India. 66 99:59:59,999 --> 99:59:59,999 The size of the bubble is the population. 67 99:59:59,999 --> 99:59:59,999 And I'm going to stage a race here 68 99:59:59,999 --> 99:59:59,999 between this sort of yellow Ford here, and the red Toyota down there, 69 99:59:59,999 --> 99:59:59,999 and the brownish Volvo. 70 99:59:59,999 --> 99:59:59,999 The Toyota has a very bad start down here, 71 99:59:59,999 --> 99:59:59,999 and United States' Ford is going off road there, 72 99:59:59,999 --> 99:59:59,999 and the Volvo is doing quite fine, this is the war, 73 99:59:59,999 --> 99:59:59,999 they Toyota got off crack, now Toyota is coming on the healthier side of Sweden. 74 99:59:59,999 --> 99:59:59,999 That's the point when I sold the Volvo and bought the Toyota. 75 99:59:59,999 --> 99:59:59,999 This is the Great Leap Forward when China fell down, 76 99:59:59,999 --> 99:59:59,999 it was central planning by Mao Tse Tung, 77 99:59:59,999 --> 99:59:59,999 China recovered and said "never more stupid central planning", but they went up here. 78 99:59:59,999 --> 99:59:59,999 No, there was one more inequity, look there! United States! 79 99:59:59,999 --> 99:59:59,999 Oh, they broke my frame! 80 99:59:59,999 --> 99:59:59,999 Washington D.C. is so rich over there, but it's not as healthy as Kerala, India. 81 99:59:59,999 --> 99:59:59,999 It's quite interesting, isn't it? 82 99:59:59,999 --> 99:59:59,999 Welcome to the USA, world leaders in big cars 83 99:59:59,999 --> 99:59:59,999 and free data. 84 99:59:59,999 --> 99:59:59,999 There are many here who share my vision 85 99:59:59,999 --> 99:59:59,999 of making public data accesible and useful for everyone. 86 99:59:59,999 --> 99:59:59,999 The city of San Francisco is in the lead, opening up it's data on everything. 87 99:59:59,999 --> 99:59:59,999 Even the Police Dept. is releasing all it's crime reports. 88 99:59:59,999 --> 99:59:59,999 This official crime data has been turned into a wonderful inteactive map 89 99:59:59,999 --> 99:59:59,999 by two of the cities computer whizzes. 90 99:59:59,999 --> 99:59:59,999 It's community statistics in action. 91 99:59:59,999 --> 99:59:59,999 Crimespotting is a map of crime reports 92 99:59:59,999 --> 99:59:59,999 from the San Francisco Police Dept. 93 99:59:59,999 --> 99:59:59,999 showing dots on maps for citizens to be able to see patterns of crime 94 99:59:59,999 --> 99:59:59,999 in their neighbourhoods in San Francisco. 95 99:59:59,999 --> 99:59:59,999 The map is not just about individual crimes 96 99:59:59,999 --> 99:59:59,999 but about broader patterns that show you where crime is clustered around the city, 97 99:59:59,999 --> 99:59:59,999 which have high crime, which areas have relatively low crime. 98 99:59:59,999 --> 99:59:59,999 We're here at top of Jones Street, on uphill, quite a nice neighbourhood 99 99:59:59,999 --> 99:59:59,999 what the crime maps show us is the relationship between typography and crime. 100 99:59:59,999 --> 99:59:59,999 The higher up the hill, the less crime there is. 101 99:59:59,999 --> 99:59:59,999 We crossed over the border into the flats. 102 99:59:59,999 --> 99:59:59,999 Essentially, as soon as you get into the kind of lower line areas of Jones street, 103 99:59:59,999 --> 99:59:59,999 the crime just skyrockets. 104 99:59:59,999 --> 99:59:59,999 So we're in the uptown Tenderloin District, 105 99:59:59,999 --> 99:59:59,999 it's one of the oldest and most dangerous neighbourhoods in San Francisco. 106 99:59:59,999 --> 99:59:59,999 This is where you go to buy drugs, right around here. 107 99:59:59,999 --> 99:59:59,999 You see lots of aggreviated assault, lots of thefts. 108 99:59:59,999 --> 99:59:59,999 Basically, the huge part of the crime of the city happens right in these four or six block areas. 109 99:59:59,999 --> 99:59:59,999 If you've been hearing police sirens in your neighbourhood, 110 99:59:59,999 --> 99:59:59,999 you can use the map to find out why. 111 99:59:59,999 --> 99:59:59,999 If you are out at night in an unfamiliar part of town 112 99:59:59,999 --> 99:59:59,999 you can check the map for streets to avoid. 113 99:59:59,999 --> 99:59:59,999 If a neighbour gets burglared, you can see, 114 99:59:59,999 --> 99:59:59,999 is it the one off or has there been a spike in local crime? 115 99:59:59,999 --> 99:59:59,999 If you commute through a neighbourhood and you're worried about its safety 116 99:59:59,999 --> 99:59:59,999 the fact that we have the ability to turn off all the night time and middle-of-the-day crimes 117 99:59:59,999 --> 99:59:59,999 and show you just the things that are happening during your commute, 118 99:59:59,999 --> 99:59:59,999 is a statistical operation but I think to the people that are interacting with the thing 119 99:59:59,999 --> 99:59:59,999 it feels very much more like they just are sort of browsing a website 120 99:59:59,999 --> 99:59:59,999 or shopping on Amazon. They're looking at data, 121 99:59:59,999 --> 99:59:59,999 and they don't realise that they're doing statistics. 122 99:59:59,999 --> 99:59:59,999 What's most exciting for me is that public statistics 123 99:59:59,999 --> 99:59:59,999 is making citizens more powerful and the authorities more accountable. 124 99:59:59,999 --> 99:59:59,999 We have community meetings that the police attend 125 99:59:59,999 --> 99:59:59,999 and what citizens are now doing, they're bringing printouts of the maps 126 99:59:59,999 --> 99:59:59,999 to show where crimes are taking place, 127 99:59:59,999 --> 99:59:59,999 and they're demanding services from the police department, 128 99:59:59,999 --> 99:59:59,999 which is now having to change how they please, 129 99:59:59,999 --> 99:59:59,999 how they provide policing services, 130 99:59:59,999 --> 99:59:59,999 because the data is showing what is working and what is not. 131 99:59:59,999 --> 99:59:59,999 People in San Francisco are also using public data 132 99:59:59,999 --> 99:59:59,999 to map social inequalities, and see how to improve society 133 99:59:59,999 --> 99:59:59,999 and the possibilities are endless. 134 99:59:59,999 --> 99:59:59,999 Our dream would be that the government announced that 135 99:59:59,999 --> 99:59:59,999 this data project would really focus on live information 136 99:59:59,999 --> 99:59:59,999 on stuff that was being reported and pushed out into the world as it was happening. 137 99:59:59,999 --> 99:59:59,999 Trash pickup, traffic accidents, buses, 138 99:59:59,999 --> 99:59:59,999 and through the kind of the stats gathering power on the internet 139 99:59:59,999 --> 99:59:59,999 it's posible to really see the workings of the city 140 99:59:59,999 --> 99:59:59,999 displayed as a unified interface. 141 99:59:59,999 --> 99:59:59,999 That's where we are heading, 142 99:59:59,999 --> 99:59:59,999 towards a world of free data with all the statistical insights that come from it 143 99:59:59,999 --> 99:59:59,999 accesible to everyone, empowering us as citizens 144 99:59:59,999 --> 99:59:59,999 and letting hold our rulers to account. 145 99:59:59,999 --> 99:59:59,999 It's a long way from where statistics began. 146 99:59:59,999 --> 99:59:59,999 Statistics are essential to monitor our government in our societies. 147 99:59:59,999 --> 99:59:59,999 But, it was our rulers out there who started the collection of statistics 148 99:59:59,999 --> 99:59:59,999 in first place in order to monitor us. 149 99:59:59,999 --> 99:59:59,999 In fact the word statistics comes from state. 150 99:59:59,999 --> 99:59:59,999 Modern statistics began two centuries ago. 151 99:59:59,999 --> 99:59:59,999 Once it got going it spread and never stopped. 152 99:59:59,999 --> 99:59:59,999 And guess who was first. 153 99:59:59,999 --> 99:59:59,999 The Chinese have Confucious, the Italians have Da Vinci, 154 99:59:59,999 --> 99:59:59,999 and the British have Shakespeare, and we have the Tabellverket 155 99:59:59,999 --> 99:59:59,999 the first ever systematic collection of statistics. 156 99:59:59,999 --> 99:59:59,999 Since the year 1749 we have collected data on every birth, marriage and death 157 99:59:59,999 --> 99:59:59,999 and we are proud of it. 158 99:59:59,999 --> 99:59:59,999 The Tabellverket recorded information from every parish in Sweden. 159 99:59:59,999 --> 99:59:59,999 It was a huge quantity of data and it was the first time any goverment 160 99:59:59,999 --> 99:59:59,999 could get any accurate picture of its people. 161 99:59:59,999 --> 99:59:59,999 Sweden had been the greatest military power in Northern Europe 162 99:59:59,999 --> 99:59:59,999 but by 1749 our star was really fading and other countries were growing stronger. 163 99:59:59,999 --> 99:59:59,999 At least though, we were a large power, thought to have 20 million people 164 99:59:59,999 --> 99:59:59,999 enough to rival Britain and France. 165 99:59:59,999 --> 99:59:59,999 But we were in for a nasty surprise. 166 99:59:59,999 --> 99:59:59,999 The first analysis of Tabellverket revealed that Sweden only had 2 million inhabitants. 167 99:59:59,999 --> 99:59:59,999 Sweden was not only a power in decline, it also had a very small popoulation. 168 99:59:59,999 --> 99:59:59,999 The government was horrified by this finding. 169 99:59:59,999 --> 99:59:59,999 What if the enemy found out? 170 99:59:59,999 --> 99:59:59,999 But the Tabellverket also showed that many women die in childbirth. 171 99:59:59,999 --> 99:59:59,999 And many children died young, and government took action to improve the health of the people. 172 99:59:59,999 --> 99:59:59,999 That was the beginning of modern Sweden. 173 99:59:59,999 --> 99:59:59,999 It took more than 50 years before the Austrians, Belgiums, Danes, Dutch, 174 99:59:59,999 --> 99:59:59,999 Germans, Italians and finally the British caught up with Sweden 175 99:59:59,999 --> 99:59:59,999 in collecting and using statistics. 176 99:59:59,999 --> 99:59:59,999 It was called political arithmethic, and it was a lovely phrase as use for statistics. 177 99:59:59,999 --> 99:59:59,999 Governments could have much more control and understanding of the society 178 99:59:59,999 --> 99:59:59,999 how it's working, how it's developing, 179 99:59:59,999 --> 99:59:59,999 and essentially, so they could control it better. 180 99:59:59,999 --> 99:59:59,999 It wasn't just governments who woke up to the power of statistics. 181 99:59:59,999 --> 99:59:59,999 Right across Europe, 19th century society went mad for facts. 182 99:59:59,999 --> 99:59:59,999 And despite its late start, Britain with its Royal Statistical Society in London 183 99:59:59,999 --> 99:59:59,999 was soon a statisticians' nirvana. 184 99:59:59,999 --> 99:59:59,999 I love looking at old copies of the Royal Statistical Society, 185 99:59:59,999 --> 99:59:59,999 because is full of this stuff. 186 99:59:59,999 --> 99:59:59,999 There's a wonderful paper from the 1840s 187 99:59:59,999 --> 99:59:59,999 which shows a map of England and the rates of bastardy of each county 188 99:59:59,999 --> 99:59:59,999 189 99:59:59,999 --> 99:59:59,999 so you can identify very quickly the areas with high areas of bastardy. 190 99:59:59,999 --> 99:59:59,999 Being in East Anglia makes me slightly laugh 191 99:59:59,999 --> 99:59:59,999 that Norfolk was on top of the bastardy league in the 1840s. 192 99:59:59,999 --> 99:59:59,999 One of the founders of the Royal Statistical Society 193 99:59:59,999 --> 99:59:59,999 was the great victorian mathematician and inventor Charles Babbage. 194 99:59:59,999 --> 99:59:59,999 In 1842 he read the latest poem by a equally great victorian 195 99:59:59,999 --> 99:59:59,999 Alfred Tennyson. 196 99:59:59,999 --> 99:59:59,999 "Vision of Sin" contained the lines: 197 99:59:59,999 --> 99:59:59,999 "Fill the cup and fill the can, Have a rouse before the morn. 198 99:59:59,999 --> 99:59:59,999 Every moment dies a man, Every moment one is born." 199 99:59:59,999 --> 99:59:59,999 So keen statistician was Babbage that he could not contain himself. 200 99:59:59,999 --> 99:59:59,999 He dashed a letter to Tennyson explaining that because of population growth 201 99:59:59,999 --> 99:59:59,999 the line should read: 202 99:59:59,999 --> 99:59:59,999 "Every moment dies a man, And 11/16 is born." 203 99:59:59,999 --> 99:59:59,999 "I may add that the exact figure is 1.167 204 99:59:59,999 --> 99:59:59,999 but something must be conceded to the laws of metre." 205 99:59:59,999 --> 99:59:59,999 In the 19th century scholars all over Europe 206 99:59:59,999 --> 99:59:59,999 did an amazing work in measuring the societies. 207 99:59:59,999 --> 99:59:59,999 They hovered up data in almost everything 208 99:59:59,999 --> 99:59:59,999 but numbers alone don't tell you anything 209 99:59:59,999 --> 99:59:59,999 you have to analyse them, and that's what makes statistics. 210 99:59:59,999 --> 99:59:59,999 When the first statisticians began to get to grips with analysing their data 211 99:59:59,999 --> 99:59:59,999 they seized upon the average, and they took the average of everything. 212 99:59:59,999 --> 99:59:59,999 What's so great about an average 213 99:59:59,999 --> 99:59:59,999 is that you can take a whole mass of data and reduce it to a single number. 214 99:59:59,999 --> 99:59:59,999 Though each of us is unique, our collective lives produce averages 215 99:59:59,999 --> 99:59:59,999 that characterise whole populations. 216 99:59:59,999 --> 99:59:59,999 I look to my local newspaper one week 217 99:59:59,999 --> 99:59:59,999 and saw that a pensioner had accidently put a foot on the accelerator 218 99:59:59,999 --> 99:59:59,999 and crashed her friend against the wall. 219 99:59:59,999 --> 99:59:59,999 Devastating, hideous, horrible thing to happen. 220 99:59:59,999 --> 99:59:59,999 And there was a second one about a young man who didn't have a driving licence 221 99:59:59,999 --> 99:59:59,999 who was driving a car under the influence of drugs and alcohol 222 99:59:59,999 --> 99:59:59,999 and crashed into a pedestrian and killed him. 223 99:59:59,999 --> 99:59:59,999 What is remarkable, absolutely remarkable, 224 99:59:59,999 --> 99:59:59,999 if you look at the number of people who die each year 225 99:59:59,999 --> 99:59:59,999 in traffic accidents, it's nearly a constant. 226 99:59:59,999 --> 99:59:59,999 What? 227 99:59:59,999 --> 99:59:59,999 All these individual events, somehow when you sum them all up 228 99:59:59,999 --> 99:59:59,999 it's the same number every year, 229 99:59:59,999 --> 99:59:59,999 and every year two and a half times as many men die 230 99:59:59,999 --> 99:59:59,999 in traffic accidents as women, and it's a constant. 231 99:59:59,999 --> 99:59:59,999 An every year the rate in Belgium is double 232 99:59:59,999 --> 99:59:59,999 the rate in England, there are these remarkable regularities 233 99:59:59,999 --> 99:59:59,999 so that these individual particular events sum up into a social phenomenon. 234 99:59:59,999 --> 99:59:59,999 (Lecture) Let's see what Sweden has done 235 99:59:59,999 --> 99:59:59,999 we used to boast of fast social progress. 236 99:59:59,999 --> 99:59:59,999 (Narration) In my lectures, to tell stories about the changing world 237 99:59:59,999 --> 99:59:59,999 I use averages for entire countries, whether the average for income, 238 99:59:59,999 --> 99:59:59,999 child mortality, family size or carbon output. 239 99:59:59,999 --> 99:59:59,999 (Lecture) OK, I give you Singapore, the year I was born. 240 99:59:59,999 --> 99:59:59,999 Singapore had twice the child mortality of Sweden. 241 99:59:59,999 --> 99:59:59,999 The most tropical country in the world. A marshland on the Equator. 242 99:59:59,999 --> 99:59:59,999 And here we go. It took a little time for them to get independence 243 99:59:59,999 --> 99:59:59,999 but they started to grow their economy, and they made the social investments, 244 99:59:59,999 --> 99:59:59,999 they got away malaria, they got a magnificient health system 245 99:59:59,999 --> 99:59:59,999 that beats both UkKs and Sweden's. 246 99:59:59,999 --> 99:59:59,999 We thought it would never happened but they would win over Sweden! 247 99:59:59,999 --> 99:59:59,999 But useful as averages are they don't tell you the whole story. 248 99:59:59,999 --> 99:59:59,999 On average, Swedish people have slightly less than two legs. 249 99:59:59,999 --> 99:59:59,999 That is because a few people have one leg or no legs, and no one has three legs 250 99:59:59,999 --> 99:59:59,999 so almost everybody in Sweden has more than the average number of legs. 251 99:59:59,999 --> 99:59:59,999 The variation in data is just as important as the average. 252 99:59:59,999 --> 99:59:59,999 But how do you get the handle on variation? 253 99:59:59,999 --> 99:59:59,999 For this you transform numbers into shapes. 254 99:59:59,999 --> 99:59:59,999 Let's llok again at the number of adult women in Sweden for different heights. 255 99:59:59,999 --> 99:59:59,999 Plotting the data as a shape shows us how much their heights vary from the average 256 99:59:59,999 --> 99:59:59,999 and how wide that variation is. 257 99:59:59,999 --> 99:59:59,999 The shape a set of data makes is called its distribution. 258 99:59:59,999 --> 99:59:59,999 (Lecture) This is the income distribution of China 1970 259 99:59:59,999 --> 99:59:59,999 This is the income distribution of the United States 1970. 260 99:59:59,999 --> 99:59:59,999 Almost no overlap. And what has happened? 261 99:59:59,999 --> 99:59:59,999 China is growing. It's not so equal any longer. 262 99:59:59,999 --> 99:59:59,999 And it's appearing here, overlooking the United States 263 99:59:59,999 --> 99:59:59,999 almost like a ghost, isn't it? It's scary! 264 99:59:59,999 --> 99:59:59,999 That statistician who first explored distribution 265 99:59:59,999 --> 99:59:59,999 discovered one shape that turned up again and again 266 99:59:59,999 --> 99:59:59,999 the victorian scholar Francis Goldtone was so fascinated 267 99:59:59,999 --> 99:59:59,999 he built a machine that could reproduce it 268 99:59:59,999 --> 99:59:59,999 and he found it fitted so many different sets of measurements 269 99:59:59,999 --> 99:59:59,999 that he named it the Normal Distribution. 270 99:59:59,999 --> 99:59:59,999 Whether it was people's arm spans, land capacity or even their exam results 271 99:59:59,999 --> 99:59:59,999 the Normal Distribution shape recurred time and time again. 272 99:59:59,999 --> 99:59:59,999 And the statisticians soon found many other regular shapes 273 99:59:59,999 --> 99:59:59,999 each produced by a certain kind of natural or social processes. 274 99:59:59,999 --> 99:59:59,999 And every statistician has their favourite. 275 99:59:59,999 --> 99:59:59,999 The Poisson distribution, I think it's my favourite, it's absolute crack. 276 99:59:59,999 --> 99:59:59,999 The Poisson shape, describes how likely it is that out-of-the-ordinary things will happen. 277 99:59:59,999 --> 99:59:59,999 Imagine a London bus stop that we know that on average will get three buses an hour. 278 99:59:59,999 --> 99:59:59,999 We won't always get three buses of course. 279 99:59:59,999 --> 99:59:59,999 Amazingly the Poisson shape will show us the probability that in any given hour 280 99:59:59,999 --> 99:59:59,999 will get 4, 5 or 6 buses or no buses at all. 281 99:59:59,999 --> 99:59:59,999 The exact shape changes with the average 282 99:59:59,999 --> 99:59:59,999 but whether it is how many people will win the lottery jackpot each week 283 99:59:59,999 --> 99:59:59,999 or how many people will phone a call centre each minute 284 99:59:59,999 --> 99:59:59,999 the Poisson shape will give the probabilities. 285 99:59:59,999 --> 99:59:59,999 The wonderful example where this does apply is in the late 19th century 286 99:59:59,999 --> 99:59:59,999 was to count each year the number of Prussian officers 287 99:59:59,999 --> 99:59:59,999 cavalry officers that had be kicked to death by their horses 288 99:59:59,999 --> 99:59:59,999 Some year there were none, some years one, some years two,... up to seven. 289 99:59:59,999 --> 99:59:59,999 One particularly bad year. 290 99:59:59,999 --> 99:59:59,999 But with this distribution, how many years they go, one, two three, four, 291 99:59:59,999 --> 99:59:59,999 Prussian cavalry officers kicked to death by their horses 292 99:59:59,999 --> 99:59:59,999 beautifully obbey the Poisson distribution. 293 99:59:59,999 --> 99:59:59,999 So statisticians use shapes so we wield the patterns in the data 294 99:59:59,999 --> 99:59:59,999 but we also use images of all kinds to communicate statistics to a wider public 295 99:59:59,999 --> 99:59:59,999 because if the story in the numbers is told by a beautiful and clever image 296 99:59:59,999 --> 99:59:59,999 then everyone understands. 297 99:59:59,999 --> 99:59:59,999 Of the pioneers of statiscal graphics, my favourite is Florence Nightingale. 298 99:59:59,999 --> 99:59:59,999 There are not many people who realise that actually she was known as a passionate statistician 299 99:59:59,999 --> 99:59:59,999 and not just the Lady of the Lamp. 300 99:59:59,999 --> 99:59:59,999 She said that to understand God's thoughts we must study statistics 301 99:59:59,999 --> 99:59:59,999 for these are the measure of His purpose. 302 99:59:59,999 --> 99:59:59,999 Statistics must reserve a religious studio moral imperative. 303 99:59:59,999 --> 99:59:59,999 When Florence was nine years old, she started collecting data. 304 99:59:59,999 --> 99:59:59,999 Her data was different fruits and vegetables she found. 305 99:59:59,999 --> 99:59:59,999 Put them into different tables, trying to organise them in some standard form, 306 99:59:59,999 --> 99:59:59,999 so we have one of the Nightgale's first statistical tables at the age of nine. 307 99:59:59,999 --> 99:59:59,999 In the mid-1850s, Florence Nightingale went to Crimea 308 99:59:59,999 --> 99:59:59,999 to care for British casualties at war. 309 99:59:59,999 --> 99:59:59,999 She was horrified by what she discovered. 310 99:59:59,999 --> 99:59:59,999 For all the soldiers being blown to bits on the battlefield 311 99:59:59,999 --> 99:59:59,999 there were many many more soldiers dying from diseases 312 99:59:59,999 --> 99:59:59,999 caught in the army's filthy hospitals. 313 99:59:59,999 --> 99:59:59,999 So Florence Nightingale bagan counting the dead. 314 99:59:59,999 --> 99:59:59,999 For two years she recorded mortality data in meticulous detail. 315 99:59:59,999 --> 99:59:59,999 When the war was over, she persuaded the government 316 99:59:59,999 --> 99:59:59,999 to set up a Royal Comission of Enquiry. 317 99:59:59,999 --> 99:59:59,999 And gathered her data in a devastating report. 318 99:59:59,999 --> 99:59:59,999 What has amended her place in the statistically history books is the graphics she used. 319 99:59:59,999 --> 99:59:59,999 And one in particular, the Polar Area Graph. 320 99:59:59,999 --> 99:59:59,999 For each month of the war, a huge blue wedge represented the soldiers 321 99:59:59,999 --> 99:59:59,999 who had died of preventable diseases. 322 99:59:59,999 --> 99:59:59,999 The much smaller red wedges were deaths from wounds, 323 99:59:59,999 --> 99:59:59,999 and the black wedges deaths from accidents and other causes. 324 99:59:59,999 --> 99:59:59,999 Nightingale graphics were so clear, they were impossible to ignore. 325 99:59:59,999 --> 99:59:59,999 The usual thing around Florence Nightingale's time 326 99:59:59,999 --> 99:59:59,999 was just to produce tables and tables of figures. Absolutely tedious stuff. 327 99:59:59,999 --> 99:59:59,999 Unless you are a dedicated statistician, it's quite difficult to spot the patterns naturally. 328 99:59:59,999 --> 99:59:59,999 But visualisations tell a story. They tell a story immediately. 329 99:59:59,999 --> 99:59:59,999 The use of colour, the use of shape, can really tell a powerful story. 330 99:59:59,999 --> 99:59:59,999 And these days, we can make things move as well. 331 99:59:59,999 --> 99:59:59,999 Florence Nightingale would've loved to play with it, 332 99:59:59,999 --> 99:59:59,999 she would've produced wonderful animations, I'm absolutely certain about it. 333 99:59:59,999 --> 99:59:59,999 Today, a hundred and fifty years on, 334 99:59:59,999 --> 99:59:59,999 Nightingale's graphics are rightly regarded as a classic. 335 99:59:59,999 --> 99:59:59,999 They led to a revolution in nursing and health care, in hygiene in hospitals worldwide. 336 99:59:59,999 --> 99:59:59,999 We've saved innumerable lives. 337 99:59:59,999 --> 99:59:59,999 Statistical graphics has become an art of its very own. 338 99:59:59,999 --> 99:59:59,999 Led by designers who are passionate about visualising data. 339 99:59:59,999 --> 99:59:59,999 This is the Billion Pound O Gram. 340 99:59:59,999 --> 99:59:59,999 This image arouse out of the frustration with the reporting 341 99:59:59,999 --> 99:59:59,999 of billion-pounds amounts in the media. 342 99:59:59,999 --> 99:59:59,999 500 trillion pounds for this war, 50 million pounds for this hospital, 343 99:59:59,999 --> 99:59:59,999 this does not make sense, these figures are too enormous to get your mind around. 344 99:59:59,999 --> 99:59:59,999 So I squailed to this data from various news sources and created this diagram 345 99:59:59,999 --> 99:59:59,999 so the squares here are scaled according the the billion-pound amounts. 346 99:59:59,999 --> 99:59:59,999 When you see numbers visualised like this, 347 99:59:59,999 --> 99:59:59,999 you start to have a different kind of relationship with them. 348 99:59:59,999 --> 99:59:59,999 You can see patterns, see the scale of them. 349 99:59:59,999 --> 99:59:59,999 Here, this little square, 37 billion, this was the predicted cost of the Iraq war in 2003. 350 99:59:59,999 --> 99:59:59,999 As you can see it has grown exponentially over the last few years 351 99:59:59,999 --> 99:59:59,999 to the total cost of about 2,500 billion. 352 99:59:59,999 --> 99:59:59,999 It's funny because when you visualise statistics like this, you undestand them. 353 99:59:59,999 --> 99:59:59,999 And when you understand them, you can put things into perspective. 354 99:59:59,999 --> 99:59:59,999 Visualisation is right at the heart of my own work too. 355 99:59:59,999 --> 99:59:59,999 I teach Global Health. 356 99:59:59,999 --> 99:59:59,999 I know that having the data is not enough, 357 99:59:59,999 --> 99:59:59,999 I have to show it in ways people both enjoy and undestand. 358 99:59:59,999 --> 99:59:59,999 Now I'm going to try something I've never done before. 359 99:59:59,999 --> 99:59:59,999 Animating the data in real space. 360 99:59:59,999 --> 99:59:59,999 With a bit of technical assistance from the crew. 361 99:59:59,999 --> 99:59:59,999 So here we go! 362 99:59:59,999 --> 99:59:59,999 First an axis for health, life expectancy from 25 years to 75 years. 363 99:59:59,999 --> 99:59:59,999 Down here an axis for wealth, income per person, $400, $4,000 and $40,000. 364 99:59:59,999 --> 99:59:59,999 So down here is poor and sick. And up here is rich and healthy. 365 99:59:59,999 --> 99:59:59,999 Now I'm going to show you the world 200 years ago, in 1810. 366 99:59:59,999 --> 99:59:59,999 Here come all the countries: Europe brown, Asia red, 367 99:59:59,999 --> 99:59:59,999 Middle East green, Africa South-of-Sahara blue, and America is yellow. 368 99:59:59,999 --> 99:59:59,999 And the size of the country bubble shows the size of the population. 369 99:59:59,999 --> 99:59:59,999 And in 1810 it was pretty crowded down there, isn't it? 370 99:59:59,999 --> 99:59:59,999 All countries were sick and poor, life expectancy would be below 40 in all countries. 371 99:59:59,999 --> 99:59:59,999 Only the UK and the Netherlands were slightly better off, but not much. 372 99:59:59,999 --> 99:59:59,999 And now, I'll start the world! 373 99:59:59,999 --> 99:59:59,999 The Industrial Revolution makes countries in Europe and elsewhere move away from the rest. 374 99:59:59,999 --> 99:59:59,999 But the colonised countries in Asia and Africa are stuck down there. 375 99:59:59,999 --> 99:59:59,999 Eventually the Western countries get healthier and healthier. 376 99:59:59,999 --> 99:59:59,999 Now we slow down to see the impact of the First World War and the Spanish Flu Epidemy. 377 99:59:59,999 --> 99:59:59,999 What a catastrophe! 378 99:59:59,999 --> 99:59:59,999 Now I'll speed up through the 1920s and 1930s 379 99:59:59,999 --> 99:59:59,999 and spite of the Great Depression, Western countries fueled on towards greater wealth and health. 380 99:59:59,999 --> 99:59:59,999 Japan and some others try to follow but most countries stay down here. 381 99:59:59,999 --> 99:59:59,999 After the tragedies of the Second World War 382 99:59:59,999 --> 99:59:59,999 we stop a bit to look at the world in 1948. 383 99:59:59,999 --> 99:59:59,999 1948 was a great year, the war was over, Sweden topped the medal table at the Winter Olympics, 384 99:59:59,999 --> 99:59:59,999 and I was born, but the differences between the countries of the world was wider than ever. 385 99:59:59,999 --> 99:59:59,999 United States was in the front, Japan was catching up, Brasil was way behind, 386 99:59:59,999 --> 99:59:59,999 Iran was getting a little richer from oil, but still had short lives. 387 99:59:59,999 --> 99:59:59,999 The Asian giants, China, India, Pakistan, Bangladesh and Indonesia, 388 99:59:59,999 --> 99:59:59,999 they were still poor and sit down here. 389 99:59:59,999 --> 99:59:59,999 But look what is about to happen. In my lifetime, former colonies gained independence 390 99:59:59,999 --> 99:59:59,999 and finally they started to get healthier, and healthier, and healthier. 391 99:59:59,999 --> 99:59:59,999 And in the 1970s, countries in Asia and Latin America 392 99:59:59,999 --> 99:59:59,999 started to catch up with the Western countries. 393 99:59:59,999 --> 99:59:59,999 They became the emerging economies. 394 99:59:59,999 --> 99:59:59,999 Some in Africa follow, some in Africa are stuck in civil wars, and others are hit by HIV. 395 99:59:59,999 --> 99:59:59,999 And now we can see the world today, in the most up-to-date statistics. 396 99:59:59,999 --> 99:59:59,999 Most people today live in the middle, 397 99:59:59,999 --> 99:59:59,999 but here are huge differences at the same time 398 99:59:59,999 --> 99:59:59,999 between the best of countries and the worst of countries 399 99:59:59,999 --> 99:59:59,999 and there are also huge inequalities within countries. 400 99:59:59,999 --> 99:59:59,999 These bubbles show country averages, but I can split them. 401 99:59:59,999 --> 99:59:59,999 Take China, I can split it into provinces. 402 99:59:59,999 --> 99:59:59,999 There goes Shanghai, it has the same health and wealth as Italy today. 403 99:59:59,999 --> 99:59:59,999 And then there's the poor inland province of Guizhou. It's like Pakistan. 404 99:59:59,999 --> 99:59:59,999 And if I split it further, the rural parts are like Ghana in Africa. 405 99:59:59,999 --> 99:59:59,999 And yet, despite the enormous disparities today, we have seen 200 years of remarkable progress. 406 99:59:59,999 --> 99:59:59,999 That huge historical gap between the West and the rest is now closing. 407 99:59:59,999 --> 99:59:59,999 We have become an entirely new converging world. 408 99:59:59,999 --> 99:59:59,999 And I see a clear trend into the future, with aid, trade, green technology and peace. 409 99:59:59,999 --> 99:59:59,999 It's fully possible that everyone can make it to the healthy-wealthy corner. 410 99:59:59,999 --> 99:59:59,999 What you've just seen in the last few minutes is a story of 200 countries 411 99:59:59,999 --> 99:59:59,999 shown over 200 years and beyond. It involved plotting 120,000 numbers. 412 99:59:59,999 --> 99:59:59,999 Pretty neat, eh? 413 99:59:59,999 --> 99:59:59,999 With statistics we can start to see things as they really are. 414 99:59:59,999 --> 99:59:59,999 From tables of data, to averages, distributions and visualisations, 415 99:59:59,999 --> 99:59:59,999 statistics gives us a clear description of the world. 416 99:59:59,999 --> 99:59:59,999 But with statistics we can not only discover what is happening 417 99:59:59,999 --> 99:59:59,999 but also explore why, by using the powerful analytical method of correlation. 418 99:59:59,999 --> 99:59:59,999 Just looking at one thing at a time doesn't tell you very much. 419 99:59:59,999 --> 99:59:59,999 You have to look at the relationships between things. 420 99:59:59,999 --> 99:59:59,999 How they change. How they vary together. That's what correlation is about. 421 99:59:59,999 --> 99:59:59,999 That's how we start to understand the processes that are really going on 422 99:59:59,999 --> 99:59:59,999 in the world and in socierty. 423 99:59:59,999 --> 99:59:59,999 Most of us would recognise today that crime correlates to poverty, 424 99:59:59,999 --> 99:59:59,999 that infection correlates to poor sanitasion, 425 99:59:59,999 --> 99:59:59,999 and that knowledge of statistics correlates to being great at dancing. 426 99:59:59,999 --> 99:59:59,999 Correlations can be very tricky. 427 99:59:59,999 --> 99:59:59,999 I've got a joke about silly correlations. 428 99:59:59,999 --> 99:59:59,999 This was this American who was afraid of heart attack. 429 99:59:59,999 --> 99:59:59,999 He found out that the Japanese ate very little fat, and almost didn't drink wine, 430 99:59:59,999 --> 99:59:59,999 and have much less heart attacks than the American. 431 99:59:59,999 --> 99:59:59,999 But on the other hand, he found out that the French eat as much fat as the Americans 432 99:59:59,999 --> 99:59:59,999 and they drink much more wine, but they also have less heart attacks. 433 99:59:59,999 --> 99:59:59,999 so he concluded that what kills you is speaking English. 434 99:59:59,999 --> 99:59:59,999 The best example of a really ground-breaking correlation 435 99:59:59,999 --> 99:59:59,999 was the link that was established in the 1950s between smoking and lung cancer. 436 99:59:59,999 --> 99:59:59,999 Not long after the Second World War, a British doctor, Richard Doll, 437 99:59:59,999 --> 99:59:59,999 investigated lung cancer patients in twenty London hospitals, 438 99:59:59,999 --> 99:59:59,999 and he became certain that the only thing they had in common was smoking 439 99:59:59,999 --> 99:59:59,999 so certain that he stopped smoking himself. 440 99:59:59,999 --> 99:59:59,999 But other people weren't so sure. 441 99:59:59,999 --> 99:59:59,999 Lots of the discussion of early data linking smoking and lung cancer 442 99:59:59,999 --> 99:59:59,999 it can't be smoking, surely, that thing we've done all our lives, that can't be bad for you. 443 99:59:59,999 --> 99:59:59,999 Maybe it's genes, maybe people who are genetically predisposed to get lung cancer 444 99:59:59,999 --> 99:59:59,999 are also genetically predisposed to smoke. 445 99:59:59,999 --> 99:59:59,999 Maybe it's not the smoking, maybe it's air pollution, 446 99:59:59,999 --> 99:59:59,999 that smokers and somehow more exposed to air pollution than non-smokers. 447 99:59:59,999 --> 99:59:59,999 Maybe it's not smoking, maybe it's poverty. 448 99:59:59,999 --> 99:59:59,999 So now we have three possible explanations apart from chance. 449 99:59:59,999 --> 99:59:59,999 To verify his correlation did imply cause and effect 450 99:59:59,999 --> 99:59:59,999 Richard Doll created the biggest statistical study of smoking yet 451 99:59:59,999 --> 99:59:59,999 He began tracking the lives of 40,000 British doctors 452 99:59:59,999 --> 99:59:59,999 some of whom smoked, some of whom didn't. 453 99:59:59,999 --> 99:59:59,999 And gathered enough data to correlate the amount of doctors who smoked 454 99:59:59,999 --> 99:59:59,999 with their likelihood of getting cancer. 455 99:59:59,999 --> 99:59:59,999 Eventually, he did not only show a correlation between smoking and lung cancer 456 99:59:59,999 --> 99:59:59,999 but also a correlation between stopping smoking and reducing the risk. 457 99:59:59,999 --> 99:59:59,999 This was science at its best. 458 99:59:59,999 --> 99:59:59,999 What correlations do not replace is human thought. 459 99:59:59,999 --> 99:59:59,999 We could think about what it means. 460 99:59:59,999 --> 99:59:59,999 What a good scientist does if he comes up with a correlation 461 99:59:59,999 --> 99:59:59,999 is try as hard as he or she possibly can to disprove it 462 99:59:59,999 --> 99:59:59,999 to break it down, to get rid of it, to try to refute it, 463 99:59:59,999 --> 99:59:59,999 and if it withstands all those efforts at demolishing it, and it still standing out, 464 99:59:59,999 --> 99:59:59,999 then we might really have something here. 465 99:59:59,999 --> 99:59:59,999 However brilliants the scientists, data is still the oxygen of science. 466 99:59:59,999 --> 99:59:59,999 The good news is that the more we have, the more correlations we'll find, 467 99:59:59,999 --> 99:59:59,999 the more theories we'll test, and the more discoveries we are likely to make. 468 99:59:59,999 --> 99:59:59,999 And history shows how our total sum of information grows in huge leaps 469 99:59:59,999 --> 99:59:59,999 as we develop new technologies. 470 99:59:59,999 --> 99:59:59,999 The invention of the printing press kicked off the first data and information explosion 471 99:59:59,999 --> 99:59:59,999 If you piled up all the books that have been printed by the year 1700 472 99:59:59,999 --> 99:59:59,999 they would make sixty stacks, each as high as Mount Everest. 473 99:59:59,999 --> 99:59:59,999 Then, starting in the 19th century, there came a second information revolution. 474 99:59:59,999 --> 99:59:59,999 With the telegraph, gramophone, camera, and later radio and TV. 475 99:59:59,999 --> 99:59:59,999 The total amount of information exploded. 476 99:59:59,999 --> 99:59:59,999 And by the 1950s the information available to us all had multiplied six thousend times. 477 99:59:59,999 --> 99:59:59,999 Then, thanks to the computer, and later the Internet, we went digital, 478 99:59:59,999 --> 99:59:59,999 and the amount of data we have now, is unimaginably vast. 479 99:59:59,999 --> 99:59:59,999 A single letter printed in a book is the equivalent to a byte of data. 480 99:59:59,999 --> 99:59:59,999 A single page equals a kilobyte or two. 481 99:59:59,999 --> 99:59:59,999 Five megabytes is enough for the complete works of Shakespeare. 482 99:59:59,999 --> 99:59:59,999 10 gigabytes, that's a DVD movie. 483 99:59:59,999 --> 99:59:59,999 2 terabytes is the tens of millions of photos added to Facebook everyday. 484 99:59:59,999 --> 99:59:59,999 10 petabytes is the data recorded every second by the world's largest particle accelerator, 485 99:59:59,999 --> 99:59:59,999 so much only a tiny fraction is kept. 486 99:59:59,999 --> 99:59:59,999 6 exabytes is what you'd have if you sequenced the genomes of every single person on Earth. 487 99:59:59,999 --> 99:59:59,999 But really, that's nothing. In 2009, the Internet added up to 600 exabytes, 488 99:59:59,999 --> 99:59:59,999 and in 2010, in just one year, that will double to more than one zettabyte. 489 99:59:59,999 --> 99:59:59,999 But in the real world, if we turned all this data into print 490 99:59:59,999 --> 99:59:59,999 it would make ninety stacks of books, each reaching from here all the way to the Sun. 491 99:59:59,999 --> 99:59:59,999 The data deluge is staggering. But with today's computers and statistics, 492 99:59:59,999 --> 99:59:59,999 I'm confident we can handle it. 493 99:59:59,999 --> 99:59:59,999 When it comes to all the data on the Internet, 494 99:59:59,999 --> 99:59:59,999 the powerhouse of statistical analysis is the Sillicon Valley giant Google. 495 99:59:59,999 --> 99:59:59,999 The average person over their lifetime 496 99:59:59,999 --> 99:59:59,999 is exposed to about a hundred million words of conversation. 497 99:59:59,999 --> 99:59:59,999 So if you multiply that by the six billion people on the planet 498 99:59:59,999 --> 99:59:59,999 that amount of words is equal to the amount of words 499 99:59:59,999 --> 99:59:59,999 that Google has available at any one instant of time. 500 99:59:59,999 --> 99:59:59,999 Google's computers hoover up and file away 501 99:59:59,999 --> 99:59:59,999 every document, web page and image they can find. 502 99:59:59,999 --> 99:59:59,999 Then they hunt for patterns and correlations in all this data 503 99:59:59,999 --> 99:59:59,999 doing statistics on a massive scale. 504 99:59:59,999 --> 99:59:59,999 And for me, Google has one project that is particularly exciting: 505 99:59:59,999 --> 99:59:59,999 statistical language translation. 506 99:59:59,999 --> 99:59:59,999 If you do want to provide access to all the web's information 507 99:59:59,999 --> 99:59:59,999 no matter what language is spoken. 508 99:59:59,999 --> 99:59:59,999 There's so much information on the Internet, you can not hope to tranlate it all by hand 509 99:59:59,999 --> 99:59:59,999 into every possible language, we figured we have to be able to do machine translation. 510 99:59:59,999 --> 99:59:59,999 In the past, programmers tried to teach their computers to see each language as a set of grammatical rules. 511 99:59:59,999 --> 99:59:59,999 Much like languages are taught at school. 512 99:59:59,999 --> 99:59:59,999 But this didn't work, because no set of rules could capture language in all its subtlety and ambiguity, 513 99:59:59,999 --> 99:59:59,999 Having eaten out lunch, the coach departed. 514 99:59:59,999 --> 99:59:59,999 That's obviously incorrect. Written like that, it would imply that the coach has eaten the lunch. 515 99:59:59,999 --> 99:59:59,999 It would be far better to say: Having eaten our lunch, we departed in the coach. 516 99:59:59,999 --> 99:59:59,999 Those rules are helpful, they are useful most of the time, 517 99:59:59,999 --> 99:59:59,999 but they don't turn out to be true all the time. 518 99:59:59,999 --> 99:59:59,999 And the insight of using statistical machine translation 519 99:59:59,999 --> 99:59:59,999 is saying: if we have all these exceptions anyways, maybe you can get by without having any rules, 520 99:59:59,999 --> 99:59:59,999 maybe we can treat everything as an exception, and that's essentially what we've done. 521 99:59:59,999 --> 99:59:59,999 What the computer is doing when it's learning how to translate 522 99:59:59,999 --> 99:59:59,999 is to learn correlations between words and between phrases 523 99:59:59,999 --> 99:59:59,999 so we feed the system very large amounts of data 524 99:59:59,999 --> 99:59:59,999 and the the system sees if a certain word or phrase correlates very often to the other language. 525 99:59:59,999 --> 99:59:59,999 Google's website currently offers translation between any of 57 different languages. 526 99:59:59,999 --> 99:59:59,999 It does this purely statistically,having correlated the huge collection of multilingual texts. 527 99:59:59,999 --> 99:59:59,999 The people who built he system don't need to know Chinese 528 99:59:59,999 --> 99:59:59,999 in order to build the Chinese system. They dont need to know Arabic. 529 99:59:59,999 --> 99:59:59,999 The expertise that is needed is basically knowledge of statistics, of computer science, 530 99:59:59,999 --> 99:59:59,999 of infrastructure, 531 99:59:59,999 --> 99:59:59,999 to build these very large computer systems we are building for doing that. 532 99:59:59,999 --> 99:59:59,999 I hooked up with Google from my office in Stockholm, to try the translator by myself. 533 99:59:59,999 --> 99:59:59,999 I will type some Swedish sentences. 534 99:59:59,999 --> 99:59:59,999 (Types in Swedish) 535 99:59:59,999 --> 99:59:59,999 (Reads on the screen) Sweden's finance minister has a ponytail and a gold ring in your ear. 536 99:59:59,999 --> 99:59:59,999 It's almost exactly correct, it's amazing. 537 99:59:59,999 --> 99:59:59,999 He comes from the conservative party, that's the kind of Sweden we have today. 538 99:59:59,999 --> 99:59:59,999 I will type one more sentence. 539 99:59:59,999 --> 99:59:59,999 In his same-sex parnertships has Stockholm's new bishop and his partners a three-year son. 540 99:59:59,999 --> 99:59:59,999 It's almost perfect, there's one important thing, it's "her". 541 99:59:59,999 --> 99:59:59,999 It's a lesbian partnership. 542 99:59:59,999 --> 99:59:59,999 OK, those kinds of words like "her" are one of the challenges in translation, 543 99:59:59,999 --> 99:59:59,999 to get those right. 544 99:59:59,999 --> 99:59:59,999 When it comes to bishops, one can excuse it. 545 99:59:59,999 --> 99:59:59,999 Right, I think that more often than not it would be probably a "his". 546 99:59:59,999 --> 99:59:59,999 I will write one more sentence. (Reads aloud in Swedish) 547 99:59:59,999 --> 99:59:59,999 When Sweden is taking part in Olympic gold, is not to win but to beat Norway. 548 99:59:59,999 --> 99:59:59,999 But they are very good in Winter Olympics, so we can't make it, but we are trying. 549 99:59:59,999 --> 99:59:59,999 Very good, very good. 550 99:59:59,999 --> 99:59:59,999 This is absolutely amazing, 551 99:59:59,999 --> 99:59:59,999 and I'm impressed that it picked up words like "same-sex partnerships" 552 99:59:59,999 --> 99:59:59,999 which are very due to the language. 553 99:59:59,999 --> 99:59:59,999 The translator is good, but if it succeeds, what will be next, that'll be remarkable. 554 99:59:59,999 --> 99:59:59,999 One of the exciting possibilities is combining the machine translation technology 555 99:59:59,999 --> 99:59:59,999 with the speech recognition technology. 556 99:59:59,999 --> 99:59:59,999 Both of these are statistically neutre. 557 99:59:59,999 --> 99:59:59,999 The machine translation relies on the statistics of mapping from one language to another, 558 99:59:59,999 --> 99:59:59,999 and similarly speech recognition relies on the statistics of mapping from a sound form to the words. 559 99:59:59,999 --> 99:59:59,999 When we put them together, now we have the capability 560 99:59:59,999 --> 99:59:59,999 of having instant conversations between two people who don't speak a common language. 561 99:59:59,999 --> 99:59:59,999 I can talk to you in my language, you hear me in your language, 562 99:59:59,999 --> 99:59:59,999 and you can answer back in real time, we can make that translation, 563 99:59:59,999 --> 99:59:59,999 we can bring people together and allow them to speak. 564 99:59:59,999 --> 99:59:59,999 The Internet is just one of many technologies created to gather massives amount of data. 565 99:59:59,999 --> 99:59:59,999 Scientists studying our Earth and our environment 566 99:59:59,999 --> 99:59:59,999 now use an incredible range of instruments to measure the processes of our planet. 567 99:59:59,999 --> 99:59:59,999 All around us our sensors are continously measuring temperature, water flow and ocean currents. 568 99:59:59,999 --> 99:59:59,999 High in orbit our satellite is busy imaging cloud formations, forest growth and snow cover. 569 99:59:59,999 --> 99:59:59,999 Scientists speak of instrumenting the Earth. 570 99:59:59,999 --> 99:59:59,999 And pointing up to the skies above, 571 99:59:59,999 --> 99:59:59,999 our powerful new telescopes are mapping the Universe. 572 99:59:59,999 --> 99:59:59,999 What's happening in astronomy, is tipically how profoundly this torrent of data 573 99:59:59,999 --> 99:59:59,999 is transforming science. 574 99:59:59,999 --> 99:59:59,999 Astronomers are now addressing many enduring misteries of the cosmos 575 99:59:59,999 --> 99:59:59,999 by applying statistical methods to all this new data. 576 99:59:59,999 --> 99:59:59,999 The galaxy is a very big place and it has billions of starts in it 577 99:59:59,999 --> 99:59:59,999 so to put toghether a coherent picture of the whole galaxy requires 578 99:59:59,999 --> 99:59:59,999 having enourmous amounts of data, and before you can do a large sky survey 579 99:59:59,999 --> 99:59:59,999 with sensitive digital detectors, that you can map many stars at once, 580 99:59:59,999 --> 99:59:59,999 it's very difficult to gather enough data of enough of the galaxy. 581 99:59:59,999 --> 99:59:59,999 In the past, large surveys of the night sky had to be done 582 99:59:59,999 --> 99:59:59,999 by exposing thousands of large photographic plates, 583 99:59:59,999 --> 99:59:59,999 but these surveys could take 25 years or more to complete. 584 99:59:59,999 --> 99:59:59,999 Then, in the 1990s, came digital astronomy, 585 99:59:59,999 --> 99:59:59,999 and a huge increase in both the amount and the accesibility of data. 586 99:59:59,999 --> 99:59:59,999 The Sloan Sky Survey is the world's biggest yet 587 99:59:59,999 --> 99:59:59,999 using a massive digital sensor mounted 588 99:59:59,999 --> 99:59:59,999 on the back of a custom built telescope in New Mexico. 589 99:59:59,999 --> 99:59:59,999 It's scanned the sky night after night for eight years 590 99:59:59,999 --> 99:59:59,999 building up a composite picture in unprecedented resolution. 591 99:59:59,999 --> 99:59:59,999 The Sloan's is some of the best deepest survey data 592 99:59:59,999 --> 99:59:59,999 we have in astronomy, 593 99:59:59,999 --> 99:59:59,999 both in our galaxy and galaxies away from ours. 594 99:59:59,999 --> 99:59:59,999 All the Sloan data is on the Internet 595 99:59:59,999 --> 99:59:59,999 and with it astronomers have identified 596 99:59:59,999 --> 99:59:59,999 millions of hidden unknown stars and galaxies. 597 99:59:59,999 --> 99:59:59,999 They also comb the database for statistical patterns 598 99:59:59,999 --> 99:59:59,999 which will prove, disprove or suggest new theories. 599 99:59:59,999 --> 99:59:59,999 So we have this idea that galaxies grow 600 99:59:59,999 --> 99:59:59,999 they become large galaxies 601 99:59:59,999 --> 99:59:59,999 like the one we live in, the Milky Way. 602 99:59:59,999 --> 99:59:59,999 Not all at once, not smoothly 603 99:59:59,999 --> 99:59:59,999 but by continously incorporating 604 99:59:59,999 --> 99:59:59,999 cannibalising smaller galaxies 605 99:59:59,999 --> 99:59:59,999 they dissolve them and become part of the bigger galaxy 606 99:59:59,999 --> 99:59:59,999 It's a startling idea 607 99:59:59,999 --> 99:59:59,999 and in the Sloan data there's the evidence to support it. 608 99:59:59,999 --> 99:59:59,999 Groups of starts that came from cannibalised galaxies 609 99:59:59,999 --> 99:59:59,999 stand out in the Sloan data statistically different from other stars. 610 99:59:59,999 --> 99:59:59,999 because they move at a different velocity. 611 99:59:59,999 --> 99:59:59,999 Each big spike of one of these distribution graphs 612 99:59:59,999 --> 99:59:59,999 means professor Rockossi has found a group of stars 613 99:59:59,999 --> 99:59:59,999 all travelling in a different way to the rest. 614 99:59:59,999 --> 99:59:59,999 They are the telltale patterns she's looking for. 615 99:59:59,999 --> 99:59:59,999 The evidence is accumulating that in fact 616 99:59:59,999 --> 99:59:59,999 this really is how galaxies grow 617 99:59:59,999 --> 99:59:59,999 or an important way of how galaxies grow 618 99:59:59,999 --> 99:59:59,999 this is important to understand how galaxies form 619 99:59:59,999 --> 99:59:59,999 not only ours but every galaxy. 620 99:59:59,999 --> 99:59:59,999 The more data there is the more discoveries can be made 621 99:59:59,999 --> 99:59:59,999 and the technology is getting better all the time. 622 99:59:59,999 --> 99:59:59,999 The next big survey telescope starts its work in 2015. 623 99:59:59,999 --> 99:59:59,999 It will leave Sloan in the dust. 624 99:59:59,999 --> 99:59:59,999 Sloan has taken 8 eight years to cover one quarter of the nightsky. 625 99:59:59,999 --> 99:59:59,999 The new telescope will scan the entire sky in even greater resolution 626 99:59:59,999 --> 99:59:59,999 every three days. 627 99:59:59,999 --> 99:59:59,999 The vast amounts of data we have today 628 99:59:59,999 --> 99:59:59,999 allows researchers in all sorts of fields 629 99:59:59,999 --> 99:59:59,999 to test their theories in a previously unimaginable scale 630 99:59:59,999 --> 99:59:59,999 but it may even change the fundamental way science is done. 631 99:59:59,999 --> 99:59:59,999 With the power of todays' computers applied to all this data 632 99:59:59,999 --> 99:59:59,999 the machines might be able to guide the researchers. 633 99:59:59,999 --> 99:59:59,999 There is a profoundly important, one of the most significant points in science 634 99:59:59,999 --> 99:59:59,999 certainly one of the most exciting 635 99:59:59,999 --> 99:59:59,999 the potential to transform not only how scientists do science 636 99:59:59,999 --> 99:59:59,999 but what science is possibly. 637 99:59:59,999 --> 99:59:59,999 What will power that transformation of how science is done 638 99:59:59,999 --> 99:59:59,999 is going to be computation. 639 99:59:59,999 --> 99:59:59,999 Many of the dynamics of the natual world 640 99:59:59,999 --> 99:59:59,999 like the interplay between the rainforest and the atmosphere 641 99:59:59,999 --> 99:59:59,999 are so complex, that we don't yet really understand. 642 99:59:59,999 --> 99:59:59,999 But now computers are generating tens of thousands of simulations 643 99:59:59,999 --> 99:59:59,999 of how these biological systems might work. 644 99:59:59,999 --> 99:59:59,999 Is like creating thousands of hypothetical parellel worlds. 645 99:59:59,999 --> 99:59:59,999 Each of these simulations is analysed with statistics 646 99:59:59,999 --> 99:59:59,999 to see if any are a good match of what is observed in each. 647 99:59:59,999 --> 99:59:59,999 The computers can now automatically generate, 648 99:59:59,999 --> 99:59:59,999 test and discard hypothesis with scarcely human insight. 649 99:59:59,999 --> 99:59:59,999 This new application statistics will become 650 99:59:59,999 --> 99:59:59,999 absolutely vital for the future of science. 651 99:59:59,999 --> 99:59:59,999 It's creating a new paradigm in the way we do science 652 99:59:59,999 --> 99:59:59,999 which is characterised as data-centric or data-driven 653 99:59:59,999 --> 99:59:59,999 rather than hypothesis- or experiment-driven. 654 99:59:59,999 --> 99:59:59,999 It's an exciting time in terms of science, computation and statistics. 655 99:59:59,999 --> 99:59:59,999 If all this sounds a bit abstract to you 656 99:59:59,999 --> 99:59:59,999 how about one final frontier? 657 99:59:59,999 --> 99:59:59,999 Could statistics make sense of your feelings? 658 99:59:59,999 --> 99:59:59,999 In California, (where else!), one computer scientist 659 99:59:59,999 --> 99:59:59,999 is harvesting the Internet to try to define the patterns 660 99:59:59,999 --> 99:59:59,999 of our innermost thoughts and emotions. 661 99:59:59,999 --> 99:59:59,999 This is the Madness Movement 662 99:59:59,999 --> 99:59:59,999 it represents a skyscrapper's view of the world. 663 99:59:59,999 --> 99:59:59,999 Each brightly coloured dot is an individual feeling 664 99:59:59,999 --> 99:59:59,999 expressed by someone out there in a blog or a tweet 665 99:59:59,999 --> 99:59:59,999 and when you click on the dot 666 99:59:59,999 --> 99:59:59,999 it explodes to reveal the underlying feeling of that person. 667 99:59:59,999 --> 99:59:59,999 This is what people say they're feeling today: 668 99:59:59,999 --> 99:59:59,999 better 669 99:59:59,999 --> 99:59:59,999 safe 670 99:59:59,999 --> 99:59:59,999 crappy 671 99:59:59,999 --> 99:59:59,999 well 672 99:59:59,999 --> 99:59:59,999 pretty 673 99:59:59,999 --> 99:59:59,999 special 674 99:59:59,999 --> 99:59:59,999 sorry 675 99:59:59,999 --> 99:59:59,999 alone 676 99:59:59,999 --> 99:59:59,999 Every minute WeFeelFine crosses the world's blogs 677 99:59:59,999 --> 99:59:59,999 takes all the sentences that start with the words "I feel" or "I'm feeling" 678 99:59:59,999 --> 99:59:59,999 and push them into a database. 679 99:59:59,999 --> 99:59:59,999 We collect all the feelings and we count the most common 680 99:59:59,999 --> 99:59:59,999 better 681 99:59:59,999 --> 99:59:59,999 bad 682 99:59:59,999 --> 99:59:59,999 good 683 99:59:59,999 --> 99:59:59,999 right 684 99:59:59,999 --> 99:59:59,999 guilty 685 99:59:59,999 --> 99:59:59,999 sick 686 99:59:59,999 --> 99:59:59,999 the same 687 99:59:59,999 --> 99:59:59,999 like shit 688 99:59:59,999 --> 99:59:59,999 sorry 689 99:59:59,999 --> 99:59:59,999 well 690 99:59:59,999 --> 99:59:59,999 We can take a look at any one feeling and analyse it. 691 99:59:59,999 --> 99:59:59,999 Right now a lot of people are feeling happy. 692 99:59:59,999 --> 99:59:59,999 We can take a look at these people, and break them down by age, gender or location. 693 99:59:59,999 --> 99:59:59,999 Since bloggers have public profiles, we have that information 694 99:59:59,999 --> 99:59:59,999 and we can ask questions like, "Are women happier than men?" 695 99:59:59,999 --> 99:59:59,999 or "Is England happier than the United States?" 696 99:59:59,999 --> 99:59:59,999 We find that as people get older, they get happier. 697 99:59:59,999 --> 99:59:59,999 For younger people, happiness associates with excitement 698 99:59:59,999 --> 99:59:59,999 whereas older people associate happiness more with peacefulness. 699 99:59:59,999 --> 99:59:59,999 We also find than women feel loved more often than men, 700 99:59:59,999 --> 99:59:59,999 but also more guilty. 701 99:59:59,999 --> 99:59:59,999 While men feel good more often than women, but also more alone. 702 99:59:59,999 --> 99:59:59,999 As people live more and more of their lives online 703 99:59:59,999 --> 99:59:59,999 they leave behind digital traces 704 99:59:59,999 --> 99:59:59,999 with which we can statistically analyse 705 99:59:59,999 --> 99:59:59,999 what it means to be human. 706 99:59:59,999 --> 99:59:59,999 Where does all this leave us? 707 99:59:59,999 --> 99:59:59,999 We generate unimaginable quantities of data 708 99:59:59,999 --> 99:59:59,999 About everything you can think of 709 99:59:59,999 --> 99:59:59,999 and we analyse it to reveal the patterns. 710 99:59:59,999 --> 99:59:59,999 Now not only experts but all of us can understand 711 99:59:59,999 --> 99:59:59,999 the stories in the numbers. 712 99:59:59,999 --> 99:59:59,999 Instead of being led astray by prejudice 713 99:59:59,999 --> 99:59:59,999 with statistics at our fingertips, our eyes can be open 714 99:59:59,999 --> 99:59:59,999 for a facts-based view of the world. 715 99:59:59,999 --> 99:59:59,999 More than ever before we can become authors of our own destiny. 716 99:59:59,999 --> 99:59:59,999 And that's pretty exciting isn't it? 717 99:59:59,999 --> 99:59:59,999 (Music)