0:00:03.200,0:00:10.040 The world we live in is awash with[br]data that comes pouring in[br]from everywhere around us. 0:00:10.040,0:00:14.520 On its own this data[br]is just noise and confusion. 0:00:14.520,0:00:22.520 To make sense of data, to find the[br]meaning in it, we need the powerful[br]branch of science - statistics. 0:00:22.520,0:00:26.040 Believe me there's nothing[br]boring about statistics. 0:00:26.040,0:00:29.400 Especially not today[br]when we can make the data sing. 0:00:29.400,0:00:33.400 With statistics we can[br]really make sense of the world. 0:00:33.400,0:00:35.040 And there's more. 0:00:35.040,0:00:40.440 With statistics, the data deluge, as[br]it's being called, is leading us 0:00:40.440,0:00:46.240 to an ever greater understanding[br]of life on Earth[br]and the universe beyond. 0:00:46.240,0:00:50.760 And thanks to the incredible[br]power of today's computers, 0:00:50.760,0:00:57.040 it may fundamentally transform the[br]process of scientific discovery. 0:00:57.040,0:01:02.560 I kid you not, statistics is[br]now the sexiest subject around. 0:01:23.000,0:01:25.600 Did you know that there is[br]one million boats in Sweden? 0:01:25.600,0:01:27.960 That's one boat per nine people! 0:01:27.960,0:01:31.080 It's the highest number of[br]boats per person in Europe! 0:01:41.080,0:01:45.760 Being a statistician,[br]you don't like telling[br]your profession at dinner parties. 0:01:45.760,0:01:48.440 But really,[br]statisticians shouldn't be shy 0:01:48.440,0:01:51.320 because everyone wants to[br]understand what's going on. 0:01:51.320,0:01:56.480 And statistics gives us a[br]perspective on the world we live in 0:01:56.480,0:01:59.320 that we can't get in any other way. 0:02:03.520,0:02:09.000 Statistics tells us whether[br]the things we think[br]and believe are actually true. 0:02:19.960,0:02:25.440 And statistics are far more useful[br]than we usually like to admit. 0:02:25.440,0:02:29.600 In the last recession there[br]was this famous call-in[br]to a talk radio station. 0:02:29.600,0:02:37.280 The man complained, "In times like[br]this when unemployment rates are up[br]to 13%, income has fallen by 5%, 0:02:37.280,0:02:41.360 "and suicide rates are climbing, and[br]I get so angry that the government 0:02:41.360,0:02:45.520 "is wasting money on things like[br]collection of statistics." 0:02:48.240,0:02:50.360 I'm not officially a statistician. 0:02:50.360,0:02:55.280 Strictly speaking,[br]my field is global health. 0:02:58.120,0:03:03.280 But I got really obsessed with stats[br]when I realised how much people 0:03:03.280,0:03:06.240 in Sweden just don't know[br]about the rest of the world. 0:03:06.240,0:03:10.800 I started in our medical[br]university, Karolinska Institutet, 0:03:10.800,0:03:13.960 an undergraduate course[br]called Global Health. 0:03:13.960,0:03:17.360 These students coming to us actually[br]have the highest grade you can get 0:03:17.360,0:03:18.840 in the Swedish college system, 0:03:18.840,0:03:22.040 so I thought, "Maybe they know[br]everything I'm going to teach them." 0:03:22.040,0:03:25.680 So I did a pre-test when they came,[br]and one of the questions 0:03:25.680,0:03:28.160 from which I learned a lot[br]was this one - 0:03:28.160,0:03:32.360 which country has the highest[br]child mortality of these five pairs? 0:03:32.360,0:03:34.920 I won't put you at test here,[br]but it is Turkey 0:03:34.920,0:03:37.000 which is highest there, Poland, 0:03:37.000,0:03:40.760 Russia, Pakistan, and South Africa. 0:03:40.760,0:03:43.080 And these were the result of[br]the Swedish students. 0:03:43.080,0:03:44.760 A 1.8 right answer[br]out of five possible. 0:03:44.760,0:03:49.920 And that means there was a place for[br]a professor of International Health[br]and for my course. 0:03:49.920,0:03:56.360 But one late night[br]when I was compiling the report,[br]I really realised my discovery. 0:03:56.360,0:04:01.160 I had shown that Swedish[br]top students know statistically 0:04:01.160,0:04:04.480 significantly less about[br]the world than the chimpanzees. 0:04:06.000,0:04:09.840 Because the chimpanzees[br]would score half right. 0:04:09.840,0:04:12.320 If I gave them two bananas[br]with Sri Lanka and Turkey, 0:04:12.320,0:04:15.600 they would be right[br]half of the cases,[br]but the students are not there. 0:04:15.600,0:04:20.200 I did also an unethical study[br]of the professors of[br]the Karolinska Institutet, 0:04:20.200,0:04:25.520 that hands out the Nobel Prize[br]for medicine, and they are on par[br]with the chimpanzees there. 0:04:28.160,0:04:32.680 Today there's more information[br]accessible than ever before. 0:04:32.680,0:04:35.760 'And I work with my team at[br]the Gapminder Foundation 0:04:35.760,0:04:41.600 'using new tools that help everyone[br]make sense of the changing world. 0:04:41.600,0:04:45.320 'We draw on the masses of data[br]that's now freely available 0:04:45.320,0:04:49.720 'from international institutions[br]like the UN and the World Bank. 0:04:49.720,0:04:53.640 'And it's become my mission to[br]share the insights 0:04:53.640,0:05:00.200 'from this data with anyone who'll[br]listen, and to reveal how statistics[br]is nothing to be frightened of.' 0:05:02.440,0:05:05.040 I'm going to provide you a view of 0:05:05.040,0:05:09.000 the global health situation[br]across mankind. 0:05:09.000,0:05:14.160 And I'm going to do that in[br]hopefully an enjoyable way,[br]so relax. 0:05:14.160,0:05:17.120 So we did this software[br]which displays it like this. 0:05:17.120,0:05:19.320 Every bubble here is a country - 0:05:19.320,0:05:21.320 this is China, this is India. 0:05:21.320,0:05:23.560 The size of the bubble[br]is the population. 0:05:23.560,0:05:27.600 I'm going to stage a race between[br]this sort of yellowish Ford here 0:05:27.600,0:05:32.760 and the red Toyota down there[br]and the brownish Volvo. 0:05:32.760,0:05:36.440 The Toyota has a very bad start[br]down here, and United States, 0:05:36.440,0:05:38.280 Ford is going off-road there, 0:05:38.280,0:05:40.480 and the Volvo is doing quite fine,[br]this is the war. 0:05:40.480,0:05:43.680 The Toyota got off track, now Toyota[br]is on the healthier side of Sweden. 0:05:43.680,0:05:46.800 That's about where I sold[br]the Volvo and bought the Toyota. 0:05:46.800,0:05:47.960 AUDIENCE LAUGH 0:05:47.960,0:05:50.840 This is the great leap forward,[br]when China fell down. 0:05:50.840,0:05:53.080 It was the central planning[br]by Mao Zedong. 0:05:53.080,0:05:56.680 China recovered and said, "Never[br]more stupid central planning," 0:05:56.680,0:05:57.800 but they went up here. 0:05:57.800,0:06:02.560 No, there is one more inequity,[br]look there - United States 0:06:02.560,0:06:07.480 They broke my frame. Washington DC[br]is so rich over there, 0:06:07.480,0:06:13.040 but they are[br]not as healthy as Kerala in India.[br]It's quite interesting, isn't it? 0:06:13.040,0:06:14.600 LAUGHTER AND APPLAUSE 0:06:20.360,0:06:25.520 Welcome to the USA,[br]world leaders in big cars 0:06:25.520,0:06:28.480 and free data. 0:06:28.480,0:06:35.880 There are many here who share[br]my vision of making public data[br]accessible and useful for everyone. 0:06:35.880,0:06:43.440 The city of San Francisco[br]is in the lead, opening up[br]its data on everything. 0:06:43.440,0:06:47.480 Even the police department is[br]releasing all its crime reports. 0:06:47.480,0:06:50.840 This official[br]crime data has been turned 0:06:50.840,0:06:55.960 into a wonderful interactive map by[br]two of the city's computer whizzes. 0:06:55.960,0:06:58.920 It's community statistics in action. 0:07:09.400,0:07:13.320 Crimespotting is[br]a map of crime reports from the[br]San Francisco Police Department 0:07:13.320,0:07:16.120 showing dots on maps[br]for citizens to be able to see 0:07:16.120,0:07:19.320 patterns of crime around their[br]neighbourhoods in San Francisco. 0:07:19.320,0:07:25.080 The map is not just about individual[br]crimes but about broader patterns[br]that show you where crime is 0:07:25.080,0:07:27.760 clustered around the city, which[br]areas have high crime, 0:07:27.760,0:07:30.320 and which areas have[br]relatively low crime. 0:07:36.840,0:07:41.440 We're here at the top of[br]Jones Street on Nob Hill... 0:07:42.960,0:07:45.280 ..quite a nice neighbourhood. 0:07:45.280,0:07:49.600 What the crime maps show us[br]is the relationship between 0:07:49.600,0:07:51.360 topography and crime. 0:07:51.360,0:07:54.520 Basically the higher up the hill,[br]the less crime there is. 0:07:56.200,0:07:58.640 You cross over the border 0:07:58.640,0:08:00.240 into the flats... 0:08:02.800,0:08:09.240 Essentially as soon as you get[br]into the lower lying areas of Jones[br]Street the crime just skyrockets. 0:08:20.240,0:08:24.160 We're here in[br]the uptown Tenderloin district. 0:08:26.040,0:08:30.320 It's one of the oldest and densest[br]neighbourhoods in San Francisco. 0:08:30.320,0:08:32.400 This is where you go to buy drugs. 0:08:32.400,0:08:33.919 Right around here. 0:08:37.200,0:08:41.640 We see lots of aggravated assaults,[br]lots of auto thefts. 0:08:41.640,0:08:48.520 Basically a huge part of the crime[br]that happens in the city happens[br]in this five or six block radius. 0:08:55.640,0:08:58.920 If you've been hearing police sirens[br]in your neighbourhood, 0:08:58.920,0:09:02.000 you can use the map to find out why. 0:09:02.000,0:09:05.680 If you're out at night in[br]an unfamiliar part of town, 0:09:05.680,0:09:09.240 you can check the map[br]for streets to avoid. 0:09:09.240,0:09:12.400 If a neighbour gets burgled,[br]you can see - 0:09:12.400,0:09:16.520 is it a one-off or has there been[br]a spike in local crime? 0:09:16.520,0:09:19.480 If you commute through a[br]neighbourhood and you're worried 0:09:19.480,0:09:23.080 about its safety, the fact that we[br]have the ability to turn off all 0:09:23.080,0:09:25.360 the night-time[br]and middle-of-the-day crimes 0:09:25.360,0:09:28.280 and show you just the things that are[br]happening during the commute, 0:09:28.280,0:09:32.880 it is a statistical operation.[br]But I think to people that are[br]interacting with the thing 0:09:32.880,0:09:38.000 it feels very much more like they're[br]just sort of browsing a website[br]or shopping on Amazon. 0:09:38.000,0:09:43.520 They're looking at data[br]and they don't realise[br]they're doing statistics. 0:09:43.520,0:09:47.840 What's most exciting for me[br]is that public statistics 0:09:47.840,0:09:52.640 is making citizens more powerful and[br]the authorities more accountable. 0:10:02.360,0:10:04.760 We have community meetings that[br]the police attend 0:10:04.760,0:10:08.880 and what citizens are[br]now doing are bringing printouts 0:10:08.880,0:10:12.240 of the maps that show where crimes[br]are taking place, 0:10:12.240,0:10:16.120 and they're demanding services[br]from the police department 0:10:16.120,0:10:20.520 and the police department is now[br]having to change how they police, 0:10:20.520,0:10:22.960 how they provide policing services, 0:10:22.960,0:10:27.040 because the data is showing[br]what is working and what is not. 0:10:28.560,0:10:31.960 People in San Francisco[br]are also using public data 0:10:31.960,0:10:35.800 to map social inequalities[br]and see how to improve society. 0:10:35.800,0:10:39.720 And the possibilities are endless. 0:10:39.720,0:10:43.160 I think our dream[br]government data analysis project 0:10:43.160,0:10:46.240 would really be focused on[br]live information, 0:10:46.240,0:10:51.240 on stuff that was being reported[br]and pushed out to the world over[br]the internet as it was happening. 0:10:51.240,0:10:55.040 You know, trash pickups,[br]traffic accidents, buses, 0:10:55.040,0:10:57.680 and I think through the kind of[br]stats-gathering power 0:10:57.680,0:11:02.520 of the internet[br]it's possible to really begin[br]to see the workings of the city 0:11:02.520,0:11:04.760 displayed as a unified interface. 0:11:07.320,0:11:09.960 So that's where we are heading. 0:11:09.960,0:11:14.760 Towards a world of free data[br]with all the statistical[br]insights that come from it, 0:11:14.760,0:11:21.800 accessible to everyone, empowering[br]us as citizens and letting us[br]hold our rulers to account. 0:11:21.800,0:11:26.920 It's a long way from[br]where statistics began. 0:11:26.920,0:11:32.880 Statistics[br]are essential to us to monitor[br]our governments and our societies. 0:11:32.880,0:11:36.760 But it was our rulers up[br]there who started 0:11:36.760,0:11:40.840 the collection of statistics in the[br]first place in order to monitor us! 0:11:46.880,0:11:51.440 In fact the word 'statistics'[br]comes from 'the state'. 0:11:51.440,0:11:55.600 Modern statistics[br]began two centuries ago. 0:11:55.600,0:11:59.080 Once it got going,[br]it spread and never stopped. 0:11:59.080,0:12:01.640 And guess who was first! 0:12:03.280,0:12:07.560 The Chinese have Confucius,[br]the Italians have da Vinci, 0:12:07.560,0:12:10.240 and the British have Shakespeare. 0:12:10.240,0:12:12.440 And we have the Tabellverket - 0:12:12.440,0:12:16.400 the first ever systematic[br]collection of statistics! 0:12:16.400,0:12:21.640 Since the year 1749[br]we have collected data 0:12:21.640,0:12:26.920 on every birth, marriage and death,[br]and we are proud of it! 0:12:29.120,0:12:32.000 The Tabellverket recorded[br]information 0:12:32.000,0:12:34.040 from every parish in Sweden. 0:12:34.040,0:12:39.080 It was a huge quantity of data and[br]it was the first time any government 0:12:39.080,0:12:41.800 could get an accurate[br]picture of its people. 0:12:49.360,0:12:53.360 Sweden had been the greatest[br]military power in Northern Europe, 0:12:53.360,0:12:58.200 but by 1749 our star[br]was really fading 0:12:58.200,0:13:00.920 and other countries[br]were growing stronger. 0:13:00.920,0:13:03.600 At least we were a large power, 0:13:03.600,0:13:09.960 thought to have 20 million people,[br]enough to rival Britain and France. 0:13:13.400,0:13:18.160 But we were in for a nasty surprise. 0:13:18.160,0:13:20.680 The first analysis[br]of the Tabellverket 0:13:20.680,0:13:24.080 revealed that Sweden[br]only had two million inhabitants. 0:13:24.080,0:13:30.680 Sweden was not just a power[br]in decline, it also had[br]a very small population. 0:13:30.680,0:13:36.080 The government was horrified[br]by this finding -[br]what if the enemy found out? 0:13:37.840,0:13:44.560 But the Tabellverket also showed[br]that many women died in childbirth[br]and many children died young. 0:13:44.560,0:13:48.640 So government took action[br]to improve the health of the people. 0:13:48.640,0:13:52.440 This was the beginning[br]of modern Sweden. 0:13:53.960,0:13:59.000 It took more than 50 years before[br]the Austrians, Belgians, Danes, 0:13:59.000,0:14:02.320 Dutch, French, Germans, Italians 0:14:02.320,0:14:08.600 and, finally, the British,[br]caught up with Sweden[br]in collecting and using statistics. 0:14:24.640,0:14:29.640 It was called political arithmetic.[br]It was a lovely phrase[br]that was used for statistics. 0:14:29.640,0:14:33.160 Governments could have much more[br]control and understanding of 0:14:33.160,0:14:36.840 the society - how it was working,[br]how it was developing 0:14:36.840,0:14:40.240 and essentially[br]so they could control it better. 0:14:43.360,0:14:47.960 It wasn't just governments who[br]woke up to the power of statistics. 0:14:47.960,0:14:54.600 Right across Europe, 19th[br]century society went mad for facts. 0:14:54.600,0:14:57.600 And, despite its late start,[br]Britain, 0:14:57.600,0:15:01.400 with its Royal Statistical Society[br]in London, 0:15:01.400,0:15:04.000 was soon a statisticians' nirvana. 0:15:05.920,0:15:09.960 I love looking at old copies of[br]the Royal Statistical Society journal 0:15:09.960,0:15:11.760 because it's full of such odd stuff. 0:15:11.760,0:15:14.840 There's a wonderful paper[br]from the 1840s 0:15:14.840,0:15:19.200 which shows a map of England and[br]the rates of bastardy in each county. 0:15:19.200,0:15:23.560 So you can identify very quickly the[br]areas with high rates of bastardy. 0:15:23.560,0:15:27.240 Being in East Anglia it always[br]makes me slightly laugh that Norfolk 0:15:27.240,0:15:30.720 seems to top the "bastardy league"[br]in the 1840s. 0:15:30.720,0:15:36.800 One of the founders of[br]the Royal Statistical Society 0:15:36.800,0:15:42.120 was the great[br]Victorian mathematician[br]and inventor Charles Babbage. 0:15:42.120,0:15:50.120 In 1842 he read the latest[br]poem by an equally great Victorian,[br]Alfred Tennyson. 0:15:50.120,0:15:53.120 Vision of Sin contained the lines: 0:15:53.120,0:15:55.800 "Fill the cup, and fill the can 0:15:55.800,0:15:58.160 "Have a rouse before the morn 0:15:58.160,0:16:03.720 "Every moment dies a man[br]Every moment one is born." 0:16:03.720,0:16:07.360 So keen a statistician was Babbage[br]that he could not contain himself. 0:16:07.360,0:16:09.360 He dashed off a letter to Tennyson 0:16:09.360,0:16:12.200 explaining that because of[br]population growth, 0:16:12.200,0:16:13.640 the line should read, 0:16:13.640,0:16:18.640 "Every moment dies a man[br]and one and a 16th is born." 0:16:18.640,0:16:22.480 I may add that[br]the exact figure is 1.067, 0:16:22.480,0:16:27.200 but something must be[br]conceded to the laws of metre. 0:16:31.840,0:16:36.640 In the 19th century, scholars all[br]over Europe did amazing work 0:16:36.640,0:16:39.000 in measuring their societies. 0:16:39.000,0:16:42.600 They were hoovering up[br]data on almost everything. 0:16:42.600,0:16:46.040 But numbers alone[br]don't tell you anything. 0:16:46.040,0:16:51.320 You have to analyse them,[br]and that's what makes statistics. 0:16:55.760,0:16:59.200 When the first statisticians[br]began to get to grips with 0:16:59.200,0:17:00.400 analysing their data 0:17:00.400,0:17:05.760 they seized upon the average, and[br]they took the average of everything. 0:17:09.720,0:17:13.760 What's so great[br]about an average is that 0:17:13.760,0:17:18.640 you can take a whole mass of data[br]and reduce it to a single number. 0:17:21.880,0:17:26.119 And though each of us is unique,[br]our collective lives produce 0:17:26.119,0:17:29.880 averages that can[br]characterise whole populations. 0:17:41.280,0:17:45.360 I looked in my local newspaper[br]one week and saw a pensioner 0:17:45.360,0:17:49.440 had accidentally put her foot on[br]the accelerator 0:17:49.440,0:17:52.560 and crushed her friend[br]against a wall. 0:17:52.560,0:17:56.360 Devastating, hideous,[br]horrible thing to happen. 0:17:56.360,0:18:01.400 And then there was a second one about[br]a young man who didn't have 0:18:01.400,0:18:07.040 a driving licence, was driving a car[br]under the influence of drugs[br]and alcohol 0:18:07.040,0:18:10.320 and he bashed into a pedestrian[br]and killed him. 0:18:10.320,0:18:15.560 What's remarkable, absolutely[br]remarkable, if you look at the number 0:18:15.560,0:18:22.880 of people who die each year[br]in traffic crashes,[br]it's nearly a constant. 0:18:22.880,0:18:24.480 What? 0:18:24.480,0:18:31.680 All these individual events,[br]somehow when you sum them all up[br]there's the same number every year. 0:18:31.680,0:18:35.080 And every year, two and a half[br]times as many men 0:18:35.080,0:18:38.880 die in traffic crashes[br]as women, and it's a constant. 0:18:38.880,0:18:44.320 And every year the rate in Belgium[br]is double the rate in England. 0:18:44.320,0:18:47.160 There are these[br]remarkable regularities. 0:18:47.160,0:18:54.800 So that these individual[br]particular events sum up[br]into a social phenomenon. 0:18:56.560,0:18:58.120 Let's see what Sweden have done. 0:18:58.120,0:19:01.560 We used to boast about fast social[br]progress, that's where we were.... 0:19:01.560,0:19:05.240 'In my lectures, to tell stories[br]about the changing world, 0:19:05.240,0:19:08.120 'I use the averages[br]from entire countries, 0:19:08.120,0:19:12.160 'whether the average of income,[br]child mortality, family size 0:19:12.160,0:19:13.360 'or carbon output.' 0:19:13.360,0:19:16.200 OK, I give you Singapore.[br]The year I was born, 0:19:16.200,0:19:20.560 Singapore had twice the child[br]mortality of Sweden, the most[br]tropical country in the world, 0:19:20.560,0:19:22.920 a marshland on[br]the Equator, and here we go. 0:19:22.920,0:19:25.160 It took a little time for them[br]to get independent, 0:19:25.160,0:19:27.160 but then they started to grow[br]their economy, 0:19:27.160,0:19:29.840 and they made the social investment,[br]they got away malaria, 0:19:29.840,0:19:33.360 they got a magnificent health system[br]that beat both US and Sweden. 0:19:33.360,0:19:37.600 We never thought it would happen[br]that they would win over Sweden! 0:19:37.600,0:19:40.520 LAUGHTER AND APPLAUSE 0:19:40.520,0:19:46.400 But useful as averages are,[br]they don't tell you the whole story. 0:19:48.800,0:19:53.040 On average, Swedish people have[br]slightly less than two legs. 0:19:53.040,0:19:57.560 This is because few people[br]only have one leg or no legs, 0:19:57.560,0:19:59.760 and no-one has three legs. 0:19:59.760,0:20:06.240 So almost everybody in Sweden[br]has more than[br]the average number of legs. 0:20:06.240,0:20:10.840 The variation in data is just[br]as important as the average. 0:20:16.800,0:20:19.400 But how do you get[br]a handle on variation? 0:20:19.400,0:20:23.000 For this, you transform[br]numbers into shapes. 0:20:23.000,0:20:26.320 Let's look again at the number of[br]adult women in Sweden 0:20:26.320,0:20:27.800 for different heights. 0:20:27.800,0:20:31.800 Plotting the data as a shape[br]shows how much their heights 0:20:31.800,0:20:36.400 vary from the average[br]and how wide that variation is. 0:20:36.400,0:20:41.520 The shape a set of data makes[br]is called its distribution. 0:20:41.520,0:20:46.080 This is the income distribution[br]of China, 1970. 0:20:46.080,0:20:51.000 This is the income distribution[br]of the United States, 1970. 0:20:51.000,0:20:54.080 Almost no overlap,[br]and what has happened? 0:20:54.080,0:20:56.880 China is growing,[br]it's not so equal any longer, 0:20:56.880,0:21:01.120 and it's appearing here[br]overlooking the United States. 0:21:01.120,0:21:03.480 Almost like a ghost, isn't it? 0:21:03.480,0:21:05.160 It's pretty scary. 0:21:05.160,0:21:06.680 Rrrr! 0:21:06.680,0:21:08.200 LAUGHTER 0:21:17.160,0:21:21.280 The statisticians[br]who first explored distribution 0:21:21.280,0:21:25.760 discovered one shape[br]that turned up again and again. 0:21:25.760,0:21:28.120 The Victorian scholar[br]Francis Galton 0:21:28.120,0:21:32.400 was so fascinated he built[br]a machine that could reproduce it, 0:21:32.400,0:21:36.080 and he found it fitted so many[br]different sets of measurements 0:21:36.080,0:21:38.640 that he named it[br]the normal distribution. 0:21:38.640,0:21:45.600 Whether it was people's arm spans,[br]lung capacities, 0:21:45.600,0:21:47.400 or even their exam results, 0:21:47.400,0:21:51.360 the normal distribution shape[br]recurred time and time again. 0:21:51.360,0:21:56.360 Other statisticians soon found[br]many other regular shapes, 0:21:56.360,0:22:01.360 each produced by particular kinds[br]of natural or social processes. 0:22:01.360,0:22:05.400 And every statistician[br]has their favourite. 0:22:05.400,0:22:09.280 The Poisson distribution, the Poisson[br]shape is my favourite distribution. 0:22:09.280,0:22:11.120 I think it's an absolute cracker. 0:22:15.760,0:22:18.720 The Poisson shape[br]describes how likely it is 0:22:18.720,0:22:21.680 that out-of-the-ordinary things[br]will happen. 0:22:21.680,0:22:24.520 Imagine a London bus stop where[br]we know that on average 0:22:24.520,0:22:26.280 we'll get three buses in an hour. 0:22:26.280,0:22:29.280 We won't always get[br]three buses, of course. 0:22:29.280,0:22:33.480 Amazingly, the Poisson shape will[br]show us the probability 0:22:33.480,0:22:37.200 that in any given hour we will get[br]four, five, or six buses, 0:22:37.200,0:22:39.440 or no buses at all. 0:22:40.720,0:22:43.480 The exact shape changes[br]with the average. 0:22:43.480,0:22:46.920 But whether it's how many people[br]will win the lottery jackpot 0:22:46.920,0:22:48.000 each week, 0:22:48.000,0:22:51.200 or how many people will phone[br]a call centre each minute, 0:22:51.200,0:22:54.120 the Poisson shape[br]will give the probabilities. 0:22:57.240,0:23:01.240 The wonderful example where this was[br]applied to in the late 19th century 0:23:01.240,0:23:04.400 was to count each year the number of[br]Prussian officers, 0:23:04.400,0:23:07.520 cavalry officers, who were kicked[br]to death by their horses. 0:23:07.520,0:23:10.240 Now, some years there were none,[br]some years there were one, 0:23:10.240,0:23:13.880 some years there were two,[br]up to seven, I think,[br]one particularly bad year. 0:23:13.880,0:23:16.680 But with this distribution,[br]however many years there were 0:23:16.680,0:23:19.640 with nought, one, two, three,[br]four Prussian cavalry officers 0:23:19.640,0:23:23.880 kicked to death by their horses,[br]beautifully obeyed[br]the Poisson distribution. 0:23:42.800,0:23:48.520 So statisticians use shapes to[br]reveal the patterns in the data. 0:23:48.520,0:23:51.000 But we also use images of all kinds 0:23:51.000,0:23:54.480 to communicate statistics[br]to a wider public. 0:23:54.480,0:23:57.320 Because if the story in the numbers 0:23:57.320,0:24:02.920 is told by a beautiful and clever[br]image, then everyone understands. 0:24:02.920,0:24:09.640 Of the pioneers[br]of statistical graphics,[br]my favourite is Florence Nightingale. 0:24:24.280,0:24:27.120 There are not many people who realise[br]that she was known 0:24:27.120,0:24:30.520 as a passionate statistician[br]and not just the Lady of the Lamp. 0:24:30.520,0:24:34.720 She said that "to understand God's[br]thoughts, we must study statistics, 0:24:34.720,0:24:37.080 "for these are[br]the measure of His purpose." 0:24:37.080,0:24:40.520 Statistics was for her a religious[br]duty and moral imperative. 0:24:42.080,0:24:45.360 When Florence was nine years old[br]she started collecting data. 0:24:45.360,0:24:48.320 Her data was different[br]fruits and vegetables she found. 0:24:48.320,0:24:50.080 Put them into different tables. 0:24:50.080,0:24:52.640 Trying to organise them[br]in some standard form. 0:24:52.640,0:24:55.640 And so we have one of Nightingale's[br]first statistical tables 0:24:55.640,0:24:57.440 at the age of nine. 0:25:04.360,0:25:11.440 In the mid 1850s Florence[br]Nightingale went to the Crimea to[br]care for British casualties of war. 0:25:11.440,0:25:14.400 She was horrified by[br]what she discovered. 0:25:14.400,0:25:19.920 For all the soldiers being blown[br]to bits on the battlefield,[br]there were many, many more soldiers 0:25:19.920,0:25:25.200 dying from diseases they caught[br]in the army's filthy hospitals. 0:25:25.200,0:25:29.120 So Florence Nightingale[br]began counting the dead. 0:25:29.120,0:25:34.920 For two years she recorded[br]mortality data in meticulous detail. 0:25:34.920,0:25:39.120 When the war was over she persuaded[br]the government to set up 0:25:39.120,0:25:41.360 a Royal Commission of Inquiry, 0:25:41.360,0:25:44.680 and gathered her data[br]in a devastating report. 0:25:44.680,0:25:48.480 What has cemented her place in[br]the statistical history books 0:25:48.480,0:25:50.120 are the graphics she used. 0:25:50.120,0:25:53.960 And one in particular,[br]the polar area graph. 0:25:53.960,0:25:58.680 For each month of the war,[br]a huge blue wedge represented 0:25:58.680,0:26:02.200 the soldiers who had died[br]from preventable diseases. 0:26:02.200,0:26:05.560 The much smaller red wedges were[br]deaths from wounds, 0:26:05.560,0:26:10.600 and the black wedges were deaths[br]from accidents and other causes. 0:26:10.600,0:26:17.040 Nightingale's graphics were so clear[br]they were impossible to ignore. 0:26:17.040,0:26:19.360 The usual thing around[br]Florence Nightingale's time 0:26:19.360,0:26:23.920 was just to produce tables and[br]tables of figures - absolutely[br]really tedious stuff that, 0:26:23.920,0:26:26.320 unless you're an absolutely dedicated[br]statistician, 0:26:26.320,0:26:29.240 it's really quite difficult to spot[br]the patterns quite naturally. 0:26:29.240,0:26:33.480 But visualisations, they tell a[br]story, they tell a story immediately. 0:26:33.480,0:26:38.480 And the use of colour[br]and the use of shape can[br]really tell a powerful story. 0:26:38.480,0:26:41.280 And nowadays of course[br]we can make things move as well. 0:26:41.280,0:26:44.320 Florence Nightingale would have[br]loved to have played with... 0:26:44.320,0:26:48.800 She would have[br]produced wonderful animations,[br]I'm absolutely certain of it. 0:26:50.880,0:26:54.800 Today, 150 years on,[br]Nightingale's graphics 0:26:54.800,0:26:57.800 are rightly regarded as a classic. 0:26:57.800,0:27:00.600 They led to a revolution[br]in nursing, health care 0:27:00.600,0:27:05.880 and hygiene in hospitals worldwide,[br]which saved innumerable lives. 0:27:07.400,0:27:11.040 And statistical graphics has[br]become an art form of its very own, 0:27:11.040,0:27:16.280 led by designers who are[br]passionate about visualising data. 0:27:24.640,0:27:27.120 This is the Billion Pound-O-Gram. 0:27:27.120,0:27:29.120 This image arose out of frustration 0:27:29.120,0:27:32.280 with the reporting of billion pound[br]amounts in the media. 0:27:32.280,0:27:34.400 £500 billion pounds for this war. 0:27:34.400,0:27:36.000 £50 billion for this oil spill. 0:27:36.000,0:27:39.440 It doesn't make sense -[br]the numbers are too enormous[br]to get your mind round. 0:27:39.440,0:27:43.520 So I scraped all this data[br]from various news sources[br]and created this diagram. 0:27:43.520,0:27:48.680 So the[br]squares here are scaled according[br]to the billion pound amounts. 0:27:48.680,0:27:51.840 When you see numbers visualised[br]like this 0:27:51.840,0:27:54.240 you start to have a different[br]relationship with them. 0:27:54.240,0:27:56.840 You can start to see the patterns,[br]and the scale of them. 0:27:56.840,0:27:59.600 Here in the corner,[br]this little square - £37 billion. 0:27:59.600,0:28:02.800 This was the predicted cost[br]of the Iraq war in 2003. 0:28:02.800,0:28:06.480 As you can see it's grown[br]exponentially over the last few years 0:28:06.480,0:28:10.560 and the total cost now is[br]around about £2,500 billion. 0:28:10.560,0:28:13.000 It's funny because when[br]you visualise statistics 0:28:13.000,0:28:15.360 you understand them,[br]and when you understand them 0:28:15.360,0:28:18.400 you can really start to put things[br]in perspective. 0:28:23.960,0:28:27.880 Visualisation is right at[br]the heart of my own work too. 0:28:27.880,0:28:30.160 I teach global health. 0:28:30.160,0:28:33.840 And I know having the data[br]is not enough - 0:28:33.840,0:28:39.160 I have to show it in ways people[br]both enjoy and understand. 0:28:39.160,0:28:42.960 Now I'm going to try something[br]I've never done before. 0:28:42.960,0:28:45.960 Animating the data in real space, 0:28:45.960,0:28:50.480 with a bit of technical[br]assistance from the crew. 0:28:50.480,0:28:52.240 So here we go. 0:28:52.240,0:28:54.200 First, an axis for health. 0:28:54.200,0:28:58.920 Life expectancy[br]from 25 years to 75 years. 0:28:58.920,0:29:01.440 And down here an axis for wealth. 0:29:01.440,0:29:06.720 Income per person -[br]400, 4,000, 40,000. 0:29:06.720,0:29:10.480 So down here is poor and sick. 0:29:10.480,0:29:14.280 And up here is rich and healthy. 0:29:14.280,0:29:18.320 Now I'm going to show you the world 0:29:18.320,0:29:21.080 200 years ago, in 1810. 0:29:21.080,0:29:22.880 Here come all the countries. 0:29:22.880,0:29:26.200 Europe, brown;[br]Asia, red; Middle East, green; 0:29:26.200,0:29:29.440 Africa south of the Sahara,[br]blue; and the Americas, yellow. 0:29:29.440,0:29:33.760 And the size of the country bubble[br]shows the size of the population. 0:29:33.760,0:29:37.560 In 1810, it was pretty crowded[br]down there, wasn't it? 0:29:37.560,0:29:39.760 All countries were sick and poor. 0:29:39.760,0:29:43.360 Life expectancy[br]was below 40 in all countries. 0:29:43.360,0:29:48.680 And only UK and the Netherlands were[br]slightly better off. But not much. 0:29:48.680,0:29:52.520 And now I start the world. 0:29:52.520,0:29:56.840 The industrial revolution makes[br]countries in Europe and elsewhere 0:29:56.840,0:29:59.040 move away from the rest. 0:29:59.040,0:30:02.280 But the colonized countries[br]in Asia and Africa, 0:30:02.280,0:30:04.040 they are stuck down there. 0:30:04.040,0:30:08.200 And eventually the Western countries[br]get healthier and healthier. 0:30:08.200,0:30:13.320 And now we slow down to show[br]the impact of the First World War 0:30:13.320,0:30:15.880 and the Spanish flu epidemic. 0:30:15.880,0:30:18.320 What a catastrophe! 0:30:18.320,0:30:22.640 And now I speed up through[br]the 1920s and the 1930s and, 0:30:22.640,0:30:24.400 in spite of the Great Depression, 0:30:24.400,0:30:27.800 Western countries forge on towards[br]greater wealth and health. 0:30:27.800,0:30:29.880 Japan and some others try to follow. 0:30:29.880,0:30:32.560 But most countries stay down here. 0:30:32.560,0:30:35.640 And after the tragedies[br]of the Second World War, 0:30:35.640,0:30:39.400 we stop a bit to look[br]at the world in 1948. 0:30:39.400,0:30:42.080 1948 was a great year. 0:30:42.080,0:30:43.280 The war was over, 0:30:43.280,0:30:48.000 Sweden topped the medal table at[br]the Winter Olympics and I was born. 0:30:48.000,0:30:51.280 But the differences between[br]the countries of the world 0:30:51.280,0:30:52.680 was wider than ever. 0:30:52.680,0:30:54.960 United States was in the front. 0:30:54.960,0:30:56.840 Japan was catching up. 0:30:56.840,0:30:58.400 Brazil was way behind, 0:30:58.400,0:31:03.040 Iran was getting a little richer[br]from oil but still had short lives. 0:31:03.040,0:31:05.160 And the Asian giants... 0:31:05.160,0:31:08.720 China, India, Pakistan, Bangladesh,[br]and Indonesia, 0:31:08.720,0:31:11.360 they were still[br]poor and sick down here. 0:31:11.360,0:31:14.360 But look what was about to happen![br]Here we go again. 0:31:14.360,0:31:18.640 In my lifetime, former colonies[br]gained independence and then finally 0:31:18.640,0:31:22.640 they started to get healthier[br]and healthier and healthier. 0:31:22.640,0:31:26.080 And in the 1970s, then countries[br]in Asia and Latin America 0:31:26.080,0:31:28.960 started to catch up[br]with the Western countries. 0:31:28.960,0:31:31.240 They became the emerging economies. 0:31:31.240,0:31:32.640 Some in Africa follows, 0:31:32.640,0:31:36.440 some Africans were stuck in civil[br]war, and others were hit by HIV. 0:31:36.440,0:31:41.840 And now we can see the world[br]in the most up-to-date statistics. 0:31:42.840,0:31:45.480 Most people today[br]live in the middle. 0:31:45.480,0:31:48.080 But there is huge difference[br]at the same time 0:31:48.080,0:31:51.520 between the best-off countries[br]and the worst-off countries. 0:31:51.520,0:31:54.520 And there are also huge[br]inequalities within countries. 0:31:54.520,0:31:59.000 These bubbles show country averages[br]but I can split them. 0:31:59.000,0:32:02.120 Take China. I can split it[br]into provinces. 0:32:02.120,0:32:05.120 There goes Shanghai... 0:32:05.120,0:32:08.000 It has the same health[br]and wealth as Italy today. 0:32:08.000,0:32:11.240 And there[br]is the poor inland province Guizhou, 0:32:11.240,0:32:12.680 it is like Pakistan. 0:32:12.680,0:32:18.800 And if I split it further, the rural[br]parts are like Ghana in Africa. 0:32:19.800,0:32:23.160 And yet, despite the enormous[br]disparities today, 0:32:23.160,0:32:27.240 we have seen 200 years[br]of remarkable progress! 0:32:27.240,0:32:31.720 That huge historical gap between[br]the west and the rest is now closing. 0:32:31.720,0:32:35.640 We have become an entirely[br]new, converging world. 0:32:35.640,0:32:37.960 And I see a clear trend[br]into the future. 0:32:37.960,0:32:40.840 With aid, trade, green[br]technology and peace, 0:32:40.840,0:32:43.720 it's fully possible[br]that everyone can make it 0:32:43.720,0:32:45.640 to the healthy, wealthy corner. 0:32:48.000,0:32:51.360 Well, what you've just seen[br]in the last few minutes 0:32:51.360,0:32:56.520 is a story of 200 countries[br]shown over 200 years and beyond. 0:32:56.520,0:33:00.960 It involved plotting[br]120,000 numbers. 0:33:00.960,0:33:02.560 Pretty neat, huh? 0:33:07.960,0:33:13.120 So, with statistics, we can begin[br]to see things as they really are. 0:33:13.120,0:33:18.200 From tables of data to averages,[br]distributions and visualisations, 0:33:18.200,0:33:22.640 statistics gives us a[br]clear description of the world. 0:33:22.640,0:33:28.200 But, with statistics, we can[br]not only discover WHAT is happening 0:33:28.200,0:33:30.520 but also explore WHY, 0:33:30.520,0:33:34.480 by using the powerful analytical[br]method - correlation. 0:33:35.480,0:33:38.400 Just looking at one thing at a[br]time doesn't tell you very much. 0:33:38.400,0:33:41.280 You've got to look at the[br]relationships between things, 0:33:41.280,0:33:43.360 how they change,[br]how they vary together. 0:33:43.360,0:33:45.360 That's what correlation is about. 0:33:45.360,0:33:48.320 That's how you start trying[br]to understand the processes 0:33:48.320,0:33:50.960 that are really going on[br]in the world and society. 0:33:52.480,0:33:57.000 Most of us today would recognise[br]that crime correlates to poverty, 0:33:57.000,0:34:00.200 that infection correlates[br]to poor sanitation, 0:34:00.200,0:34:02.600 and that knowledge of statistics[br]correlates 0:34:02.600,0:34:05.040 to being great at dancing! 0:34:06.560,0:34:10.199 Correlations can be very tricky. 0:34:10.199,0:34:12.960 I got a joke about[br]silly correlations. 0:34:12.960,0:34:15.840 There was this American who[br]was afraid of heart attack. 0:34:15.840,0:34:19.920 He found out that[br]the Japanese ate very little fat 0:34:19.920,0:34:22.320 and almost didn't drink wine, 0:34:22.320,0:34:25.520 but they had much less[br]heart attacks than the Americans. 0:34:25.520,0:34:28.639 But, on the other hand,[br]he also found out that the French 0:34:28.639,0:34:35.080 eat as much fat as the Americans[br]and they drink much more wine but[br]they also have less heart attacks. 0:34:35.080,0:34:40.840 So he concluded that what kills you[br]is speaking English. 0:34:40.840,0:34:43.920 # Smoke, smoke,[br]smoke that cigarette 0:34:43.920,0:34:48.000 # Puff, puff, puff and if you[br]smoke yourself to death... # 0:34:48.000,0:34:51.920 The time, the pace,[br]the cigarette. Weights Tilt. 0:34:51.920,0:34:56.199 The best example of a really[br]ground-breaking correlation 0:34:56.199,0:35:01.640 is the link that was established[br]in the 1950s between[br]smoking and lung cancer. 0:35:01.640,0:35:07.040 Not long after the Second World War,[br]a British doctor, Richard Doll, 0:35:07.040,0:35:11.040 investigated lung cancer patients[br]in 20 London hospitals. 0:35:11.040,0:35:15.400 And he became certain[br]that the only thing they had[br]in common was smoking. 0:35:15.400,0:35:18.280 So certain,[br]that he stopped smoking himself. 0:35:18.280,0:35:22.160 But other people weren't so sure. 0:35:22.160,0:35:25.400 A lot of the discussion[br]of the early data, 0:35:25.400,0:35:29.120 linking smoking to lung cancer, said,[br]"It's not the smoking, surely, 0:35:29.120,0:35:32.600 "that thing we've done all our lives,[br]that can't be bad for you. 0:35:32.600,0:35:35.000 "Maybe it's genes. 0:35:35.000,0:35:39.080 "Maybe people who are genetically[br]predisposed to get lung cancer 0:35:39.080,0:35:43.840 "are also genetically[br]predisposed to smoke." 0:35:43.840,0:35:47.360 "Maybe it's not the smoking,[br]maybe it's air pollution - 0:35:47.360,0:35:52.520 "that smokers are somehow[br]more exposed to air pollution[br]than non-smokers. 0:35:52.520,0:35:56.280 "Maybe it's not smoking,[br]maybe it's poverty." 0:35:56.280,0:36:00.720 So now we've got three alternative[br]explanations, apart from chance. 0:36:02.240,0:36:06.760 To verify his correlation[br]did imply cause and effect. 0:36:06.760,0:36:10.680 Richard Doll created the biggest[br]statistical study of smoking yet. 0:36:10.680,0:36:14.680 He began tracking the lives[br]of 40,000 British doctors, 0:36:14.680,0:36:17.000 some of whom smoked[br]and some of whom didn't, 0:36:17.000,0:36:19.440 and gathered enough data 0:36:19.440,0:36:22.000 to correlate the amount[br]the doctors smoked 0:36:22.000,0:36:24.920 with their likelihood[br]of getting cancer. 0:36:24.920,0:36:30.120 Eventually, he not only[br]showed a correlation between[br]smoking and lung cancer, 0:36:30.120,0:36:35.800 but also a correlation[br]between stopping smoking[br]and reducing the risk. 0:36:35.800,0:36:37.760 This was science at its best. 0:36:39.760,0:36:44.000 What correlations do not replace[br]is human thought. 0:36:44.000,0:36:46.760 You've got to think[br]about what it means. 0:36:46.760,0:36:50.480 What a good scientist does,[br]if he comes with a correlation, 0:36:50.480,0:36:55.960 is try as hard as she or he[br]possibly can to disprove it, 0:36:55.960,0:37:00.200 to break it down, to get rid of it,[br]to try and refute it. 0:37:00.200,0:37:05.440 And if it withstands[br]all those efforts at demolishing it 0:37:05.440,0:37:10.760 and it is still standing up then,[br]cautiously, you say, "We really[br]might have something here." 0:37:26.720,0:37:32.840 However brilliant the scientist,[br]data is still the oxygen of science. 0:37:32.840,0:37:39.320 The good news is that the more we[br]have, the more correlations we'll[br]find, the more theories we'll test, 0:37:39.320,0:37:42.240 and the more discoveries[br]we're likely to make. 0:37:46.160,0:37:53.440 And history shows how our total sum[br]of information grows in huge leaps[br]as we develop new technologies. 0:37:53.440,0:38:00.000 The invention of the[br]printing press kicked off the first[br]data and information explosion. 0:38:00.000,0:38:06.000 If you piled up all the books that[br]had been printed by the year 1700, 0:38:06.000,0:38:11.200 they would make 60 stacks[br]each as high as Mount Everest. 0:38:12.880,0:38:15.360 Then, starting in the 19th century, 0:38:15.360,0:38:19.880 there came a second information[br]revolution with the telegraph, 0:38:19.880,0:38:23.960 gramophone and camera.[br]And later radio and TV. 0:38:23.960,0:38:28.200 The total amount[br]of information exploded. 0:38:28.200,0:38:35.200 And by the 1950s[br]the information available to us all[br]had multiplied 6,000 times. 0:38:35.200,0:38:41.440 Then, thanks to the computer and[br]later the internet, we went digital. 0:38:41.440,0:38:47.200 And the amount of data we have now[br]is unimaginably vast. 0:38:49.920,0:38:55.080 A single letter printed in a book[br]is equivalent to a byte of data. 0:38:55.080,0:38:58.720 A printed page[br]equals a kilobyte or two. 0:39:01.960,0:39:06.240 Five megabytes is enough for[br]the complete works of Shakespeare. 0:39:08.000,0:39:11.680 10 gigabytes - that's a DVD movie. 0:39:16.840,0:39:23.360 Two terabytes[br]is the tens of millions of photos[br]added to Facebook every day. 0:39:24.880,0:39:32.200 Ten petabytes is the data recorded[br]every second by the world's[br]largest particle accelerator. 0:39:32.200,0:39:35.800 So much[br]only a tiny fraction is kept. 0:39:35.800,0:39:43.440 Six exabytes is what you'd have[br]if you sequenced the genomes[br]of every single person on Earth. 0:39:48.680,0:39:50.520 But really, that's nothing. 0:39:50.520,0:39:55.080 In 2009, the internet[br]added up to 500 exabytes. 0:39:55.080,0:40:02.120 In 2010, in just one year, that will[br]double to more than one zettabyte! 0:40:06.360,0:40:14.000 Back in the real world, if we[br]turned all this data into print[br]it would make 90 stacks of books, 0:40:14.000,0:40:18.560 each reaching from here[br]all the way to the sun! 0:40:18.560,0:40:23.600 The data deluge is staggering,[br]but, with today's computers 0:40:23.600,0:40:28.200 and statistics,[br]I'm confident we can handle it. 0:40:28.200,0:40:31.400 When it comes to all the data[br]on the internet, 0:40:31.400,0:40:33.760 the powerhouse[br]of statistical analysis 0:40:33.760,0:40:37.560 is the Silicon Valley giant Google. 0:40:44.000,0:40:50.600 The average person over their[br]lifetime is exposed to about 100[br]million words of conversation. 0:40:50.600,0:40:54.840 And so if you multiple that by the[br]six billion people on the planet, 0:40:54.840,0:40:58.040 that amount of words is about[br]equal to the number of words 0:40:58.040,0:41:01.080 that Google has available[br]at any one instant in time. 0:41:03.480,0:41:08.680 Google's computers hoover up[br]and file away every document,[br]web page, and image they can find. 0:41:08.680,0:41:14.640 They then hunt for patterns and[br]correlations in all this data, 0:41:14.640,0:41:17.760 doing statistics on a massive scale. 0:41:17.760,0:41:25.560 And, for me, Google has one project[br]that's particularly exciting -[br]statistical language translation. 0:41:25.560,0:41:30.880 We wanted to provide access[br]to all the web's information,[br]no matter what language you spoke. 0:41:30.880,0:41:33.520 There's just so much information[br]on the internet, 0:41:33.520,0:41:37.880 you couldn't hope to translate it all[br]by hand into every possible language. 0:41:37.880,0:41:41.560 We figured we'd have to be able[br]to do machine translation. 0:41:44.280,0:41:47.360 In the past, programmers[br]tried to teach their computers 0:41:47.360,0:41:53.320 to see each language as a set of[br]grammatical rules - much like the[br]way languages are taught at school. 0:41:53.320,0:41:58.760 But this didn't work because no set[br]of rules could capture a language 0:41:58.760,0:42:01.480 in all its subtlety and ambiguity. 0:42:01.480,0:42:05.840 "Having eaten our lunch[br]the coach departed." 0:42:05.840,0:42:07.920 Well, that's obviously incorrect. 0:42:07.920,0:42:12.000 Written like that it would imply[br]that the coach has eaten the lunch. 0:42:12.000,0:42:15.160 It would be far better to say... 0:42:15.160,0:42:19.920 "having eaten our lunch[br]we departed in the coach." 0:42:19.920,0:42:26.320 Those rules are helpful and they are[br]useful most of time, but they don't[br]turn out to be true all the time. 0:42:26.320,0:42:30.320 And the insight of using statistical[br]machine translation is saying, 0:42:30.320,0:42:35.280 "If you've got to have all these[br]exceptions anyways, maybe you can get[br]by without having any of the rules. 0:42:35.280,0:42:39.480 "Maybe you can treat everything[br]as an exception." And that's[br]essentially what we've done. 0:42:48.840,0:42:52.640 What the computer is doing when[br]he's learning how to translate 0:42:52.640,0:42:55.160 is to learn correlations[br]between words 0:42:55.160,0:42:57.240 and correlations between phrases. 0:42:57.240,0:43:00.840 So we feed the system very large[br]amounts of data 0:43:00.840,0:43:04.720 and then the system is seeing that[br]a certain word or a certain phrase 0:43:04.720,0:43:07.600 correlates very often[br]to the other language. 0:43:09.800,0:43:15.800 Google's website currently[br]offers translation between[br]any of 57 different languages. 0:43:15.800,0:43:22.680 It does this purely statistically,[br]having correlated a huge collection[br]of multilingual texts. 0:43:22.680,0:43:25.600 The people that built the system[br]don't need to know Chinese 0:43:25.600,0:43:29.800 in order to build the[br]Chinese-to-English system,[br]or they don't need to know Arabic. 0:43:29.800,0:43:33.040 But the expertise that's needed is[br]basically knowledge of statistics, 0:43:33.040,0:43:35.840 knowledge of computer science,[br]knowledge of infrastructure 0:43:35.840,0:43:40.880 to build those very large[br]computational systems[br]that we are building for doing that. 0:43:42.880,0:43:48.360 I hooked up with Google[br]from my office in Stockholm[br]to try the translator for myself. 0:43:48.360,0:43:51.760 'I will type...[br]some Swedish sentences.' 0:43:51.760,0:43:53.080 OK. 0:43:53.080,0:43:55.240 Sveriges... 0:43:55.240,0:43:59.280 ..guldring i orat. 0:44:00.920,0:44:07.400 OK. So it says, "Sweden's finance[br]minister has a ponytail[br]and a gold ring in your ear." 0:44:07.400,0:44:11.520 I guess it probably means[br]in his ear. 'That's exactly[br]correct, it's amazing! 0:44:11.520,0:44:15.400 'He comes from the Conservative[br]party, that's the kind[br]of Sweden we have today. 0:44:15.400,0:44:18.520 'I will type one more sentence.' 0:44:18.520,0:44:22.080 'I sitt samkonade...' 0:44:22.080,0:44:25.600 partnerskap... 0:44:25.600,0:44:28.280 nya biskop. 0:44:28.280,0:44:35.200 "In his same-sex partnership[br]has Stockholm's new bishop[br]and his partners a three-year son." 0:44:35.200,0:44:38.120 It's almost perfect,[br]there's one important thing - 0:44:38.120,0:44:41.800 it's HER,[br]it's a lesbian partnership. 0:44:41.800,0:44:46.760 OK, so those kinds of words his[br]and her are one of the challenges 0:44:46.760,0:44:49.080 in translation[br]to get really those right. 0:44:49.080,0:44:51.920 Especially when it comes[br]to bishops one can excuse it! 0:44:51.920,0:44:53.640 'Right, right.' 0:44:53.640,0:44:58.520 I guess more often than not[br]it would probably be a "his".[br]'I will write one more sentence.' 0:44:58.520,0:45:01.720 Nar Sverige deltar[br]I olympiader ar malet 0:45:01.720,0:45:03.720 'inte att vinna[br]utan att sla Norge.' 0:45:06.400,0:45:11.960 OK. "When Sweden is taking part[br]in Olympic goal is not[br]to win but to beat Norway." 0:45:11.960,0:45:13.640 'Yes! This is what it is! 0:45:13.640,0:45:17.920 'But they are very good[br]in Winter Olympics, so we[br]can't make it, but we are trying.' 0:45:17.920,0:45:19.960 Ah, very good, very good. 0:45:19.960,0:45:24.960 'This is absolutely amazing, you[br]know, and I was especially impressed 0:45:24.960,0:45:30.520 'that it picks up words like[br]"same-sex partnership"[br]which are very new to the language." 0:45:30.520,0:45:36.920 'The translator is good, but[br]if they succeed with what's next,[br]that'll be remarkable.' 0:45:36.920,0:45:38.440 One of the exciting possibilities 0:45:38.440,0:45:42.720 is combining the machine[br]translation technology with[br]the speech recognition technology. 0:45:42.720,0:45:45.480 Now, both of these[br]are statistical in nature. 0:45:45.480,0:45:51.360 The machine translation relies[br]on the statistics of mapping[br]from one language to another, 0:45:51.360,0:45:57.840 and similarly speech recognition[br]relies on the statistics of mapping[br]from a sound form to the words. 0:45:57.840,0:45:59.520 When we put them together, 0:45:59.520,0:46:03.200 now we have the capability[br]of having instant conversation 0:46:03.200,0:46:06.760 between two people[br]that don't speak a common language. 0:46:06.760,0:46:08.680 I can talk to you in my language, 0:46:08.680,0:46:11.880 you hear me in your language[br]and you can answer back. 0:46:11.880,0:46:15.000 And in real time we can[br]make that translation, 0:46:15.000,0:46:18.800 we can bring two people together[br]and allow them to speak. 0:46:31.400,0:46:39.040 The internet is just one[br]of many technologies created[br]to gather massive amounts of data. 0:46:39.040,0:46:43.640 Scientists studying[br]our earth and our environment 0:46:43.640,0:46:47.440 now use an incredible range[br]of instruments 0:46:47.440,0:46:50.920 to measure the processes[br]of our planet. 0:46:52.760,0:47:00.360 All around us are sensors[br]continuously measuring temperature,[br]water flow, and ocean currents. 0:47:00.360,0:47:06.800 And high in orbit are satellites[br]busy imaging cloud formations,[br]forest growth and snow cover. 0:47:06.800,0:47:11.360 Scientists speak[br]of "instrumenting the earth". 0:47:13.320,0:47:20.160 And pointing up to the skies[br]above are powerful new telescopes[br]mapping the universe. 0:47:30.280,0:47:34.760 What's happening in astronomy[br]is typical of how profoundly 0:47:34.760,0:47:39.760 this new torrent of data[br]is transforming science. 0:47:39.760,0:47:45.280 Astronomers are now addressing many[br]enduring mysteries of the cosmos 0:47:45.280,0:47:49.600 by applying statistical methods[br]to all this new data. 0:47:59.800,0:48:03.360 The galaxy is a very big place and[br]it's got billions of stars in it, 0:48:03.360,0:48:09.400 and so to put together a coherent[br]picture of the whole galaxy requires[br]having an enormous amount of data. 0:48:09.400,0:48:13.720 And before you could do[br]a large sky survey with[br]sensitive, digital detectors 0:48:13.720,0:48:16.880 that meant that you could map many,[br]many stars all at once, 0:48:16.880,0:48:20.680 it was very difficult to build up[br]enough data on enough of the galaxy. 0:48:24.600,0:48:28.560 In the past, large surveys[br]of the night sky had to be done 0:48:28.560,0:48:32.400 by exposing thousands[br]of large photographic plates. 0:48:32.400,0:48:37.200 But these surveys could take[br]25 years or more to complete. 0:48:39.040,0:48:44.680 Then, in the 1990s, came digital[br]astronomy and a huge increase 0:48:44.680,0:48:49.600 in both the amount[br]and the accessibility of data. 0:48:49.600,0:48:55.960 The Sloan Sky Survey[br]is the world's biggest yet,[br]using a massive digital sensor 0:48:55.960,0:49:00.840 mounted on the back[br]of a custom-built telescope[br]in New Mexico. 0:49:00.840,0:49:05.240 It's scanned the sky night[br]after night for eight years, 0:49:05.240,0:49:09.800 building up a composite picture[br]in unprecedented resolution. 0:49:09.800,0:49:14.840 The Sloan is some of the best,[br]deepest survey data[br]that we have in astronomy. 0:49:14.840,0:49:18.760 Both on our own galaxy and[br]on galaxies further away from ours. 0:49:24.080,0:49:27.320 All the Sloan data[br]is on the internet, 0:49:27.320,0:49:34.120 and with it astronomers[br]have identified millions of hitherto[br]unknown stars and galaxies. 0:49:34.120,0:49:37.480 They also comb the database[br]for statistical patterns 0:49:37.480,0:49:42.800 which will prove, disprove,[br]or even suggest new theories. 0:49:42.800,0:49:49.160 So we have this idea that galaxies[br]grow, they become large galaxies like[br]the one we live in, the milky way, 0:49:49.160,0:49:55.880 not all at once, or not smoothly,[br]but by continuously incorporating, 0:49:55.880,0:49:59.160 basically cannibalising,[br]smaller galaxies. 0:49:59.160,0:50:04.000 They dissolve them[br]and they become part[br]of the bigger galaxy as it grows. 0:50:06.040,0:50:12.520 It's a startling idea,[br]and, in the Sloan data,[br]is the evidence to support it. 0:50:12.520,0:50:16.280 Groups of stars that came[br]from cannibalised galaxies 0:50:16.280,0:50:21.240 stand out in the Sloan data[br]as statistically different[br]from other stars 0:50:21.240,0:50:24.280 because they move[br]at a different velocity. 0:50:24.280,0:50:28.680 Each big spike[br]on one of these distribution graphs 0:50:28.680,0:50:35.120 means Professor Rockosi has found[br]a group of stars all travelling[br]in a different way to the rest. 0:50:35.120,0:50:38.360 They are the telltale[br]patterns she's looking for. 0:50:40.240,0:50:44.960 The evidence is accumulating[br]that, in fact, this really is[br]how galaxies grow, 0:50:44.960,0:50:47.440 or an important way[br]in which how galaxies grow. 0:50:47.440,0:50:53.000 And so this is an important part[br]of understanding how galaxies form,[br]not only ours but every galaxy. 0:50:56.360,0:51:00.400 The more data there is,[br]the more discoveries can be made. 0:51:00.400,0:51:03.320 And the technology[br]is getting better all the time. 0:51:03.320,0:51:07.560 The next big survey telescope[br]starts its work in 2015. 0:51:07.560,0:51:10.760 It will leave Sloan in the dust! 0:51:10.760,0:51:16.160 Sloan has taken eight years to cover[br]one quarter of the night sky. 0:51:17.680,0:51:25.680 The new telescope will scan[br]the entire sky, in even greater[br]resolution, every three days! 0:51:34.120,0:51:41.000 The vast amounts of data[br]we have today allows researchers[br]in all sorts of fields 0:51:41.000,0:51:46.280 to test their theories[br]on a previously unimaginable scale. 0:51:46.280,0:51:53.600 But more than this,[br]it may even change[br]the fundamental way science is done. 0:51:53.600,0:51:58.560 With the power of today's computers[br]applied to all this data, 0:51:58.560,0:52:03.880 the machines might even be able[br]to guide the researchers. 0:52:14.600,0:52:17.920 We're at a potentially[br]profoundly important 0:52:17.920,0:52:22.560 and potentially one of the most[br]significant points in science, 0:52:22.560,0:52:24.680 and certainly one of[br]the most exciting, 0:52:24.680,0:52:32.080 where the potential to transform[br]not just how scientists do science[br]but even what science is possible. 0:52:32.080,0:52:34.680 And what will power[br]that transformation 0:52:34.680,0:52:38.400 of both how science is done[br]and even what science is possible 0:52:38.400,0:52:40.120 is going to be computation. 0:52:41.800,0:52:49.440 Many of the dynamics of the natural[br]world, like the interplay between[br]the rainforests and the atmosphere, 0:52:49.440,0:52:53.560 are so complex that we don't[br]as yet really understand them. 0:52:53.560,0:52:59.280 But now computers are generating[br]literally tens of thousands[br]of different simulations 0:52:59.280,0:53:03.480 of how these[br]biological systems might work. 0:53:03.480,0:53:07.840 It's like creating thousands[br]of hypothetical parallel worlds. 0:53:07.840,0:53:10.640 Each and every one[br]of these simulations 0:53:10.640,0:53:18.360 is analysed with statistics[br]to see if any are a good match[br]for what is observed in nature. 0:53:18.360,0:53:21.840 The computers can now[br]automatically generate, 0:53:21.840,0:53:26.240 test and discard hypotheses[br]with scarcely a human in sight. 0:53:28.240,0:53:35.120 This new application of statistics[br]will become absolutely vital[br]for the future of science. 0:53:35.120,0:53:39.400 It's creating a new paradigm,[br]if you like, 0:53:39.400,0:53:42.640 in science, in the way[br]in which we can do science, 0:53:42.640,0:53:45.280 which is increasingly... 0:53:45.280,0:53:51.160 Which one might characterise as...[br]data-centric or data driven 0:53:51.160,0:53:55.000 rather than being hypothesis-driven[br]or experimentally-driven. 0:53:55.000,0:53:58.240 So, it's exciting times[br]in terms of the science, 0:53:58.240,0:54:02.200 in terms of the computation[br]and in terms of the statistics. 0:54:08.800,0:54:15.480 Now, if all that sounds a bit[br]abstract and theoretical to you,[br]how about one final frontier? 0:54:15.480,0:54:19.040 Could statistics even make[br]sense of your feelings? 0:54:21.200,0:54:25.800 In California - where else? -[br]one computer scientist 0:54:25.800,0:54:32.680 is harvesting the internet to try[br]to divine the patterns of our[br]innermost thoughts and emotions. 0:54:44.800,0:54:46.360 This is the madness movement. 0:54:46.360,0:54:50.960 The madness movement represents[br]a skyscraper view of the world. 0:54:50.960,0:54:54.880 Each of these brightly coloured dots[br]is an individual feeling 0:54:54.880,0:54:58.720 expressed by someone out there[br]in a blog or a tweet. 0:54:58.720,0:55:04.480 And when you click on the dot[br]it explodes to reveal the[br]underlying feeling of that person. 0:55:04.480,0:55:07.080 This is what people say[br]they're feeling today. 0:55:07.720,0:55:10.160 Better...safe... 0:55:10.160,0:55:12.040 crappy... 0:55:12.040,0:55:14.560 well... 0:55:14.560,0:55:18.440 pretty...special... 0:55:18.440,0:55:20.800 sorry...alone... 0:55:25.560,0:55:29.040 So, every minute, We Feel Fine[br]crawls the world's blogs, 0:55:29.040,0:55:34.120 takes all the sentences[br]that start with the words[br]"I feel" or "I am feeling", 0:55:34.120,0:55:35.920 and puts them in a database. 0:55:35.920,0:55:40.080 We collect all the feelings[br]and we count the most common. 0:55:40.080,0:55:43.320 They are better...bad... 0:55:43.320,0:55:45.640 good...right... 0:55:45.640,0:55:48.520 guilty...sick... 0:55:48.520,0:55:51.680 the same...like shit... 0:55:51.680,0:55:54.720 sorry...well... 0:55:54.720,0:55:56.240 and so on. 0:55:58.320,0:56:01.760 And we can take a look at any[br]one feeling and analyse it. 0:56:01.760,0:56:04.800 Right now a lot of people[br]are feeling happy. 0:56:04.800,0:56:11.320 We can take a look at all the[br]people who are happy and break it[br]down by age, gender or location. 0:56:11.320,0:56:16.840 Since bloggers have public profiles[br]we have that information and[br]so we can ask questions like, 0:56:16.840,0:56:21.400 "Are women happier than men?"[br]or, "Is England happier[br]than the United States?" 0:56:30.240,0:56:33.120 We find that, as people get older,[br]they get happier. 0:56:33.120,0:56:40.560 And, moreover, we find that[br]for younger people they associate[br]happiness more with excitement, 0:56:40.560,0:56:47.000 and, as people get older,[br]they associate happiness[br]more with peacefulness. 0:56:51.240,0:56:57.760 And we also find that women feel[br]loved more often than men,[br]but also more guilty. 0:56:57.760,0:57:02.480 While men feel good more often[br]than women, but also more alone. 0:57:06.640,0:57:12.480 As people lead more and[br]more of their lives online,[br]they leave behind digital traces, 0:57:12.480,0:57:19.840 and with these digital traces[br]we can begin to statistically analyse[br]what it means to be human. 0:57:51.280,0:57:54.480 So where does all of this leave us? 0:57:54.480,0:58:00.160 We generate unimaginable[br]quantities of data[br]about everything you can think of. 0:58:00.160,0:58:02.800 We analyse it to reveal[br]the patterns. 0:58:02.800,0:58:10.480 And now not only experts[br]but all of us can understand[br]the stories in the numbers. 0:58:18.160,0:58:21.080 Instead of being[br]led astray by prejudice, 0:58:21.080,0:58:28.160 with statistics at our fingertips,[br]our eyes can be open[br]for a fact-based view of the world. 0:58:28.160,0:58:33.760 So, more than ever before, we can[br]become authors of our own destiny. 0:58:33.760,0:58:36.800 And that's pretty[br]exciting isn't it?! 0:58:37.680,0:58:44.200 # 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,[br]12, 13, 14, 15, 16, 17, 18, 19, 20 0:58:44.200,0:58:50.800 # 1, 22, 3, 24, 25, 26, 27, 28, 9,[br]30, 31, 32, 3, 34, 35, 36, 7 0:58:50.800,0:58:54.440 # 38, 39, 40, 41, 42, 3,[br]44, 45, 46, 47 0:58:54.440,0:58:58.680 LYRICS DEGENERATE INTO GIBBERISH 0:59:08.680,0:59:13.400 GIBBERISH DEGENERATES INTO NOISE 0:59:13.400,0:59:14.440 # 100. #