Return to Video

The Joy of Stats

  • 0:03 - 0:10
    The world we live in is awash with
    data that comes pouring in
    from everywhere around us.
  • 0:10 - 0:15
    On its own this data
    is just noise and confusion.
  • 0:15 - 0:23
    To make sense of data, to find the
    meaning in it, we need the powerful
    branch of science - statistics.
  • 0:23 - 0:26
    Believe me there's nothing
    boring about statistics.
  • 0:26 - 0:29
    Especially not today
    when we can make the data sing.
  • 0:29 - 0:33
    With statistics we can
    really make sense of the world.
  • 0:33 - 0:35
    And there's more.
  • 0:35 - 0:40
    With statistics, the data deluge, as
    it's being called, is leading us
  • 0:40 - 0:46
    to an ever greater understanding
    of life on Earth
    and the universe beyond.
  • 0:46 - 0:51
    And thanks to the incredible
    power of today's computers,
  • 0:51 - 0:57
    it may fundamentally transform the
    process of scientific discovery.
  • 0:57 - 1:03
    I kid you not, statistics is
    now the sexiest subject around.
  • 1:23 - 1:26
    Did you know that there is
    one million boats in Sweden?
  • 1:26 - 1:28
    That's one boat per nine people!
  • 1:28 - 1:31
    It's the highest number of
    boats per person in Europe!
  • 1:41 - 1:46
    Being a statistician,
    you don't like telling
    your profession at dinner parties.
  • 1:46 - 1:48
    But really,
    statisticians shouldn't be shy
  • 1:48 - 1:51
    because everyone wants to
    understand what's going on.
  • 1:51 - 1:56
    And statistics gives us a
    perspective on the world we live in
  • 1:56 - 1:59
    that we can't get in any other way.
  • 2:04 - 2:09
    Statistics tells us whether
    the things we think
    and believe are actually true.
  • 2:20 - 2:25
    And statistics are far more useful
    than we usually like to admit.
  • 2:25 - 2:30
    In the last recession there
    was this famous call-in
    to a talk radio station.
  • 2:30 - 2:37
    The man complained, "In times like
    this when unemployment rates are up
    to 13%, income has fallen by 5%,
  • 2:37 - 2:41
    "and suicide rates are climbing, and
    I get so angry that the government
  • 2:41 - 2:46
    "is wasting money on things like
    collection of statistics."
  • 2:48 - 2:50
    I'm not officially a statistician.
  • 2:50 - 2:55
    Strictly speaking,
    my field is global health.
  • 2:58 - 3:03
    But I got really obsessed with stats
    when I realised how much people
  • 3:03 - 3:06
    in Sweden just don't know
    about the rest of the world.
  • 3:06 - 3:11
    I started in our medical
    university, Karolinska Institutet,
  • 3:11 - 3:14
    an undergraduate course
    called Global Health.
  • 3:14 - 3:17
    These students coming to us actually
    have the highest grade you can get
  • 3:17 - 3:19
    in the Swedish college system,
  • 3:19 - 3:22
    so I thought, "Maybe they know
    everything I'm going to teach them."
  • 3:22 - 3:26
    So I did a pre-test when they came,
    and one of the questions
  • 3:26 - 3:28
    from which I learned a lot
    was this one -
  • 3:28 - 3:32
    which country has the highest
    child mortality of these five pairs?
  • 3:32 - 3:35
    I won't put you at test here,
    but it is Turkey
  • 3:35 - 3:37
    which is highest there, Poland,
  • 3:37 - 3:41
    Russia, Pakistan, and South Africa.
  • 3:41 - 3:43
    And these were the result of
    the Swedish students.
  • 3:43 - 3:45
    A 1.8 right answer
    out of five possible.
  • 3:45 - 3:50
    And that means there was a place for
    a professor of International Health
    and for my course.
  • 3:50 - 3:56
    But one late night
    when I was compiling the report,
    I really realised my discovery.
  • 3:56 - 4:01
    I had shown that Swedish
    top students know statistically
  • 4:01 - 4:04
    significantly less about
    the world than the chimpanzees.
  • 4:06 - 4:10
    Because the chimpanzees
    would score half right.
  • 4:10 - 4:12
    If I gave them two bananas
    with Sri Lanka and Turkey,
  • 4:12 - 4:16
    they would be right
    half of the cases,
    but the students are not there.
  • 4:16 - 4:20
    I did also an unethical study
    of the professors of
    the Karolinska Institutet,
  • 4:20 - 4:26
    that hands out the Nobel Prize
    for medicine, and they are on par
    with the chimpanzees there.
  • 4:28 - 4:33
    Today there's more information
    accessible than ever before.
  • 4:33 - 4:36
    'And I work with my team at
    the Gapminder Foundation
  • 4:36 - 4:42
    'using new tools that help everyone
    make sense of the changing world.
  • 4:42 - 4:45
    'We draw on the masses of data
    that's now freely available
  • 4:45 - 4:50
    'from international institutions
    like the UN and the World Bank.
  • 4:50 - 4:54
    'And it's become my mission to
    share the insights
  • 4:54 - 5:00
    'from this data with anyone who'll
    listen, and to reveal how statistics
    is nothing to be frightened of.'
  • 5:02 - 5:05
    I'm going to provide you a view of
  • 5:05 - 5:09
    the global health situation
    across mankind.
  • 5:09 - 5:14
    And I'm going to do that in
    hopefully an enjoyable way,
    so relax.
  • 5:14 - 5:17
    So we did this software
    which displays it like this.
  • 5:17 - 5:19
    Every bubble here is a country -
  • 5:19 - 5:21
    this is China, this is India.
  • 5:21 - 5:24
    The size of the bubble
    is the population.
  • 5:24 - 5:28
    I'm going to stage a race between
    this sort of yellowish Ford here
  • 5:28 - 5:33
    and the red Toyota down there
    and the brownish Volvo.
  • 5:33 - 5:36
    The Toyota has a very bad start
    down here, and United States,
  • 5:36 - 5:38
    Ford is going off-road there,
  • 5:38 - 5:40
    and the Volvo is doing quite fine,
    this is the war.
  • 5:40 - 5:44
    The Toyota got off track, now Toyota
    is on the healthier side of Sweden.
  • 5:44 - 5:47
    That's about where I sold
    the Volvo and bought the Toyota.
  • 5:47 - 5:48
    AUDIENCE LAUGH
  • 5:48 - 5:51
    This is the great leap forward,
    when China fell down.
  • 5:51 - 5:53
    It was the central planning
    by Mao Zedong.
  • 5:53 - 5:57
    China recovered and said, "Never
    more stupid central planning,"
  • 5:57 - 5:58
    but they went up here.
  • 5:58 - 6:03
    No, there is one more inequity,
    look there - United States
  • 6:03 - 6:07
    They broke my frame. Washington DC
    is so rich over there,
  • 6:07 - 6:13
    but they are
    not as healthy as Kerala in India.
    It's quite interesting, isn't it?
  • 6:13 - 6:15
    LAUGHTER AND APPLAUSE
  • 6:20 - 6:26
    Welcome to the USA,
    world leaders in big cars
  • 6:26 - 6:28
    and free data.
  • 6:28 - 6:36
    There are many here who share
    my vision of making public data
    accessible and useful for everyone.
  • 6:36 - 6:43
    The city of San Francisco
    is in the lead, opening up
    its data on everything.
  • 6:43 - 6:47
    Even the police department is
    releasing all its crime reports.
  • 6:47 - 6:51
    This official
    crime data has been turned
  • 6:51 - 6:56
    into a wonderful interactive map by
    two of the city's computer whizzes.
  • 6:56 - 6:59
    It's community statistics in action.
  • 7:09 - 7:13
    Crimespotting is
    a map of crime reports from the
    San Francisco Police Department
  • 7:13 - 7:16
    showing dots on maps
    for citizens to be able to see
  • 7:16 - 7:19
    patterns of crime around their
    neighbourhoods in San Francisco.
  • 7:19 - 7:25
    The map is not just about individual
    crimes but about broader patterns
    that show you where crime is
  • 7:25 - 7:28
    clustered around the city, which
    areas have high crime,
  • 7:28 - 7:30
    and which areas have
    relatively low crime.
  • 7:37 - 7:41
    We're here at the top of
    Jones Street on Nob Hill...
  • 7:43 - 7:45
    ..quite a nice neighbourhood.
  • 7:45 - 7:50
    What the crime maps show us
    is the relationship between
  • 7:50 - 7:51
    topography and crime.
  • 7:51 - 7:55
    Basically the higher up the hill,
    the less crime there is.
  • 7:56 - 7:59
    You cross over the border
  • 7:59 - 8:00
    into the flats...
  • 8:03 - 8:09
    Essentially as soon as you get
    into the lower lying areas of Jones
    Street the crime just skyrockets.
  • 8:20 - 8:24
    We're here in
    the uptown Tenderloin district.
  • 8:26 - 8:30
    It's one of the oldest and densest
    neighbourhoods in San Francisco.
  • 8:30 - 8:32
    This is where you go to buy drugs.
  • 8:32 - 8:34
    Right around here.
  • 8:37 - 8:42
    We see lots of aggravated assaults,
    lots of auto thefts.
  • 8:42 - 8:49
    Basically a huge part of the crime
    that happens in the city happens
    in this five or six block radius.
  • 8:56 - 8:59
    If you've been hearing police sirens
    in your neighbourhood,
  • 8:59 - 9:02
    you can use the map to find out why.
  • 9:02 - 9:06
    If you're out at night in
    an unfamiliar part of town,
  • 9:06 - 9:09
    you can check the map
    for streets to avoid.
  • 9:09 - 9:12
    If a neighbour gets burgled,
    you can see -
  • 9:12 - 9:17
    is it a one-off or has there been
    a spike in local crime?
  • 9:17 - 9:19
    If you commute through a
    neighbourhood and you're worried
  • 9:19 - 9:23
    about its safety, the fact that we
    have the ability to turn off all
  • 9:23 - 9:25
    the night-time
    and middle-of-the-day crimes
  • 9:25 - 9:28
    and show you just the things that are
    happening during the commute,
  • 9:28 - 9:33
    it is a statistical operation.
    But I think to people that are
    interacting with the thing
  • 9:33 - 9:38
    it feels very much more like they're
    just sort of browsing a website
    or shopping on Amazon.
  • 9:38 - 9:44
    They're looking at data
    and they don't realise
    they're doing statistics.
  • 9:44 - 9:48
    What's most exciting for me
    is that public statistics
  • 9:48 - 9:53
    is making citizens more powerful and
    the authorities more accountable.
  • 10:02 - 10:05
    We have community meetings that
    the police attend
  • 10:05 - 10:09
    and what citizens are
    now doing are bringing printouts
  • 10:09 - 10:12
    of the maps that show where crimes
    are taking place,
  • 10:12 - 10:16
    and they're demanding services
    from the police department
  • 10:16 - 10:21
    and the police department is now
    having to change how they police,
  • 10:21 - 10:23
    how they provide policing services,
  • 10:23 - 10:27
    because the data is showing
    what is working and what is not.
  • 10:29 - 10:32
    People in San Francisco
    are also using public data
  • 10:32 - 10:36
    to map social inequalities
    and see how to improve society.
  • 10:36 - 10:40
    And the possibilities are endless.
  • 10:40 - 10:43
    I think our dream
    government data analysis project
  • 10:43 - 10:46
    would really be focused on
    live information,
  • 10:46 - 10:51
    on stuff that was being reported
    and pushed out to the world over
    the internet as it was happening.
  • 10:51 - 10:55
    You know, trash pickups,
    traffic accidents, buses,
  • 10:55 - 10:58
    and I think through the kind of
    stats-gathering power
  • 10:58 - 11:03
    of the internet
    it's possible to really begin
    to see the workings of the city
  • 11:03 - 11:05
    displayed as a unified interface.
  • 11:07 - 11:10
    So that's where we are heading.
  • 11:10 - 11:15
    Towards a world of free data
    with all the statistical
    insights that come from it,
  • 11:15 - 11:22
    accessible to everyone, empowering
    us as citizens and letting us
    hold our rulers to account.
  • 11:22 - 11:27
    It's a long way from
    where statistics began.
  • 11:27 - 11:33
    Statistics
    are essential to us to monitor
    our governments and our societies.
  • 11:33 - 11:37
    But it was our rulers up
    there who started
  • 11:37 - 11:41
    the collection of statistics in the
    first place in order to monitor us!
  • 11:47 - 11:51
    In fact the word 'statistics'
    comes from 'the state'.
  • 11:51 - 11:56
    Modern statistics
    began two centuries ago.
  • 11:56 - 11:59
    Once it got going,
    it spread and never stopped.
  • 11:59 - 12:02
    And guess who was first!
  • 12:03 - 12:08
    The Chinese have Confucius,
    the Italians have da Vinci,
  • 12:08 - 12:10
    and the British have Shakespeare.
  • 12:10 - 12:12
    And we have the Tabellverket -
  • 12:12 - 12:16
    the first ever systematic
    collection of statistics!
  • 12:16 - 12:22
    Since the year 1749
    we have collected data
  • 12:22 - 12:27
    on every birth, marriage and death,
    and we are proud of it!
  • 12:29 - 12:32
    The Tabellverket recorded
    information
  • 12:32 - 12:34
    from every parish in Sweden.
  • 12:34 - 12:39
    It was a huge quantity of data and
    it was the first time any government
  • 12:39 - 12:42
    could get an accurate
    picture of its people.
  • 12:49 - 12:53
    Sweden had been the greatest
    military power in Northern Europe,
  • 12:53 - 12:58
    but by 1749 our star
    was really fading
  • 12:58 - 13:01
    and other countries
    were growing stronger.
  • 13:01 - 13:04
    At least we were a large power,
  • 13:04 - 13:10
    thought to have 20 million people,
    enough to rival Britain and France.
  • 13:13 - 13:18
    But we were in for a nasty surprise.
  • 13:18 - 13:21
    The first analysis
    of the Tabellverket
  • 13:21 - 13:24
    revealed that Sweden
    only had two million inhabitants.
  • 13:24 - 13:31
    Sweden was not just a power
    in decline, it also had
    a very small population.
  • 13:31 - 13:36
    The government was horrified
    by this finding -
    what if the enemy found out?
  • 13:38 - 13:45
    But the Tabellverket also showed
    that many women died in childbirth
    and many children died young.
  • 13:45 - 13:49
    So government took action
    to improve the health of the people.
  • 13:49 - 13:52
    This was the beginning
    of modern Sweden.
  • 13:54 - 13:59
    It took more than 50 years before
    the Austrians, Belgians, Danes,
  • 13:59 - 14:02
    Dutch, French, Germans, Italians
  • 14:02 - 14:09
    and, finally, the British,
    caught up with Sweden
    in collecting and using statistics.
  • 14:25 - 14:30
    It was called political arithmetic.
    It was a lovely phrase
    that was used for statistics.
  • 14:30 - 14:33
    Governments could have much more
    control and understanding of
  • 14:33 - 14:37
    the society - how it was working,
    how it was developing
  • 14:37 - 14:40
    and essentially
    so they could control it better.
  • 14:43 - 14:48
    It wasn't just governments who
    woke up to the power of statistics.
  • 14:48 - 14:55
    Right across Europe, 19th
    century society went mad for facts.
  • 14:55 - 14:58
    And, despite its late start,
    Britain,
  • 14:58 - 15:01
    with its Royal Statistical Society
    in London,
  • 15:01 - 15:04
    was soon a statisticians' nirvana.
  • 15:06 - 15:10
    I love looking at old copies of
    the Royal Statistical Society journal
  • 15:10 - 15:12
    because it's full of such odd stuff.
  • 15:12 - 15:15
    There's a wonderful paper
    from the 1840s
  • 15:15 - 15:19
    which shows a map of England and
    the rates of bastardy in each county.
  • 15:19 - 15:24
    So you can identify very quickly the
    areas with high rates of bastardy.
  • 15:24 - 15:27
    Being in East Anglia it always
    makes me slightly laugh that Norfolk
  • 15:27 - 15:31
    seems to top the "bastardy league"
    in the 1840s.
  • 15:31 - 15:37
    One of the founders of
    the Royal Statistical Society
  • 15:37 - 15:42
    was the great
    Victorian mathematician
    and inventor Charles Babbage.
  • 15:42 - 15:50
    In 1842 he read the latest
    poem by an equally great Victorian,
    Alfred Tennyson.
  • 15:50 - 15:53
    Vision of Sin contained the lines:
  • 15:53 - 15:56
    "Fill the cup, and fill the can
  • 15:56 - 15:58
    "Have a rouse before the morn
  • 15:58 - 16:04
    "Every moment dies a man
    Every moment one is born."
  • 16:04 - 16:07
    So keen a statistician was Babbage
    that he could not contain himself.
  • 16:07 - 16:09
    He dashed off a letter to Tennyson
  • 16:09 - 16:12
    explaining that because of
    population growth,
  • 16:12 - 16:14
    the line should read,
  • 16:14 - 16:19
    "Every moment dies a man
    and one and a 16th is born."
  • 16:19 - 16:22
    I may add that
    the exact figure is 1.067,
  • 16:22 - 16:27
    but something must be
    conceded to the laws of metre.
  • 16:32 - 16:37
    In the 19th century, scholars all
    over Europe did amazing work
  • 16:37 - 16:39
    in measuring their societies.
  • 16:39 - 16:43
    They were hoovering up
    data on almost everything.
  • 16:43 - 16:46
    But numbers alone
    don't tell you anything.
  • 16:46 - 16:51
    You have to analyse them,
    and that's what makes statistics.
  • 16:56 - 16:59
    When the first statisticians
    began to get to grips with
  • 16:59 - 17:00
    analysing their data
  • 17:00 - 17:06
    they seized upon the average, and
    they took the average of everything.
  • 17:10 - 17:14
    What's so great
    about an average is that
  • 17:14 - 17:19
    you can take a whole mass of data
    and reduce it to a single number.
  • 17:22 - 17:26
    And though each of us is unique,
    our collective lives produce
  • 17:26 - 17:30
    averages that can
    characterise whole populations.
  • 17:41 - 17:45
    I looked in my local newspaper
    one week and saw a pensioner
  • 17:45 - 17:49
    had accidentally put her foot on
    the accelerator
  • 17:49 - 17:53
    and crushed her friend
    against a wall.
  • 17:53 - 17:56
    Devastating, hideous,
    horrible thing to happen.
  • 17:56 - 18:01
    And then there was a second one about
    a young man who didn't have
  • 18:01 - 18:07
    a driving licence, was driving a car
    under the influence of drugs
    and alcohol
  • 18:07 - 18:10
    and he bashed into a pedestrian
    and killed him.
  • 18:10 - 18:16
    What's remarkable, absolutely
    remarkable, if you look at the number
  • 18:16 - 18:23
    of people who die each year
    in traffic crashes,
    it's nearly a constant.
  • 18:23 - 18:24
    What?
  • 18:24 - 18:32
    All these individual events,
    somehow when you sum them all up
    there's the same number every year.
  • 18:32 - 18:35
    And every year, two and a half
    times as many men
  • 18:35 - 18:39
    die in traffic crashes
    as women, and it's a constant.
  • 18:39 - 18:44
    And every year the rate in Belgium
    is double the rate in England.
  • 18:44 - 18:47
    There are these
    remarkable regularities.
  • 18:47 - 18:55
    So that these individual
    particular events sum up
    into a social phenomenon.
  • 18:57 - 18:58
    Let's see what Sweden have done.
  • 18:58 - 19:02
    We used to boast about fast social
    progress, that's where we were....
  • 19:02 - 19:05
    'In my lectures, to tell stories
    about the changing world,
  • 19:05 - 19:08
    'I use the averages
    from entire countries,
  • 19:08 - 19:12
    'whether the average of income,
    child mortality, family size
  • 19:12 - 19:13
    'or carbon output.'
  • 19:13 - 19:16
    OK, I give you Singapore.
    The year I was born,
  • 19:16 - 19:21
    Singapore had twice the child
    mortality of Sweden, the most
    tropical country in the world,
  • 19:21 - 19:23
    a marshland on
    the Equator, and here we go.
  • 19:23 - 19:25
    It took a little time for them
    to get independent,
  • 19:25 - 19:27
    but then they started to grow
    their economy,
  • 19:27 - 19:30
    and they made the social investment,
    they got away malaria,
  • 19:30 - 19:33
    they got a magnificent health system
    that beat both US and Sweden.
  • 19:33 - 19:38
    We never thought it would happen
    that they would win over Sweden!
  • 19:38 - 19:41
    LAUGHTER AND APPLAUSE
  • 19:41 - 19:46
    But useful as averages are,
    they don't tell you the whole story.
  • 19:49 - 19:53
    On average, Swedish people have
    slightly less than two legs.
  • 19:53 - 19:58
    This is because few people
    only have one leg or no legs,
  • 19:58 - 20:00
    and no-one has three legs.
  • 20:00 - 20:06
    So almost everybody in Sweden
    has more than
    the average number of legs.
  • 20:06 - 20:11
    The variation in data is just
    as important as the average.
  • 20:17 - 20:19
    But how do you get
    a handle on variation?
  • 20:19 - 20:23
    For this, you transform
    numbers into shapes.
  • 20:23 - 20:26
    Let's look again at the number of
    adult women in Sweden
  • 20:26 - 20:28
    for different heights.
  • 20:28 - 20:32
    Plotting the data as a shape
    shows how much their heights
  • 20:32 - 20:36
    vary from the average
    and how wide that variation is.
  • 20:36 - 20:42
    The shape a set of data makes
    is called its distribution.
  • 20:42 - 20:46
    This is the income distribution
    of China, 1970.
  • 20:46 - 20:51
    This is the income distribution
    of the United States, 1970.
  • 20:51 - 20:54
    Almost no overlap,
    and what has happened?
  • 20:54 - 20:57
    China is growing,
    it's not so equal any longer,
  • 20:57 - 21:01
    and it's appearing here
    overlooking the United States.
  • 21:01 - 21:03
    Almost like a ghost, isn't it?
  • 21:03 - 21:05
    It's pretty scary.
  • 21:05 - 21:07
    Rrrr!
  • 21:07 - 21:08
    LAUGHTER
  • 21:17 - 21:21
    The statisticians
    who first explored distribution
  • 21:21 - 21:26
    discovered one shape
    that turned up again and again.
  • 21:26 - 21:28
    The Victorian scholar
    Francis Galton
  • 21:28 - 21:32
    was so fascinated he built
    a machine that could reproduce it,
  • 21:32 - 21:36
    and he found it fitted so many
    different sets of measurements
  • 21:36 - 21:39
    that he named it
    the normal distribution.
  • 21:39 - 21:46
    Whether it was people's arm spans,
    lung capacities,
  • 21:46 - 21:47
    or even their exam results,
  • 21:47 - 21:51
    the normal distribution shape
    recurred time and time again.
  • 21:51 - 21:56
    Other statisticians soon found
    many other regular shapes,
  • 21:56 - 22:01
    each produced by particular kinds
    of natural or social processes.
  • 22:01 - 22:05
    And every statistician
    has their favourite.
  • 22:05 - 22:09
    The Poisson distribution, the Poisson
    shape is my favourite distribution.
  • 22:09 - 22:11
    I think it's an absolute cracker.
  • 22:16 - 22:19
    The Poisson shape
    describes how likely it is
  • 22:19 - 22:22
    that out-of-the-ordinary things
    will happen.
  • 22:22 - 22:25
    Imagine a London bus stop where
    we know that on average
  • 22:25 - 22:26
    we'll get three buses in an hour.
  • 22:26 - 22:29
    We won't always get
    three buses, of course.
  • 22:29 - 22:33
    Amazingly, the Poisson shape will
    show us the probability
  • 22:33 - 22:37
    that in any given hour we will get
    four, five, or six buses,
  • 22:37 - 22:39
    or no buses at all.
  • 22:41 - 22:43
    The exact shape changes
    with the average.
  • 22:43 - 22:47
    But whether it's how many people
    will win the lottery jackpot
  • 22:47 - 22:48
    each week,
  • 22:48 - 22:51
    or how many people will phone
    a call centre each minute,
  • 22:51 - 22:54
    the Poisson shape
    will give the probabilities.
  • 22:57 - 23:01
    The wonderful example where this was
    applied to in the late 19th century
  • 23:01 - 23:04
    was to count each year the number of
    Prussian officers,
  • 23:04 - 23:08
    cavalry officers, who were kicked
    to death by their horses.
  • 23:08 - 23:10
    Now, some years there were none,
    some years there were one,
  • 23:10 - 23:14
    some years there were two,
    up to seven, I think,
    one particularly bad year.
  • 23:14 - 23:17
    But with this distribution,
    however many years there were
  • 23:17 - 23:20
    with nought, one, two, three,
    four Prussian cavalry officers
  • 23:20 - 23:24
    kicked to death by their horses,
    beautifully obeyed
    the Poisson distribution.
  • 23:43 - 23:49
    So statisticians use shapes to
    reveal the patterns in the data.
  • 23:49 - 23:51
    But we also use images of all kinds
  • 23:51 - 23:54
    to communicate statistics
    to a wider public.
  • 23:54 - 23:57
    Because if the story in the numbers
  • 23:57 - 24:03
    is told by a beautiful and clever
    image, then everyone understands.
  • 24:03 - 24:10
    Of the pioneers
    of statistical graphics,
    my favourite is Florence Nightingale.
  • 24:24 - 24:27
    There are not many people who realise
    that she was known
  • 24:27 - 24:31
    as a passionate statistician
    and not just the Lady of the Lamp.
  • 24:31 - 24:35
    She said that "to understand God's
    thoughts, we must study statistics,
  • 24:35 - 24:37
    "for these are
    the measure of His purpose."
  • 24:37 - 24:41
    Statistics was for her a religious
    duty and moral imperative.
  • 24:42 - 24:45
    When Florence was nine years old
    she started collecting data.
  • 24:45 - 24:48
    Her data was different
    fruits and vegetables she found.
  • 24:48 - 24:50
    Put them into different tables.
  • 24:50 - 24:53
    Trying to organise them
    in some standard form.
  • 24:53 - 24:56
    And so we have one of Nightingale's
    first statistical tables
  • 24:56 - 24:57
    at the age of nine.
  • 25:04 - 25:11
    In the mid 1850s Florence
    Nightingale went to the Crimea to
    care for British casualties of war.
  • 25:11 - 25:14
    She was horrified by
    what she discovered.
  • 25:14 - 25:20
    For all the soldiers being blown
    to bits on the battlefield,
    there were many, many more soldiers
  • 25:20 - 25:25
    dying from diseases they caught
    in the army's filthy hospitals.
  • 25:25 - 25:29
    So Florence Nightingale
    began counting the dead.
  • 25:29 - 25:35
    For two years she recorded
    mortality data in meticulous detail.
  • 25:35 - 25:39
    When the war was over she persuaded
    the government to set up
  • 25:39 - 25:41
    a Royal Commission of Inquiry,
  • 25:41 - 25:45
    and gathered her data
    in a devastating report.
  • 25:45 - 25:48
    What has cemented her place in
    the statistical history books
  • 25:48 - 25:50
    are the graphics she used.
  • 25:50 - 25:54
    And one in particular,
    the polar area graph.
  • 25:54 - 25:59
    For each month of the war,
    a huge blue wedge represented
  • 25:59 - 26:02
    the soldiers who had died
    from preventable diseases.
  • 26:02 - 26:06
    The much smaller red wedges were
    deaths from wounds,
  • 26:06 - 26:11
    and the black wedges were deaths
    from accidents and other causes.
  • 26:11 - 26:17
    Nightingale's graphics were so clear
    they were impossible to ignore.
  • 26:17 - 26:19
    The usual thing around
    Florence Nightingale's time
  • 26:19 - 26:24
    was just to produce tables and
    tables of figures - absolutely
    really tedious stuff that,
  • 26:24 - 26:26
    unless you're an absolutely dedicated
    statistician,
  • 26:26 - 26:29
    it's really quite difficult to spot
    the patterns quite naturally.
  • 26:29 - 26:33
    But visualisations, they tell a
    story, they tell a story immediately.
  • 26:33 - 26:38
    And the use of colour
    and the use of shape can
    really tell a powerful story.
  • 26:38 - 26:41
    And nowadays of course
    we can make things move as well.
  • 26:41 - 26:44
    Florence Nightingale would have
    loved to have played with...
  • 26:44 - 26:49
    She would have
    produced wonderful animations,
    I'm absolutely certain of it.
  • 26:51 - 26:55
    Today, 150 years on,
    Nightingale's graphics
  • 26:55 - 26:58
    are rightly regarded as a classic.
  • 26:58 - 27:01
    They led to a revolution
    in nursing, health care
  • 27:01 - 27:06
    and hygiene in hospitals worldwide,
    which saved innumerable lives.
  • 27:07 - 27:11
    And statistical graphics has
    become an art form of its very own,
  • 27:11 - 27:16
    led by designers who are
    passionate about visualising data.
  • 27:25 - 27:27
    This is the Billion Pound-O-Gram.
  • 27:27 - 27:29
    This image arose out of frustration
  • 27:29 - 27:32
    with the reporting of billion pound
    amounts in the media.
  • 27:32 - 27:34
    £500 billion pounds for this war.
  • 27:34 - 27:36
    £50 billion for this oil spill.
  • 27:36 - 27:39
    It doesn't make sense -
    the numbers are too enormous
    to get your mind round.
  • 27:39 - 27:44
    So I scraped all this data
    from various news sources
    and created this diagram.
  • 27:44 - 27:49
    So the
    squares here are scaled according
    to the billion pound amounts.
  • 27:49 - 27:52
    When you see numbers visualised
    like this
  • 27:52 - 27:54
    you start to have a different
    relationship with them.
  • 27:54 - 27:57
    You can start to see the patterns,
    and the scale of them.
  • 27:57 - 28:00
    Here in the corner,
    this little square - £37 billion.
  • 28:00 - 28:03
    This was the predicted cost
    of the Iraq war in 2003.
  • 28:03 - 28:06
    As you can see it's grown
    exponentially over the last few years
  • 28:06 - 28:11
    and the total cost now is
    around about £2,500 billion.
  • 28:11 - 28:13
    It's funny because when
    you visualise statistics
  • 28:13 - 28:15
    you understand them,
    and when you understand them
  • 28:15 - 28:18
    you can really start to put things
    in perspective.
  • 28:24 - 28:28
    Visualisation is right at
    the heart of my own work too.
  • 28:28 - 28:30
    I teach global health.
  • 28:30 - 28:34
    And I know having the data
    is not enough -
  • 28:34 - 28:39
    I have to show it in ways people
    both enjoy and understand.
  • 28:39 - 28:43
    Now I'm going to try something
    I've never done before.
  • 28:43 - 28:46
    Animating the data in real space,
  • 28:46 - 28:50
    with a bit of technical
    assistance from the crew.
  • 28:50 - 28:52
    So here we go.
  • 28:52 - 28:54
    First, an axis for health.
  • 28:54 - 28:59
    Life expectancy
    from 25 years to 75 years.
  • 28:59 - 29:01
    And down here an axis for wealth.
  • 29:01 - 29:07
    Income per person -
    400, 4,000, 40,000.
  • 29:07 - 29:10
    So down here is poor and sick.
  • 29:10 - 29:14
    And up here is rich and healthy.
  • 29:14 - 29:18
    Now I'm going to show you the world
  • 29:18 - 29:21
    200 years ago, in 1810.
  • 29:21 - 29:23
    Here come all the countries.
  • 29:23 - 29:26
    Europe, brown;
    Asia, red; Middle East, green;
  • 29:26 - 29:29
    Africa south of the Sahara,
    blue; and the Americas, yellow.
  • 29:29 - 29:34
    And the size of the country bubble
    shows the size of the population.
  • 29:34 - 29:38
    In 1810, it was pretty crowded
    down there, wasn't it?
  • 29:38 - 29:40
    All countries were sick and poor.
  • 29:40 - 29:43
    Life expectancy
    was below 40 in all countries.
  • 29:43 - 29:49
    And only UK and the Netherlands were
    slightly better off. But not much.
  • 29:49 - 29:53
    And now I start the world.
  • 29:53 - 29:57
    The industrial revolution makes
    countries in Europe and elsewhere
  • 29:57 - 29:59
    move away from the rest.
  • 29:59 - 30:02
    But the colonized countries
    in Asia and Africa,
  • 30:02 - 30:04
    they are stuck down there.
  • 30:04 - 30:08
    And eventually the Western countries
    get healthier and healthier.
  • 30:08 - 30:13
    And now we slow down to show
    the impact of the First World War
  • 30:13 - 30:16
    and the Spanish flu epidemic.
  • 30:16 - 30:18
    What a catastrophe!
  • 30:18 - 30:23
    And now I speed up through
    the 1920s and the 1930s and,
  • 30:23 - 30:24
    in spite of the Great Depression,
  • 30:24 - 30:28
    Western countries forge on towards
    greater wealth and health.
  • 30:28 - 30:30
    Japan and some others try to follow.
  • 30:30 - 30:33
    But most countries stay down here.
  • 30:33 - 30:36
    And after the tragedies
    of the Second World War,
  • 30:36 - 30:39
    we stop a bit to look
    at the world in 1948.
  • 30:39 - 30:42
    1948 was a great year.
  • 30:42 - 30:43
    The war was over,
  • 30:43 - 30:48
    Sweden topped the medal table at
    the Winter Olympics and I was born.
  • 30:48 - 30:51
    But the differences between
    the countries of the world
  • 30:51 - 30:53
    was wider than ever.
  • 30:53 - 30:55
    United States was in the front.
  • 30:55 - 30:57
    Japan was catching up.
  • 30:57 - 30:58
    Brazil was way behind,
  • 30:58 - 31:03
    Iran was getting a little richer
    from oil but still had short lives.
  • 31:03 - 31:05
    And the Asian giants...
  • 31:05 - 31:09
    China, India, Pakistan, Bangladesh,
    and Indonesia,
  • 31:09 - 31:11
    they were still
    poor and sick down here.
  • 31:11 - 31:14
    But look what was about to happen!
    Here we go again.
  • 31:14 - 31:19
    In my lifetime, former colonies
    gained independence and then finally
  • 31:19 - 31:23
    they started to get healthier
    and healthier and healthier.
  • 31:23 - 31:26
    And in the 1970s, then countries
    in Asia and Latin America
  • 31:26 - 31:29
    started to catch up
    with the Western countries.
  • 31:29 - 31:31
    They became the emerging economies.
  • 31:31 - 31:33
    Some in Africa follows,
  • 31:33 - 31:36
    some Africans were stuck in civil
    war, and others were hit by HIV.
  • 31:36 - 31:42
    And now we can see the world
    in the most up-to-date statistics.
  • 31:43 - 31:45
    Most people today
    live in the middle.
  • 31:45 - 31:48
    But there is huge difference
    at the same time
  • 31:48 - 31:52
    between the best-off countries
    and the worst-off countries.
  • 31:52 - 31:55
    And there are also huge
    inequalities within countries.
  • 31:55 - 31:59
    These bubbles show country averages
    but I can split them.
  • 31:59 - 32:02
    Take China. I can split it
    into provinces.
  • 32:02 - 32:05
    There goes Shanghai...
  • 32:05 - 32:08
    It has the same health
    and wealth as Italy today.
  • 32:08 - 32:11
    And there
    is the poor inland province Guizhou,
  • 32:11 - 32:13
    it is like Pakistan.
  • 32:13 - 32:19
    And if I split it further, the rural
    parts are like Ghana in Africa.
  • 32:20 - 32:23
    And yet, despite the enormous
    disparities today,
  • 32:23 - 32:27
    we have seen 200 years
    of remarkable progress!
  • 32:27 - 32:32
    That huge historical gap between
    the west and the rest is now closing.
  • 32:32 - 32:36
    We have become an entirely
    new, converging world.
  • 32:36 - 32:38
    And I see a clear trend
    into the future.
  • 32:38 - 32:41
    With aid, trade, green
    technology and peace,
  • 32:41 - 32:44
    it's fully possible
    that everyone can make it
  • 32:44 - 32:46
    to the healthy, wealthy corner.
  • 32:48 - 32:51
    Well, what you've just seen
    in the last few minutes
  • 32:51 - 32:57
    is a story of 200 countries
    shown over 200 years and beyond.
  • 32:57 - 33:01
    It involved plotting
    120,000 numbers.
  • 33:01 - 33:03
    Pretty neat, huh?
  • 33:08 - 33:13
    So, with statistics, we can begin
    to see things as they really are.
  • 33:13 - 33:18
    From tables of data to averages,
    distributions and visualisations,
  • 33:18 - 33:23
    statistics gives us a
    clear description of the world.
  • 33:23 - 33:28
    But, with statistics, we can
    not only discover WHAT is happening
  • 33:28 - 33:31
    but also explore WHY,
  • 33:31 - 33:34
    by using the powerful analytical
    method - correlation.
  • 33:35 - 33:38
    Just looking at one thing at a
    time doesn't tell you very much.
  • 33:38 - 33:41
    You've got to look at the
    relationships between things,
  • 33:41 - 33:43
    how they change,
    how they vary together.
  • 33:43 - 33:45
    That's what correlation is about.
  • 33:45 - 33:48
    That's how you start trying
    to understand the processes
  • 33:48 - 33:51
    that are really going on
    in the world and society.
  • 33:52 - 33:57
    Most of us today would recognise
    that crime correlates to poverty,
  • 33:57 - 34:00
    that infection correlates
    to poor sanitation,
  • 34:00 - 34:03
    and that knowledge of statistics
    correlates
  • 34:03 - 34:05
    to being great at dancing!
  • 34:07 - 34:10
    Correlations can be very tricky.
  • 34:10 - 34:13
    I got a joke about
    silly correlations.
  • 34:13 - 34:16
    There was this American who
    was afraid of heart attack.
  • 34:16 - 34:20
    He found out that
    the Japanese ate very little fat
  • 34:20 - 34:22
    and almost didn't drink wine,
  • 34:22 - 34:26
    but they had much less
    heart attacks than the Americans.
  • 34:26 - 34:29
    But, on the other hand,
    he also found out that the French
  • 34:29 - 34:35
    eat as much fat as the Americans
    and they drink much more wine but
    they also have less heart attacks.
  • 34:35 - 34:41
    So he concluded that what kills you
    is speaking English.
  • 34:41 - 34:44
    # Smoke, smoke,
    smoke that cigarette
  • 34:44 - 34:48
    # Puff, puff, puff and if you
    smoke yourself to death... #
  • 34:48 - 34:52
    The time, the pace,
    the cigarette. Weights Tilt.
  • 34:52 - 34:56
    The best example of a really
    ground-breaking correlation
  • 34:56 - 35:02
    is the link that was established
    in the 1950s between
    smoking and lung cancer.
  • 35:02 - 35:07
    Not long after the Second World War,
    a British doctor, Richard Doll,
  • 35:07 - 35:11
    investigated lung cancer patients
    in 20 London hospitals.
  • 35:11 - 35:15
    And he became certain
    that the only thing they had
    in common was smoking.
  • 35:15 - 35:18
    So certain,
    that he stopped smoking himself.
  • 35:18 - 35:22
    But other people weren't so sure.
  • 35:22 - 35:25
    A lot of the discussion
    of the early data,
  • 35:25 - 35:29
    linking smoking to lung cancer, said,
    "It's not the smoking, surely,
  • 35:29 - 35:33
    "that thing we've done all our lives,
    that can't be bad for you.
  • 35:33 - 35:35
    "Maybe it's genes.
  • 35:35 - 35:39
    "Maybe people who are genetically
    predisposed to get lung cancer
  • 35:39 - 35:44
    "are also genetically
    predisposed to smoke."
  • 35:44 - 35:47
    "Maybe it's not the smoking,
    maybe it's air pollution -
  • 35:47 - 35:53
    "that smokers are somehow
    more exposed to air pollution
    than non-smokers.
  • 35:53 - 35:56
    "Maybe it's not smoking,
    maybe it's poverty."
  • 35:56 - 36:01
    So now we've got three alternative
    explanations, apart from chance.
  • 36:02 - 36:07
    To verify his correlation
    did imply cause and effect.
  • 36:07 - 36:11
    Richard Doll created the biggest
    statistical study of smoking yet.
  • 36:11 - 36:15
    He began tracking the lives
    of 40,000 British doctors,
  • 36:15 - 36:17
    some of whom smoked
    and some of whom didn't,
  • 36:17 - 36:19
    and gathered enough data
  • 36:19 - 36:22
    to correlate the amount
    the doctors smoked
  • 36:22 - 36:25
    with their likelihood
    of getting cancer.
  • 36:25 - 36:30
    Eventually, he not only
    showed a correlation between
    smoking and lung cancer,
  • 36:30 - 36:36
    but also a correlation
    between stopping smoking
    and reducing the risk.
  • 36:36 - 36:38
    This was science at its best.
  • 36:40 - 36:44
    What correlations do not replace
    is human thought.
  • 36:44 - 36:47
    You've got to think
    about what it means.
  • 36:47 - 36:50
    What a good scientist does,
    if he comes with a correlation,
  • 36:50 - 36:56
    is try as hard as she or he
    possibly can to disprove it,
  • 36:56 - 37:00
    to break it down, to get rid of it,
    to try and refute it.
  • 37:00 - 37:05
    And if it withstands
    all those efforts at demolishing it
  • 37:05 - 37:11
    and it is still standing up then,
    cautiously, you say, "We really
    might have something here."
  • 37:27 - 37:33
    However brilliant the scientist,
    data is still the oxygen of science.
  • 37:33 - 37:39
    The good news is that the more we
    have, the more correlations we'll
    find, the more theories we'll test,
  • 37:39 - 37:42
    and the more discoveries
    we're likely to make.
  • 37:46 - 37:53
    And history shows how our total sum
    of information grows in huge leaps
    as we develop new technologies.
  • 37:53 - 38:00
    The invention of the
    printing press kicked off the first
    data and information explosion.
  • 38:00 - 38:06
    If you piled up all the books that
    had been printed by the year 1700,
  • 38:06 - 38:11
    they would make 60 stacks
    each as high as Mount Everest.
  • 38:13 - 38:15
    Then, starting in the 19th century,
  • 38:15 - 38:20
    there came a second information
    revolution with the telegraph,
  • 38:20 - 38:24
    gramophone and camera.
    And later radio and TV.
  • 38:24 - 38:28
    The total amount
    of information exploded.
  • 38:28 - 38:35
    And by the 1950s
    the information available to us all
    had multiplied 6,000 times.
  • 38:35 - 38:41
    Then, thanks to the computer and
    later the internet, we went digital.
  • 38:41 - 38:47
    And the amount of data we have now
    is unimaginably vast.
  • 38:50 - 38:55
    A single letter printed in a book
    is equivalent to a byte of data.
  • 38:55 - 38:59
    A printed page
    equals a kilobyte or two.
  • 39:02 - 39:06
    Five megabytes is enough for
    the complete works of Shakespeare.
  • 39:08 - 39:12
    10 gigabytes - that's a DVD movie.
  • 39:17 - 39:23
    Two terabytes
    is the tens of millions of photos
    added to Facebook every day.
  • 39:25 - 39:32
    Ten petabytes is the data recorded
    every second by the world's
    largest particle accelerator.
  • 39:32 - 39:36
    So much
    only a tiny fraction is kept.
  • 39:36 - 39:43
    Six exabytes is what you'd have
    if you sequenced the genomes
    of every single person on Earth.
  • 39:49 - 39:51
    But really, that's nothing.
  • 39:51 - 39:55
    In 2009, the internet
    added up to 500 exabytes.
  • 39:55 - 40:02
    In 2010, in just one year, that will
    double to more than one zettabyte!
  • 40:06 - 40:14
    Back in the real world, if we
    turned all this data into print
    it would make 90 stacks of books,
  • 40:14 - 40:19
    each reaching from here
    all the way to the sun!
  • 40:19 - 40:24
    The data deluge is staggering,
    but, with today's computers
  • 40:24 - 40:28
    and statistics,
    I'm confident we can handle it.
  • 40:28 - 40:31
    When it comes to all the data
    on the internet,
  • 40:31 - 40:34
    the powerhouse
    of statistical analysis
  • 40:34 - 40:38
    is the Silicon Valley giant Google.
  • 40:44 - 40:51
    The average person over their
    lifetime is exposed to about 100
    million words of conversation.
  • 40:51 - 40:55
    And so if you multiple that by the
    six billion people on the planet,
  • 40:55 - 40:58
    that amount of words is about
    equal to the number of words
  • 40:58 - 41:01
    that Google has available
    at any one instant in time.
  • 41:03 - 41:09
    Google's computers hoover up
    and file away every document,
    web page, and image they can find.
  • 41:09 - 41:15
    They then hunt for patterns and
    correlations in all this data,
  • 41:15 - 41:18
    doing statistics on a massive scale.
  • 41:18 - 41:26
    And, for me, Google has one project
    that's particularly exciting -
    statistical language translation.
  • 41:26 - 41:31
    We wanted to provide access
    to all the web's information,
    no matter what language you spoke.
  • 41:31 - 41:34
    There's just so much information
    on the internet,
  • 41:34 - 41:38
    you couldn't hope to translate it all
    by hand into every possible language.
  • 41:38 - 41:42
    We figured we'd have to be able
    to do machine translation.
  • 41:44 - 41:47
    In the past, programmers
    tried to teach their computers
  • 41:47 - 41:53
    to see each language as a set of
    grammatical rules - much like the
    way languages are taught at school.
  • 41:53 - 41:59
    But this didn't work because no set
    of rules could capture a language
  • 41:59 - 42:01
    in all its subtlety and ambiguity.
  • 42:01 - 42:06
    "Having eaten our lunch
    the coach departed."
  • 42:06 - 42:08
    Well, that's obviously incorrect.
  • 42:08 - 42:12
    Written like that it would imply
    that the coach has eaten the lunch.
  • 42:12 - 42:15
    It would be far better to say...
  • 42:15 - 42:20
    "having eaten our lunch
    we departed in the coach."
  • 42:20 - 42:26
    Those rules are helpful and they are
    useful most of time, but they don't
    turn out to be true all the time.
  • 42:26 - 42:30
    And the insight of using statistical
    machine translation is saying,
  • 42:30 - 42:35
    "If you've got to have all these
    exceptions anyways, maybe you can get
    by without having any of the rules.
  • 42:35 - 42:39
    "Maybe you can treat everything
    as an exception." And that's
    essentially what we've done.
  • 42:49 - 42:53
    What the computer is doing when
    he's learning how to translate
  • 42:53 - 42:55
    is to learn correlations
    between words
  • 42:55 - 42:57
    and correlations between phrases.
  • 42:57 - 43:01
    So we feed the system very large
    amounts of data
  • 43:01 - 43:05
    and then the system is seeing that
    a certain word or a certain phrase
  • 43:05 - 43:08
    correlates very often
    to the other language.
  • 43:10 - 43:16
    Google's website currently
    offers translation between
    any of 57 different languages.
  • 43:16 - 43:23
    It does this purely statistically,
    having correlated a huge collection
    of multilingual texts.
  • 43:23 - 43:26
    The people that built the system
    don't need to know Chinese
  • 43:26 - 43:30
    in order to build the
    Chinese-to-English system,
    or they don't need to know Arabic.
  • 43:30 - 43:33
    But the expertise that's needed is
    basically knowledge of statistics,
  • 43:33 - 43:36
    knowledge of computer science,
    knowledge of infrastructure
  • 43:36 - 43:41
    to build those very large
    computational systems
    that we are building for doing that.
  • 43:43 - 43:48
    I hooked up with Google
    from my office in Stockholm
    to try the translator for myself.
  • 43:48 - 43:52
    'I will type...
    some Swedish sentences.'
  • 43:52 - 43:53
    OK.
  • 43:53 - 43:55
    Sveriges...
  • 43:55 - 43:59
    ..guldring i orat.
  • 44:01 - 44:07
    OK. So it says, "Sweden's finance
    minister has a ponytail
    and a gold ring in your ear."
  • 44:07 - 44:12
    I guess it probably means
    in his ear. 'That's exactly
    correct, it's amazing!
  • 44:12 - 44:15
    'He comes from the Conservative
    party, that's the kind
    of Sweden we have today.
  • 44:15 - 44:19
    'I will type one more sentence.'
  • 44:19 - 44:22
    'I sitt samkonade...'
  • 44:22 - 44:26
    partnerskap...
  • 44:26 - 44:28
    nya biskop.
  • 44:28 - 44:35
    "In his same-sex partnership
    has Stockholm's new bishop
    and his partners a three-year son."
  • 44:35 - 44:38
    It's almost perfect,
    there's one important thing -
  • 44:38 - 44:42
    it's HER,
    it's a lesbian partnership.
  • 44:42 - 44:47
    OK, so those kinds of words his
    and her are one of the challenges
  • 44:47 - 44:49
    in translation
    to get really those right.
  • 44:49 - 44:52
    Especially when it comes
    to bishops one can excuse it!
  • 44:52 - 44:54
    'Right, right.'
  • 44:54 - 44:59
    I guess more often than not
    it would probably be a "his".
    'I will write one more sentence.'
  • 44:59 - 45:02
    Nar Sverige deltar
    I olympiader ar malet
  • 45:02 - 45:04
    'inte att vinna
    utan att sla Norge.'
  • 45:06 - 45:12
    OK. "When Sweden is taking part
    in Olympic goal is not
    to win but to beat Norway."
  • 45:12 - 45:14
    'Yes! This is what it is!
  • 45:14 - 45:18
    'But they are very good
    in Winter Olympics, so we
    can't make it, but we are trying.'
  • 45:18 - 45:20
    Ah, very good, very good.
  • 45:20 - 45:25
    'This is absolutely amazing, you
    know, and I was especially impressed
  • 45:25 - 45:31
    'that it picks up words like
    "same-sex partnership"
    which are very new to the language."
  • 45:31 - 45:37
    'The translator is good, but
    if they succeed with what's next,
    that'll be remarkable.'
  • 45:37 - 45:38
    One of the exciting possibilities
  • 45:38 - 45:43
    is combining the machine
    translation technology with
    the speech recognition technology.
  • 45:43 - 45:45
    Now, both of these
    are statistical in nature.
  • 45:45 - 45:51
    The machine translation relies
    on the statistics of mapping
    from one language to another,
  • 45:51 - 45:58
    and similarly speech recognition
    relies on the statistics of mapping
    from a sound form to the words.
  • 45:58 - 46:00
    When we put them together,
  • 46:00 - 46:03
    now we have the capability
    of having instant conversation
  • 46:03 - 46:07
    between two people
    that don't speak a common language.
  • 46:07 - 46:09
    I can talk to you in my language,
  • 46:09 - 46:12
    you hear me in your language
    and you can answer back.
  • 46:12 - 46:15
    And in real time we can
    make that translation,
  • 46:15 - 46:19
    we can bring two people together
    and allow them to speak.
  • 46:31 - 46:39
    The internet is just one
    of many technologies created
    to gather massive amounts of data.
  • 46:39 - 46:44
    Scientists studying
    our earth and our environment
  • 46:44 - 46:47
    now use an incredible range
    of instruments
  • 46:47 - 46:51
    to measure the processes
    of our planet.
  • 46:53 - 47:00
    All around us are sensors
    continuously measuring temperature,
    water flow, and ocean currents.
  • 47:00 - 47:07
    And high in orbit are satellites
    busy imaging cloud formations,
    forest growth and snow cover.
  • 47:07 - 47:11
    Scientists speak
    of "instrumenting the earth".
  • 47:13 - 47:20
    And pointing up to the skies
    above are powerful new telescopes
    mapping the universe.
  • 47:30 - 47:35
    What's happening in astronomy
    is typical of how profoundly
  • 47:35 - 47:40
    this new torrent of data
    is transforming science.
  • 47:40 - 47:45
    Astronomers are now addressing many
    enduring mysteries of the cosmos
  • 47:45 - 47:50
    by applying statistical methods
    to all this new data.
  • 48:00 - 48:03
    The galaxy is a very big place and
    it's got billions of stars in it,
  • 48:03 - 48:09
    and so to put together a coherent
    picture of the whole galaxy requires
    having an enormous amount of data.
  • 48:09 - 48:14
    And before you could do
    a large sky survey with
    sensitive, digital detectors
  • 48:14 - 48:17
    that meant that you could map many,
    many stars all at once,
  • 48:17 - 48:21
    it was very difficult to build up
    enough data on enough of the galaxy.
  • 48:25 - 48:29
    In the past, large surveys
    of the night sky had to be done
  • 48:29 - 48:32
    by exposing thousands
    of large photographic plates.
  • 48:32 - 48:37
    But these surveys could take
    25 years or more to complete.
  • 48:39 - 48:45
    Then, in the 1990s, came digital
    astronomy and a huge increase
  • 48:45 - 48:50
    in both the amount
    and the accessibility of data.
  • 48:50 - 48:56
    The Sloan Sky Survey
    is the world's biggest yet,
    using a massive digital sensor
  • 48:56 - 49:01
    mounted on the back
    of a custom-built telescope
    in New Mexico.
  • 49:01 - 49:05
    It's scanned the sky night
    after night for eight years,
  • 49:05 - 49:10
    building up a composite picture
    in unprecedented resolution.
  • 49:10 - 49:15
    The Sloan is some of the best,
    deepest survey data
    that we have in astronomy.
  • 49:15 - 49:19
    Both on our own galaxy and
    on galaxies further away from ours.
  • 49:24 - 49:27
    All the Sloan data
    is on the internet,
  • 49:27 - 49:34
    and with it astronomers
    have identified millions of hitherto
    unknown stars and galaxies.
  • 49:34 - 49:37
    They also comb the database
    for statistical patterns
  • 49:37 - 49:43
    which will prove, disprove,
    or even suggest new theories.
  • 49:43 - 49:49
    So we have this idea that galaxies
    grow, they become large galaxies like
    the one we live in, the milky way,
  • 49:49 - 49:56
    not all at once, or not smoothly,
    but by continuously incorporating,
  • 49:56 - 49:59
    basically cannibalising,
    smaller galaxies.
  • 49:59 - 50:04
    They dissolve them
    and they become part
    of the bigger galaxy as it grows.
  • 50:06 - 50:13
    It's a startling idea,
    and, in the Sloan data,
    is the evidence to support it.
  • 50:13 - 50:16
    Groups of stars that came
    from cannibalised galaxies
  • 50:16 - 50:21
    stand out in the Sloan data
    as statistically different
    from other stars
  • 50:21 - 50:24
    because they move
    at a different velocity.
  • 50:24 - 50:29
    Each big spike
    on one of these distribution graphs
  • 50:29 - 50:35
    means Professor Rockosi has found
    a group of stars all travelling
    in a different way to the rest.
  • 50:35 - 50:38
    They are the telltale
    patterns she's looking for.
  • 50:40 - 50:45
    The evidence is accumulating
    that, in fact, this really is
    how galaxies grow,
  • 50:45 - 50:47
    or an important way
    in which how galaxies grow.
  • 50:47 - 50:53
    And so this is an important part
    of understanding how galaxies form,
    not only ours but every galaxy.
  • 50:56 - 51:00
    The more data there is,
    the more discoveries can be made.
  • 51:00 - 51:03
    And the technology
    is getting better all the time.
  • 51:03 - 51:08
    The next big survey telescope
    starts its work in 2015.
  • 51:08 - 51:11
    It will leave Sloan in the dust!
  • 51:11 - 51:16
    Sloan has taken eight years to cover
    one quarter of the night sky.
  • 51:18 - 51:26
    The new telescope will scan
    the entire sky, in even greater
    resolution, every three days!
  • 51:34 - 51:41
    The vast amounts of data
    we have today allows researchers
    in all sorts of fields
  • 51:41 - 51:46
    to test their theories
    on a previously unimaginable scale.
  • 51:46 - 51:54
    But more than this,
    it may even change
    the fundamental way science is done.
  • 51:54 - 51:59
    With the power of today's computers
    applied to all this data,
  • 51:59 - 52:04
    the machines might even be able
    to guide the researchers.
  • 52:15 - 52:18
    We're at a potentially
    profoundly important
  • 52:18 - 52:23
    and potentially one of the most
    significant points in science,
  • 52:23 - 52:25
    and certainly one of
    the most exciting,
  • 52:25 - 52:32
    where the potential to transform
    not just how scientists do science
    but even what science is possible.
  • 52:32 - 52:35
    And what will power
    that transformation
  • 52:35 - 52:38
    of both how science is done
    and even what science is possible
  • 52:38 - 52:40
    is going to be computation.
  • 52:42 - 52:49
    Many of the dynamics of the natural
    world, like the interplay between
    the rainforests and the atmosphere,
  • 52:49 - 52:54
    are so complex that we don't
    as yet really understand them.
  • 52:54 - 52:59
    But now computers are generating
    literally tens of thousands
    of different simulations
  • 52:59 - 53:03
    of how these
    biological systems might work.
  • 53:03 - 53:08
    It's like creating thousands
    of hypothetical parallel worlds.
  • 53:08 - 53:11
    Each and every one
    of these simulations
  • 53:11 - 53:18
    is analysed with statistics
    to see if any are a good match
    for what is observed in nature.
  • 53:18 - 53:22
    The computers can now
    automatically generate,
  • 53:22 - 53:26
    test and discard hypotheses
    with scarcely a human in sight.
  • 53:28 - 53:35
    This new application of statistics
    will become absolutely vital
    for the future of science.
  • 53:35 - 53:39
    It's creating a new paradigm,
    if you like,
  • 53:39 - 53:43
    in science, in the way
    in which we can do science,
  • 53:43 - 53:45
    which is increasingly...
  • 53:45 - 53:51
    Which one might characterise as...
    data-centric or data driven
  • 53:51 - 53:55
    rather than being hypothesis-driven
    or experimentally-driven.
  • 53:55 - 53:58
    So, it's exciting times
    in terms of the science,
  • 53:58 - 54:02
    in terms of the computation
    and in terms of the statistics.
  • 54:09 - 54:15
    Now, if all that sounds a bit
    abstract and theoretical to you,
    how about one final frontier?
  • 54:15 - 54:19
    Could statistics even make
    sense of your feelings?
  • 54:21 - 54:26
    In California - where else? -
    one computer scientist
  • 54:26 - 54:33
    is harvesting the internet to try
    to divine the patterns of our
    innermost thoughts and emotions.
  • 54:45 - 54:46
    This is the madness movement.
  • 54:46 - 54:51
    The madness movement represents
    a skyscraper view of the world.
  • 54:51 - 54:55
    Each of these brightly coloured dots
    is an individual feeling
  • 54:55 - 54:59
    expressed by someone out there
    in a blog or a tweet.
  • 54:59 - 55:04
    And when you click on the dot
    it explodes to reveal the
    underlying feeling of that person.
  • 55:04 - 55:07
    This is what people say
    they're feeling today.
  • 55:08 - 55:10
    Better...safe...
  • 55:10 - 55:12
    crappy...
  • 55:12 - 55:15
    well...
  • 55:15 - 55:18
    pretty...special...
  • 55:18 - 55:21
    sorry...alone...
  • 55:26 - 55:29
    So, every minute, We Feel Fine
    crawls the world's blogs,
  • 55:29 - 55:34
    takes all the sentences
    that start with the words
    "I feel" or "I am feeling",
  • 55:34 - 55:36
    and puts them in a database.
  • 55:36 - 55:40
    We collect all the feelings
    and we count the most common.
  • 55:40 - 55:43
    They are better...bad...
  • 55:43 - 55:46
    good...right...
  • 55:46 - 55:49
    guilty...sick...
  • 55:49 - 55:52
    the same...like shit...
  • 55:52 - 55:55
    sorry...well...
  • 55:55 - 55:56
    and so on.
  • 55:58 - 56:02
    And we can take a look at any
    one feeling and analyse it.
  • 56:02 - 56:05
    Right now a lot of people
    are feeling happy.
  • 56:05 - 56:11
    We can take a look at all the
    people who are happy and break it
    down by age, gender or location.
  • 56:11 - 56:17
    Since bloggers have public profiles
    we have that information and
    so we can ask questions like,
  • 56:17 - 56:21
    "Are women happier than men?"
    or, "Is England happier
    than the United States?"
  • 56:30 - 56:33
    We find that, as people get older,
    they get happier.
  • 56:33 - 56:41
    And, moreover, we find that
    for younger people they associate
    happiness more with excitement,
  • 56:41 - 56:47
    and, as people get older,
    they associate happiness
    more with peacefulness.
  • 56:51 - 56:58
    And we also find that women feel
    loved more often than men,
    but also more guilty.
  • 56:58 - 57:02
    While men feel good more often
    than women, but also more alone.
  • 57:07 - 57:12
    As people lead more and
    more of their lives online,
    they leave behind digital traces,
  • 57:12 - 57:20
    and with these digital traces
    we can begin to statistically analyse
    what it means to be human.
  • 57:51 - 57:54
    So where does all of this leave us?
  • 57:54 - 58:00
    We generate unimaginable
    quantities of data
    about everything you can think of.
  • 58:00 - 58:03
    We analyse it to reveal
    the patterns.
  • 58:03 - 58:10
    And now not only experts
    but all of us can understand
    the stories in the numbers.
  • 58:18 - 58:21
    Instead of being
    led astray by prejudice,
  • 58:21 - 58:28
    with statistics at our fingertips,
    our eyes can be open
    for a fact-based view of the world.
  • 58:28 - 58:34
    So, more than ever before, we can
    become authors of our own destiny.
  • 58:34 - 58:37
    And that's pretty
    exciting isn't it?!
  • 58:38 - 58:44
    # 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
    12, 13, 14, 15, 16, 17, 18, 19, 20
  • 58:44 - 58:51
    # 1, 22, 3, 24, 25, 26, 27, 28, 9,
    30, 31, 32, 3, 34, 35, 36, 7
  • 58:51 - 58:54
    # 38, 39, 40, 41, 42, 3,
    44, 45, 46, 47
  • 58:54 - 58:59
    LYRICS DEGENERATE INTO GIBBERISH
  • 59:09 - 59:13
    GIBBERISH DEGENERATES INTO NOISE
  • 59:13 - 59:14
    # 100. #
Title:
The Joy of Stats
Description:

Documentary which takes viewers on a rollercoaster ride through the wonderful world of statistics to explore the remarkable power they have to change our understanding of the world, presented by superstar boffin Professor Hans Rosling, whose eye-opening, mind-expanding and funny online lectures have made him an international internet legend.

more » « less
Video Language:
English
Duration:
59:13
ettorerizza edited English, British subtitles for The Joy of Stats
Adam Biernacki edited English, British subtitles for The Joy of Stats
Adam Biernacki added a translation

English, British subtitles

Revisions Compare revisions