The world we live in is awash with
data that comes pouring in
from everywhere around us.
On its own this data
is just noise and confusion.
To make sense of data, to find the
meaning in it, we need the powerful
branch of science - statistics.
Believe me there's nothing
boring about statistics.
Especially not today
when we can make the data sing.
With statistics we can
really make sense of the world.
And there's more.
With statistics, the data deluge, as
it's being called, is leading us
to an ever greater understanding
of life on Earth
and the universe beyond.
And thanks to the incredible
power of today's computers,
it may fundamentally transform the
process of scientific discovery.
I kid you not, statistics is
now the sexiest subject around.
Did you know that there is
one million boats in Sweden?
That's one boat per nine people!
It's the highest number of
boats per person in Europe!
Being a statistician,
you don't like telling
your profession at dinner parties.
But really,
statisticians shouldn't be shy
because everyone wants to
understand what's going on.
And statistics gives us a
perspective on the world we live in
that we can't get in any other way.
Statistics tells us whether
the things we think
and believe are actually true.
And statistics are far more useful
than we usually like to admit.
In the last recession there
was this famous call-in
to a talk radio station.
The man complained, "In times like
this when unemployment rates are up
to 13%, income has fallen by 5%,
"and suicide rates are climbing, and
I get so angry that the government
"is wasting money on things like
collection of statistics."
I'm not officially a statistician.
Strictly speaking,
my field is global health.
But I got really obsessed with stats
when I realised how much people
in Sweden just don't know
about the rest of the world.
I started in our medical
university, Karolinska Institutet,
an undergraduate course
called Global Health.
These students coming to us actually
have the highest grade you can get
in the Swedish college system,
so I thought, "Maybe they know
everything I'm going to teach them."
So I did a pre-test when they came,
and one of the questions
from which I learned a lot
was this one -
which country has the highest
child mortality of these five pairs?
I won't put you at test here,
but it is Turkey
which is highest there, Poland,
Russia, Pakistan, and South Africa.
And these were the result of
the Swedish students.
A 1.8 right answer
out of five possible.
And that means there was a place for
a professor of International Health
and for my course.
But one late night
when I was compiling the report,
I really realised my discovery.
I had shown that Swedish
top students know statistically
significantly less about
the world than the chimpanzees.
Because the chimpanzees
would score half right.
If I gave them two bananas
with Sri Lanka and Turkey,
they would be right
half of the cases,
but the students are not there.
I did also an unethical study
of the professors of
the Karolinska Institutet,
that hands out the Nobel Prize
for medicine, and they are on par
with the chimpanzees there.
Today there's more information
accessible than ever before.
'And I work with my team at
the Gapminder Foundation
'using new tools that help everyone
make sense of the changing world.
'We draw on the masses of data
that's now freely available
'from international institutions
like the UN and the World Bank.
'And it's become my mission to
share the insights
'from this data with anyone who'll
listen, and to reveal how statistics
is nothing to be frightened of.'
I'm going to provide you a view of
the global health situation
across mankind.
And I'm going to do that in
hopefully an enjoyable way,
so relax.
So we did this software
which displays it like this.
Every bubble here is a country -
this is China, this is India.
The size of the bubble
is the population.
I'm going to stage a race between
this sort of yellowish Ford here
and the red Toyota down there
and the brownish Volvo.
The Toyota has a very bad start
down here, and United States,
Ford is going off-road there,
and the Volvo is doing quite fine,
this is the war.
The Toyota got off track, now Toyota
is on the healthier side of Sweden.
That's about where I sold
the Volvo and bought the Toyota.
AUDIENCE LAUGH
This is the great leap forward,
when China fell down.
It was the central planning
by Mao Zedong.
China recovered and said, "Never
more stupid central planning,"
but they went up here.
No, there is one more inequity,
look there - United States
They broke my frame. Washington DC
is so rich over there,
but they are
not as healthy as Kerala in India.
It's quite interesting, isn't it?
LAUGHTER AND APPLAUSE
Welcome to the USA,
world leaders in big cars
and free data.
There are many here who share
my vision of making public data
accessible and useful for everyone.
The city of San Francisco
is in the lead, opening up
its data on everything.
Even the police department is
releasing all its crime reports.
This official
crime data has been turned
into a wonderful interactive map by
two of the city's computer whizzes.
It's community statistics in action.
Crimespotting is
a map of crime reports from the
San Francisco Police Department
showing dots on maps
for citizens to be able to see
patterns of crime around their
neighbourhoods in San Francisco.
The map is not just about individual
crimes but about broader patterns
that show you where crime is
clustered around the city, which
areas have high crime,
and which areas have
relatively low crime.
We're here at the top of
Jones Street on Nob Hill...
..quite a nice neighbourhood.
What the crime maps show us
is the relationship between
topography and crime.
Basically the higher up the hill,
the less crime there is.
You cross over the border
into the flats...
Essentially as soon as you get
into the lower lying areas of Jones
Street the crime just skyrockets.
We're here in
the uptown Tenderloin district.
It's one of the oldest and densest
neighbourhoods in San Francisco.
This is where you go to buy drugs.
Right around here.
We see lots of aggravated assaults,
lots of auto thefts.
Basically a huge part of the crime
that happens in the city happens
in this five or six block radius.
If you've been hearing police sirens
in your neighbourhood,
you can use the map to find out why.
If you're out at night in
an unfamiliar part of town,
you can check the map
for streets to avoid.
If a neighbour gets burgled,
you can see -
is it a one-off or has there been
a spike in local crime?
If you commute through a
neighbourhood and you're worried
about its safety, the fact that we
have the ability to turn off all
the night-time
and middle-of-the-day crimes
and show you just the things that are
happening during the commute,
it is a statistical operation.
But I think to people that are
interacting with the thing
it feels very much more like they're
just sort of browsing a website
or shopping on Amazon.
They're looking at data
and they don't realise
they're doing statistics.
What's most exciting for me
is that public statistics
is making citizens more powerful and
the authorities more accountable.
We have community meetings that
the police attend
and what citizens are
now doing are bringing printouts
of the maps that show where crimes
are taking place,
and they're demanding services
from the police department
and the police department is now
having to change how they police,
how they provide policing services,
because the data is showing
what is working and what is not.
People in San Francisco
are also using public data
to map social inequalities
and see how to improve society.
And the possibilities are endless.
I think our dream
government data analysis project
would really be focused on
live information,
on stuff that was being reported
and pushed out to the world over
the internet as it was happening.
You know, trash pickups,
traffic accidents, buses,
and I think through the kind of
stats-gathering power
of the internet
it's possible to really begin
to see the workings of the city
displayed as a unified interface.
So that's where we are heading.
Towards a world of free data
with all the statistical
insights that come from it,
accessible to everyone, empowering
us as citizens and letting us
hold our rulers to account.
It's a long way from
where statistics began.
Statistics
are essential to us to monitor
our governments and our societies.
But it was our rulers up
there who started
the collection of statistics in the
first place in order to monitor us!
In fact the word 'statistics'
comes from 'the state'.
Modern statistics
began two centuries ago.
Once it got going,
it spread and never stopped.
And guess who was first!
The Chinese have Confucius,
the Italians have da Vinci,
and the British have Shakespeare.
And we have the Tabellverket -
the first ever systematic
collection of statistics!
Since the year 1749
we have collected data
on every birth, marriage and death,
and we are proud of it!
The Tabellverket recorded
information
from every parish in Sweden.
It was a huge quantity of data and
it was the first time any government
could get an accurate
picture of its people.
Sweden had been the greatest
military power in Northern Europe,
but by 1749 our star
was really fading
and other countries
were growing stronger.
At least we were a large power,
thought to have 20 million people,
enough to rival Britain and France.
But we were in for a nasty surprise.
The first analysis
of the Tabellverket
revealed that Sweden
only had two million inhabitants.
Sweden was not just a power
in decline, it also had
a very small population.
The government was horrified
by this finding -
what if the enemy found out?
But the Tabellverket also showed
that many women died in childbirth
and many children died young.
So government took action
to improve the health of the people.
This was the beginning
of modern Sweden.
It took more than 50 years before
the Austrians, Belgians, Danes,
Dutch, French, Germans, Italians
and, finally, the British,
caught up with Sweden
in collecting and using statistics.
It was called political arithmetic.
It was a lovely phrase
that was used for statistics.
Governments could have much more
control and understanding of
the society - how it was working,
how it was developing
and essentially
so they could control it better.
It wasn't just governments who
woke up to the power of statistics.
Right across Europe, 19th
century society went mad for facts.
And, despite its late start,
Britain,
with its Royal Statistical Society
in London,
was soon a statisticians' nirvana.
I love looking at old copies of
the Royal Statistical Society journal
because it's full of such odd stuff.
There's a wonderful paper
from the 1840s
which shows a map of England and
the rates of bastardy in each county.
So you can identify very quickly the
areas with high rates of bastardy.
Being in East Anglia it always
makes me slightly laugh that Norfolk
seems to top the "bastardy league"
in the 1840s.
One of the founders of
the Royal Statistical Society
was the great
Victorian mathematician
and inventor Charles Babbage.
In 1842 he read the latest
poem by an equally great Victorian,
Alfred Tennyson.
Vision of Sin contained the lines:
"Fill the cup, and fill the can
"Have a rouse before the morn
"Every moment dies a man
Every moment one is born."
So keen a statistician was Babbage
that he could not contain himself.
He dashed off a letter to Tennyson
explaining that because of
population growth,
the line should read,
"Every moment dies a man
and one and a 16th is born."
I may add that
the exact figure is 1.067,
but something must be
conceded to the laws of metre.
In the 19th century, scholars all
over Europe did amazing work
in measuring their societies.
They were hoovering up
data on almost everything.
But numbers alone
don't tell you anything.
You have to analyse them,
and that's what makes statistics.
When the first statisticians
began to get to grips with
analysing their data
they seized upon the average, and
they took the average of everything.
What's so great
about an average is that
you can take a whole mass of data
and reduce it to a single number.
And though each of us is unique,
our collective lives produce
averages that can
characterise whole populations.
I looked in my local newspaper
one week and saw a pensioner
had accidentally put her foot on
the accelerator
and crushed her friend
against a wall.
Devastating, hideous,
horrible thing to happen.
And then there was a second one about
a young man who didn't have
a driving licence, was driving a car
under the influence of drugs
and alcohol
and he bashed into a pedestrian
and killed him.
What's remarkable, absolutely
remarkable, if you look at the number
of people who die each year
in traffic crashes,
it's nearly a constant.
What?
All these individual events,
somehow when you sum them all up
there's the same number every year.
And every year, two and a half
times as many men
die in traffic crashes
as women, and it's a constant.
And every year the rate in Belgium
is double the rate in England.
There are these
remarkable regularities.
So that these individual
particular events sum up
into a social phenomenon.
Let's see what Sweden have done.
We used to boast about fast social
progress, that's where we were....
'In my lectures, to tell stories
about the changing world,
'I use the averages
from entire countries,
'whether the average of income,
child mortality, family size
'or carbon output.'
OK, I give you Singapore.
The year I was born,
Singapore had twice the child
mortality of Sweden, the most
tropical country in the world,
a marshland on
the Equator, and here we go.
It took a little time for them
to get independent,
but then they started to grow
their economy,
and they made the social investment,
they got away malaria,
they got a magnificent health system
that beat both US and Sweden.
We never thought it would happen
that they would win over Sweden!
LAUGHTER AND APPLAUSE
But useful as averages are,
they don't tell you the whole story.
On average, Swedish people have
slightly less than two legs.
This is because few people
only have one leg or no legs,
and no-one has three legs.
So almost everybody in Sweden
has more than
the average number of legs.
The variation in data is just
as important as the average.
But how do you get
a handle on variation?
For this, you transform
numbers into shapes.
Let's look again at the number of
adult women in Sweden
for different heights.
Plotting the data as a shape
shows how much their heights
vary from the average
and how wide that variation is.
The shape a set of data makes
is called its distribution.
This is the income distribution
of China, 1970.
This is the income distribution
of the United States, 1970.
Almost no overlap,
and what has happened?
China is growing,
it's not so equal any longer,
and it's appearing here
overlooking the United States.
Almost like a ghost, isn't it?
It's pretty scary.
Rrrr!
LAUGHTER
The statisticians
who first explored distribution
discovered one shape
that turned up again and again.
The Victorian scholar
Francis Galton
was so fascinated he built
a machine that could reproduce it,
and he found it fitted so many
different sets of measurements
that he named it
the normal distribution.
Whether it was people's arm spans,
lung capacities,
or even their exam results,
the normal distribution shape
recurred time and time again.
Other statisticians soon found
many other regular shapes,
each produced by particular kinds
of natural or social processes.
And every statistician
has their favourite.
The Poisson distribution, the Poisson
shape is my favourite distribution.
I think it's an absolute cracker.
The Poisson shape
describes how likely it is
that out-of-the-ordinary things
will happen.
Imagine a London bus stop where
we know that on average
we'll get three buses in an hour.
We won't always get
three buses, of course.
Amazingly, the Poisson shape will
show us the probability
that in any given hour we will get
four, five, or six buses,
or no buses at all.
The exact shape changes
with the average.
But whether it's how many people
will win the lottery jackpot
each week,
or how many people will phone
a call centre each minute,
the Poisson shape
will give the probabilities.
The wonderful example where this was
applied to in the late 19th century
was to count each year the number of
Prussian officers,
cavalry officers, who were kicked
to death by their horses.
Now, some years there were none,
some years there were one,
some years there were two,
up to seven, I think,
one particularly bad year.
But with this distribution,
however many years there were
with nought, one, two, three,
four Prussian cavalry officers
kicked to death by their horses,
beautifully obeyed
the Poisson distribution.
So statisticians use shapes to
reveal the patterns in the data.
But we also use images of all kinds
to communicate statistics
to a wider public.
Because if the story in the numbers
is told by a beautiful and clever
image, then everyone understands.
Of the pioneers
of statistical graphics,
my favourite is Florence Nightingale.
There are not many people who realise
that she was known
as a passionate statistician
and not just the Lady of the Lamp.
She said that "to understand God's
thoughts, we must study statistics,
"for these are
the measure of His purpose."
Statistics was for her a religious
duty and moral imperative.
When Florence was nine years old
she started collecting data.
Her data was different
fruits and vegetables she found.
Put them into different tables.
Trying to organise them
in some standard form.
And so we have one of Nightingale's
first statistical tables
at the age of nine.
In the mid 1850s Florence
Nightingale went to the Crimea to
care for British casualties of war.
She was horrified by
what she discovered.
For all the soldiers being blown
to bits on the battlefield,
there were many, many more soldiers
dying from diseases they caught
in the army's filthy hospitals.
So Florence Nightingale
began counting the dead.
For two years she recorded
mortality data in meticulous detail.
When the war was over she persuaded
the government to set up
a Royal Commission of Inquiry,
and gathered her data
in a devastating report.
What has cemented her place in
the statistical history books
are the graphics she used.
And one in particular,
the polar area graph.
For each month of the war,
a huge blue wedge represented
the soldiers who had died
from preventable diseases.
The much smaller red wedges were
deaths from wounds,
and the black wedges were deaths
from accidents and other causes.
Nightingale's graphics were so clear
they were impossible to ignore.
The usual thing around
Florence Nightingale's time
was just to produce tables and
tables of figures - absolutely
really tedious stuff that,
unless you're an absolutely dedicated
statistician,
it's really quite difficult to spot
the patterns quite naturally.
But visualisations, they tell a
story, they tell a story immediately.
And the use of colour
and the use of shape can
really tell a powerful story.
And nowadays of course
we can make things move as well.
Florence Nightingale would have
loved to have played with...
She would have
produced wonderful animations,
I'm absolutely certain of it.
Today, 150 years on,
Nightingale's graphics
are rightly regarded as a classic.
They led to a revolution
in nursing, health care
and hygiene in hospitals worldwide,
which saved innumerable lives.
And statistical graphics has
become an art form of its very own,
led by designers who are
passionate about visualising data.
This is the Billion Pound-O-Gram.
This image arose out of frustration
with the reporting of billion pound
amounts in the media.
£500 billion pounds for this war.
£50 billion for this oil spill.
It doesn't make sense -
the numbers are too enormous
to get your mind round.
So I scraped all this data
from various news sources
and created this diagram.
So the
squares here are scaled according
to the billion pound amounts.
When you see numbers visualised
like this
you start to have a different
relationship with them.
You can start to see the patterns,
and the scale of them.
Here in the corner,
this little square - £37 billion.
This was the predicted cost
of the Iraq war in 2003.
As you can see it's grown
exponentially over the last few years
and the total cost now is
around about £2,500 billion.
It's funny because when
you visualise statistics
you understand them,
and when you understand them
you can really start to put things
in perspective.
Visualisation is right at
the heart of my own work too.
I teach global health.
And I know having the data
is not enough -
I have to show it in ways people
both enjoy and understand.
Now I'm going to try something
I've never done before.
Animating the data in real space,
with a bit of technical
assistance from the crew.
So here we go.
First, an axis for health.
Life expectancy
from 25 years to 75 years.
And down here an axis for wealth.
Income per person -
400, 4,000, 40,000.
So down here is poor and sick.
And up here is rich and healthy.
Now I'm going to show you the world
200 years ago, in 1810.
Here come all the countries.
Europe, brown;
Asia, red; Middle East, green;
Africa south of the Sahara,
blue; and the Americas, yellow.
And the size of the country bubble
shows the size of the population.
In 1810, it was pretty crowded
down there, wasn't it?
All countries were sick and poor.
Life expectancy
was below 40 in all countries.
And only UK and the Netherlands were
slightly better off. But not much.
And now I start the world.
The industrial revolution makes
countries in Europe and elsewhere
move away from the rest.
But the colonized countries
in Asia and Africa,
they are stuck down there.
And eventually the Western countries
get healthier and healthier.
And now we slow down to show
the impact of the First World War
and the Spanish flu epidemic.
What a catastrophe!
And now I speed up through
the 1920s and the 1930s and,
in spite of the Great Depression,
Western countries forge on towards
greater wealth and health.
Japan and some others try to follow.
But most countries stay down here.
And after the tragedies
of the Second World War,
we stop a bit to look
at the world in 1948.
1948 was a great year.
The war was over,
Sweden topped the medal table at
the Winter Olympics and I was born.
But the differences between
the countries of the world
was wider than ever.
United States was in the front.
Japan was catching up.
Brazil was way behind,
Iran was getting a little richer
from oil but still had short lives.
And the Asian giants...
China, India, Pakistan, Bangladesh,
and Indonesia,
they were still
poor and sick down here.
But look what was about to happen!
Here we go again.
In my lifetime, former colonies
gained independence and then finally
they started to get healthier
and healthier and healthier.
And in the 1970s, then countries
in Asia and Latin America
started to catch up
with the Western countries.
They became the emerging economies.
Some in Africa follows,
some Africans were stuck in civil
war, and others were hit by HIV.
And now we can see the world
in the most up-to-date statistics.
Most people today
live in the middle.
But there is huge difference
at the same time
between the best-off countries
and the worst-off countries.
And there are also huge
inequalities within countries.
These bubbles show country averages
but I can split them.
Take China. I can split it
into provinces.
There goes Shanghai...
It has the same health
and wealth as Italy today.
And there
is the poor inland province Guizhou,
it is like Pakistan.
And if I split it further, the rural
parts are like Ghana in Africa.
And yet, despite the enormous
disparities today,
we have seen 200 years
of remarkable progress!
That huge historical gap between
the west and the rest is now closing.
We have become an entirely
new, converging world.
And I see a clear trend
into the future.
With aid, trade, green
technology and peace,
it's fully possible
that everyone can make it
to the healthy, wealthy corner.
Well, what you've just seen
in the last few minutes
is a story of 200 countries
shown over 200 years and beyond.
It involved plotting
120,000 numbers.
Pretty neat, huh?
So, with statistics, we can begin
to see things as they really are.
From tables of data to averages,
distributions and visualisations,
statistics gives us a
clear description of the world.
But, with statistics, we can
not only discover WHAT is happening
but also explore WHY,
by using the powerful analytical
method - correlation.
Just looking at one thing at a
time doesn't tell you very much.
You've got to look at the
relationships between things,
how they change,
how they vary together.
That's what correlation is about.
That's how you start trying
to understand the processes
that are really going on
in the world and society.
Most of us today would recognise
that crime correlates to poverty,
that infection correlates
to poor sanitation,
and that knowledge of statistics
correlates
to being great at dancing!
Correlations can be very tricky.
I got a joke about
silly correlations.
There was this American who
was afraid of heart attack.
He found out that
the Japanese ate very little fat
and almost didn't drink wine,
but they had much less
heart attacks than the Americans.
But, on the other hand,
he also found out that the French
eat as much fat as the Americans
and they drink much more wine but
they also have less heart attacks.
So he concluded that what kills you
is speaking English.
# Smoke, smoke,
smoke that cigarette
# Puff, puff, puff and if you
smoke yourself to death... #
The time, the pace,
the cigarette. Weights Tilt.
The best example of a really
ground-breaking correlation
is the link that was established
in the 1950s between
smoking and lung cancer.
Not long after the Second World War,
a British doctor, Richard Doll,
investigated lung cancer patients
in 20 London hospitals.
And he became certain
that the only thing they had
in common was smoking.
So certain,
that he stopped smoking himself.
But other people weren't so sure.
A lot of the discussion
of the early data,
linking smoking to lung cancer, said,
"It's not the smoking, surely,
"that thing we've done all our lives,
that can't be bad for you.
"Maybe it's genes.
"Maybe people who are genetically
predisposed to get lung cancer
"are also genetically
predisposed to smoke."
"Maybe it's not the smoking,
maybe it's air pollution -
"that smokers are somehow
more exposed to air pollution
than non-smokers.
"Maybe it's not smoking,
maybe it's poverty."
So now we've got three alternative
explanations, apart from chance.
To verify his correlation
did imply cause and effect.
Richard Doll created the biggest
statistical study of smoking yet.
He began tracking the lives
of 40,000 British doctors,
some of whom smoked
and some of whom didn't,
and gathered enough data
to correlate the amount
the doctors smoked
with their likelihood
of getting cancer.
Eventually, he not only
showed a correlation between
smoking and lung cancer,
but also a correlation
between stopping smoking
and reducing the risk.
This was science at its best.
What correlations do not replace
is human thought.
You've got to think
about what it means.
What a good scientist does,
if he comes with a correlation,
is try as hard as she or he
possibly can to disprove it,
to break it down, to get rid of it,
to try and refute it.
And if it withstands
all those efforts at demolishing it
and it is still standing up then,
cautiously, you say, "We really
might have something here."
However brilliant the scientist,
data is still the oxygen of science.
The good news is that the more we
have, the more correlations we'll
find, the more theories we'll test,
and the more discoveries
we're likely to make.
And history shows how our total sum
of information grows in huge leaps
as we develop new technologies.
The invention of the
printing press kicked off the first
data and information explosion.
If you piled up all the books that
had been printed by the year 1700,
they would make 60 stacks
each as high as Mount Everest.
Then, starting in the 19th century,
there came a second information
revolution with the telegraph,
gramophone and camera.
And later radio and TV.
The total amount
of information exploded.
And by the 1950s
the information available to us all
had multiplied 6,000 times.
Then, thanks to the computer and
later the internet, we went digital.
And the amount of data we have now
is unimaginably vast.
A single letter printed in a book
is equivalent to a byte of data.
A printed page
equals a kilobyte or two.
Five megabytes is enough for
the complete works of Shakespeare.
10 gigabytes - that's a DVD movie.
Two terabytes
is the tens of millions of photos
added to Facebook every day.
Ten petabytes is the data recorded
every second by the world's
largest particle accelerator.
So much
only a tiny fraction is kept.
Six exabytes is what you'd have
if you sequenced the genomes
of every single person on Earth.
But really, that's nothing.
In 2009, the internet
added up to 500 exabytes.
In 2010, in just one year, that will
double to more than one zettabyte!
Back in the real world, if we
turned all this data into print
it would make 90 stacks of books,
each reaching from here
all the way to the sun!
The data deluge is staggering,
but, with today's computers
and statistics,
I'm confident we can handle it.
When it comes to all the data
on the internet,
the powerhouse
of statistical analysis
is the Silicon Valley giant Google.
The average person over their
lifetime is exposed to about 100
million words of conversation.
And so if you multiple that by the
six billion people on the planet,
that amount of words is about
equal to the number of words
that Google has available
at any one instant in time.
Google's computers hoover up
and file away every document,
web page, and image they can find.
They then hunt for patterns and
correlations in all this data,
doing statistics on a massive scale.
And, for me, Google has one project
that's particularly exciting -
statistical language translation.
We wanted to provide access
to all the web's information,
no matter what language you spoke.
There's just so much information
on the internet,
you couldn't hope to translate it all
by hand into every possible language.
We figured we'd have to be able
to do machine translation.
In the past, programmers
tried to teach their computers
to see each language as a set of
grammatical rules - much like the
way languages are taught at school.
But this didn't work because no set
of rules could capture a language
in all its subtlety and ambiguity.
"Having eaten our lunch
the coach departed."
Well, that's obviously incorrect.
Written like that it would imply
that the coach has eaten the lunch.
It would be far better to say...
"having eaten our lunch
we departed in the coach."
Those rules are helpful and they are
useful most of time, but they don't
turn out to be true all the time.
And the insight of using statistical
machine translation is saying,
"If you've got to have all these
exceptions anyways, maybe you can get
by without having any of the rules.
"Maybe you can treat everything
as an exception." And that's
essentially what we've done.
What the computer is doing when
he's learning how to translate
is to learn correlations
between words
and correlations between phrases.
So we feed the system very large
amounts of data
and then the system is seeing that
a certain word or a certain phrase
correlates very often
to the other language.
Google's website currently
offers translation between
any of 57 different languages.
It does this purely statistically,
having correlated a huge collection
of multilingual texts.
The people that built the system
don't need to know Chinese
in order to build the
Chinese-to-English system,
or they don't need to know Arabic.
But the expertise that's needed is
basically knowledge of statistics,
knowledge of computer science,
knowledge of infrastructure
to build those very large
computational systems
that we are building for doing that.
I hooked up with Google
from my office in Stockholm
to try the translator for myself.
'I will type...
some Swedish sentences.'
OK.
Sveriges...
..guldring i orat.
OK. So it says, "Sweden's finance
minister has a ponytail
and a gold ring in your ear."
I guess it probably means
in his ear. 'That's exactly
correct, it's amazing!
'He comes from the Conservative
party, that's the kind
of Sweden we have today.
'I will type one more sentence.'
'I sitt samkonade...'
partnerskap...
nya biskop.
"In his same-sex partnership
has Stockholm's new bishop
and his partners a three-year son."
It's almost perfect,
there's one important thing -
it's HER,
it's a lesbian partnership.
OK, so those kinds of words his
and her are one of the challenges
in translation
to get really those right.
Especially when it comes
to bishops one can excuse it!
'Right, right.'
I guess more often than not
it would probably be a "his".
'I will write one more sentence.'
Nar Sverige deltar
I olympiader ar malet
'inte att vinna
utan att sla Norge.'
OK. "When Sweden is taking part
in Olympic goal is not
to win but to beat Norway."
'Yes! This is what it is!
'But they are very good
in Winter Olympics, so we
can't make it, but we are trying.'
Ah, very good, very good.
'This is absolutely amazing, you
know, and I was especially impressed
'that it picks up words like
"same-sex partnership"
which are very new to the language."
'The translator is good, but
if they succeed with what's next,
that'll be remarkable.'
One of the exciting possibilities
is combining the machine
translation technology with
the speech recognition technology.
Now, both of these
are statistical in nature.
The machine translation relies
on the statistics of mapping
from one language to another,
and similarly speech recognition
relies on the statistics of mapping
from a sound form to the words.
When we put them together,
now we have the capability
of having instant conversation
between two people
that don't speak a common language.
I can talk to you in my language,
you hear me in your language
and you can answer back.
And in real time we can
make that translation,
we can bring two people together
and allow them to speak.
The internet is just one
of many technologies created
to gather massive amounts of data.
Scientists studying
our earth and our environment
now use an incredible range
of instruments
to measure the processes
of our planet.
All around us are sensors
continuously measuring temperature,
water flow, and ocean currents.
And high in orbit are satellites
busy imaging cloud formations,
forest growth and snow cover.
Scientists speak
of "instrumenting the earth".
And pointing up to the skies
above are powerful new telescopes
mapping the universe.
What's happening in astronomy
is typical of how profoundly
this new torrent of data
is transforming science.
Astronomers are now addressing many
enduring mysteries of the cosmos
by applying statistical methods
to all this new data.
The galaxy is a very big place and
it's got billions of stars in it,
and so to put together a coherent
picture of the whole galaxy requires
having an enormous amount of data.
And before you could do
a large sky survey with
sensitive, digital detectors
that meant that you could map many,
many stars all at once,
it was very difficult to build up
enough data on enough of the galaxy.
In the past, large surveys
of the night sky had to be done
by exposing thousands
of large photographic plates.
But these surveys could take
25 years or more to complete.
Then, in the 1990s, came digital
astronomy and a huge increase
in both the amount
and the accessibility of data.
The Sloan Sky Survey
is the world's biggest yet,
using a massive digital sensor
mounted on the back
of a custom-built telescope
in New Mexico.
It's scanned the sky night
after night for eight years,
building up a composite picture
in unprecedented resolution.
The Sloan is some of the best,
deepest survey data
that we have in astronomy.
Both on our own galaxy and
on galaxies further away from ours.
All the Sloan data
is on the internet,
and with it astronomers
have identified millions of hitherto
unknown stars and galaxies.
They also comb the database
for statistical patterns
which will prove, disprove,
or even suggest new theories.
So we have this idea that galaxies
grow, they become large galaxies like
the one we live in, the milky way,
not all at once, or not smoothly,
but by continuously incorporating,
basically cannibalising,
smaller galaxies.
They dissolve them
and they become part
of the bigger galaxy as it grows.
It's a startling idea,
and, in the Sloan data,
is the evidence to support it.
Groups of stars that came
from cannibalised galaxies
stand out in the Sloan data
as statistically different
from other stars
because they move
at a different velocity.
Each big spike
on one of these distribution graphs
means Professor Rockosi has found
a group of stars all travelling
in a different way to the rest.
They are the telltale
patterns she's looking for.
The evidence is accumulating
that, in fact, this really is
how galaxies grow,
or an important way
in which how galaxies grow.
And so this is an important part
of understanding how galaxies form,
not only ours but every galaxy.
The more data there is,
the more discoveries can be made.
And the technology
is getting better all the time.
The next big survey telescope
starts its work in 2015.
It will leave Sloan in the dust!
Sloan has taken eight years to cover
one quarter of the night sky.
The new telescope will scan
the entire sky, in even greater
resolution, every three days!
The vast amounts of data
we have today allows researchers
in all sorts of fields
to test their theories
on a previously unimaginable scale.
But more than this,
it may even change
the fundamental way science is done.
With the power of today's computers
applied to all this data,
the machines might even be able
to guide the researchers.
We're at a potentially
profoundly important
and potentially one of the most
significant points in science,
and certainly one of
the most exciting,
where the potential to transform
not just how scientists do science
but even what science is possible.
And what will power
that transformation
of both how science is done
and even what science is possible
is going to be computation.
Many of the dynamics of the natural
world, like the interplay between
the rainforests and the atmosphere,
are so complex that we don't
as yet really understand them.
But now computers are generating
literally tens of thousands
of different simulations
of how these
biological systems might work.
It's like creating thousands
of hypothetical parallel worlds.
Each and every one
of these simulations
is analysed with statistics
to see if any are a good match
for what is observed in nature.
The computers can now
automatically generate,
test and discard hypotheses
with scarcely a human in sight.
This new application of statistics
will become absolutely vital
for the future of science.
It's creating a new paradigm,
if you like,
in science, in the way
in which we can do science,
which is increasingly...
Which one might characterise as...
data-centric or data driven
rather than being hypothesis-driven
or experimentally-driven.
So, it's exciting times
in terms of the science,
in terms of the computation
and in terms of the statistics.
Now, if all that sounds a bit
abstract and theoretical to you,
how about one final frontier?
Could statistics even make
sense of your feelings?
In California - where else? -
one computer scientist
is harvesting the internet to try
to divine the patterns of our
innermost thoughts and emotions.
This is the madness movement.
The madness movement represents
a skyscraper view of the world.
Each of these brightly coloured dots
is an individual feeling
expressed by someone out there
in a blog or a tweet.
And when you click on the dot
it explodes to reveal the
underlying feeling of that person.
This is what people say
they're feeling today.
Better...safe...
crappy...
well...
pretty...special...
sorry...alone...
So, every minute, We Feel Fine
crawls the world's blogs,
takes all the sentences
that start with the words
"I feel" or "I am feeling",
and puts them in a database.
We collect all the feelings
and we count the most common.
They are better...bad...
good...right...
guilty...sick...
the same...like shit...
sorry...well...
and so on.
And we can take a look at any
one feeling and analyse it.
Right now a lot of people
are feeling happy.
We can take a look at all the
people who are happy and break it
down by age, gender or location.
Since bloggers have public profiles
we have that information and
so we can ask questions like,
"Are women happier than men?"
or, "Is England happier
than the United States?"
We find that, as people get older,
they get happier.
And, moreover, we find that
for younger people they associate
happiness more with excitement,
and, as people get older,
they associate happiness
more with peacefulness.
And we also find that women feel
loved more often than men,
but also more guilty.
While men feel good more often
than women, but also more alone.
As people lead more and
more of their lives online,
they leave behind digital traces,
and with these digital traces
we can begin to statistically analyse
what it means to be human.
So where does all of this leave us?
We generate unimaginable
quantities of data
about everything you can think of.
We analyse it to reveal
the patterns.
And now not only experts
but all of us can understand
the stories in the numbers.
Instead of being
led astray by prejudice,
with statistics at our fingertips,
our eyes can be open
for a fact-based view of the world.
So, more than ever before, we can
become authors of our own destiny.
And that's pretty
exciting isn't it?!
# 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
12, 13, 14, 15, 16, 17, 18, 19, 20
# 1, 22, 3, 24, 25, 26, 27, 28, 9,
30, 31, 32, 3, 34, 35, 36, 7
# 38, 39, 40, 41, 42, 3,
44, 45, 46, 47
LYRICS DEGENERATE INTO GIBBERISH
GIBBERISH DEGENERATES INTO NOISE
# 100. #