The world we live in is awashed with data
that comes pouring in from everywhere around us.
On it own, this data is just noise and confusion.
To make sense of data, to find the meaning in it,
we need a powerful branch of science: statistics.
Believe me, there's nothing boring about statistics
especially not today, when we can make the data sing.
With statistics we can really make sense of the world.
Are statistics, the data diluge as it's been called,
leading us to a greater understanding
of the life on Earth and the world beyond?
Thanks to incredible power of today's computers
it may fundamentally transform the process of scientific discovery.
I kid you not, statistics is now the sexiest subject around.
Did you know that there's is one million boats in Sweden?
That's one boat per nine people.
It's the highest number of boats per person in Europe.
Being statistician, you don't like telling your profession at dinner parties,
but really, statisticians shouldn't be shy
because they always want to understand what's going on.
Stastistics gives us a persperctive of the world we live in
that we can't get in any other way.
Statistics tells us whether the things we think and believe are actually true.
Statistics are far more useful than we usually like to admit.
In the last recession, there was this famous call into Talk Radio Station.
The man complained: "in times like this, when unemployment rates are up to 13%,
and income has fallen by 5%, and suicide rates are climbing,
I get so angry that the government is wasting money on things like correctional statistics."
I'm not oficially a statistician, strictly speaking my field is global health.
But I got really obsessed with stats, when I realised how many people in Sweden
don't know anything about the rest of the world.
I started in our Medical University in Karolinksa Institute,
an ungraduate course called Global Health.
These students coming to us have actually the highest grades you can get in theSwedish college system.
So I thought maybe they know everything I'm going to teach them.
So I did a pre-test when they came.
One of the questions, from which I learnt a lot, was:
Which country has the highest child mortality of these five pairs?
I won't put you at test here, but it's Turkey which is higher there,
Poland, Russia, Pakistan and South Africa.
And these were the results of the Swedish students.
1.8 answers right out of 5 possible,
that means that there was a place for a professor in International Health
and for my course.
But one late night when I was compiling my report
I really realise my discovery.
I have shown that Swedish top students know
statistically significantly less about the world than the chimpanzees.
Beacuse the chimpanzee would score half right.
If I gave them two bananas with Sri Lanka and Turkey
they would be right half of the cases.
But the students are not there.
I did also an unethical study of the professors of the Karolinska Institute
that hands out the Nobel Prize in Medicine, and they aren't on par with the chimpanzee.
Today, there's more information accesible than ever before,
and I work with my team at the Gapminder Foundation
using new tools that help everyone make sense of the changing world.
We draw on the masses of data that are now free available
from international institutions like the UN and the World Bank.
It's become my mission to share my insights from this data
with anyone who listen, and to reveal how statistics
is nothing to be frightened of.
I'm going to provide you a view
of the global health situation across mankind,
and I'm going to do that in a hopefully enjoyable way.
So relax.
We did this software which displays it like this,
every bubble here is a country, this is China, this is India.
The size of the bubble is the population.
And I'm going to stage a race here
between this sort of yellow Ford here, and the red Toyota down there,
and the brownish Volvo.
The Toyota has a very bad start down here,
and United States' Ford is going off road there,
and the Volvo is doing quite fine, this is the war,
they Toyota got off crack, now Toyota is coming on the healthier side of Sweden.
That's the point when I sold the Volvo and bought the Toyota.
This is the Great Leap Forward when China fell down,
it was central planning by Mao Tse Tung,
China recovered and said "never more stupid central planning", but they went up here.
No, there was one more inequity, look there! United States!
Oh, they broke my frame!
Washington D.C. is so rich over there, but it's not as healthy as Kerala, India.
It's quite interesting, isn't it?
Welcome to the USA, world leaders in big cars
and free data.
There are many here who share my vision
of making public data accesible and useful for everyone.
The city of San Francisco is in the lead, opening up it's data on everything.
Even the Police Dept. is releasing all it's crime reports.
This official crime data has been turned into a wonderful inteactive map
by two of the cities computer whizzes.
It's community statistics in action.
Crimespotting is a map of crime reports
from the San Francisco Police Dept.
showing dots on maps for citizens to be able to see patterns of crime
in their neighbourhoods in San Francisco.
The map is not just about individual crimes
but about broader patterns that show
you where crime is clustered around the city,
which have high crime,
which areas have relatively low crime.
We're here at top of Jones Street, on uphill,
quite a nice neighbourhood
what the crime maps show us is the relationship between typography and crime.
The higher up the hill, the less crime there is.
We crossed over the border into the flats.
Essentially, as soon as you get into the kind of lower line areas of Jones street,
the crime just skyrockets.
So we're in the uptown Tenderloin District,
it's one of the oldest and most dangerous
neighbourhoods in San Francisco.
This is where you go to buy drugs,
right around here.
You see lots of aggreviated assault,
lots of thefts.
Basically, the huge part of the crime of the city
happens right in these four or six block areas.
If you've been hearing police sirens
in your neighbourhood,
you can use the map to find out why.
If you are out at night in an unfamiliar part of town
you can check the map for streets to avoid.
If a neighbour gets burglared, you can see,
is it the one off or has there been a spike in local crime?
If you commute through a neighbourhood and you're worried about its safety
the fact that we have the ability to turn off all the night time and middle-of-the-day crimes
and show you just the things that are happening during your commute,
is a statistical operation but I think to the people
that are interacting with the thing
it feels very much more like
they just are sort of browsing a website
or shopping on Amazon.
They're looking at data,
and they don't realise that they're doing statistics.
What's most exciting for me is that public statistics
is making citizens more powerful
and the authorities more accountable.
We have community meetings that the police attend
and what citizens are now doing,
they're bringing printouts of the maps
to show where crimes are taking place,
and they're demanding services
from the police department,
which is now having to change how they please,
how they provide policing services,
because the data is showing
what is working and what is not.
People in San Francisco are also using public data
to map social inequalities,
and see how to improve society
and the possibilities are endless.
Our dream would be that the government announced that
this data project would really focus on live information
on stuff that was being reported
and pushed out into the world as it was happening.
Trash pickup, traffic accidents, buses,
and through the kind of the stats gathering power on the internet
it's posible to really see the workings of the city
displayed as a unified interface.
That's where we are heading,
towards a world of free data
with all the statistical insights that come from it
accesible to everyone, empowering us as citizens
and letting hold our rulers to account.
It's a long way from where statistics began.
Statistics are essential to monitor our government in our societies.
But, it was our rulers out there who started the collection of statistics
in first place in order to monitor us.
In fact the word statistics comes from state.
Modern statistics began two centuries ago.
Once it got going it spread and never stopped.
And guess who was first.
The Chinese have Confucious,
the Italians have Da Vinci,
and the British have Shakespeare, and we have the Tabellverket
the first ever systematic collection of statistics.
Since the year 1749 we have collected data on every birth, marriage and death
and we are proud of it.
The Tabellverket recorded information from every parish in Sweden.
It was a huge quantity of data
and it was the first time any goverment
could get any accurate picture of its people.
Sweden had been the greatest military power in Northern Europe
but by 1749 our star was really fading and other countries were growing stronger.
At least though, we were a large power, thought to have 20 million people
enough to rival Britain and France.
But we were in for a nasty surprise.
The first analysis of Tabellverket revealed that Sweden only had 2 million inhabitants.
Sweden was not only a power in decline, it also had a very small popoulation.
The government was horrified by this finding.
What if the enemy found out?
But the Tabellverket also showed that many women die in childbirth.
And many children died young, and government took action to improve the health of the people.
That was the beginning of modern Sweden.
It took more than 50 years before the Austrians, Belgiums, Danes, Dutch,
Germans, Italians and finally the British caught up with Sweden
in collecting and using statistics.
It was called political arithmethic,
and it was a lovely phrase as use for statistics.
Governments could have much more control
and understanding of the society
how it's working, how it's developing,
and essentially, so they could control it better.
It wasn't just governments
who woke up to the power of statistics.
Right across Europe, 19th century society went mad for facts.
And despite its late start, Britain with its Royal Statistical Society in London
was soon a statisticians' nirvana.
I love looking at old copies of the Royal Statistical Society,
because is full of this stuff.
There's a wonderful paper from the 1840s
which shows a map of England
and the rates of bastardy of each county
so you can identify very quickly the areas
with high areas of bastardy.
Being in East Anglia makes me slightly laugh
that Norfolk was on top of the bastardy league in the 1840s.
One of the founders of the Royal Statistical Society
was the great victorian mathematician and inventor Charles Babbage.
In 1842 he read the latest poem
by a equally great victorian
Alfred Tennyson.
"Vision of Sin" contained the lines:
"Fill the cup and fill the can,
Have a rouse before the morn.
Every moment dies a man,
Every moment one is born."
So keen statistician was Babbage
that he could not contain himself.
He dashed a letter to Tennyson
explaining that because of population growth
the line should read:
"Every moment dies a man,
And 11/16 is born."
"I may add that the exact figure is 1.167
but something must be conceded
to the laws of metre."
In the 19th century scholars all over Europe
did an amazing work in measuring the societies.
They hovered up data in almost everything
but numbers alone don't tell you anything
you have to analyse them, and that's what makes statistics.
When the first statisticians began
to get to grips with analysing their data
they seized upon the average,
and they took the average of everything.
What's so great about an average
is that you can take a whole mass of data and reduce it to a single number.
Though each of us is unique,
our collective lives produce averages
that characterise whole populations.
I look to my local newspaper one week
and saw that a pensioner had accidently
put a foot on the accelerator
and crashed her friend against the wall.
Devastating, hideous, horrible thing to happen.
And there was a second one about a young man
who didn't have a driving licence
who was driving a car under the influence
of drugs and alcohol
and crashed into a pedestrian and killed him.
What is remarkable,
absolutely remarkable,
if you look at the number of people
who die each year
in traffic accidents,
it's nearly a constant.
What?
All these individual events,
somehow when you sum them all up
it's the same number every year,
and every year two and a half times
as many men die
in traffic accidents as women,
and it's a constant.
An every year the rate in Belgium is double
the rate in England,
there are these remarkable regularities
so that these individual particular events
sum up into a social phenomenon.
(Lecture) Let's see what Sweden has done
we used to boast of fast social progress.
(Narration) In my lectures,
to tell stories about the changing world
I use averages for entire countries,
whether the average for income,
child mortality, family size or carbon output.
(Lecture) OK, I give you Singapore,
the year I was born.
Singapore had twice the child mortality of Sweden.
The most tropical country in the world.
A marshland on the Equator.
And here we go. It took a little time for them to get independence
but they started to grow their economy,
and they made the social investments,
they got away malaria,
they got a magnificient health system
that beats both UkKs and Sweden's.
We thought it would never happened
but they would win over Sweden!
But useful as averages are
they don't tell you the whole story.
On average, Swedish people have slightly
less than two legs.
That is because a few people have one leg
or no legs, and no one has three legs
so almost everybody in Sweden
has more than the average number of legs.
The variation in data is just
as important as the average.
But how do you get the handle on variation?
For this you transform numbers into shapes.
Let's llok again at the number of adult women
in Sweden for different heights.
Plotting the data as a shape shows us
how much their heights vary from the average
and how wide that variation is.
The shape a set of data makes
is called its distribution.
(Lecture) This is the income distribution
of China 1970
This is the income distribution
of the United States 1970.
Almost no overlap. And what has happened?
China is growing. It's not so equal any longer.
And it's appearing here,
overlooking the United States
almost like a ghost, isn't it? It's scary!
That statistician who first explored distribution
discovered one shape that turned up
again and again
the victorian scholar Francis Goldtone
was so fascinated
he built a machine that could reproduce it
and he found it fitted so many different
sets of measurements
that he named it the Normal Distribution.
Whether it was people's arm spans, land capacity or even their exam results
the Normal Distribution shape recurred
time and time again.
And the statisticians soon found
many other regular shapes
each produced by a certain kind of natural or social processes.
And every statistician has their favourite.
The Poisson distribution, I think it's my favourite,
it's absolute crack.
The Poisson shape, describes how likely it is
that out-of-the-ordinary things will happen.
Imagine a London bus stop that we know
that on average will get three buses an hour.
We won't always get three buses of course.
Amazingly the Poisson shape will show us
the probability that in any given hour
will get 4, 5 or 6 buses or no buses at all.
The exact shape changes with the average
but whether it is how many people will
win the lottery jackpot each week
or how many people will phone
a call centre each minute
the Poisson shape will give the probabilities.
The wonderful example where this does apply
is in the late 19th century
was to count each year the number
of Prussian officers
cavalry officers that had be kicked
to death by their horses
Some year there were none, some years one,
some years two,... up to seven.
One particularly bad year.
But with this distribution, how many years they go, one, two three, four,
Prussian cavalry officers kicked to death
by their horses
beautifully obbey the Poisson distribution.
So statisticians use shapes so we wield the patterns in the data
but we also use images of all kinds to communicate statistics to a wider public
because if the story in the numbers is told by a beautiful and clever image
then everyone understands.
Of the pioneers of statiscal graphics, my favourite is Florence Nightingale.
There are not many people who realise that actually she was known as a passionate statistician
and not just the Lady of the Lamp.
She said that to understand God's thoughts we must study statistics
for these are the measure of His purpose.
Statistics must reserve a religious studio moral imperative.
When Florence was nine years old,
she started collecting data.
Her data was different fruits and vegetables she found.
Put them into different tables,
trying to organise them in some standard form,
so we have one of the Nightgale's first
statistical tables at the age of nine.
In the mid-1850s, Florence Nightingale went to Crimea
to care for British casualties at war.
She was horrified by what she discovered.
For all the soldiers being blown to bits on the battlefield
there were many many more soldiers
dying from diseases
caught in the army's filthy hospitals.
So Florence Nightingale bagan counting the dead.
For two years she recorded mortality data
in meticulous detail.
When the war was over,
she persuaded the government
to set up a Royal Comission of Enquiry.
And gathered her data in a devastating report.
What has amended her place in the statistically
history books is the graphics she used.
And one in particular, the Polar Area Graph.
For each month of the war,
a huge blue wedge represented the soldiers
who had died of preventable diseases.
The much smaller red wedges
were deaths from wounds,
and the black wedges deaths
from accidents and other causes.
Nightingale graphics were so clear,
they were impossible to ignore.
The usual thing around Florence Nightingale's time
was just to produce tables and tables of figures.
Absolutely tedious stuff.
Unless you are a dedicated statistician,
it's quite difficult to spot the patterns naturally.
But visualisations tell a story.
They tell a story immediately.
The use of colour, the use of shape,
can really tell a powerful story.
And these days, we can make things move as well.
Florence Nightingale would've loved to play with it,
she would've produced wonderful animations,
I'm absolutely certain about it.
Today, a hundred and fifty years on,
Nightingale's graphics are rightly
regarded as a classic.
They led to a revolution in nursing and health care,
in hygiene in hospitals worldwide.
We've saved innumerable lives.
Statistical graphics has become
an art of its very own.
Led by designers who are passionate
about visualising data.
This is the Billion Pound O Gram.
This image arouse out of the frustration
with the reporting
of billion-pounds amounts in the media.
500 trillion pounds for this war,
50 million pounds for this hospital,
this does not make sense,
these figures are too enormous to get your mind around.
So I squailed to this data from various news sources
and created this diagram
so the squares here are scaled
according the the billion-pound amounts.
When you see numbers visualised like this,
you start to have a different
kind of relationship with them.
You can see patterns, see the scale of them.
Here, this little square, 37 billion,
this was the predicted cost of the Iraq war in 2003.
As you can see it has grown exponentially
over the last few years
to the total cost of about 2,500 billion.
It's funny because when you visualise statistics
like this, you undestand them.
And when you understand them,
you can put things into perspective.
Visualisation is right at the heart of my own work too.
I teach Global Health.
I know that having the data is not enough,
I have to show it in ways people
both enjoy and undestand.
Now I'm going to try something
I've never done before.
Animating the data in real space.
With a bit of technical assistance
from the crew.
So here we go!
First an axis for health,
life expectancy from 25 years to 75 years.
Down here an axis for wealth,
income per person, $400, $4,000 and $40,000.
So down here is poor and sick.
And up here is rich and healthy.
Now I'm going to show you the world
200 years ago, in 1810.
Here come all the countries:
Europe brown, Asia red,
Middle East green, Africa South-of-Sahara blue,
and America is yellow.
And the size of the country bubble
shows the size of the population.
And in 1810 it was pretty crowded down there, isn't it?
All countries were sick and poor,
life expectancy would be below 40 in all countries.
Only the UK and the Netherlands
were slightly better off, but not much.
And now, I'll start the world!
The Industrial Revolution makes countries in Europe and elsewhere move away from the rest.
But the colonised countries in Asia and Africa
are stuck down there.
Eventually the Western countries
get healthier and healthier.
Now we slow down to see the impact
of the First World War and the Spanish Flu Epidemy.
What a catastrophe!
Now I'll speed up through the 1920s and 1930s
and spite of the Great Depression, Western countries fueled on towards greater wealth and health.
Japan and some others try to follow
but most countries stay down here.
After the tragedies of the Second World War
we stop a bit to look at the world in 1948.
1948 was a great year, the war was over,
Sweden topped the medal table at the Winter Olympics,
and I was born, but the differences between
the countries of the world was wider than ever.
United States was in the front,
Japan was catching up, Brasil was way behind,
Iran was getting a little richer from oil,
but still had short lives.
The Asian giants, China, India,
Pakistan, Bangladesh and Indonesia,
they were still poor and sit down here.
But look what is about to happen. In my lifetime,
former colonies gained independence
and finally they started to get healthier,
and healthier, and healthier.
And in the 1970s,
countries in Asia and Latin America
started to catch up with the Western countries.
They became the emerging economies.
Some in Africa follow, some in Africa
are stuck in civil wars, and others are hit by HIV.
And now we can see the world today,
in the most up-to-date statistics.
Most people today live in the middle,
but here are huge differences at the same time
between the best of countries
and the worst of countries
and there are also huge inequalities within countries.
These bubbles show country averages,
but I can split them.
Take China, I can split it into provinces.
There goes Shanghai,
it has the same health and wealth as Italy today.
And then there's the poor inland province of Guizhou. It's like Pakistan.
And if I split it further,
the rural parts are like Ghana in Africa.
And yet, despite the enormous disparities today,
we have seen 200 years of remarkable progress.
That huge historical gap between
the West and the rest is now closing.
We have become an entirely new converging world.
And I see a clear trend into the future,
with aid, trade, green technology and peace.
It's fully possible that everyone
can make it to the healthy-wealthy corner.
What you've just seen in the last few minutes
is a story of 200 countries
shown over 200 years and beyond.
It involved plotting 120,000 numbers.
Pretty neat, eh?
With statistics we can start to see things
as they really are.
From tables of data, to averages,
distributions and visualisations,
statistics gives us a clear description of the world.
But with statistics we can not only
discover what is happening
but also explore why, by using
the powerful analytical method of correlation.
Just looking at one thing at a time
doesn't tell you very much.
You have to look at the relationships between things.
How they change. How they vary together.
That's what correlation is about.
That's how we start to understand
the processes that are really going on
in the world and in socierty.
Most of us would recognise today that crime
correlates to poverty,
that infection correlates to poor sanitasion,
and that knowledge of statistics correlates
to being great at dancing.
Correlations can be very tricky.
I've got a joke about silly correlations.
This was this American
who was afraid of heart attack.
He found out that the Japanese ate very little fat,
and almost didn't drink wine,
and have much less heart attacks than the American.
But on the other hand, he found out that the French
eat as much fat as the Americans
and they drink much more wine, but they also have less heart attacks.
so he concluded that what kills you
is speaking English.
The best example of
a really ground-breaking correlation
was the link that was established in the 1950s
between smoking and lung cancer.
Not long after the Second World War,
a British doctor, Richard Doll,
investigated lung cancer patients
in twenty London hospitals,
and he became certain that
the only thing they had in common was smoking
so certain that he stopped smoking himself.
But other people weren't so sure.
Lots of the discussion of early data
linking smoking and lung cancer
it can't be smoking, surely, that thing
we've done all our lives, that can't be bad for you.
Maybe it's genes, maybe people
who are genetically predisposed to get lung cancer
are also genetically predisposed to smoke.
Maybe it's not the smoking,
maybe it's air pollution,
that smokers and somehow more exposed to air pollution than non-smokers.
Maybe it's not smoking, maybe it's poverty.
So now we have three possible explanations
apart from chance.
To verify his correlation did imply cause and effect
Richard Doll created
the biggest statistical study of smoking yet
He began tracking the lives of 40,000 British doctors
some of whom smoked, some of whom didn't.
And gathered enough data to correlate
the amount of doctors who smoked
with their likelihood of getting cancer.
Eventually, he did not only show a correlation
between smoking and lung cancer
but also a correlation between stopping smoking
and reducing the risk.
This was science at its best.
What correlations do not replace
is human thought.
We could think about what it means.
What a good scientist does
if he comes up with a correlation
is try as hard as he or she possibly can
to disprove it
to break it down, to get rid of it,
to try to refute it,
and if it withstands all those efforts
at demolishing it, and it still standing out,
then we might really have something here.
However brilliants the scientists,
data is still the oxygen of science.
The good news is that the more we have,
the more correlations we'll find,
the more theories we'll test,
and the more discoveries we are likely to make.
And history shows how our total sum of information
grows in huge leaps
as we develop new technologies.
The invention of the printing press kicked off
the first data and information explosion
If you piled up all the books that have been printed
by the year 1700
they would make sixty stacks,
each as high as Mount Everest.
Then, starting in the 19th century,
there came a second information revolution.
With the telegraph, gramophone, camera,
and later radio and TV.
The total amount of information exploded.
And by the 1950s the information available to us all
had multiplied six thousend times.
Then, thanks to the computer,
and later the Internet, we went digital,
and the amount of data we have now,
is unimaginably vast.
A single letter printed in a book
is the equivalent to a byte of data.
A single page
equals a kilobyte or two.
Five megabytes is enough
for the complete works of Shakespeare.
10 gigabytes, that's a DVD movie.
2 terabytes is the tens of millions of photos
added to Facebook everyday.
10 petabytes is the data recorded every second
by the world's largest particle accelerator,
so much only a tiny fraction is kept.
6 exabytes is what you'd have if you sequenced
the genomes of every single person on Earth.
But really, that's nothing.
In 2009, the Internet added up to 600 exabytes,
and in 2010, in just one year, that will double to more than one zettabyte.
But in the real world,
if we turned all this data into print
it would make ninety stacks of books,
each reaching from here all the way to the Sun.
The data deluge is staggering.
But with today's computers and statistics,
I'm confident we can handle it.
When it comes to all the data on the Internet,
the powerhouse of statistical analysis
is the Sillicon Valley giant Google.
The average person over their lifetime
is exposed to about a hundred million words
of conversation.
So if you multiply that
by the six billion people on the planet
that amount of words is equal to the amount of words
that Google has available at any one instant of time.
Google's computers hoover up and file away
every document, web page
and image they can find.
Then they hunt for patterns and correlations
in all this data
doing statistics on a massive scale.
And for me, Google has one project
that is particularly exciting:
statistical language translation.
If you do want to provide access
to all the web's information
no matter what language is spoken.
There's so much information on the Internet,
you can not hope to tranlate it all by hand
into every possible language, we figured
we have to be able to do machine translation.
In the past, programmers tried to teach their computers to see each language as a set of grammatical rules.
Much like languages are taught at school.
But this didn't work, because no set of rules
could capture language in all its subtlety and ambiguity,
Having eaten out lunch,
the coach departed.
That's obviously incorrect. Written like that,
it would imply that the coach has eaten the lunch.
It would be far better to say: Having eaten our lunch,
we departed in the coach.
Those rules are helpful,
they are useful most of the time,
but they don't turn out to be true
all the time.
And the insight of using
statistical machine translation
is saying: if we have all these exceptions anyways, maybe you can get by without having any rules,
maybe we can treat everything as an exception,
and that's essentially what we've done.
What the computer is doing
when it's learning how to translate
is to learn correlations between words
and between phrases
so we feed the system
very large amounts of data
and the the system sees if a certain word or phrase
correlates very often to the other language.
Google's website currently offers translation between any of 57 different languages.
It does this purely statistically,having correlated
the huge collection of multilingual texts.
The people who built he system
don't need to know Chinese
in order to build the Chinese system.
They dont need to know Arabic.
The expertise that is needed is basically knowledge of statistics, of computer science,
of infrastructure,
to build these very large computer systems we are building for doing that.
I hooked up with Google from my office in Stockholm,
to try the translator by myself.
I will type some Swedish sentences.
(Types in Swedish)
(Reads on the screen) Sweden's finance minister
has a ponytail and a gold ring in your ear.
It's almost exactly correct, it's amazing.
He comes from the conservative party,
that's the kind of Sweden we have today.
I will type one more sentence.
In his same-sex parnertships has Stockholm's
new bishop and his partners a three-year son.
It's almost perfect,
there's one important thing, it's "her".
It's a lesbian partnership.
OK, those kinds of words like "her"
are one of the challenges in translation,
to get those right.
When it comes to bishops,
one can excuse it.
Right, I think that more often than not
it would be probably a "his".
I will write one more sentence.
(Reads aloud in Swedish)
When Sweden is taking part in Olympic gold,
is not to win but to beat Norway.
But they are very good in Winter Olympics,
so we can't make it, but we are trying.
Very good, very good.
This is absolutely amazing,
and I'm impressed that it picked up
words like "same-sex partnerships"
which are very due to the language.
The translator is good, but if it succeeds,
what will be next, that'll be remarkable.
One of the exciting possibilities is combining
the machine translation technology
with the speech recognition technology.
Both of these are statistically neutre.
The machine translation relies on the statistics
of mapping from one language to another,
and similarly speech recognition relies on the statistics
of mapping from a sound form to the words.
When we put them together,
now we have the capability
of having instant conversations between two people who don't speak a common language.
I can talk to you in my language,
you hear me in your language,
and you can answer back in real time,
we can make that translation,
we can bring people together
and allow them to speak.
The Internet is just one of many technologies
created to gather massives amount of data.
Scientists studying our Earth
and our environment
now use an incredible range of instruments
to measure the processes of our planet.
All around us our sensors are continously measuring
temperature, water flow and ocean currents.
High in orbit our satellite is busy imaging cloud formations, forest growth and snow cover.
Scientists speak of instrumenting the Earth.
And pointing up to the skies above,
our powerful new telescopes are mapping the Universe.
What's happening in astronomy,
is tipically how profoundly this torrent of data
is transforming science.
Astronomers are now addressing
many enduring misteries of the cosmos
by applying statistical methods
to all this new data.
The galaxy is a very big place
and it has billions of starts in it
so to put toghether a coherent picture
of the whole galaxy requires
having enourmous amounts of data,
and before you can do a large sky survey
with sensitive digital detectors,
that you can map many stars at once,
it's very difficult to gather enough data
of enough of the galaxy.
In the past, large surveys of the night sky
had to be done
by exposing thousands
of large photographic plates,
but these surveys could take 25 years
or more to complete.
Then, in the 1990s, came digital astronomy,
and a huge increase in both the amount
and the accesibility of data.
The Sloan Sky Survey is the world's biggest yet
using a massive digital sensor mounted
on the back of a custom built telescope in New Mexico.
It's scanned the sky night after night for eight years
building up a composite picture
in unprecedented resolution.
The Sloan's is some of the best deepest survey data
we have in astronomy,
both in our galaxy and galaxies away from ours.
All the Sloan data is on the Internet
and with it astronomers have identified
millions of hidden unknown stars and galaxies.
They also comb the database for statistical patterns
which will prove, disprove or suggest new theories.
So we have this idea that galaxies grow
they become large galaxies
like the one we live in, the Milky Way.
Not all at once, not smoothly
but by continously incorporating
cannibalising smaller galaxies
they dissolve them and become
part of the bigger galaxy
It's a startling idea
and in the Sloan data there's the evidence to support it.
Groups of starts that came from cannibalised galaxies
stand out in the Sloan data statistically
different from other stars.
because they move at a different velocity.
Each big spike of one of these distribution graphs
means professor Rockossi has found a group of stars
all travelling in a different way to the rest.
They are the telltale patterns she's looking for.
The evidence is accumulating that in fact
this really is how galaxies grow
or an important way of how galaxies grow
this is important to understand how galaxies form
not only ours but every galaxy.
The more data there is
the more discoveries can be made
and the technology is getting better all the time.
The next big survey telescope starts its work in 2015.
It will leave Sloan in the dust.
Sloan has taken 8 eight years
to cover one quarter of the nightsky.
The new telescope will scan the entire sky
in even greater resolution
every three days.
The vast amounts of data we have today
allows researchers in all sorts of fields
to test their theories in a previously unimaginable scale
but it may even change the fundamental way
science is done.
With the power of todays' computers
applied to all this data
the machines might be able to guide the researchers.
There is a profoundly important,
one of the most significant points in science
certainly one of the most exciting
the potential to transform not only
how scientists do science
but what science is possibly.
What will power that transformation
of how science is done
is going to be computation.
Many of the dynamics of the natual world
like the interplay
between the rainforest and the atmosphere
are so complex, that we don't yet
really understand.
But now computers are generating
tens of thousands of simulations
of how these biological systems might work.
Is like creating thousands of
hypothetical parellel worlds.
Each of these simulations is analysed with statistics
to see if any are a good match
of what is observed in each.
The computers can now automatically generate,
test and discard hypothesis
with scarcely human insight.
This new application statistics will become
absolutely vital for the future of science.
It's creating a new paradigm in the way we do science
which is characterised as data-centric or data-driven
rather than hypothesis- or experiment-driven.
It's an exciting time in terms of science,
computation and statistics.
If all this sounds a bit abstract to you
how about one final frontier?
Could statistics make sense of your feelings?
In California, (where else!), one computer scientist
is harvesting the Internet to try to define the patterns
of our innermost thoughts and emotions.
This is the Madness Movement
it represents a skyscrapper's view of the world.
Each brightly coloured dot is an individual feeling
expressed by someone out there in a blog or a tweet
and when you click on the dot
it explodes to reveal
the underlying feeling of that person.
This is what people say they're feeling today:
better
safe
crappy
well
pretty
special
sorry
alone
Every minute WeFeelFine crosses the world's blogs
takes all the sentences that start
with the words "I feel" or "I'm feeling"
and push them into a database.
We collect all the feelings
and we count the most common
better
bad
good
right
guilty
sick
the same
like shit
sorry
well
We can take a look at any one feeling and analyse it.
Right now a lot of people are feeling happy.
We can take a look at these people,
and break them down by age, gender or location.
Since bloggers have public profiles,
we have that information
and we can ask questions like,
"Are women happier than men?"
or "Is England happier
than the United States?"
We find that as people get older, they get happier.
For younger people,
happiness associates with excitement
whereas older people associate happiness
more with peacefulness.
We also find than women feel loved
more often than men,
but also more guilty.
While men feel good more often than women,
but also more alone.
As people live more and more of their lives online
they leave behind digital traces
with which we can statistically analyse
what it means to be human.
Where does all this leave us?
We generate unimaginable quantities of data
About everything you can think of
and we analyse it to reveal the patterns.
Now not only experts but all of us can understand
the stories in the numbers.
Instead of being led astray by prejudice
with statistics at our fingertips, our eyes can be open
for a facts-based view of the world.
More than ever before we can become
authors of our own destiny.
And that's pretty exciting isn't it?
(Music)