Return to Video

How we're building the world's largest family tree

  • 0:01 - 0:04
    People use the internet
    for various reasons.
  • 0:06 - 0:10
    It turns out that one of the most
    popular categories of website
  • 0:10 - 0:12
    is something that people
    typically consume in private.
  • 0:14 - 0:16
    It involves curiosity,
  • 0:16 - 0:20
    non-insignificant levels
    of self-indulgence
  • 0:20 - 0:23
    and is centered around recording
    the reproductive activities
  • 0:23 - 0:25
    of other people.
  • 0:25 - 0:26
    (Laughter)
  • 0:26 - 0:28
    Of course, I'm talking about genealogy --
  • 0:28 - 0:29
    (Laughter)
  • 0:29 - 0:31
    the study of family history.
  • 0:31 - 0:33
    When it comes to detailing family history,
  • 0:33 - 0:37
    in every family, we have this person
    that is obsessed with genealogy.
  • 0:37 - 0:39
    Let's call him Uncle Bernie.
  • 0:39 - 0:43
    Uncle Bernie is exactly the last person
    you want to sit next to
  • 0:43 - 0:45
    in Thanksgiving dinner,
  • 0:45 - 0:47
    because he will bore you to death
    with peculiar details
  • 0:47 - 0:49
    about some ancient relatives.
  • 0:50 - 0:52
    But as you know,
  • 0:52 - 0:55
    there is a scientific side for everything,
  • 0:55 - 0:58
    and we found that Uncle Bernie's stories
  • 0:58 - 1:01
    have immense potential
    for biomedical research.
  • 1:01 - 1:04
    We let Uncle Bernie
    and his fellow genealogists
  • 1:04 - 1:09
    document their family trees through
    a genealogy website called geni.com.
  • 1:09 - 1:11
    When users upload
    their trees to the website,
  • 1:11 - 1:13
    it scans their relatives,
  • 1:13 - 1:15
    and if it finds matches to existing trees,
  • 1:15 - 1:19
    it merges the existing
    and the new tree together.
  • 1:20 - 1:23
    The result is that large
    family trees are created,
  • 1:23 - 1:26
    beyond the individual level
    of each genealogist.
  • 1:27 - 1:31
    Now, by repeating this process
    with millions of people
  • 1:31 - 1:33
    all over the world,
  • 1:33 - 1:38
    we can crowdsource the construction
    of a family tree of all humankind.
  • 1:39 - 1:41
    Using this website,
  • 1:41 - 1:46
    we were able to connect 125 million people
  • 1:46 - 1:48
    into a single family tree.
  • 1:49 - 1:52
    I cannot draw the tree
    on the screens over here
  • 1:52 - 1:54
    because they have less pixels
  • 1:54 - 1:56
    than the number of people in this tree.
  • 1:57 - 2:02
    But here is an example of a subset
    of 6,000 individuals.
  • 2:02 - 2:05
    Each green node is a person.
  • 2:05 - 2:08
    The red nodes represent marriages,
  • 2:08 - 2:10
    and the connections represent parenthood.
  • 2:11 - 2:13
    In the middle of this tree,
    you see the ancestors.
  • 2:13 - 2:16
    And as we go to the periphery,
    you see the descendants.
  • 2:16 - 2:19
    This tree has seven
    generations, approximately.
  • 2:20 - 2:23
    Now, this is what happens
    when we increase the number of individuals
  • 2:23 - 2:25
    to 70,000 people --
  • 2:25 - 2:29
    still a tiny subset
    of all the data that we have.
  • 2:30 - 2:34
    Despite that, you can already see
    the formation of gigantic family trees
  • 2:34 - 2:37
    with many very distant relatives.
  • 2:38 - 2:41
    Thanks to the hard work
    of our genealogists,
  • 2:41 - 2:44
    we can go back in time
    hundreds of years ago.
  • 2:44 - 2:48
    For example, here is Alexander Hamilton,
  • 2:48 - 2:50
    who was born in 1755.
  • 2:51 - 2:55
    Alexander was the first
    US Secretary of the Treasury,
  • 2:55 - 2:58
    but mostly known today
    due to a popular Broadway musical.
  • 2:59 - 3:04
    We found that Alexander has deeper
    connections in the showbiz industry.
  • 3:04 - 3:06
    In fact, he's a blood relative of ...
  • 3:07 - 3:08
    Kevin Bacon!
  • 3:08 - 3:10
    (Laughter)
  • 3:10 - 3:13
    Both of them are descendants
    of a lady from Scotland
  • 3:13 - 3:15
    who lived in the 13th century.
  • 3:15 - 3:18
    So you can say that Alexander Hamilton
  • 3:18 - 3:21
    is 35 degrees of Kevin Bacon genealogy.
  • 3:21 - 3:23
    (Laughter)
  • 3:23 - 3:26
    And our tree has millions
    of stories like that.
  • 3:28 - 3:33
    We invested significant efforts
    to validate the quality of our data.
  • 3:33 - 3:38
    Using DNA, we found that .3 percent of
    the mother-child connections in our data
  • 3:38 - 3:40
    are wrong,
  • 3:40 - 3:43
    which could match the adoption rate
    in the US pre-Second World War.
  • 3:45 - 3:47
    For the father's side,
  • 3:47 - 3:49
    the news is not as good:
  • 3:50 - 3:56
    1.9 percent of the father-child
    connections in our data are wrong.
  • 3:56 - 3:58
    And I see some people smirk over here.
  • 3:58 - 4:00
    It is what you think --
  • 4:00 - 4:02
    there are many milkmen out there.
  • 4:02 - 4:03
    (Laughter)
  • 4:03 - 4:07
    However, this 1.9 percent error rate
    in patrilineal connections
  • 4:07 - 4:09
    is not unique to our data.
  • 4:09 - 4:12
    Previous studies found
    a similar error rate
  • 4:12 - 4:14
    using clinical-grade pedigrees.
  • 4:14 - 4:17
    So the quality of our data is good,
  • 4:17 - 4:19
    and that should not be a surprise.
  • 4:19 - 4:23
    Our genealogists have
    a profound, vested interest
  • 4:23 - 4:26
    in correctly documenting
    their family history.
  • 4:29 - 4:33
    We can leverage this data to learn
    quantitative information about humanity,
  • 4:33 - 4:36
    for example, questions about demography.
  • 4:36 - 4:40
    Here is a look at all our profiles
    on the map of the world.
  • 4:40 - 4:45
    Each pixel is a person
    that lived at some point.
  • 4:45 - 4:46
    And since we have so much data,
  • 4:46 - 4:49
    you can see the contours
    of many countries,
  • 4:49 - 4:51
    especially in the Western world.
  • 4:51 - 4:55
    In this clip, we stratified
    the map that I've showed you
  • 4:55 - 5:00
    based on the year of births of individuals
    from 1400 to 1900,
  • 5:00 - 5:03
    and we compared it
    to known migration events.
  • 5:03 - 5:07
    The clip is going to show you
    that the deepest lineages in our data
  • 5:07 - 5:08
    go all the way back to the UK,
  • 5:08 - 5:10
    where they had better record keeping,
  • 5:10 - 5:13
    and then they spread along
    the routes of Western colonialism.
  • 5:13 - 5:15
    Let's watch this.
  • 5:15 - 5:17
    (Music)
  • 5:17 - 5:19
    [Year of birth: ]
  • 5:20 - 5:22
    [1492 - Columbus sails the ocean blue]
  • 5:24 - 5:26
    [1620 - Mayflower lands in Massachusetts]
  • 5:27 - 5:29
    [1652 - Dutch settle in South Africa]
  • 5:32 - 5:36
    [1788 - Great Britain penal
    transportation to Australia starts]
  • 5:36 - 5:37
    [1836 - First migrants use Oregon Trail]
  • 5:38 - 5:41
    [all activity]
  • 5:44 - 5:45
    I love this movie.
  • 5:45 - 5:51
    Now, since these migration events
    are giving the context of families,
  • 5:51 - 5:53
    we can ask questions such as:
  • 5:53 - 5:56
    What is the typical distance
    between the birth locations
  • 5:56 - 5:59
    of husbands and wives?
  • 5:59 - 6:03
    This distance plays
    a pivotal role in demography,
  • 6:03 - 6:06
    because the patterns in which
    people migrate to form families
  • 6:06 - 6:10
    determine how genes spread
    in geographical areas.
  • 6:11 - 6:13
    We analyzed this distance using our data,
  • 6:13 - 6:15
    and we found that in the old days,
  • 6:15 - 6:17
    people had it easy.
  • 6:17 - 6:19
    They just married someone
    in the village nearby.
  • 6:20 - 6:24
    But the Industrial Revolution
    really complicated our love life.
  • 6:24 - 6:28
    And today, with affordable flights
    and online social media,
  • 6:28 - 6:33
    people typically migrate more than
    100 kilometers from their place of birth
  • 6:33 - 6:35
    to find their soul mate.
  • 6:37 - 6:38
    So now you might ask:
  • 6:38 - 6:42
    OK, but who does the hard work
    of migrating from places to places
  • 6:42 - 6:44
    to form families?
  • 6:44 - 6:47
    Are these the males or the females?
  • 6:48 - 6:50
    We used our data to address this question,
  • 6:50 - 6:53
    and at least in the last 300 years,
  • 6:53 - 6:56
    we found that the ladies do the hard work
  • 6:56 - 6:59
    of migrating from places
    to places to form families.
  • 6:59 - 7:03
    Now, these results
    are statistically significant,
  • 7:03 - 7:06
    so you can take it as scientific fact
    that males are lazy.
  • 7:06 - 7:09
    (Laughter)
  • 7:09 - 7:12
    We can move from questions
    about demography
  • 7:12 - 7:15
    and ask questions about human health.
  • 7:15 - 7:16
    For example, we can ask
  • 7:16 - 7:21
    to what extent genetic variations
    account for differences in life span
  • 7:21 - 7:22
    between individuals.
  • 7:23 - 7:28
    Previous studies analyzed the correlation
    of longevity between twins
  • 7:28 - 7:29
    to address this question.
  • 7:29 - 7:32
    They estimated that the genetic
    variations account for
  • 7:32 - 7:36
    about a quarter of the differences
    in life span between individuals.
  • 7:37 - 7:39
    But twins can be correlated
    due to so many reasons,
  • 7:39 - 7:42
    including various environmental effects
  • 7:42 - 7:43
    or a shared household.
  • 7:44 - 7:48
    Large family trees give us the opportunity
    to analyze both close relatives,
  • 7:48 - 7:49
    such as twins,
  • 7:49 - 7:52
    all the way to distant relatives,
    even fourth cousins.
  • 7:53 - 7:55
    This way we can build robust models
  • 7:55 - 7:59
    that can tease apart the contribution
    of genetic variations
  • 7:59 - 8:01
    from environmental factors.
  • 8:01 - 8:04
    We conducted this analysis using our data,
  • 8:04 - 8:10
    and we found that genetic variations
    explain only 15 percent
  • 8:10 - 8:13
    of the differences in life span
    between individuals.
  • 8:15 - 8:18
    That is five years, on average.
  • 8:18 - 8:23
    So genes matter less than
    what we thought before to life span.
  • 8:24 - 8:26
    And I find it great news,
  • 8:26 - 8:30
    because it means that
    our actions can matter more.
  • 8:31 - 8:35
    Smoking, for example, determines
    10 years of our life expectancy --
  • 8:35 - 8:37
    twice as much as what genetics determines.
  • 8:38 - 8:41
    We can even have more surprising findings
  • 8:41 - 8:42
    as we move from family trees
  • 8:42 - 8:47
    and we let our genealogists
    document and crowdsource DNA information.
  • 8:47 - 8:49
    And the results can be amazing.
  • 8:49 - 8:53
    It might be hard to imagine,
    but Uncle Bernie and his friends
  • 8:53 - 8:56
    can create DNA forensic capabilities
  • 8:56 - 8:59
    that even exceed
    what the FBI currently has.
  • 9:01 - 9:03
    When you place the DNA
    on a large family tree,
  • 9:03 - 9:05
    you effectively create a beacon
  • 9:05 - 9:08
    that illuminates the hundreds
    of distant relatives
  • 9:08 - 9:12
    that are all connected to the person
    that originated the DNA.
  • 9:13 - 9:15
    By placing multiple beacons
    on a large family tree,
  • 9:15 - 9:19
    you can now triangulate the DNA
    of an unknown person,
  • 9:19 - 9:23
    the same way that the GPS system
    uses multiple satellites
  • 9:23 - 9:24
    to find a location.
  • 9:25 - 9:29
    The prime example
    of the power of this technique
  • 9:29 - 9:32
    is capturing the Golden State Killer,
  • 9:33 - 9:37
    one of the most notorious criminals
    in the history of the US.
  • 9:37 - 9:43
    The FBI had been searching
    for this person for over 40 years.
  • 9:44 - 9:45
    They had his DNA,
  • 9:45 - 9:49
    but he never showed up
    in any police database.
  • 9:49 - 9:54
    About a year ago, the FBI
    consulted a genetic genealogist,
  • 9:54 - 9:58
    and she suggested that they submit
    his DNA to a genealogy service
  • 9:58 - 10:01
    that can locate distant relatives.
  • 10:01 - 10:02
    They did that,
  • 10:02 - 10:06
    and they found a third cousin
    of the Golden State Killer.
  • 10:06 - 10:08
    They built a large family tree,
  • 10:08 - 10:10
    scanned the different
    branches of that tree,
  • 10:11 - 10:13
    until they found a profile
    that exactly matched
  • 10:13 - 10:16
    what they knew about
    the Golden State Killer.
  • 10:16 - 10:19
    They obtained DNA from this person
    and found a perfect match
  • 10:19 - 10:21
    to the DNA they had in hand.
  • 10:21 - 10:24
    They arrested him
    and brought him to justice
  • 10:24 - 10:25
    after all these years.
  • 10:26 - 10:29
    Since then, genetic genealogists
    have started working with
  • 10:29 - 10:32
    local US law enforcement agencies
  • 10:32 - 10:35
    to use this technique
    in order to capture criminals.
  • 10:36 - 10:38
    And only in the past six months,
  • 10:38 - 10:43
    they were able to solve
    over 20 cold cases with this technique.
  • 10:44 - 10:49
    Luckily, we have people like Uncle
    Bernie and his fellow genealogists
  • 10:49 - 10:52
    These are not amateurs
    with a self-serving hobby.
  • 10:53 - 10:59
    These are citizen scientists
    with a deep passion to tell us who we are.
  • 10:59 - 11:04
    And they know that the past
    can hold a key to the future.
  • 11:04 - 11:05
    Thank you very much.
  • 11:05 - 11:09
    (Applause)
Title:
How we're building the world's largest family tree
Speaker:
Yaniv Erlich
Description:

Computational geneticist Yaniv Erlich helped build the world's largest family tree -- comprising 13 million people and going back more than 500 years. He shares fascinating patterns that emerged from the work -- about our love lives, our health, even decades-old criminal cases -- and shows how crowdsourced genealogy databases can shed light not only on the past but also on the future.

more » « less
Video Language:
English
Team:
closed TED
Project:
TEDTalks
Duration:
11:45

English subtitles

Revisions Compare revisions