Return to Video

How we're building the world's largest family tree

  • 0:01 - 0:06
    People use the internet
    for various reasons.
  • 0:06 - 0:10
    It turns out that one of the most
    popular categories of website
  • 0:10 - 0:13
    is something that people
    typically consume in private.
  • 0:13 - 0:16
    It involves curiosity,
  • 0:16 - 0:20
    non-insignificant levels
    of self-indulgence,
  • 0:20 - 0:23
    and centered around recording
    the reproductive activities
  • 0:23 - 0:25
    of other people.
  • 0:26 - 0:28
    Of course I'm talking
    about genealogy.
  • 0:28 - 0:29
    (Laughter)
  • 0:29 - 0:31
    The study of family history.
  • 0:31 - 0:33
    When it comes to detailing family history,
  • 0:33 - 0:37
    in every family we have this person
    that is obsessed with genealogy.
  • 0:37 - 0:39
    Let's call him Uncle Bernie.
  • 0:39 - 0:43
    Uncle Bernie is exactly the last person
    you want to sit next to
  • 0:43 - 0:45
    in Thanksgiving dinner
  • 0:45 - 0:48
    because he will bore you to death
    with peculiar details
  • 0:48 - 0:50
    about some ancient relatives.
  • 0:50 - 0:52
    But as you know,
  • 0:52 - 0:55
    there is a scientific side for everything,
  • 0:55 - 0:57
    and we found that Uncle Bernie's stories
  • 0:57 - 1:01
    have immense potential
    for biomedical research.
  • 1:01 - 1:02
    We let Uncle Bernie
    and his fellow genealogists
  • 1:02 - 1:09
    document their family trees through
    a genealogy website called geni.com.
  • 1:09 - 1:12
    When users upload
    their trees to the website,
  • 1:12 - 1:13
    it scans their relatives,
  • 1:13 - 1:15
    and if it finds matches to existing trees,
  • 1:15 - 1:20
    it emerges the existing
    and the new tree together.
  • 1:20 - 1:26
    The result is that large family trees
    are created beyond the individual level
  • 1:26 - 1:27
    of each genealogist.
  • 1:27 - 1:30
    Now, by repeating this process with
    millions of people all over the world,
  • 1:30 - 1:38
    we can crowdsource the construction
    of a family tree of all humankind.
  • 1:40 - 1:42
    Using this website,
  • 1:42 - 1:46
    we were able to connect 125 million people
  • 1:46 - 1:49
    into a single family tree.
  • 1:49 - 1:52
    I cannot draw the tree
    on the screens over here
  • 1:52 - 1:55
    because they have less pixels
  • 1:55 - 1:57
    than the number of people in this tree,
  • 1:57 - 2:02
    but here is an example of a subset
    of 6,000 individuals.
  • 2:02 - 2:04
    Each green node is a person.
  • 2:04 - 2:05
    The red nodes represent marriages,
  • 2:05 - 2:09
    and the connections represent parenthood.
  • 2:09 - 2:11
    In the middle of this tree,
    you see the ancestors,
  • 2:11 - 2:15
    and as we go to the periphery,
    you see the descendants,
  • 2:15 - 2:19
    and this tree has seven
    generations approximately.
  • 2:19 - 2:23
    Now, this is what happens
    when we increase the number of individuals
  • 2:23 - 2:25
    to 70,000 people,
  • 2:25 - 2:30
    still a tiny subset
    of all the data that we have.
  • 2:30 - 2:35
    Despite that, you can already see
    the formation of gigantic family trees
  • 2:35 - 2:38
    with very many distant relatives.
  • 2:38 - 2:41
    Thanks to the hard work
    of our genealogists,
  • 2:41 - 2:44
    we can go back in time
    hundreds of years ago.
  • 2:44 - 2:48
    For example, here is Alexander Hamilton
  • 2:48 - 2:51
    that was born in 1755.
  • 2:51 - 2:55
    Alexander was the first
    US Secretary of the Treasury,
  • 2:55 - 2:58
    but mostly known today
    due to a popular Broadway musical.
  • 2:59 - 3:04
    We found that Alexander has deeper
    connections in the showbiz industry.
  • 3:04 - 3:07
    In fact, he's a blood relative
    of Kevin Bacon.
  • 3:08 - 3:09
    (Laughter)
  • 3:10 - 3:13
    Both of them are descendants
    of a lady from Scotland
  • 3:13 - 3:15
    who lived in the 13th century.
  • 3:15 - 3:18
    So you can say that Alexander Hamilton
  • 3:18 - 3:22
    is 35 degrees of Kevin Bacon genealogy.
  • 3:22 - 3:23
    (Laughter)
  • 3:23 - 3:27
    And our tree has millions
    of stories like that.
  • 3:27 - 3:33
    We invested significant effort
    to validate the quality of our data.
  • 3:33 - 3:38
    Using DNA, we found that .3 percent of
    the mother-child connections in our data
  • 3:38 - 3:40
    are wrong,
  • 3:40 - 3:44
    which could match the adoption rate
    in the US pre-Second World War.
  • 3:44 - 3:47
    For the father's side,
  • 3:47 - 3:50
    the news are not as good.
  • 3:50 - 3:56
    1.9 percent of the father-child
    connections in our data are wrong.
  • 3:56 - 3:58
    And I see some people smirk over here.
  • 3:58 - 4:00
    It is what you think.
  • 4:00 - 4:02
    There are many milkmen out there.
  • 4:02 - 4:03
    (Laughter)
  • 4:03 - 4:07
    However, this 1.9 percent error rate
    in patrilineal connections
  • 4:07 - 4:09
    is not unique to our data.
  • 4:09 - 4:12
    Previous studies found
    a similar error rate
  • 4:12 - 4:14
    using clinical-grade pedigrees.
  • 4:14 - 4:17
    So the quality of our data is good,
  • 4:17 - 4:19
    and that should not be a surprise.
  • 4:19 - 4:25
    Our genealogists have a profound,
    vested interest in correctly documenting
  • 4:25 - 4:27
    the family history.
  • 4:27 - 4:33
    We can leverage this data to learn
    quantitative information about humanity,
  • 4:33 - 4:36
    for example questions about demography.
  • 4:36 - 4:40
    Here is a look of all our profiles
    on the map of the world.
  • 4:40 - 4:45
    Each pixel is a person
    that lived at some point,
  • 4:45 - 4:47
    and since we have so much data,
  • 4:47 - 4:50
    you can see the contours
    of many countries,
  • 4:50 - 4:52
    especially in the Western world.
  • 4:52 - 4:55
    In this clip, we stratified
    the map that I've showed you
  • 4:55 - 4:58
    basically of birth of individuals
  • 4:58 - 5:00
    from 1400 to 1900
  • 5:00 - 5:04
    and we compared it
    to known migration events.
  • 5:04 - 5:07
    The clip is going to show you
    that the deepest lineages in our data
  • 5:07 - 5:09
    go all the way back to the UK,
  • 5:09 - 5:11
    where they had better record-keeping,
  • 5:11 - 5:14
    and then they spread along
    the routes of Western colonialism.
  • 5:14 - 5:15
    Let's watch this.
  • 5:16 - 5:18
    (Music)
  • 5:44 - 5:46
    I love this movie.
  • 5:46 - 5:50
    Now, since these migrations events
    are giving the context of families,
  • 5:50 - 5:53
    we can ask questions
  • 5:53 - 5:58
    such as what is the typical distance
    between the birth locations
  • 5:58 - 5:59
    of husbands and wives?
  • 5:59 - 6:03
    This distance plays
    a pivotal role in demography,
  • 6:03 - 6:06
    because the patterns on which
    people migrate to form families
  • 6:06 - 6:11
    determine how genes spread
    in geographical areas.
  • 6:11 - 6:14
    We analyzed this distance using our data,
  • 6:14 - 6:16
    and we found that in the old days,
  • 6:16 - 6:17
    people had it easy.
  • 6:17 - 6:19
    They just married someone
    in the village nearby.
  • 6:19 - 6:24
    But the Industrial Revolution
    really complicated our love life,
  • 6:24 - 6:28
    and today with affordable flights
    and online social media,
  • 6:28 - 6:32
    people typically migrate
    more than 100 kilometers
  • 6:32 - 6:35
    from their place of birth
    to find their soulmate.
  • 6:35 - 6:40
    So now you might ask, OK,
  • 6:40 - 6:43
    but who does the hard work
    of migrating from places to places
  • 6:43 - 6:44
    to form families?
  • 6:44 - 6:48
    Are these the males or the females?
  • 6:48 - 6:50
    We used our data to address this question,
  • 6:50 - 6:53
    and at least in the last 300 years,
  • 6:53 - 6:56
    we found that the ladies
  • 6:56 - 6:58
    do the hard work of migrating
    from places to places to form families.
  • 6:58 - 7:00
    Now these results
    are statistically significant,
  • 7:00 - 7:06
    so you can take it as scientific fact
    that males are lazy.
  • 7:06 - 7:09
    (Laughter)
  • 7:09 - 7:13
    We can move from questions
    about demography
  • 7:13 - 7:15
    and ask questions about human health.
  • 7:15 - 7:20
    For example, we can ask to what extent
    genetic variations account for differences
  • 7:20 - 7:23
    in lifespan between individuals.
  • 7:23 - 7:27
    Previous studies analyzed
    the correlation of longevity
  • 7:27 - 7:29
    between twins to address this question.
  • 7:30 - 7:34
    They estimated that the genetic variations
    account for about a quarter
  • 7:34 - 7:37
    of the differences in lifespan
    between individuals.
  • 7:37 - 7:40
    But twins can be correlated
    due to so many reasons,
  • 7:40 - 7:42
    including various environmental effects
  • 7:42 - 7:44
    or a shared household.
  • 7:44 - 7:48
    Large family trees give us the opportunity
    to analyze both close relatives,
  • 7:48 - 7:53
    such as twins, all the way
    to distant relatives, even fourth cousins.
  • 7:53 - 7:56
    This way we can build robust models
  • 7:56 - 7:59
    that can tease apart the contribution
    of genetic variations
  • 7:59 - 8:01
    from environmental factors.
  • 8:01 - 8:04
    We conducted this analysis using our data,
  • 8:04 - 8:11
    and we found that genetic variations
    explain only 15 percent
  • 8:11 - 8:14
    of the differences in lifespan
    between individuals.
  • 8:14 - 8:18
    That is five years, on average.
  • 8:18 - 8:24
    So genes matter less than
    what we thought before to lifespan,
  • 8:24 - 8:26
    and I find it as great news,
  • 8:26 - 8:30
    because it means that
    our actions can matter more.
  • 8:30 - 8:35
    Smoking, for example, determines
    10 years of our life expectancy,
  • 8:35 - 8:38
    twice as much as what genetics determines.
  • 8:38 - 8:41
    We can even have more surprising findings
  • 8:41 - 8:43
    as we move from family trees
  • 8:43 - 8:44
    and we let our genealogists
  • 8:44 - 8:47
    to document and crowdsource
    DNA information.
  • 8:47 - 8:50
    And the results can be amazing.
  • 8:50 - 8:53
    It might be hard to imagine,
    but Uncle Bernie and his friends
  • 8:53 - 8:56
    can create a DNA forensic capabilities
  • 8:56 - 9:00
    that even exceed
    what the FBI currently has.
  • 9:01 - 9:04
    When you place the DNA
    on a large family tree,
  • 9:04 - 9:06
    you effectively create a beacon
  • 9:06 - 9:08
    that illuminates the hundreds
    of distant relatives
  • 9:08 - 9:13
    that are connected to the person
    that originated the DNA.
  • 9:13 - 9:16
    By placing multiple beacons
    on a large family tree,
  • 9:16 - 9:19
    you can now triangulate the DNA
    of an unknown person,
  • 9:19 - 9:22
    the same way that the GPS system
  • 9:22 - 9:24
    uses multiple satellites
    to find a location.
  • 9:25 - 9:29
    The prime example
    of the power of this technique
  • 9:29 - 9:32
    is capturing the Golden State Killer,
  • 9:32 - 9:37
    one of the most notorious criminals
    in the history of the US.
  • 9:37 - 9:44
    The FBI has been searching
    For this person for over 40 years.
  • 9:44 - 9:46
    They had his DNA,
  • 9:46 - 9:49
    but he never showed up
    in any police database.
  • 9:49 - 9:54
    About a year ago, the FBI
    consulted a genetic genealogist,
  • 9:54 - 9:58
    and she suggested that they submit
    his DNA to a genealogy service
  • 9:58 - 10:01
    that can locate distant relatives.
  • 10:01 - 10:04
    They did that,
  • 10:04 - 10:06
    and they found a third cousin
    of the Golden State Killer.
  • 10:06 - 10:09
    They built a large family tree,
  • 10:09 - 10:11
    scanned the different
    branches of that tree
  • 10:11 - 10:13
    until they found a profile
    that exactly matched
  • 10:13 - 10:15
    what they knew about
    the Golden State Killer.
  • 10:16 - 10:20
    They obtained DNA from this person
    and found a perfect match
  • 10:20 - 10:21
    to the DNA they had in hand.
  • 10:21 - 10:24
    They arrested him
    and brought him to justice
  • 10:24 - 10:26
    after all these years.
  • 10:26 - 10:29
    Since then, genetic genealogists
  • 10:29 - 10:33
    have started working with
    local US law enforcement agencies
  • 10:33 - 10:36
    to use this technique
    in order to capture criminals,
  • 10:36 - 10:38
    and only in the past six months,
  • 10:38 - 10:43
    they were able to solve
    over 20 cold cases with this technique.
  • 10:43 - 10:52
    The French Nobel Laureate André Gide
    once wrote, "Families, I hate you!"
  • 10:52 - 10:53
    (Laughter)
  • 10:53 - 10:56
    And I think most of us
    can relate to his words.
  • 10:56 - 11:00
    Why dig around in the past
    doing family history
  • 11:00 - 11:02
    when the future is so bright and open?
  • 11:02 - 11:06
    But luckily, we have people
    like Uncle Bernie
  • 11:06 - 11:08
    and his fellow genealogists
    who love families
  • 11:08 - 11:10
    and tirelessly study them.
  • 11:10 - 11:15
    These are not amateurs
    with a self-serving hobby,
  • 11:15 - 11:22
    these are citizen scientists
    with a deep passion to tell us who we are,
  • 11:22 - 11:26
    and they know that the past
    can hold a key to the future.
  • 11:27 - 11:29
    Thank you very much.
  • 11:29 - 11:32
    (Applause)
Title:
How we're building the world's largest family tree
Speaker:
Yaniv Erlich
Description:

more » « less
Video Language:
English
Team:
closed TED
Project:
TEDTalks
Duration:
11:45

English subtitles

Revisions Compare revisions