WEBVTT
00:00:00.000 --> 00:00:20.130
36C3 preroll music
00:00:20.130 --> 00:00:25.169
Angel: Right now I'd like to welcome our
first speaker on stage. The talk will be
00:00:25.169 --> 00:00:30.800
about protecting the wild and I'll hand
over to her. Please give her a warm round
00:00:30.800 --> 00:00:32.870
of applause.
00:00:32.870 --> 00:00:34.860
Applause
00:00:34.860 --> 00:00:43.920
Jutta Buschbom: Thank you very much for
the introduction. My name is Jutta
00:00:43.920 --> 00:00:52.110
Buschbom, I'm an evolutionary biologist.
That is my background. I did do my PHD at
00:00:52.110 --> 00:00:57.290
the University of Chicago working on
little fungees that live in symbiosis with
00:00:57.290 --> 00:01:05.979
algae and form colorful rocks, colorful
crust on rocks. I then did a Postdoc in
00:01:05.979 --> 00:01:12.240
bioinformatics and after that moved back
into organismal biology, working in forest
00:01:12.240 --> 00:01:19.560
genetics. And the ten years I worked in
forest genetics for the first time I
00:01:19.560 --> 00:01:26.049
encountered questions that were with
regard to application, and I found out
00:01:26.049 --> 00:01:37.359
that actually moving from research to
application is not trivial. So what I'm
00:01:37.359 --> 00:01:45.869
going to present is a high tech way using
genomic data to protect biodiversity in a
00:01:45.869 --> 00:01:51.939
way that you can actually reach
application and use conservation genomic
00:01:51.939 --> 00:02:02.600
tools. So this summer the draft of the
report of the Intergovernmental Science
00:02:02.600 --> 00:02:12.319
Policy Panel for Biodiversity and
Ecosystem Services came out and its
00:02:12.319 --> 00:02:19.930
results were quite warning. It stated that
around a million animal and plant species
00:02:19.930 --> 00:02:27.330
are currently stated and of those...half
of those species are already dead species
00:02:27.330 --> 00:02:33.450
walking. So because due to the destruction
of the habitats or habitat deterioration,
00:02:33.450 --> 00:02:42.950
they are not able to reproduce in a
sustainable way anymore. A third of the
00:02:42.950 --> 00:02:51.170
total species extinction rate risk to date
has arisen in the last 25 years. And just
00:02:51.170 --> 00:03:01.450
to give you an idea about the relation we
are talking about...currently the rate of
00:03:01.450 --> 00:03:07.680
extinction risk is already at least ten to
hundreds times higher than it has averaged
00:03:07.680 --> 00:03:13.130
over the past 10 million years. And within
these 10 million years there were the Ice
00:03:13.130 --> 00:03:23.260
Ages, for example. And most of the
extinction risk is due to the fact of land
00:03:23.260 --> 00:03:36.190
and sea use change. The report also talks,
even talks about that we already seem to
00:03:36.190 --> 00:03:42.420
have transgressed a proposed precautionary
planetary boundary, which means within the
00:03:42.420 --> 00:03:48.370
boundary we have a stable biological
system. But having transgressed it, we
00:03:48.370 --> 00:03:55.430
might already be in a transition to a new
state that we have no way to find out how
00:03:55.430 --> 00:04:05.240
this state is going to look like. So all
of these facts that the report is stating
00:04:05.240 --> 00:04:14.730
are actually pretty negative. And I was
quite happy to read that they also present
00:04:14.730 --> 00:04:20.699
that there are actually people who do
better than most of us. And they point out
00:04:20.699 --> 00:04:27.810
that many practices of indigenous people
and local communities actually conserve
00:04:27.810 --> 00:04:38.350
and sustain wild and domesticated
biodiversity quite well. Today, a higher
00:04:38.350 --> 00:04:44.600
proportion of the remaining terrestrial
biodiversity lies in areas managed and
00:04:44.600 --> 00:04:52.890
held by indigenous people. And these
ecosystems are more intact and less
00:04:52.890 --> 00:05:01.770
declining, less rapidly declining. So we
have examples of lifestyles that actually
00:05:01.770 --> 00:05:10.530
do better than most of us. And I know the
solutions won't be simple and it won't be
00:05:10.530 --> 00:05:22.330
easy to get there but we can look to what
these people do better than we do. All of
00:05:22.330 --> 00:05:27.930
this sounds...it's a global report and it
sounds kind of like far away, like
00:05:27.930 --> 00:05:35.990
probably somewhere in the tropics, but
actually threats to biodiversity happen
00:05:35.990 --> 00:05:45.400
also directly in front of our own front
doors. This summer a paper came out from
00:05:45.400 --> 00:05:52.800
two colleagues from the University of
Greifswald, who had analyzed the long term
00:05:52.800 --> 00:05:58.490
data set about leaf beetles. And they were
asking if we already have a decline of
00:05:58.490 --> 00:06:08.240
leaf beetles in Central Europe. So they
compiled long term data sets of leaf
00:06:08.240 --> 00:06:19.140
beetle observations for Central Europe,
starting from 1900 now to 2017, so
00:06:19.140 --> 00:06:27.010
spanning a hundred and twenty years. And
what they find is that systematic reports
00:06:27.010 --> 00:06:36.270
on leaf beetles and leaf beetle
observations are increasing during this
00:06:36.270 --> 00:06:45.310
time interval, time span. But despite the
fact that we have...like in the last two
00:06:45.310 --> 00:06:53.270
decades, we had very high numbers of
reports and observations for leaf beetles,
00:06:53.270 --> 00:07:00.100
the number of species, the orange line, is
declining. It's slightly declining. But
00:07:00.100 --> 00:07:06.010
the question is, is this real or not? And
what was most worrisome to the authors is
00:07:06.010 --> 00:07:15.110
that in the data set, the number of
species here in orange that were having
00:07:15.110 --> 00:07:21.930
more reports was declining, while the
number of species that showed less reports
00:07:21.930 --> 00:07:33.930
than before is expanding. So this kind of
long term datasets are very hard to
00:07:33.930 --> 00:07:41.310
interpret and many factors can contribute
to those patterns. And it's not clear if
00:07:41.310 --> 00:07:48.310
this pattern is statistically significant.
But if you take a step back and consider
00:07:48.310 --> 00:07:54.470
your background knowledge, your prior
knowledge about the state of the world, do
00:07:54.470 --> 00:08:02.760
you say, like, how does the current state
look like? Does it look good or rather
00:08:02.760 --> 00:08:16.910
worrisome? And then with that knowledge,
tell me that these results are an
00:08:16.910 --> 00:08:30.150
artifact or a bias. I'm worried that once
we have statistical significant signal in
00:08:30.150 --> 00:08:41.789
this dataset, it will be already too late.
So right now, I've been talking about leaf
00:08:41.789 --> 00:08:49.639
beetles and beetles are the largest group
within insects with about 400.000 species.
00:08:49.639 --> 00:08:56.200
Leaf beetles are a large family of about
50.000 species which are worldwide
00:08:56.200 --> 00:09:05.080
distributed. And here in Germany, we have
over 470 leaf beetle species. So how do we
00:09:05.080 --> 00:09:09.740
actually know how many species there are
and who actually counted all these
00:09:09.740 --> 00:09:15.960
species? And is that just a task of
taxonomists. Taxonomy is the science of
00:09:15.960 --> 00:09:21.600
naming and defining, including
circumscribing and classifying groups of
00:09:21.600 --> 00:09:32.020
biological organisms on the basis of
shared characters. So one could have the
00:09:32.020 --> 00:09:37.560
picture of some woman with a funny hat
running over a meadow catching like
00:09:37.560 --> 00:09:44.480
butterflies or some guy mushroom hunter
crawling through the forest trying to find
00:09:44.480 --> 00:09:52.380
mushrooms. And it's true, as biodiversity
scientists we spent a lot of time outdoors
00:09:52.380 --> 00:10:02.290
and yeah...on the other hand, biotaxonomy
is a high-tech science today. So
00:10:02.290 --> 00:10:11.050
taxonomists actually take up new
technological tools and developments to
00:10:11.050 --> 00:10:17.270
help them identify and describe,
understand the species. So taxonomists
00:10:17.270 --> 00:10:25.110
actually are often experts in, for
example, microscopy, mathematics,
00:10:25.110 --> 00:10:36.850
biochemistry, even proteomics and
genomics. So throughout the talk, I'm
00:10:36.850 --> 00:10:41.520
going to compile this list of people and
experts we're going to need to protect
00:10:41.520 --> 00:10:49.360
biodiversity if we want to do this on the
basis of genetic data. Right now, the list
00:10:49.360 --> 00:10:56.430
is quite empty. The first entry is a
taxonomists, but that will change quickly
00:10:56.430 --> 00:11:06.260
and taxonomists are a subgroup of
evolutionary biologists mostly. So I told
00:11:06.260 --> 00:11:15.560
you as taxonomists and biodiversity
scientists take up technology and...so as
00:11:15.560 --> 00:11:23.610
soon as computers came about and the
internet started people started to use
00:11:23.610 --> 00:11:32.420
that to compile information about species,
and today we have several global resources
00:11:32.420 --> 00:11:40.640
available at the species level and above
the species level. So we biodiversity
00:11:40.640 --> 00:11:45.720
scientists were among the first who
defined biodiversity information
00:11:45.720 --> 00:11:56.690
standards. We have a global catalog of
life. A list of all named species. The
00:11:56.690 --> 00:12:01.810
Global Biodiversity Information Facility
has an aim to bring together information
00:12:01.810 --> 00:12:08.630
from different sources and they are
compiling, producing this wonderful map.
00:12:08.630 --> 00:12:13.940
This is leaf beetles, all the records
about leaf beetles that we have in the
00:12:13.940 --> 00:12:22.200
world. And it looks like as if leaf
beetles are highly associated with third
00:12:22.200 --> 00:12:29.580
world economics. However that clearly is
an artifact and it just shows that we need
00:12:29.580 --> 00:12:34.560
many more taxonomists and biodiversity
scientists all over the world to find and
00:12:34.560 --> 00:12:45.300
identify leaf beetles. So we also need
biodiversity informaticians to help us
00:12:45.300 --> 00:12:52.050
compile global lists and distribute
knowledge. So far I have been talking
00:12:52.050 --> 00:12:57.890
about species which is a simplification.
The question is what is...what are species
00:12:57.890 --> 00:13:03.400
actually? And so we need to talk about
genetic diversity within and between
00:13:03.400 --> 00:13:16.519
species. And I'm going to do so using
gulls, which most of us might know. Here
00:13:16.519 --> 00:13:21.670
in Europe, we have two large gulls of the
genus Larus. One is in the front, the
00:13:21.670 --> 00:13:31.070
lighter gray is our Silbermöwe. And in the
back is our Heringsmöwe, the dark one. And
00:13:31.070 --> 00:13:35.740
I'm going to use German names because the
English names go crosswise and that's
00:13:35.740 --> 00:13:43.160
completely confusing. So I will stick with
the German names. Here in Europe these two
00:13:43.160 --> 00:13:48.450
species seem to be really fine species
because they barely interbreed, so they
00:13:48.450 --> 00:13:55.680
don't hybridize. However, if you take a
step back and look at the genus in
00:13:55.680 --> 00:14:03.120
general, you see that the species of the
genus are distributed kind of ringwise
00:14:03.120 --> 00:14:14.510
around the Arctic. And so the idea is
that, say during the Ice Age, all of this
00:14:14.510 --> 00:14:22.959
area was glaciated and the gulls retreated
to a refuge here near the Caspian Sea. And
00:14:22.959 --> 00:14:28.110
then after the ice retreated, the gulls
moved back north. One branch moved into
00:14:28.110 --> 00:14:34.350
Europe forming our Heringsmöwe and
another branch then moved counterclockwise
00:14:34.350 --> 00:14:41.019
around the Arctic, producing different
morphotypes, different species across the
00:14:41.019 --> 00:14:49.450
Bering Strait and then into North America.
There the dark blue one is...I'm
00:14:49.450 --> 00:14:58.730
simplifying, the equivalent of our
European Silbermöwe, the American
00:14:58.730 --> 00:15:03.830
Silbermöwe. Then the idea is that some
individuals crossed back to Europe and
00:15:03.830 --> 00:15:14.800
formed our European Silbermöwe. And while
all of these species here are
00:15:14.800 --> 00:15:21.769
interbreeding, so they hybridize. Only
when this ring is closed those two species
00:15:21.769 --> 00:15:26.720
don't interbreed anymore. And the big
question is, are we actually dealing with
00:15:26.720 --> 00:15:34.230
one single species or are we dealing with
different species that just happened to
00:15:34.230 --> 00:15:41.079
hybridize more or less? The question is
not trivial because it has consequences
00:15:41.079 --> 00:15:48.740
for protection. If we are dealing with one
single species, all the gulls in Eurasia
00:15:48.740 --> 00:15:53.010
could go extinct and it wouldn't matter
because we still would have the gulls in
00:15:53.010 --> 00:15:58.540
North America. However, if we have
different species in all of these areas,
00:15:58.540 --> 00:16:04.709
we would need to protect individuals or
the species on a regional level and
00:16:04.709 --> 00:16:17.279
protect all of these different species. So
to investigate this question about: Do we
00:16:17.279 --> 00:16:23.589
have different species? And what were the
evolutionary processes and histories that
00:16:23.589 --> 00:16:31.100
brought about the species? A group of
scientists investigated that using DNA
00:16:31.100 --> 00:16:39.930
sequences. And on the left, you have the
model, the theoretical model of the ring
00:16:39.930 --> 00:16:46.380
species. And here on the right you have
reality. And the scientists found that the
00:16:46.380 --> 00:16:51.630
reality is always much more complex. So,
for example, they found two refuges or
00:16:51.630 --> 00:16:58.430
they proposed two refuges. But what they
found was that genetic diversity was
00:16:58.430 --> 00:17:07.351
correlated with those species or
morphotypes. So what that also means is
00:17:07.351 --> 00:17:15.730
that genetic diversity is cultivated with
geographic origin. What we learn from this
00:17:15.730 --> 00:17:24.360
type of analysis is we learn about
evolutionary processes and history, about
00:17:24.360 --> 00:17:30.170
variability and differentiation of our
gene flow and migration, about speciation
00:17:30.170 --> 00:17:37.590
processes. That we all need to understand
our species, which will allow us to
00:17:37.590 --> 00:17:43.440
protect them. So we need evolutionary
biologists who do follow genetics and
00:17:43.440 --> 00:17:59.030
population genetics. So once we found out
that one can use genetic diversity, to
00:17:59.030 --> 00:18:07.130
infer geographic origin because genetic
diversity is correlated with geography,
00:18:07.130 --> 00:18:18.500
people immediately said: 'Okay, we can use
it for conservation applications.'. And
00:18:18.500 --> 00:18:24.049
it's also...we learned that we...often it
is unclear what is a species, species
00:18:24.049 --> 00:18:32.559
boundaries are unclear and some species
have huge distribution ranges with
00:18:32.559 --> 00:18:37.340
different clusters of viability within
this huge range. So we know that we need
00:18:37.340 --> 00:18:42.941
to protect within species genetic
diversity, which means that we need to
00:18:42.941 --> 00:18:50.650
understand within species population
structure and we need to build useful and
00:18:50.650 --> 00:18:58.919
reliable models of population structure.
These models are actually required for all
00:18:58.919 --> 00:19:03.740
of our applications. They are required for
monitoring, for example, for conservation
00:19:03.740 --> 00:19:11.890
strategies, for functional adaptation and
adaptability, questions of productability
00:19:11.890 --> 00:19:19.190
of different provenances, its impact on
management regimes, breeding strategies,
00:19:19.190 --> 00:19:27.610
and also for enforcement applications.
From the studies I showed you before with
00:19:27.610 --> 00:19:34.110
the gulls we also know that we need to
approach the question of a population
00:19:34.110 --> 00:19:47.070
structure on a distribution range wide
scale. So here's the map produced by
00:19:47.070 --> 00:19:53.630
EUFORGENE, the European Network for forest
reproductive material for one of our
00:19:53.630 --> 00:20:02.000
native oaks, the sessil oak. And the dots
are the sites for genetic conservation
00:20:02.000 --> 00:20:12.120
units. And so that is one strategy how to
represent within species genetic diversity
00:20:12.120 --> 00:20:22.020
and how to sample it. And you can see this
is a hypothetical example, but we likely
00:20:22.020 --> 00:20:32.460
will see a gradient from west to east or
might see one at this scale. Then once we
00:20:32.460 --> 00:20:37.800
have these kind of global data sets, we
can go to the fine scale and maybe, for
00:20:37.800 --> 00:20:44.100
example, do a national genetic monitoring.
And we will find much finer scale
00:20:44.100 --> 00:20:51.210
gradients. We also will find especially
for first trace outliers, so for stands
00:20:51.210 --> 00:20:59.150
that don't fit the usual pattern. And that
is because the first reproductive material
00:20:59.150 --> 00:21:07.660
has been moved around a lot. And so these
lighter or darker dots is material that
00:21:07.660 --> 00:21:16.150
was moved to Germany from the outside. And
we only will identify these outliers if we
00:21:16.150 --> 00:21:21.380
have the whole reference dataset. If we
don't have the whole reference dataset, we
00:21:21.380 --> 00:21:28.799
might not identify these outliers - stands
with a different history. Or in a worst
00:21:28.799 --> 00:21:34.280
case, these outliers might actually bias
our gradients. And we are always talking
00:21:34.280 --> 00:21:42.770
about very slight gradients. So it's easy
to bias these gradiants, dilute them, so
00:21:42.770 --> 00:21:50.710
we actually won't get the results we need.
To compile these kinds of reference
00:21:50.710 --> 00:21:57.850
datasets that's huge collaborative efforts
because people need to go out into the
00:21:57.850 --> 00:22:04.500
field and collect the reference samples
and that might be scientists, that might
00:22:04.500 --> 00:22:13.669
be people from local communities, citizen
scientists, managers, owners, government
00:22:13.669 --> 00:22:20.179
officials who provide background
information, maps, distribution
00:22:20.179 --> 00:22:27.929
information and also in many parts of the
world might protect the people who are
00:22:27.929 --> 00:22:34.510
actually collecting the samples. And it
might be conservation activists and NGOs.
00:22:34.510 --> 00:22:41.150
So once the samples have been collected
they need to be stored somewhere for the
00:22:41.150 --> 00:22:51.150
long term and the information needs to be
databased. And that is the work of
00:22:51.150 --> 00:22:57.430
scientific connections, which are mostly
at natural history museums and there the
00:22:57.430 --> 00:23:04.460
samples are processed. They're organized
in ways that you can find them again. All
00:23:04.460 --> 00:23:09.680
the metadata is entered, which curators
do, collection managers, preparators,
00:23:09.680 --> 00:23:17.030
technical staff at the scientific
collections. So once we have these kind of
00:23:17.030 --> 00:23:24.910
data sets, large scale data sets, what are
we actually doing with them? So the
00:23:24.910 --> 00:23:32.514
foundation for all of our applications is
population structure and there
00:23:32.514 --> 00:23:42.370
specifically population assignment. So the
process is set first. We decide on a
00:23:42.370 --> 00:23:46.660
question and design our project
accordingly that we can answer the
00:23:46.660 --> 00:23:51.940
question. Then we need to infer the
population structure model and optimize
00:23:51.940 --> 00:23:57.480
it. In the next step we need to check if a
model actually is good enough for
00:23:57.480 --> 00:24:03.040
application because we might have found
the best model, but it might still not be
00:24:03.040 --> 00:24:07.480
good enough for application. So we need to
test that. And that is the step of
00:24:07.480 --> 00:24:12.831
population assignment or predictive
assignment. And then in the end, we want
00:24:12.831 --> 00:24:19.330
to test our hypothesis. Are the two stands
different or does an individual come from
00:24:19.330 --> 00:24:31.059
stand A or from stand B? And here we
identify error rates and accuracy. So this
00:24:31.059 --> 00:24:38.890
whole process is very statistical. And so
the analysis of these reference data they
00:24:38.890 --> 00:24:48.240
need to be accompanied by biostatisticians
who can tell us how to analyze our data.
00:24:48.240 --> 00:24:55.289
So what is the state-of-the-art right now?
What kind of geographic resolution do we
00:24:55.289 --> 00:25:02.990
actually get of this non model specie
currently? And I'm going to present the
00:25:02.990 --> 00:25:09.600
example of an African timber tree
species, which is a very valuable timber.
00:25:09.600 --> 00:25:18.110
It's one example but basically all results
for species who have large distribution
00:25:18.110 --> 00:25:26.059
ranges and are continuously distributed
and are also long-lived, are very similar.
00:25:26.059 --> 00:25:33.460
So this kind of results seem to be species
independent. So the species are Milica
00:25:33.460 --> 00:25:40.370
regia and excelsa, African teak, which
cannot be grown in plantations for timber
00:25:40.370 --> 00:25:51.159
quality. So it is harvested unsustainably
from natural forests. It's distributed in
00:25:51.159 --> 00:26:00.580
West, Central and East Africa. Here's a
black rectangle. And a group of a dozen
00:26:00.580 --> 00:26:06.289
scientists got together and they actually
sampled a reference dataset for these two
00:26:06.289 --> 00:26:18.659
species. It's about over 400 samples, they
analyzed four marker systems, resulting in
00:26:18.659 --> 00:26:24.570
a total of something like 100 markers,
genetic markers, and then they optimized
00:26:24.570 --> 00:26:32.660
the population model and used different
parameter settings. And we're going to
00:26:32.660 --> 00:26:40.080
concentrate here on the best solution that
they found. And basically this rectangle
00:26:40.080 --> 00:26:47.870
here is the black one over here. So the
resolution is... they found population
00:26:47.870 --> 00:26:54.690
structure with clear clusters. So the
populations and the species from West
00:26:54.690 --> 00:27:01.490
Africa can be distinguished from those
populations in Central Africa. And the
00:27:01.490 --> 00:27:08.460
ones in East Africa can be differentiated.
So that is really good. So we have
00:27:08.460 --> 00:27:13.480
population structure. We know their
signal. The problem is still that our
00:27:13.480 --> 00:27:21.510
resolution is much lower than we would
need to have it because we basically need
00:27:21.510 --> 00:27:32.090
resolution at least on a country level,
because most of the laws are national. So
00:27:32.090 --> 00:27:41.770
it might be legal to harvest a tree in one
country, but not in another country. So we
00:27:41.770 --> 00:27:49.319
need to get our resolution down to country
level or even to regional level. If you
00:27:49.319 --> 00:27:52.361
want to distinguish, was the tree
harvested in a national park in a
00:27:52.361 --> 00:28:02.289
protected area or outside in a managed
forest. And when as biodiversity
00:28:02.289 --> 00:28:10.740
scientists, we don't know how to continue,
one thing is to look for what people do
00:28:10.740 --> 00:28:17.179
with model organisms and specifically what
people do in human population genomics
00:28:17.179 --> 00:28:24.179
because there thousands of populations
geneticists are working and there is a
00:28:24.179 --> 00:28:28.210
completely different funding background
due to the interest of the medical and the
00:28:28.210 --> 00:28:39.119
pharma industry. So they are always
advanced. What we can learn from there,
00:28:39.119 --> 00:28:46.660
from the human populations genomics is
that we need two features. One is we
00:28:46.660 --> 00:28:53.570
already know that we need distribution
wide sampling, which provides a spatial
00:28:53.570 --> 00:28:59.950
context. The second feature is that we
need genome wide sequencing, preferably
00:28:59.950 --> 00:29:09.210
genome sequencing, which provides us steps
in time because our genomes are archives
00:29:09.210 --> 00:29:14.710
of our evolutionary history. They are
records of all the processes and events
00:29:14.710 --> 00:29:21.429
and these steps in time then translate
also into resolution. Once we have these
00:29:21.429 --> 00:29:30.150
two features, actually these reference
datasets open Pandora's box. Suddently we
00:29:30.150 --> 00:29:36.390
can ask all kinds of questions and
objectives, even those that we still don't
00:29:36.390 --> 00:29:47.010
know. We can develop all kinds of
applications which is done for humans.
00:29:47.010 --> 00:29:59.400
Currently, there are at least four global
datasets on human diversity. These are
00:29:59.400 --> 00:30:08.860
very widely reused and these big datasets
- so they are big data with regard to the
00:30:08.860 --> 00:30:18.850
number of samples and also the genomes or
the genome representations and this
00:30:18.850 --> 00:30:26.470
results in very information rich data
which initiates analytical development so
00:30:26.470 --> 00:30:33.799
people continuously are developing new
statistical methods. And right now, a new
00:30:33.799 --> 00:30:42.330
wave is coming in of these methods. So
once you have these global datasets,
00:30:42.330 --> 00:30:47.500
people start in human populations
genomics, started to do these intense
00:30:47.500 --> 00:30:56.299
regional samplings. And this is the
example of the United Kingdom Biobank.
00:30:56.299 --> 00:31:02.789
It's a project with 500.000 volunteers,
they are all UK citizens from all over the
00:31:02.789 --> 00:31:13.982
islands. And each individual was genotyped
in a vet lab for 820.000 markers. That's
00:31:13.982 --> 00:31:19.620
completely I mean, that's a different
number than the 100 or 1000...in
00:31:19.620 --> 00:31:26.409
biodiversity scientists we normally
analyse a maximum of a couple of 10.000
00:31:26.409 --> 00:31:36.220
markers. So that's a completely different
number. But then statistical geneticists
00:31:36.220 --> 00:31:47.140
come. They do some weird and wonderful
voodoo and they derive 96 million markers
00:31:47.140 --> 00:31:53.460
per genome that is per individual from
these 820.000 markers that were produced
00:31:53.460 --> 00:32:00.630
in the lab. So that's a hundred fold
increase. And once you have this kind of
00:32:00.630 --> 00:32:07.510
dataset for a genome, you suddenly or you
finally become country level and within
00:32:07.510 --> 00:32:18.970
country level resolution. So these panels
are examples. So the first panel shows
00:32:18.970 --> 00:32:25.980
individuals who were born in Edinburgh and
the question was "Where were people born
00:32:25.980 --> 00:32:32.419
who had a similar ancestral background,
genetic background?". And what they found
00:32:32.419 --> 00:32:41.980
was that was all over Scotland and
Northern Ireland. Northern Yorkshire was
00:32:41.980 --> 00:32:50.250
even more local. So people from Yorkshire
don't seem to get around a lot. For London
00:32:50.250 --> 00:32:54.090
the situation is completely different.
That is what we would expect because
00:32:54.090 --> 00:32:59.580
London is a people magnet. People move
there all the time. They meet there, they
00:32:59.580 --> 00:33:05.700
get children and the kids born in London,
their genetic ancestry has nothing to do
00:33:05.700 --> 00:33:12.760
with London. It's from all over the place,
from the British Isles and the world. So
00:33:12.760 --> 00:33:21.600
that's why the colors are strongly
dissolved. So this study came out also
00:33:21.600 --> 00:33:26.100
this summer. And it's the first time that
I have seen that we actually really can
00:33:26.100 --> 00:33:36.580
achieve regional resolution. And I find
this possibility for biodiversity science
00:33:36.580 --> 00:33:46.820
really exciting. So it was made possible
by very sophisticated statistical
00:33:46.820 --> 00:33:51.890
approaches which are able to analyze
genetic data from highly complex
00:33:51.890 --> 00:33:59.450
evolutionary and ecological systems. And
at the same time these analyses are able
00:33:59.450 --> 00:34:04.910
to handle big data. We we're talking about
gigabytes and terabytes of data and
00:34:04.910 --> 00:34:13.810
results. So a statistical geneticist are
developing new methods of data
00:34:13.810 --> 00:34:20.309
representation to handle this amount of
data. And then we are able to sufficiently
00:34:20.309 --> 00:34:25.520
extract the signal for a very specific
question from data which are very low
00:34:25.520 --> 00:34:36.919
signal to noise ratio. So to get there, we
need many experts and specialists. So we
00:34:36.919 --> 00:34:41.659
need statistical geneticists, big data
experts who also might contribute machine
00:34:41.659 --> 00:34:49.299
learning expertise. We need molecular
biologists who know how to sequence
00:34:49.299 --> 00:34:54.259
complex genomes. We now need
bioinformatics with an expertise in
00:34:54.259 --> 00:35:05.010
genomics for assembly, annotation and
alignment of genomic sequences. The result
00:35:05.010 --> 00:35:12.569
is actually this: This is the author list
for the thousands genomes project
00:35:12.569 --> 00:35:20.380
reference data set, and I don't expect you
to be able to read it, but the bold type
00:35:20.380 --> 00:35:25.539
is of interest because it shows all the
different tasks that are necessary to
00:35:25.539 --> 00:35:36.140
produce a standardized and highly cleaned
reverence dataset. So the whole author
00:35:36.140 --> 00:35:41.880
list is something like 1.5 pages long and
even considering that some authors will
00:35:41.880 --> 00:35:51.130
have contributed to several tasks. The
publications for reference datasets mostly
00:35:51.130 --> 00:35:57.079
have author lists that are far over 50
people. So they are huge collaborative
00:35:57.079 --> 00:36:05.219
efforts. Now we take the step into
biodiversity science. Here these are eight
00:36:05.219 --> 00:36:13.440
gastrotrichs, they are little worm like...
organisms who live in the sediments of
00:36:13.440 --> 00:36:23.069
freshwater lakes and marine sediment. They
are in general a couple of hundreds micro
00:36:23.069 --> 00:36:29.569
meters large. And I don't have any
numbers, but my guess would be that maybe
00:36:29.569 --> 00:36:38.640
worldwide, a hundred to a thousand people
actually work on these species. There are
00:36:38.640 --> 00:36:44.829
800 species of gastrotrichs. So let's say
there's one, two, maybe three experts per
00:36:44.829 --> 00:36:52.240
species for these organisms. So how are
these three people going to manage all
00:36:52.240 --> 00:37:01.420
these tasks to produce a reference
dataset? You might say, well, it's
00:37:01.420 --> 00:37:05.209
gastrotrichs, I mean, have never heard
about them. Maybe they are not so
00:37:05.209 --> 00:37:08.349
important. Maybe you don't need a
reference data sets, but actually some of
00:37:08.349 --> 00:37:17.579
those species are bioindicators for water
quality. So what we observe right now is a
00:37:17.579 --> 00:37:27.510
gap for biodiversity conservation. In
model organisms, we have Pandora's Box
00:37:27.510 --> 00:37:34.630
open. We have all the statistical analyses
at our hands to analyze our data sets.
00:37:34.630 --> 00:37:39.709
However, in none model organisms, we are
still stuck with summary statistics that
00:37:39.709 --> 00:37:46.839
don't provide us the resolution that we
need. And we know that to close this gap,
00:37:46.839 --> 00:37:52.599
even for a single species, it's a huge
effort. But at the same time, we have over
00:37:52.599 --> 00:38:03.560
35.000 species listed by scientists which
need already now effective protection. So
00:38:03.560 --> 00:38:10.008
we need to find a way to close this gap
and actually move in this direction. And
00:38:10.008 --> 00:38:19.940
the good thing is, so all of this... in
biodiversity science, in academia, and we
00:38:19.940 --> 00:38:24.890
need to make the transition over the
conservational genomic gap into the big
00:38:24.890 --> 00:38:32.130
loop of real world conservation tasks. And
the good thing is we already know what we
00:38:32.130 --> 00:38:37.940
have to do. So we need to have reference
data sets, distribution range wide. We
00:38:37.940 --> 00:38:43.959
need to have statistics. And it's going to
be big data. So we need collection
00:38:43.959 --> 00:38:54.140
management, data management and an
analysis environment. So looking at
00:38:54.140 --> 00:38:59.880
different ingredients or different steps
the first we need is a general data
00:38:59.880 --> 00:39:05.269
infrastructure for global diversity of
reference data sets that actually can be
00:39:05.269 --> 00:39:11.779
used across species for preferably as many
species as possible and provide a working
00:39:11.779 --> 00:39:19.749
environment for biodiversity scientists
and experts. It should be user friendly so
00:39:19.749 --> 00:39:25.759
it can be used by scientists, but also
that people from local communities and
00:39:25.759 --> 00:39:33.489
citizen scientists can add their
observation data and their data into this
00:39:33.489 --> 00:39:41.339
data infrastructure. I have listed quite a
lot of features that these kind of
00:39:41.339 --> 00:39:48.400
infrastructures should have. And I'm going
to argue that these features are not some
00:39:48.400 --> 00:40:02.609
nice to have, but actually some must have.
Because our goal is always application. So
00:40:02.609 --> 00:40:13.279
we need developers, managers and curators
for data infrastructures. Since our goal
00:40:13.279 --> 00:40:30.900
is application, our main features are
quality control and error reduction. These
00:40:30.900 --> 00:40:38.880
are the basis. So that our conservation
tools can be robustly and reliably applied
00:40:38.880 --> 00:40:46.459
under real world operating conditions. And
the way to achieve quality and error
00:40:46.459 --> 00:40:52.759
reduction is through chains of custody. So
it means that from project of sign, from
00:40:52.759 --> 00:40:58.299
the questions through all the steps that
are necessary to produce a reference data
00:40:58.299 --> 00:41:08.219
set and then...so from sample collection,
genomic statistical analysis down to
00:41:08.219 --> 00:41:15.599
application. These steps need to be
documented and standardized. They need to
00:41:15.599 --> 00:41:22.239
be, each one of them needs to be validated
and reproducible. They should be modular
00:41:22.239 --> 00:41:28.999
so they can be user friendly. And the
whole chain of custody needs to be
00:41:28.999 --> 00:41:40.690
scalable. So if our chains of custody have
these characteristics, we actually will
00:41:40.690 --> 00:41:51.390
have tools that will work in everyday
life. So we need professional developers
00:41:51.390 --> 00:41:59.519
and programmers who are able to produce
these very collaborative softwares. We
00:41:59.519 --> 00:42:06.130
need free and open source experts. So we
always can ensure that our code and that
00:42:06.130 --> 00:42:13.859
our infrastructures are still integer and
we can check them. And I'm a biologist, I
00:42:13.859 --> 00:42:19.390
don't have any background in hardware, but
I've heard a couple of talks here in the
00:42:19.390 --> 00:42:26.099
conference about Green IT. And I have
the feeling we should have people who know
00:42:26.099 --> 00:42:33.849
hardware and software and know how to
develop these high tech tools in a way
00:42:33.849 --> 00:42:38.450
sustainable so that by developing these
tools, we don't use more resources than we
00:42:38.450 --> 00:42:48.940
are trying to protect. So I've shown all
these features and characteristics that
00:42:48.940 --> 00:42:57.459
the software should have. And I'm arguing
that these features are necessary because
00:42:57.459 --> 00:43:04.819
of the reality we find us in. It is one of
rising over-exploitation and destruction
00:43:04.819 --> 00:43:19.799
of nature. So the extent of environmental
crimes is up in the billions. All
00:43:19.799 --> 00:43:29.029
environmental crime together, the green
bubbles are only second to drug associated
00:43:29.029 --> 00:43:35.489
crimes. They are up there with
counterfeiting or human trafficing. So
00:43:35.489 --> 00:43:45.479
these are multi-billion enterprises. They
are often transnational and industries
00:43:45.479 --> 00:44:02.019
with huge profits. So if there's some
crime, some mafia boss, some criminal
00:44:02.019 --> 00:44:09.539
manager who just bribed a government
official somewhere in the neck in the
00:44:09.539 --> 00:44:17.859
woods, it just would make sense that that
person would not wait or not take the
00:44:17.859 --> 00:44:23.809
risks to be discovered just because some
customs officer pulls out a container
00:44:23.809 --> 00:44:29.170
somewhere in the harbor, for example,
opens it and says "This looks kind of
00:44:29.170 --> 00:44:37.380
weird. Let's take a sample, send it to a
lab." and then a population geneticist
00:44:37.380 --> 00:44:44.171
comes back and says "Oh, yes, this sample
is not from area A as documented, but
00:44:44.171 --> 00:44:52.449
actually it's from area B and it was
illegally logged." If we have reference
00:44:52.449 --> 00:44:58.660
data sets, information rich reference data
sets, they become highly valuable and they
00:44:58.660 --> 00:45:08.430
need protection themselves against
manipulation and destruction. So we will
00:45:08.430 --> 00:45:14.739
need to think about IT security from the
beginning. Also, these data sets are often
00:45:14.739 --> 00:45:20.069
very politically sensitive because if it
is shown that in a certain country there
00:45:20.069 --> 00:45:25.680
is the illegal logging repeatedly, that
country might not be too excited about
00:45:25.680 --> 00:45:41.380
this information. So we need to think
about IT security experts. So my hope is
00:45:41.380 --> 00:45:48.599
that these kind of very high tech digital
conservation tools can actually contribute
00:45:48.599 --> 00:45:55.690
to the U.N. Sustainable Development Goals
by empowering indigenous people, local
00:45:55.690 --> 00:46:02.810
communities and also us to protect and
force and sustainably use our lands and
00:46:02.810 --> 00:46:10.139
our biodiversity by providing some
management and law enforcement tools. So
00:46:10.139 --> 00:46:14.059
we need people from around the world,
users from around the world who use these
00:46:14.059 --> 00:46:25.789
tools and help to develop them further and
to maintain them. And finally here, these
00:46:25.789 --> 00:46:33.910
high tech tools will just another
technological fix. If we don't manage to
00:46:33.910 --> 00:46:45.770
get our back down, our way of life down to
sustainable levels. So what we need is to
00:46:45.770 --> 00:46:53.759
today...this year, the Earth Overshoot Day
was at the end of July. So at the end of
00:46:53.759 --> 00:47:01.639
July, we had used all the resources that
we had available for the whole year. And
00:47:01.639 --> 00:47:09.400
we need to get this back to the end of the
year so that our resources actually
00:47:09.400 --> 00:47:22.910
sustain us for the whole year. The graphic
here for Germany suggests that we are on a
00:47:22.910 --> 00:47:29.819
good way. We are reducing our resource
consumption and maybe even our biocapacity
00:47:29.819 --> 00:47:38.099
moves up a little bit. So actually it
seems that our personal lifestyles and
00:47:38.099 --> 00:47:46.329
choices make a difference and we just need
to close this gap here much quicker. So
00:47:46.329 --> 00:47:53.689
protecting biodiversity needs all of us to
achieve that. And with that, thank you
00:47:53.689 --> 00:47:57.770
very much.
00:47:57.770 --> 00:48:08.020
Applause
00:48:08.020 --> 00:48:12.680
Angel: So thank you Jutta for this very
interesting talk and the very valuable
00:48:12.680 --> 00:48:16.609
work you're doing. We have three mics
here. Please line up at the microphones if
00:48:16.609 --> 00:48:22.809
you have any questions or suggestions or
want to participate and work together with
00:48:22.809 --> 00:48:29.660
Jutta. We have one question from the
Internet, so please Signal-Angel start.
00:48:29.660 --> 00:48:34.749
Signal-Angel: Why do wild plant species
within a genus are further apart than wild
00:48:34.749 --> 00:48:42.509
animal species within a genus?
Angel: Could you repeat it, please?
00:48:42.509 --> 00:48:49.069
Signal-Angel: Why do wild plant species
within a genus are further apart than wild
00:48:49.069 --> 00:48:55.910
animal species within a genus?
Jutta: I'm not sure I understand the
00:48:55.910 --> 00:49:01.180
background for the question.
Mic 1: Because animals move and plants
00:49:01.180 --> 00:49:06.449
don't move.
Jutta: Oh, okay. If that is the idea
00:49:06.449 --> 00:49:12.299
behind the question. Plants actually move,
too. They don't move as individuals, but
00:49:12.299 --> 00:49:24.289
they move their genetic material through
pollen or fragments. So actually diversity
00:49:24.289 --> 00:49:30.760
in plants and in animals can be quite
similar. So the idea is that plants are
00:49:30.760 --> 00:49:36.459
just stuck and should have a completely
different population structure does not
00:49:36.459 --> 00:49:43.130
hold because plants move around their
genetic material through seeds, through
00:49:43.130 --> 00:49:49.610
pollen, through vegetative propagules.
Angel: So thank you microphone 1 for
00:49:49.610 --> 00:49:55.999
helping out. Please ask your question. Mic
1: So my question is about the success
00:49:55.999 --> 00:50:00.939
factor of it. If you think of this,
whatever database being set up there and I
00:50:00.939 --> 00:50:07.430
think it's gonna be a huge database...I
downloaded my own genome on the Internet.
00:50:07.430 --> 00:50:12.989
It was about 150 megabytes. And if we
multiply that, I think the genetic
00:50:12.989 --> 00:50:17.539
variation from one person to another is
about 1 percent only. So we can compress
00:50:17.539 --> 00:50:25.009
that to 4 megabytes per person. If we
sequence all the humans in the world, that
00:50:25.009 --> 00:50:32.689
would be 32 petabytes, that would cost
approximately 15 billion dollars. And
00:50:32.689 --> 00:50:36.890
that's only for the storage. Now comes the
entire management. Of course, we don't
00:50:36.890 --> 00:50:41.470
want to digitize all the human genome, but
rather the plants and animal species
00:50:41.470 --> 00:50:46.309
genome. So it's a huge data program. And
what would be for you the success factors
00:50:46.309 --> 00:50:51.229
for this thing to really fly? And did you
talk to organizations like WikiData or
00:50:51.229 --> 00:50:56.469
others or where would it ideally be
hosted? At a university or an
00:50:56.469 --> 00:51:02.170
international nonprofit or who would be
running the thing?
00:51:02.170 --> 00:51:14.519
Jutta: Yeah, I mean, it's just really big
data. I think our first goal is not to
00:51:14.519 --> 00:51:23.670
think about having all predicted 5 to 10
million species be sequenced on a
00:51:23.670 --> 00:51:30.239
population level. I think we need to think
about the next step. And there it would
00:51:30.239 --> 00:51:35.530
make sense to start with species that are
actually highly exploited, like many
00:51:35.530 --> 00:51:40.579
timber species and also many marine
fishes. I think that's where we should
00:51:40.579 --> 00:51:48.039
start. And to host this kind of data I
think it should be in political
00:51:48.039 --> 00:51:56.410
independent hands. So it should be with an
NGO or with the U.N., some organization
00:51:56.410 --> 00:52:02.449
that is independent.
Mic 1: Are you the first to think about
00:52:02.449 --> 00:52:06.509
this or are there existing initiatives?
Jutta: There are actually existing
00:52:06.509 --> 00:52:14.219
initiatives. I have been in contact with
the Forest Stewardship Council and they
00:52:14.219 --> 00:52:23.219
are actually starting to sample their
concessions and initiated to build up the
00:52:23.219 --> 00:52:28.730
samples, they work together with Kew
Botanical Gardens and the U.S. Forest
00:52:28.730 --> 00:52:37.589
Service. And right now they're analyzing
the samples, using isotopes which is
00:52:37.589 --> 00:52:45.579
another method which is very powerful and
can also produce geographic information.
00:52:45.579 --> 00:53:00.710
And so, yeah, so people are moving in this
way. So, yeah, I think the idea is out
00:53:00.710 --> 00:53:05.839
there, just we have to start and we have
to really do it and provide one
00:53:05.839 --> 00:53:13.210
infrastructure so that we can combine, for
example, morphological data, isotope data
00:53:13.210 --> 00:53:18.329
and genomic data into one dataset, which
will increase our resolution and our
00:53:18.329 --> 00:53:23.980
reliability.
Angel: Okay. Microphone number two,
00:53:23.980 --> 00:53:27.069
please.
Mic 2: Thank you for your valuable talk.
00:53:27.069 --> 00:53:32.660
My question would be you'd start your talk
with the possible decrease of leaf beetles
00:53:32.660 --> 00:53:37.100
in the data set you showed on slide number
six there was an increase in leaf beetle
00:53:37.100 --> 00:53:41.930
population until the 70s, something about
that. Is there a possible explanation for
00:53:41.930 --> 00:53:49.869
that?
Jutta: Yeah, I believe it is, because
00:53:49.869 --> 00:53:55.359
people started to much more systematically
observe leaf beetles. So it's a sample
00:53:55.359 --> 00:54:05.869
effort. And also at that time the people -
so it's a multi-people collaboration who
00:54:05.869 --> 00:54:12.369
actually has assembled this dataset so the
people who are part of this collaboration
00:54:12.369 --> 00:54:16.949
they edit their own private data sets. And
that's why you have an increase I think.
00:54:16.949 --> 00:54:23.509
While the people from the nineteen
hundreds, nineteen hundred ten you only
00:54:23.509 --> 00:54:29.009
can use the data that is available in
publications and samples in museums or in
00:54:29.009 --> 00:54:33.289
scientific collections. I think that is
the reason why you have the sharp
00:54:33.289 --> 00:54:35.589
increase.
Mic 2: Thank you.
00:54:35.589 --> 00:54:38.750
Angel: So we have another question of
microphone number two.
00:54:38.750 --> 00:54:44.459
Mic 2: Thank you for your fine talking.
Excuse me. Maybe my question is a bit off
00:54:44.459 --> 00:54:51.730
topic. Do you think the methods and roles
that you identified in your talk could be
00:54:51.730 --> 00:54:59.880
transferred to the assessment of raw
materials? I'm thinking about metals?
00:54:59.880 --> 00:55:09.349
Jutta: Maybe the data infrastructure, like
if you wanted to collect raw metals or
00:55:09.349 --> 00:55:16.471
materials from all over the world and...a
sampleized scientific collection and to
00:55:16.471 --> 00:55:22.390
have kind of a reference dataset that
might work, actually. But the genomics
00:55:22.390 --> 00:55:29.170
obviously won't. So that part of what you
would need to use different methods from
00:55:29.170 --> 00:55:36.010
physics, obviously. But actually the
infrastructure, certain parts will be
00:55:36.010 --> 00:55:40.249
quite similar. I think so, yes.
Angel: So we have one more question from
00:55:40.249 --> 00:55:43.420
the Internet.
Signal-Angel: Who does contract a
00:55:43.420 --> 00:55:51.619
freelance evolutionary biologist? Can you
give an example of this kind of work you
00:55:51.619 --> 00:56:01.429
proposed?
Jutta: So I see this gap between science
00:56:01.429 --> 00:56:07.739
and applications, that we need these
applications and there's a huge potential
00:56:07.739 --> 00:56:18.150
for these applications. We know that
illegal logging and that is my background,
00:56:18.150 --> 00:56:23.769
but doesn't seem to be much different, for
example, in marine fisheries. We know that
00:56:23.769 --> 00:56:29.730
there is this huge amount of illegal
logging and timber trade going on. And we
00:56:29.730 --> 00:56:39.670
need to have some assets actually that
have the power to detect illegally traded
00:56:39.670 --> 00:56:49.789
timber. So I think there is a huge need
for these kind of methods and
00:56:49.789 --> 00:57:00.869
organizations who are interested in these
kind of methods. Our governments, their
00:57:00.869 --> 00:57:12.719
companies, NGOs, customs, Interpol. So,
yeah.
00:57:12.719 --> 00:57:19.700
Angel: Do we have any other questions? So
thank you again Jutta for your talk and
00:57:19.700 --> 00:57:23.739
the valuable work you're doing. Please
give a warm round of applause to Jutta.
00:57:23.739 --> 00:57:29.009
Applause
00:57:29.009 --> 00:57:33.599
36c3 postrol music
00:57:33.599 --> 00:57:56.000
Subtitles created by c3subtitles.de
in the year 2020. Join, and help us!