WEBVTT 00:00:00.000 --> 00:00:20.130 36C3 preroll music 00:00:20.130 --> 00:00:25.169 Angel: Right now I'd like to welcome our first speaker on stage. The talk will be 00:00:25.169 --> 00:00:30.800 about protecting the wild and I'll hand over to her. Please give her a warm round 00:00:30.800 --> 00:00:32.870 of applause. 00:00:32.870 --> 00:00:34.860 Applause 00:00:34.860 --> 00:00:43.920 Jutta Buschbom: Thank you very much for the introduction. My name is Jutta 00:00:43.920 --> 00:00:52.110 Buschbom, I'm an evolutionary biologist. That is my background. I did do my PHD at 00:00:52.110 --> 00:00:57.290 the University of Chicago working on little fungees that live in symbiosis with 00:00:57.290 --> 00:01:05.979 algae and form colorful rocks, colorful crust on rocks. I then did a Postdoc in 00:01:05.979 --> 00:01:12.240 bioinformatics and after that moved back into organismal biology, working in forest 00:01:12.240 --> 00:01:19.560 genetics. And the ten years I worked in forest genetics for the first time I 00:01:19.560 --> 00:01:26.049 encountered questions that were with regard to application, and I found out 00:01:26.049 --> 00:01:37.359 that actually moving from research to application is not trivial. So what I'm 00:01:37.359 --> 00:01:45.869 going to present is a high tech way using genomic data to protect biodiversity in a 00:01:45.869 --> 00:01:51.939 way that you can actually reach application and use conservation genomic 00:01:51.939 --> 00:02:02.600 tools. So this summer the draft of the report of the Intergovernmental Science 00:02:02.600 --> 00:02:12.319 Policy Panel for Biodiversity and Ecosystem Services came out and its 00:02:12.319 --> 00:02:19.930 results were quite warning. It stated that around a million animal and plant species 00:02:19.930 --> 00:02:27.330 are currently stated and of those...half of those species are already dead species 00:02:27.330 --> 00:02:33.450 walking. So because due to the destruction of the habitats or habitat deterioration, 00:02:33.450 --> 00:02:42.950 they are not able to reproduce in a sustainable way anymore. A third of the 00:02:42.950 --> 00:02:51.170 total species extinction rate risk to date has arisen in the last 25 years. And just 00:02:51.170 --> 00:03:01.450 to give you an idea about the relation we are talking about...currently the rate of 00:03:01.450 --> 00:03:07.680 extinction risk is already at least ten to hundreds times higher than it has averaged 00:03:07.680 --> 00:03:13.130 over the past 10 million years. And within these 10 million years there were the Ice 00:03:13.130 --> 00:03:23.260 Ages, for example. And most of the extinction risk is due to the fact of land 00:03:23.260 --> 00:03:36.190 and sea use change. The report also talks, even talks about that we already seem to 00:03:36.190 --> 00:03:42.420 have transgressed a proposed precautionary planetary boundary, which means within the 00:03:42.420 --> 00:03:48.370 boundary we have a stable biological system. But having transgressed it, we 00:03:48.370 --> 00:03:55.430 might already be in a transition to a new state that we have no way to find out how 00:03:55.430 --> 00:04:05.240 this state is going to look like. So all of these facts that the report is stating 00:04:05.240 --> 00:04:14.730 are actually pretty negative. And I was quite happy to read that they also present 00:04:14.730 --> 00:04:20.699 that there are actually people who do better than most of us. And they point out 00:04:20.699 --> 00:04:27.810 that many practices of indigenous people and local communities actually conserve 00:04:27.810 --> 00:04:38.350 and sustain wild and domesticated biodiversity quite well. Today, a higher 00:04:38.350 --> 00:04:44.600 proportion of the remaining terrestrial biodiversity lies in areas managed and 00:04:44.600 --> 00:04:52.890 held by indigenous people. And these ecosystems are more intact and less 00:04:52.890 --> 00:05:01.770 declining, less rapidly declining. So we have examples of lifestyles that actually 00:05:01.770 --> 00:05:10.530 do better than most of us. And I know the solutions won't be simple and it won't be 00:05:10.530 --> 00:05:22.330 easy to get there but we can look to what these people do better than we do. All of 00:05:22.330 --> 00:05:27.930 this sounds...it's a global report and it sounds kind of like far away, like 00:05:27.930 --> 00:05:35.990 probably somewhere in the tropics, but actually threats to biodiversity happen 00:05:35.990 --> 00:05:45.400 also directly in front of our own front doors. This summer a paper came out from 00:05:45.400 --> 00:05:52.800 two colleagues from the University of Greifswald, who had analyzed the long term 00:05:52.800 --> 00:05:58.490 data set about leaf beetles. And they were asking if we already have a decline of 00:05:58.490 --> 00:06:08.240 leaf beetles in Central Europe. So they compiled long term data sets of leaf 00:06:08.240 --> 00:06:19.140 beetle observations for Central Europe, starting from 1900 now to 2017, so 00:06:19.140 --> 00:06:27.010 spanning a hundred and twenty years. And what they find is that systematic reports 00:06:27.010 --> 00:06:36.270 on leaf beetles and leaf beetle observations are increasing during this 00:06:36.270 --> 00:06:45.310 time interval, time span. But despite the fact that we have...like in the last two 00:06:45.310 --> 00:06:53.270 decades, we had very high numbers of reports and observations for leaf beetles, 00:06:53.270 --> 00:07:00.100 the number of species, the orange line, is declining. It's slightly declining. But 00:07:00.100 --> 00:07:06.010 the question is, is this real or not? And what was most worrisome to the authors is 00:07:06.010 --> 00:07:15.110 that in the data set, the number of species here in orange that were having 00:07:15.110 --> 00:07:21.930 more reports was declining, while the number of species that showed less reports 00:07:21.930 --> 00:07:33.930 than before is expanding. So this kind of long term datasets are very hard to 00:07:33.930 --> 00:07:41.310 interpret and many factors can contribute to those patterns. And it's not clear if 00:07:41.310 --> 00:07:48.310 this pattern is statistically significant. But if you take a step back and consider 00:07:48.310 --> 00:07:54.470 your background knowledge, your prior knowledge about the state of the world, do 00:07:54.470 --> 00:08:02.760 you say, like, how does the current state look like? Does it look good or rather 00:08:02.760 --> 00:08:16.910 worrisome? And then with that knowledge, tell me that these results are an 00:08:16.910 --> 00:08:30.150 artifact or a bias. I'm worried that once we have statistical significant signal in 00:08:30.150 --> 00:08:41.789 this dataset, it will be already too late. So right now, I've been talking about leaf 00:08:41.789 --> 00:08:49.639 beetles and beetles are the largest group within insects with about 400.000 species. 00:08:49.639 --> 00:08:56.200 Leaf beetles are a large family of about 50.000 species which are worldwide 00:08:56.200 --> 00:09:05.080 distributed. And here in Germany, we have over 470 leaf beetle species. So how do we 00:09:05.080 --> 00:09:09.740 actually know how many species there are and who actually counted all these 00:09:09.740 --> 00:09:15.960 species? And is that just a task of taxonomists. Taxonomy is the science of 00:09:15.960 --> 00:09:21.600 naming and defining, including circumscribing and classifying groups of 00:09:21.600 --> 00:09:32.020 biological organisms on the basis of shared characters. So one could have the 00:09:32.020 --> 00:09:37.560 picture of some woman with a funny hat running over a meadow catching like 00:09:37.560 --> 00:09:44.480 butterflies or some guy mushroom hunter crawling through the forest trying to find 00:09:44.480 --> 00:09:52.380 mushrooms. And it's true, as biodiversity scientists we spent a lot of time outdoors 00:09:52.380 --> 00:10:02.290 and yeah...on the other hand, biotaxonomy is a high-tech science today. So 00:10:02.290 --> 00:10:11.050 taxonomists actually take up new technological tools and developments to 00:10:11.050 --> 00:10:17.270 help them identify and describe, understand the species. So taxonomists 00:10:17.270 --> 00:10:25.110 actually are often experts in, for example, microscopy, mathematics, 00:10:25.110 --> 00:10:36.850 biochemistry, even proteomics and genomics. So throughout the talk, I'm 00:10:36.850 --> 00:10:41.520 going to compile this list of people and experts we're going to need to protect 00:10:41.520 --> 00:10:49.360 biodiversity if we want to do this on the basis of genetic data. Right now, the list 00:10:49.360 --> 00:10:56.430 is quite empty. The first entry is a taxonomists, but that will change quickly 00:10:56.430 --> 00:11:06.260 and taxonomists are a subgroup of evolutionary biologists mostly. So I told 00:11:06.260 --> 00:11:15.560 you as taxonomists and biodiversity scientists take up technology and...so as 00:11:15.560 --> 00:11:23.610 soon as computers came about and the internet started people started to use 00:11:23.610 --> 00:11:32.420 that to compile information about species, and today we have several global resources 00:11:32.420 --> 00:11:40.640 available at the species level and above the species level. So we biodiversity 00:11:40.640 --> 00:11:45.720 scientists were among the first who defined biodiversity information 00:11:45.720 --> 00:11:56.690 standards. We have a global catalog of life. A list of all named species. The 00:11:56.690 --> 00:12:01.810 Global Biodiversity Information Facility has an aim to bring together information 00:12:01.810 --> 00:12:08.630 from different sources and they are compiling, producing this wonderful map. 00:12:08.630 --> 00:12:13.940 This is leaf beetles, all the records about leaf beetles that we have in the 00:12:13.940 --> 00:12:22.200 world. And it looks like as if leaf beetles are highly associated with third 00:12:22.200 --> 00:12:29.580 world economics. However that clearly is an artifact and it just shows that we need 00:12:29.580 --> 00:12:34.560 many more taxonomists and biodiversity scientists all over the world to find and 00:12:34.560 --> 00:12:45.300 identify leaf beetles. So we also need biodiversity informaticians to help us 00:12:45.300 --> 00:12:52.050 compile global lists and distribute knowledge. So far I have been talking 00:12:52.050 --> 00:12:57.890 about species which is a simplification. The question is what is...what are species 00:12:57.890 --> 00:13:03.400 actually? And so we need to talk about genetic diversity within and between 00:13:03.400 --> 00:13:16.519 species. And I'm going to do so using gulls, which most of us might know. Here 00:13:16.519 --> 00:13:21.670 in Europe, we have two large gulls of the genus Larus. One is in the front, the 00:13:21.670 --> 00:13:31.070 lighter gray is our Silbermöwe. And in the back is our Heringsmöwe, the dark one. And 00:13:31.070 --> 00:13:35.740 I'm going to use German names because the English names go crosswise and that's 00:13:35.740 --> 00:13:43.160 completely confusing. So I will stick with the German names. Here in Europe these two 00:13:43.160 --> 00:13:48.450 species seem to be really fine species because they barely interbreed, so they 00:13:48.450 --> 00:13:55.680 don't hybridize. However, if you take a step back and look at the genus in 00:13:55.680 --> 00:14:03.120 general, you see that the species of the genus are distributed kind of ringwise 00:14:03.120 --> 00:14:14.510 around the Arctic. And so the idea is that, say during the Ice Age, all of this 00:14:14.510 --> 00:14:22.959 area was glaciated and the gulls retreated to a refuge here near the Caspian Sea. And 00:14:22.959 --> 00:14:28.110 then after the ice retreated, the gulls moved back north. One branch moved into 00:14:28.110 --> 00:14:34.350 Europe forming our Heringsmöwe and another branch then moved counterclockwise 00:14:34.350 --> 00:14:41.019 around the Arctic, producing different morphotypes, different species across the 00:14:41.019 --> 00:14:49.450 Bering Strait and then into North America. There the dark blue one is...I'm 00:14:49.450 --> 00:14:58.730 simplifying, the equivalent of our European Silbermöwe, the American 00:14:58.730 --> 00:15:03.830 Silbermöwe. Then the idea is that some individuals crossed back to Europe and 00:15:03.830 --> 00:15:14.800 formed our European Silbermöwe. And while all of these species here are 00:15:14.800 --> 00:15:21.769 interbreeding, so they hybridize. Only when this ring is closed those two species 00:15:21.769 --> 00:15:26.720 don't interbreed anymore. And the big question is, are we actually dealing with 00:15:26.720 --> 00:15:34.230 one single species or are we dealing with different species that just happened to 00:15:34.230 --> 00:15:41.079 hybridize more or less? The question is not trivial because it has consequences 00:15:41.079 --> 00:15:48.740 for protection. If we are dealing with one single species, all the gulls in Eurasia 00:15:48.740 --> 00:15:53.010 could go extinct and it wouldn't matter because we still would have the gulls in 00:15:53.010 --> 00:15:58.540 North America. However, if we have different species in all of these areas, 00:15:58.540 --> 00:16:04.709 we would need to protect individuals or the species on a regional level and 00:16:04.709 --> 00:16:17.279 protect all of these different species. So to investigate this question about: Do we 00:16:17.279 --> 00:16:23.589 have different species? And what were the evolutionary processes and histories that 00:16:23.589 --> 00:16:31.100 brought about the species? A group of scientists investigated that using DNA 00:16:31.100 --> 00:16:39.930 sequences. And on the left, you have the model, the theoretical model of the ring 00:16:39.930 --> 00:16:46.380 species. And here on the right you have reality. And the scientists found that the 00:16:46.380 --> 00:16:51.630 reality is always much more complex. So, for example, they found two refuges or 00:16:51.630 --> 00:16:58.430 they proposed two refuges. But what they found was that genetic diversity was 00:16:58.430 --> 00:17:07.351 correlated with those species or morphotypes. So what that also means is 00:17:07.351 --> 00:17:15.730 that genetic diversity is cultivated with geographic origin. What we learn from this 00:17:15.730 --> 00:17:24.360 type of analysis is we learn about evolutionary processes and history, about 00:17:24.360 --> 00:17:30.170 variability and differentiation of our gene flow and migration, about speciation 00:17:30.170 --> 00:17:37.590 processes. That we all need to understand our species, which will allow us to 00:17:37.590 --> 00:17:43.440 protect them. So we need evolutionary biologists who do follow genetics and 00:17:43.440 --> 00:17:59.030 population genetics. So once we found out that one can use genetic diversity, to 00:17:59.030 --> 00:18:07.130 infer geographic origin because genetic diversity is correlated with geography, 00:18:07.130 --> 00:18:18.500 people immediately said: 'Okay, we can use it for conservation applications.'. And 00:18:18.500 --> 00:18:24.049 it's also...we learned that we...often it is unclear what is a species, species 00:18:24.049 --> 00:18:32.559 boundaries are unclear and some species have huge distribution ranges with 00:18:32.559 --> 00:18:37.340 different clusters of viability within this huge range. So we know that we need 00:18:37.340 --> 00:18:42.941 to protect within species genetic diversity, which means that we need to 00:18:42.941 --> 00:18:50.650 understand within species population structure and we need to build useful and 00:18:50.650 --> 00:18:58.919 reliable models of population structure. These models are actually required for all 00:18:58.919 --> 00:19:03.740 of our applications. They are required for monitoring, for example, for conservation 00:19:03.740 --> 00:19:11.890 strategies, for functional adaptation and adaptability, questions of productability 00:19:11.890 --> 00:19:19.190 of different provenances, its impact on management regimes, breeding strategies, 00:19:19.190 --> 00:19:27.610 and also for enforcement applications. From the studies I showed you before with 00:19:27.610 --> 00:19:34.110 the gulls we also know that we need to approach the question of a population 00:19:34.110 --> 00:19:47.070 structure on a distribution range wide scale. So here's the map produced by 00:19:47.070 --> 00:19:53.630 EUFORGENE, the European Network for forest reproductive material for one of our 00:19:53.630 --> 00:20:02.000 native oaks, the sessil oak. And the dots are the sites for genetic conservation 00:20:02.000 --> 00:20:12.120 units. And so that is one strategy how to represent within species genetic diversity 00:20:12.120 --> 00:20:22.020 and how to sample it. And you can see this is a hypothetical example, but we likely 00:20:22.020 --> 00:20:32.460 will see a gradient from west to east or might see one at this scale. Then once we 00:20:32.460 --> 00:20:37.800 have these kind of global data sets, we can go to the fine scale and maybe, for 00:20:37.800 --> 00:20:44.100 example, do a national genetic monitoring. And we will find much finer scale 00:20:44.100 --> 00:20:51.210 gradients. We also will find especially for first trace outliers, so for stands 00:20:51.210 --> 00:20:59.150 that don't fit the usual pattern. And that is because the first reproductive material 00:20:59.150 --> 00:21:07.660 has been moved around a lot. And so these lighter or darker dots is material that 00:21:07.660 --> 00:21:16.150 was moved to Germany from the outside. And we only will identify these outliers if we 00:21:16.150 --> 00:21:21.380 have the whole reference dataset. If we don't have the whole reference dataset, we 00:21:21.380 --> 00:21:28.799 might not identify these outliers - stands with a different history. Or in a worst 00:21:28.799 --> 00:21:34.280 case, these outliers might actually bias our gradients. And we are always talking 00:21:34.280 --> 00:21:42.770 about very slight gradients. So it's easy to bias these gradiants, dilute them, so 00:21:42.770 --> 00:21:50.710 we actually won't get the results we need. To compile these kinds of reference 00:21:50.710 --> 00:21:57.850 datasets that's huge collaborative efforts because people need to go out into the 00:21:57.850 --> 00:22:04.500 field and collect the reference samples and that might be scientists, that might 00:22:04.500 --> 00:22:13.669 be people from local communities, citizen scientists, managers, owners, government 00:22:13.669 --> 00:22:20.179 officials who provide background information, maps, distribution 00:22:20.179 --> 00:22:27.929 information and also in many parts of the world might protect the people who are 00:22:27.929 --> 00:22:34.510 actually collecting the samples. And it might be conservation activists and NGOs. 00:22:34.510 --> 00:22:41.150 So once the samples have been collected they need to be stored somewhere for the 00:22:41.150 --> 00:22:51.150 long term and the information needs to be databased. And that is the work of 00:22:51.150 --> 00:22:57.430 scientific connections, which are mostly at natural history museums and there the 00:22:57.430 --> 00:23:04.460 samples are processed. They're organized in ways that you can find them again. All 00:23:04.460 --> 00:23:09.680 the metadata is entered, which curators do, collection managers, preparators, 00:23:09.680 --> 00:23:17.030 technical staff at the scientific collections. So once we have these kind of 00:23:17.030 --> 00:23:24.910 data sets, large scale data sets, what are we actually doing with them? So the 00:23:24.910 --> 00:23:32.514 foundation for all of our applications is population structure and there 00:23:32.514 --> 00:23:42.370 specifically population assignment. So the process is set first. We decide on a 00:23:42.370 --> 00:23:46.660 question and design our project accordingly that we can answer the 00:23:46.660 --> 00:23:51.940 question. Then we need to infer the population structure model and optimize 00:23:51.940 --> 00:23:57.480 it. In the next step we need to check if a model actually is good enough for 00:23:57.480 --> 00:24:03.040 application because we might have found the best model, but it might still not be 00:24:03.040 --> 00:24:07.480 good enough for application. So we need to test that. And that is the step of 00:24:07.480 --> 00:24:12.831 population assignment or predictive assignment. And then in the end, we want 00:24:12.831 --> 00:24:19.330 to test our hypothesis. Are the two stands different or does an individual come from 00:24:19.330 --> 00:24:31.059 stand A or from stand B? And here we identify error rates and accuracy. So this 00:24:31.059 --> 00:24:38.890 whole process is very statistical. And so the analysis of these reference data they 00:24:38.890 --> 00:24:48.240 need to be accompanied by biostatisticians who can tell us how to analyze our data. 00:24:48.240 --> 00:24:55.289 So what is the state-of-the-art right now? What kind of geographic resolution do we 00:24:55.289 --> 00:25:02.990 actually get of this non model specie currently? And I'm going to present the 00:25:02.990 --> 00:25:09.600 example of an African timber tree species, which is a very valuable timber. 00:25:09.600 --> 00:25:18.110 It's one example but basically all results for species who have large distribution 00:25:18.110 --> 00:25:26.059 ranges and are continuously distributed and are also long-lived, are very similar. 00:25:26.059 --> 00:25:33.460 So this kind of results seem to be species independent. So the species are Milica 00:25:33.460 --> 00:25:40.370 regia and excelsa, African teak, which cannot be grown in plantations for timber 00:25:40.370 --> 00:25:51.159 quality. So it is harvested unsustainably from natural forests. It's distributed in 00:25:51.159 --> 00:26:00.580 West, Central and East Africa. Here's a black rectangle. And a group of a dozen 00:26:00.580 --> 00:26:06.289 scientists got together and they actually sampled a reference dataset for these two 00:26:06.289 --> 00:26:18.659 species. It's about over 400 samples, they analyzed four marker systems, resulting in 00:26:18.659 --> 00:26:24.570 a total of something like 100 markers, genetic markers, and then they optimized 00:26:24.570 --> 00:26:32.660 the population model and used different parameter settings. And we're going to 00:26:32.660 --> 00:26:40.080 concentrate here on the best solution that they found. And basically this rectangle 00:26:40.080 --> 00:26:47.870 here is the black one over here. So the resolution is... they found population 00:26:47.870 --> 00:26:54.690 structure with clear clusters. So the populations and the species from West 00:26:54.690 --> 00:27:01.490 Africa can be distinguished from those populations in Central Africa. And the 00:27:01.490 --> 00:27:08.460 ones in East Africa can be differentiated. So that is really good. So we have 00:27:08.460 --> 00:27:13.480 population structure. We know their signal. The problem is still that our 00:27:13.480 --> 00:27:21.510 resolution is much lower than we would need to have it because we basically need 00:27:21.510 --> 00:27:32.090 resolution at least on a country level, because most of the laws are national. So 00:27:32.090 --> 00:27:41.770 it might be legal to harvest a tree in one country, but not in another country. So we 00:27:41.770 --> 00:27:49.319 need to get our resolution down to country level or even to regional level. If you 00:27:49.319 --> 00:27:52.361 want to distinguish, was the tree harvested in a national park in a 00:27:52.361 --> 00:28:02.289 protected area or outside in a managed forest. And when as biodiversity 00:28:02.289 --> 00:28:10.740 scientists, we don't know how to continue, one thing is to look for what people do 00:28:10.740 --> 00:28:17.179 with model organisms and specifically what people do in human population genomics 00:28:17.179 --> 00:28:24.179 because there thousands of populations geneticists are working and there is a 00:28:24.179 --> 00:28:28.210 completely different funding background due to the interest of the medical and the 00:28:28.210 --> 00:28:39.119 pharma industry. So they are always advanced. What we can learn from there, 00:28:39.119 --> 00:28:46.660 from the human populations genomics is that we need two features. One is we 00:28:46.660 --> 00:28:53.570 already know that we need distribution wide sampling, which provides a spatial 00:28:53.570 --> 00:28:59.950 context. The second feature is that we need genome wide sequencing, preferably 00:28:59.950 --> 00:29:09.210 genome sequencing, which provides us steps in time because our genomes are archives 00:29:09.210 --> 00:29:14.710 of our evolutionary history. They are records of all the processes and events 00:29:14.710 --> 00:29:21.429 and these steps in time then translate also into resolution. Once we have these 00:29:21.429 --> 00:29:30.150 two features, actually these reference datasets open Pandora's box. Suddently we 00:29:30.150 --> 00:29:36.390 can ask all kinds of questions and objectives, even those that we still don't 00:29:36.390 --> 00:29:47.010 know. We can develop all kinds of applications which is done for humans. 00:29:47.010 --> 00:29:59.400 Currently, there are at least four global datasets on human diversity. These are 00:29:59.400 --> 00:30:08.860 very widely reused and these big datasets - so they are big data with regard to the 00:30:08.860 --> 00:30:18.850 number of samples and also the genomes or the genome representations and this 00:30:18.850 --> 00:30:26.470 results in very information rich data which initiates analytical development so 00:30:26.470 --> 00:30:33.799 people continuously are developing new statistical methods. And right now, a new 00:30:33.799 --> 00:30:42.330 wave is coming in of these methods. So once you have these global datasets, 00:30:42.330 --> 00:30:47.500 people start in human populations genomics, started to do these intense 00:30:47.500 --> 00:30:56.299 regional samplings. And this is the example of the United Kingdom Biobank. 00:30:56.299 --> 00:31:02.789 It's a project with 500.000 volunteers, they are all UK citizens from all over the 00:31:02.789 --> 00:31:13.982 islands. And each individual was genotyped in a vet lab for 820.000 markers. That's 00:31:13.982 --> 00:31:19.620 completely I mean, that's a different number than the 100 or 1000...in 00:31:19.620 --> 00:31:26.409 biodiversity scientists we normally analyse a maximum of a couple of 10.000 00:31:26.409 --> 00:31:36.220 markers. So that's a completely different number. But then statistical geneticists 00:31:36.220 --> 00:31:47.140 come. They do some weird and wonderful voodoo and they derive 96 million markers 00:31:47.140 --> 00:31:53.460 per genome that is per individual from these 820.000 markers that were produced 00:31:53.460 --> 00:32:00.630 in the lab. So that's a hundred fold increase. And once you have this kind of 00:32:00.630 --> 00:32:07.510 dataset for a genome, you suddenly or you finally become country level and within 00:32:07.510 --> 00:32:18.970 country level resolution. So these panels are examples. So the first panel shows 00:32:18.970 --> 00:32:25.980 individuals who were born in Edinburgh and the question was "Where were people born 00:32:25.980 --> 00:32:32.419 who had a similar ancestral background, genetic background?". And what they found 00:32:32.419 --> 00:32:41.980 was that was all over Scotland and Northern Ireland. Northern Yorkshire was 00:32:41.980 --> 00:32:50.250 even more local. So people from Yorkshire don't seem to get around a lot. For London 00:32:50.250 --> 00:32:54.090 the situation is completely different. That is what we would expect because 00:32:54.090 --> 00:32:59.580 London is a people magnet. People move there all the time. They meet there, they 00:32:59.580 --> 00:33:05.700 get children and the kids born in London, their genetic ancestry has nothing to do 00:33:05.700 --> 00:33:12.760 with London. It's from all over the place, from the British Isles and the world. So 00:33:12.760 --> 00:33:21.600 that's why the colors are strongly dissolved. So this study came out also 00:33:21.600 --> 00:33:26.100 this summer. And it's the first time that I have seen that we actually really can 00:33:26.100 --> 00:33:36.580 achieve regional resolution. And I find this possibility for biodiversity science 00:33:36.580 --> 00:33:46.820 really exciting. So it was made possible by very sophisticated statistical 00:33:46.820 --> 00:33:51.890 approaches which are able to analyze genetic data from highly complex 00:33:51.890 --> 00:33:59.450 evolutionary and ecological systems. And at the same time these analyses are able 00:33:59.450 --> 00:34:04.910 to handle big data. We we're talking about gigabytes and terabytes of data and 00:34:04.910 --> 00:34:13.810 results. So a statistical geneticist are developing new methods of data 00:34:13.810 --> 00:34:20.309 representation to handle this amount of data. And then we are able to sufficiently 00:34:20.309 --> 00:34:25.520 extract the signal for a very specific question from data which are very low 00:34:25.520 --> 00:34:36.919 signal to noise ratio. So to get there, we need many experts and specialists. So we 00:34:36.919 --> 00:34:41.659 need statistical geneticists, big data experts who also might contribute machine 00:34:41.659 --> 00:34:49.299 learning expertise. We need molecular biologists who know how to sequence 00:34:49.299 --> 00:34:54.259 complex genomes. We now need bioinformatics with an expertise in 00:34:54.259 --> 00:35:05.010 genomics for assembly, annotation and alignment of genomic sequences. The result 00:35:05.010 --> 00:35:12.569 is actually this: This is the author list for the thousands genomes project 00:35:12.569 --> 00:35:20.380 reference data set, and I don't expect you to be able to read it, but the bold type 00:35:20.380 --> 00:35:25.539 is of interest because it shows all the different tasks that are necessary to 00:35:25.539 --> 00:35:36.140 produce a standardized and highly cleaned reverence dataset. So the whole author 00:35:36.140 --> 00:35:41.880 list is something like 1.5 pages long and even considering that some authors will 00:35:41.880 --> 00:35:51.130 have contributed to several tasks. The publications for reference datasets mostly 00:35:51.130 --> 00:35:57.079 have author lists that are far over 50 people. So they are huge collaborative 00:35:57.079 --> 00:36:05.219 efforts. Now we take the step into biodiversity science. Here these are eight 00:36:05.219 --> 00:36:13.440 gastrotrichs, they are little worm like... organisms who live in the sediments of 00:36:13.440 --> 00:36:23.069 freshwater lakes and marine sediment. They are in general a couple of hundreds micro 00:36:23.069 --> 00:36:29.569 meters large. And I don't have any numbers, but my guess would be that maybe 00:36:29.569 --> 00:36:38.640 worldwide, a hundred to a thousand people actually work on these species. There are 00:36:38.640 --> 00:36:44.829 800 species of gastrotrichs. So let's say there's one, two, maybe three experts per 00:36:44.829 --> 00:36:52.240 species for these organisms. So how are these three people going to manage all 00:36:52.240 --> 00:37:01.420 these tasks to produce a reference dataset? You might say, well, it's 00:37:01.420 --> 00:37:05.209 gastrotrichs, I mean, have never heard about them. Maybe they are not so 00:37:05.209 --> 00:37:08.349 important. Maybe you don't need a reference data sets, but actually some of 00:37:08.349 --> 00:37:17.579 those species are bioindicators for water quality. So what we observe right now is a 00:37:17.579 --> 00:37:27.510 gap for biodiversity conservation. In model organisms, we have Pandora's Box 00:37:27.510 --> 00:37:34.630 open. We have all the statistical analyses at our hands to analyze our data sets. 00:37:34.630 --> 00:37:39.709 However, in none model organisms, we are still stuck with summary statistics that 00:37:39.709 --> 00:37:46.839 don't provide us the resolution that we need. And we know that to close this gap, 00:37:46.839 --> 00:37:52.599 even for a single species, it's a huge effort. But at the same time, we have over 00:37:52.599 --> 00:38:03.560 35.000 species listed by scientists which need already now effective protection. So 00:38:03.560 --> 00:38:10.008 we need to find a way to close this gap and actually move in this direction. And 00:38:10.008 --> 00:38:19.940 the good thing is, so all of this... in biodiversity science, in academia, and we 00:38:19.940 --> 00:38:24.890 need to make the transition over the conservational genomic gap into the big 00:38:24.890 --> 00:38:32.130 loop of real world conservation tasks. And the good thing is we already know what we 00:38:32.130 --> 00:38:37.940 have to do. So we need to have reference data sets, distribution range wide. We 00:38:37.940 --> 00:38:43.959 need to have statistics. And it's going to be big data. So we need collection 00:38:43.959 --> 00:38:54.140 management, data management and an analysis environment. So looking at 00:38:54.140 --> 00:38:59.880 different ingredients or different steps the first we need is a general data 00:38:59.880 --> 00:39:05.269 infrastructure for global diversity of reference data sets that actually can be 00:39:05.269 --> 00:39:11.779 used across species for preferably as many species as possible and provide a working 00:39:11.779 --> 00:39:19.749 environment for biodiversity scientists and experts. It should be user friendly so 00:39:19.749 --> 00:39:25.759 it can be used by scientists, but also that people from local communities and 00:39:25.759 --> 00:39:33.489 citizen scientists can add their observation data and their data into this 00:39:33.489 --> 00:39:41.339 data infrastructure. I have listed quite a lot of features that these kind of 00:39:41.339 --> 00:39:48.400 infrastructures should have. And I'm going to argue that these features are not some 00:39:48.400 --> 00:40:02.609 nice to have, but actually some must have. Because our goal is always application. So 00:40:02.609 --> 00:40:13.279 we need developers, managers and curators for data infrastructures. Since our goal 00:40:13.279 --> 00:40:30.900 is application, our main features are quality control and error reduction. These 00:40:30.900 --> 00:40:38.880 are the basis. So that our conservation tools can be robustly and reliably applied 00:40:38.880 --> 00:40:46.459 under real world operating conditions. And the way to achieve quality and error 00:40:46.459 --> 00:40:52.759 reduction is through chains of custody. So it means that from project of sign, from 00:40:52.759 --> 00:40:58.299 the questions through all the steps that are necessary to produce a reference data 00:40:58.299 --> 00:41:08.219 set and then...so from sample collection, genomic statistical analysis down to 00:41:08.219 --> 00:41:15.599 application. These steps need to be documented and standardized. They need to 00:41:15.599 --> 00:41:22.239 be, each one of them needs to be validated and reproducible. They should be modular 00:41:22.239 --> 00:41:28.999 so they can be user friendly. And the whole chain of custody needs to be 00:41:28.999 --> 00:41:40.690 scalable. So if our chains of custody have these characteristics, we actually will 00:41:40.690 --> 00:41:51.390 have tools that will work in everyday life. So we need professional developers 00:41:51.390 --> 00:41:59.519 and programmers who are able to produce these very collaborative softwares. We 00:41:59.519 --> 00:42:06.130 need free and open source experts. So we always can ensure that our code and that 00:42:06.130 --> 00:42:13.859 our infrastructures are still integer and we can check them. And I'm a biologist, I 00:42:13.859 --> 00:42:19.390 don't have any background in hardware, but I've heard a couple of talks here in the 00:42:19.390 --> 00:42:26.099 conference about Green IT. And I have the feeling we should have people who know 00:42:26.099 --> 00:42:33.849 hardware and software and know how to develop these high tech tools in a way 00:42:33.849 --> 00:42:38.450 sustainable so that by developing these tools, we don't use more resources than we 00:42:38.450 --> 00:42:48.940 are trying to protect. So I've shown all these features and characteristics that 00:42:48.940 --> 00:42:57.459 the software should have. And I'm arguing that these features are necessary because 00:42:57.459 --> 00:43:04.819 of the reality we find us in. It is one of rising over-exploitation and destruction 00:43:04.819 --> 00:43:19.799 of nature. So the extent of environmental crimes is up in the billions. All 00:43:19.799 --> 00:43:29.029 environmental crime together, the green bubbles are only second to drug associated 00:43:29.029 --> 00:43:35.489 crimes. They are up there with counterfeiting or human trafficing. So 00:43:35.489 --> 00:43:45.479 these are multi-billion enterprises. They are often transnational and industries 00:43:45.479 --> 00:44:02.019 with huge profits. So if there's some crime, some mafia boss, some criminal 00:44:02.019 --> 00:44:09.539 manager who just bribed a government official somewhere in the neck in the 00:44:09.539 --> 00:44:17.859 woods, it just would make sense that that person would not wait or not take the 00:44:17.859 --> 00:44:23.809 risks to be discovered just because some customs officer pulls out a container 00:44:23.809 --> 00:44:29.170 somewhere in the harbor, for example, opens it and says "This looks kind of 00:44:29.170 --> 00:44:37.380 weird. Let's take a sample, send it to a lab." and then a population geneticist 00:44:37.380 --> 00:44:44.171 comes back and says "Oh, yes, this sample is not from area A as documented, but 00:44:44.171 --> 00:44:52.449 actually it's from area B and it was illegally logged." If we have reference 00:44:52.449 --> 00:44:58.660 data sets, information rich reference data sets, they become highly valuable and they 00:44:58.660 --> 00:45:08.430 need protection themselves against manipulation and destruction. So we will 00:45:08.430 --> 00:45:14.739 need to think about IT security from the beginning. Also, these data sets are often 00:45:14.739 --> 00:45:20.069 very politically sensitive because if it is shown that in a certain country there 00:45:20.069 --> 00:45:25.680 is the illegal logging repeatedly, that country might not be too excited about 00:45:25.680 --> 00:45:41.380 this information. So we need to think about IT security experts. So my hope is 00:45:41.380 --> 00:45:48.599 that these kind of very high tech digital conservation tools can actually contribute 00:45:48.599 --> 00:45:55.690 to the U.N. Sustainable Development Goals by empowering indigenous people, local 00:45:55.690 --> 00:46:02.810 communities and also us to protect and force and sustainably use our lands and 00:46:02.810 --> 00:46:10.139 our biodiversity by providing some management and law enforcement tools. So 00:46:10.139 --> 00:46:14.059 we need people from around the world, users from around the world who use these 00:46:14.059 --> 00:46:25.789 tools and help to develop them further and to maintain them. And finally here, these 00:46:25.789 --> 00:46:33.910 high tech tools will just another technological fix. If we don't manage to 00:46:33.910 --> 00:46:45.770 get our back down, our way of life down to sustainable levels. So what we need is to 00:46:45.770 --> 00:46:53.759 today...this year, the Earth Overshoot Day was at the end of July. So at the end of 00:46:53.759 --> 00:47:01.639 July, we had used all the resources that we had available for the whole year. And 00:47:01.639 --> 00:47:09.400 we need to get this back to the end of the year so that our resources actually 00:47:09.400 --> 00:47:22.910 sustain us for the whole year. The graphic here for Germany suggests that we are on a 00:47:22.910 --> 00:47:29.819 good way. We are reducing our resource consumption and maybe even our biocapacity 00:47:29.819 --> 00:47:38.099 moves up a little bit. So actually it seems that our personal lifestyles and 00:47:38.099 --> 00:47:46.329 choices make a difference and we just need to close this gap here much quicker. So 00:47:46.329 --> 00:47:53.689 protecting biodiversity needs all of us to achieve that. And with that, thank you 00:47:53.689 --> 00:47:57.770 very much. 00:47:57.770 --> 00:48:08.020 Applause 00:48:08.020 --> 00:48:12.680 Angel: So thank you Jutta for this very interesting talk and the very valuable 00:48:12.680 --> 00:48:16.609 work you're doing. We have three mics here. Please line up at the microphones if 00:48:16.609 --> 00:48:22.809 you have any questions or suggestions or want to participate and work together with 00:48:22.809 --> 00:48:29.660 Jutta. We have one question from the Internet, so please Signal-Angel start. 00:48:29.660 --> 00:48:34.749 Signal-Angel: Why do wild plant species within a genus are further apart than wild 00:48:34.749 --> 00:48:42.509 animal species within a genus? Angel: Could you repeat it, please? 00:48:42.509 --> 00:48:49.069 Signal-Angel: Why do wild plant species within a genus are further apart than wild 00:48:49.069 --> 00:48:55.910 animal species within a genus? Jutta: I'm not sure I understand the 00:48:55.910 --> 00:49:01.180 background for the question. Mic 1: Because animals move and plants 00:49:01.180 --> 00:49:06.449 don't move. Jutta: Oh, okay. If that is the idea 00:49:06.449 --> 00:49:12.299 behind the question. Plants actually move, too. They don't move as individuals, but 00:49:12.299 --> 00:49:24.289 they move their genetic material through pollen or fragments. So actually diversity 00:49:24.289 --> 00:49:30.760 in plants and in animals can be quite similar. So the idea is that plants are 00:49:30.760 --> 00:49:36.459 just stuck and should have a completely different population structure does not 00:49:36.459 --> 00:49:43.130 hold because plants move around their genetic material through seeds, through 00:49:43.130 --> 00:49:49.610 pollen, through vegetative propagules. Angel: So thank you microphone 1 for 00:49:49.610 --> 00:49:55.999 helping out. Please ask your question. Mic 1: So my question is about the success 00:49:55.999 --> 00:50:00.939 factor of it. If you think of this, whatever database being set up there and I 00:50:00.939 --> 00:50:07.430 think it's gonna be a huge database...I downloaded my own genome on the Internet. 00:50:07.430 --> 00:50:12.989 It was about 150 megabytes. And if we multiply that, I think the genetic 00:50:12.989 --> 00:50:17.539 variation from one person to another is about 1 percent only. So we can compress 00:50:17.539 --> 00:50:25.009 that to 4 megabytes per person. If we sequence all the humans in the world, that 00:50:25.009 --> 00:50:32.689 would be 32 petabytes, that would cost approximately 15 billion dollars. And 00:50:32.689 --> 00:50:36.890 that's only for the storage. Now comes the entire management. Of course, we don't 00:50:36.890 --> 00:50:41.470 want to digitize all the human genome, but rather the plants and animal species 00:50:41.470 --> 00:50:46.309 genome. So it's a huge data program. And what would be for you the success factors 00:50:46.309 --> 00:50:51.229 for this thing to really fly? And did you talk to organizations like WikiData or 00:50:51.229 --> 00:50:56.469 others or where would it ideally be hosted? At a university or an 00:50:56.469 --> 00:51:02.170 international nonprofit or who would be running the thing? 00:51:02.170 --> 00:51:14.519 Jutta: Yeah, I mean, it's just really big data. I think our first goal is not to 00:51:14.519 --> 00:51:23.670 think about having all predicted 5 to 10 million species be sequenced on a 00:51:23.670 --> 00:51:30.239 population level. I think we need to think about the next step. And there it would 00:51:30.239 --> 00:51:35.530 make sense to start with species that are actually highly exploited, like many 00:51:35.530 --> 00:51:40.579 timber species and also many marine fishes. I think that's where we should 00:51:40.579 --> 00:51:48.039 start. And to host this kind of data I think it should be in political 00:51:48.039 --> 00:51:56.410 independent hands. So it should be with an NGO or with the U.N., some organization 00:51:56.410 --> 00:52:02.449 that is independent. Mic 1: Are you the first to think about 00:52:02.449 --> 00:52:06.509 this or are there existing initiatives? Jutta: There are actually existing 00:52:06.509 --> 00:52:14.219 initiatives. I have been in contact with the Forest Stewardship Council and they 00:52:14.219 --> 00:52:23.219 are actually starting to sample their concessions and initiated to build up the 00:52:23.219 --> 00:52:28.730 samples, they work together with Kew Botanical Gardens and the U.S. Forest 00:52:28.730 --> 00:52:37.589 Service. And right now they're analyzing the samples, using isotopes which is 00:52:37.589 --> 00:52:45.579 another method which is very powerful and can also produce geographic information. 00:52:45.579 --> 00:53:00.710 And so, yeah, so people are moving in this way. So, yeah, I think the idea is out 00:53:00.710 --> 00:53:05.839 there, just we have to start and we have to really do it and provide one 00:53:05.839 --> 00:53:13.210 infrastructure so that we can combine, for example, morphological data, isotope data 00:53:13.210 --> 00:53:18.329 and genomic data into one dataset, which will increase our resolution and our 00:53:18.329 --> 00:53:23.980 reliability. Angel: Okay. Microphone number two, 00:53:23.980 --> 00:53:27.069 please. Mic 2: Thank you for your valuable talk. 00:53:27.069 --> 00:53:32.660 My question would be you'd start your talk with the possible decrease of leaf beetles 00:53:32.660 --> 00:53:37.100 in the data set you showed on slide number six there was an increase in leaf beetle 00:53:37.100 --> 00:53:41.930 population until the 70s, something about that. Is there a possible explanation for 00:53:41.930 --> 00:53:49.869 that? Jutta: Yeah, I believe it is, because 00:53:49.869 --> 00:53:55.359 people started to much more systematically observe leaf beetles. So it's a sample 00:53:55.359 --> 00:54:05.869 effort. And also at that time the people - so it's a multi-people collaboration who 00:54:05.869 --> 00:54:12.369 actually has assembled this dataset so the people who are part of this collaboration 00:54:12.369 --> 00:54:16.949 they edit their own private data sets. And that's why you have an increase I think. 00:54:16.949 --> 00:54:23.509 While the people from the nineteen hundreds, nineteen hundred ten you only 00:54:23.509 --> 00:54:29.009 can use the data that is available in publications and samples in museums or in 00:54:29.009 --> 00:54:33.289 scientific collections. I think that is the reason why you have the sharp 00:54:33.289 --> 00:54:35.589 increase. Mic 2: Thank you. 00:54:35.589 --> 00:54:38.750 Angel: So we have another question of microphone number two. 00:54:38.750 --> 00:54:44.459 Mic 2: Thank you for your fine talking. Excuse me. Maybe my question is a bit off 00:54:44.459 --> 00:54:51.730 topic. Do you think the methods and roles that you identified in your talk could be 00:54:51.730 --> 00:54:59.880 transferred to the assessment of raw materials? I'm thinking about metals? 00:54:59.880 --> 00:55:09.349 Jutta: Maybe the data infrastructure, like if you wanted to collect raw metals or 00:55:09.349 --> 00:55:16.471 materials from all over the world and...a sampleized scientific collection and to 00:55:16.471 --> 00:55:22.390 have kind of a reference dataset that might work, actually. But the genomics 00:55:22.390 --> 00:55:29.170 obviously won't. So that part of what you would need to use different methods from 00:55:29.170 --> 00:55:36.010 physics, obviously. But actually the infrastructure, certain parts will be 00:55:36.010 --> 00:55:40.249 quite similar. I think so, yes. Angel: So we have one more question from 00:55:40.249 --> 00:55:43.420 the Internet. Signal-Angel: Who does contract a 00:55:43.420 --> 00:55:51.619 freelance evolutionary biologist? Can you give an example of this kind of work you 00:55:51.619 --> 00:56:01.429 proposed? Jutta: So I see this gap between science 00:56:01.429 --> 00:56:07.739 and applications, that we need these applications and there's a huge potential 00:56:07.739 --> 00:56:18.150 for these applications. We know that illegal logging and that is my background, 00:56:18.150 --> 00:56:23.769 but doesn't seem to be much different, for example, in marine fisheries. We know that 00:56:23.769 --> 00:56:29.730 there is this huge amount of illegal logging and timber trade going on. And we 00:56:29.730 --> 00:56:39.670 need to have some assets actually that have the power to detect illegally traded 00:56:39.670 --> 00:56:49.789 timber. So I think there is a huge need for these kind of methods and 00:56:49.789 --> 00:57:00.869 organizations who are interested in these kind of methods. Our governments, their 00:57:00.869 --> 00:57:12.719 companies, NGOs, customs, Interpol. So, yeah. 00:57:12.719 --> 00:57:19.700 Angel: Do we have any other questions? So thank you again Jutta for your talk and 00:57:19.700 --> 00:57:23.739 the valuable work you're doing. Please give a warm round of applause to Jutta. 00:57:23.739 --> 00:57:29.009 Applause 00:57:29.009 --> 00:57:33.599 36c3 postrol music 00:57:33.599 --> 00:57:56.000 Subtitles created by c3subtitles.de in the year 2020. Join, and help us!