36C3 preroll music Angel: Right now I'd like to welcome our first speaker on stage. The talk will be about protecting the wild and I'll hand over to her. Please give her a warm round of applause. Applause Jutta Buschbom: Thank you very much for the introduction. My name is Jutta Buschbom, I'm an evolutionary biologist. That is my background. I did do my PHD at the University of Chicago working on little fungees that live in symbiosis with algae and form colorful rocks, colorful crust on rocks. I then did a Postdoc in bioinformatics and after that moved back into organismal biology, working in forest genetics. And the ten years I worked in forest genetics for the first time I encountered questions that were with regard to application, and I found out that actually moving from research to application is not trivial. So what I'm going to present is a high tech way using genomic data to protect biodiversity in a way that you can actually reach application and use conservation genomic tools. So this summer the draft of the report of the Intergovernmental Science Policy Panel for Biodiversity and Ecosystem Services came out and its results were quite warning. It stated that around a million animal and plant species are currently stated and of those...half of those species are already dead species walking. So because due to the destruction of the habitats or habitat deterioration, they are not able to reproduce in a sustainable way anymore. A third of the total species extinction rate risk to date has arisen in the last 25 years. And just to give you an idea about the relation we are talking about...currently the rate of extinction risk is already at least ten to hundreds times higher than it has averaged over the past 10 million years. And within these 10 million years there were the Ice Ages, for example. And most of the extinction risk is due to the fact of land and sea use change. The report also talks, even talks about that we already seem to have transgressed a proposed precautionary planetary boundary, which means within the boundary we have a stable biological system. But having transgressed it, we might already be in a transition to a new state that we have no way to find out how this state is going to look like. So all of these facts that the report is stating are actually pretty negative. And I was quite happy to read that they also present that there are actually people who do better than most of us. And they point out that many practices of indigenous people and local communities actually conserve and sustain wild and domesticated biodiversity quite well. Today, a higher proportion of the remaining terrestrial biodiversity lies in areas managed and held by indigenous people. And these ecosystems are more intact and less declining, less rapidly declining. So we have examples of lifestyles that actually do better than most of us. And I know the solutions won't be simple and it won't be easy to get there but we can look to what these people do better than we do. All of this sounds...it's a global report and it sounds kind of like far away, like probably somewhere in the tropics, but actually threats to biodiversity happen also directly in front of our own front doors. This summer a paper came out from two colleagues from the University of Greifswald, who had analyzed the long term data set about leaf beetles. And they were asking if we already have a decline of leaf beetles in Central Europe. So they compiled long term data sets of leaf beetle observations for Central Europe, starting from 1900 now to 2017, so spanning a hundred and twenty years. And what they find is that systematic reports on leaf beetles and leaf beetle observations are increasing during this time interval, time span. But despite the fact that we have...like in the last two decades, we had very high numbers of reports and observations for leaf beetles, the number of species, the orange line, is declining. It's slightly declining. But the question is, is this real or not? And what was most worrisome to the authors is that in the data set, the number of species here in orange that were having more reports was declining, while the number of species that showed less reports than before is expanding. So this kind of long term datasets are very hard to interpret and many factors can contribute to those patterns. And it's not clear if this pattern is statistically significant. But if you take a step back and consider your background knowledge, your prior knowledge about the state of the world, do you say, like, how does the current state look like? Does it look good or rather worrisome? And then with that knowledge, tell me that these results are an artifact or a bias. I'm worried that once we have statistical significant signal in this dataset, it will be already too late. So right now, I've been talking about leaf beetles and beetles are the largest group within insects with about 400.000 species. Leaf beetles are a large family of about 50.000 species which are worldwide distributed. And here in Germany, we have over 470 leaf beetle species. So how do we actually know how many species there are and who actually counted all these species? And is that just a task of taxonomists. Taxonomy is the science of naming and defining, including circumscribing and classifying groups of biological organisms on the basis of shared characters. So one could have the picture of some woman with a funny hat running over a meadow catching like butterflies or some guy mushroom hunter crawling through the forest trying to find mushrooms. And it's true, as biodiversity scientists we spent a lot of time outdoors and yeah...on the other hand, biotaxonomy is a high-tech science today. So taxonomists actually take up new technological tools and developments to help them identify and describe, understand the species. So taxonomists actually are often experts in, for example, microscopy, mathematics, biochemistry, even proteomics and genomics. So throughout the talk, I'm going to compile this list of people and experts we're going to need to protect biodiversity if we want to do this on the basis of genetic data. Right now, the list is quite empty. The first entry is a taxonomists, but that will change quickly and taxonomists are a subgroup of evolutionary biologists mostly. So I told you as taxonomists and biodiversity scientists take up technology and...so as soon as computers came about and the internet started people started to use that to compile information about species, and today we have several global resources available at the species level and above the species level. So we biodiversity scientists were among the first who defined biodiversity information standards. We have a global catalog of life. A list of all named species. The Global Biodiversity Information Facility has an aim to bring together information from different sources and they are compiling, producing this wonderful map. This is leaf beetles, all the records about leaf beetles that we have in the world. And it looks like as if leaf beetles are highly associated with third world economics. However that clearly is an artifact and it just shows that we need many more taxonomists and biodiversity scientists all over the world to find and identify leaf beetles. So we also need biodiversity informaticians to help us compile global lists and distribute knowledge. So far I have been talking about species which is a simplification. The question is what is...what are species actually? And so we need to talk about genetic diversity within and between species. And I'm going to do so using gulls, which most of us might know. Here in Europe, we have two large gulls of the genus Larus. One is in the front, the lighter gray is our Silbermöwe. And in the back is our Heringsmöwe, the dark one. And I'm going to use German names because the English names go crosswise and that's completely confusing. So I will stick with the German names. Here in Europe these two species seem to be really fine species because they barely interbreed, so they don't hybridize. However, if you take a step back and look at the genus in general, you see that the species of the genus are distributed kind of ringwise around the Arctic. And so the idea is that, say during the Ice Age, all of this area was glaciated and the gulls retreated to a refuge here near the Caspian Sea. And then after the ice retreated, the gulls moved back north. One branch moved into Europe forming our Heringsmöwe and another branch then moved counterclockwise around the Arctic, producing different morphotypes, different species across the Bering Strait and then into North America. There the dark blue one is...I'm simplifying, the equivalent of our European Silbermöwe, the American Silbermöwe. Then the idea is that some individuals crossed back to Europe and formed our European Silbermöwe. And while all of these species here are interbreeding, so they hybridize. Only when this ring is closed those two species don't interbreed anymore. And the big question is, are we actually dealing with one single species or are we dealing with different species that just happened to hybridize more or less? The question is not trivial because it has consequences for protection. If we are dealing with one single species, all the gulls in Eurasia could go extinct and it wouldn't matter because we still would have the gulls in North America. However, if we have different species in all of these areas, we would need to protect individuals or the species on a regional level and protect all of these different species. So to investigate this question about: Do we have different species? And what were the evolutionary processes and histories that brought about the species? A group of scientists investigated that using DNA sequences. And on the left, you have the model, the theoretical model of the ring species. And here on the right you have reality. And the scientists found that the reality is always much more complex. So, for example, they found two refuges or they proposed two refuges. But what they found was that genetic diversity was correlated with those species or morphotypes. So what that also means is that genetic diversity is cultivated with geographic origin. What we learn from this type of analysis is we learn about evolutionary processes and history, about variability and differentiation of our gene flow and migration, about speciation processes. That we all need to understand our species, which will allow us to protect them. So we need evolutionary biologists who do follow genetics and population genetics. So once we found out that one can use genetic diversity, to infer geographic origin because genetic diversity is correlated with geography, people immediately said: 'Okay, we can use it for conservation applications.'. And it's also...we learned that we...often it is unclear what is a species, species boundaries are unclear and some species have huge distribution ranges with different clusters of viability within this huge range. So we know that we need to protect within species genetic diversity, which means that we need to understand within species population structure and we need to build useful and reliable models of population structure. These models are actually required for all of our applications. They are required for monitoring, for example, for conservation strategies, for functional adaptation and adaptability, questions of productability of different provenances, its impact on management regimes, breeding strategies, and also for enforcement applications. From the studies I showed you before with the gulls we also know that we need to approach the question of a population structure on a distribution range wide scale. So here's the map produced by EUFORGENE, the European Network for forest reproductive material for one of our native oaks, the sessil oak. And the dots are the sites for genetic conservation units. And so that is one strategy how to represent within species genetic diversity and how to sample it. And you can see this is a hypothetical example, but we likely will see a gradient from west to east or might see one at this scale. Then once we have these kind of global data sets, we can go to the fine scale and maybe, for example, do a national genetic monitoring. And we will find much finer scale gradients. We also will find especially for first trace outliers, so for stands that don't fit the usual pattern. And that is because the first reproductive material has been moved around a lot. And so these lighter or darker dots is material that was moved to Germany from the outside. And we only will identify these outliers if we have the whole reference dataset. If we don't have the whole reference dataset, we might not identify these outliers - stands with a different history. Or in a worst case, these outliers might actually bias our gradients. And we are always talking about very slight gradients. So it's easy to bias these gradiants, dilute them, so we actually won't get the results we need. To compile these kinds of reference datasets that's huge collaborative efforts because people need to go out into the field and collect the reference samples and that might be scientists, that might be people from local communities, citizen scientists, managers, owners, government officials who provide background information, maps, distribution information and also in many parts of the world might protect the people who are actually collecting the samples. And it might be conservation activists and NGOs. So once the samples have been collected they need to be stored somewhere for the long term and the information needs to be databased. And that is the work of scientific connections, which are mostly at natural history museums and there the samples are processed. They're organized in ways that you can find them again. All the metadata is entered, which curators do, collection managers, preparators, technical staff at the scientific collections. So once we have these kind of data sets, large scale data sets, what are we actually doing with them? So the foundation for all of our applications is population structure and there specifically population assignment. So the process is set first. We decide on a question and design our project accordingly that we can answer the question. Then we need to infer the population structure model and optimize it. In the next step we need to check if a model actually is good enough for application because we might have found the best model, but it might still not be good enough for application. So we need to test that. And that is the step of population assignment or predictive assignment. And then in the end, we want to test our hypothesis. Are the two stands different or does an individual come from stand A or from stand B? And here we identify error rates and accuracy. So this whole process is very statistical. And so the analysis of these reference data they need to be accompanied by biostatisticians who can tell us how to analyze our data. So what is the state-of-the-art right now? What kind of geographic resolution do we actually get of this non model specie currently? And I'm going to present the example of an African timber tree species, which is a very valuable timber. It's one example but basically all results for species who have large distribution ranges and are continuously distributed and are also long-lived, are very similar. So this kind of results seem to be species independent. So the species are Milica regia and excelsa, African teak, which cannot be grown in plantations for timber quality. So it is harvested unsustainably from natural forests. It's distributed in West, Central and East Africa. Here's a black rectangle. And a group of a dozen scientists got together and they actually sampled a reference dataset for these two species. It's about over 400 samples, they analyzed four marker systems, resulting in a total of something like 100 markers, genetic markers, and then they optimized the population model and used different parameter settings. And we're going to concentrate here on the best solution that they found. And basically this rectangle here is the black one over here. So the resolution is... they found population structure with clear clusters. So the populations and the species from West Africa can be distinguished from those populations in Central Africa. And the ones in East Africa can be differentiated. So that is really good. So we have population structure. We know their signal. The problem is still that our resolution is much lower than we would need to have it because we basically need resolution at least on a country level, because most of the laws are national. So it might be legal to harvest a tree in one country, but not in another country. So we need to get our resolution down to country level or even to regional level. If you want to distinguish, was the tree harvested in a national park in a protected area or outside in a managed forest. And when as biodiversity scientists, we don't know how to continue, one thing is to look for what people do with model organisms and specifically what people do in human population genomics because there thousands of populations geneticists are working and there is a completely different funding background due to the interest of the medical and the pharma industry. So they are always advanced. What we can learn from there, from the human populations genomics is that we need two features. One is we already know that we need distribution wide sampling, which provides a spatial context. The second feature is that we need genome wide sequencing, preferably genome sequencing, which provides us steps in time because our genomes are archives of our evolutionary history. They are records of all the processes and events and these steps in time then translate also into resolution. Once we have these two features, actually these reference datasets open Pandora's box. Suddently we can ask all kinds of questions and objectives, even those that we still don't know. We can develop all kinds of applications which is done for humans. Currently, there are at least four global datasets on human diversity. These are very widely reused and these big datasets - so they are big data with regard to the number of samples and also the genomes or the genome representations and this results in very information rich data which initiates analytical development so people continuously are developing new statistical methods. And right now, a new wave is coming in of these methods. So once you have these global datasets, people start in human populations genomics, started to do these intense regional samplings. And this is the example of the United Kingdom Biobank. It's a project with 500.000 volunteers, they are all UK citizens from all over the islands. And each individual was genotyped in a vet lab for 820.000 markers. That's completely I mean, that's a different number than the 100 or 1000...in biodiversity scientists we normally analyse a maximum of a couple of 10.000 markers. So that's a completely different number. But then statistical geneticists come. They do some weird and wonderful voodoo and they derive 96 million markers per genome that is per individual from these 820.000 markers that were produced in the lab. So that's a hundred fold increase. And once you have this kind of dataset for a genome, you suddenly or you finally become country level and within country level resolution. So these panels are examples. So the first panel shows individuals who were born in Edinburgh and the question was "Where were people born who had a similar ancestral background, genetic background?". And what they found was that was all over Scotland and Northern Ireland. Northern Yorkshire was even more local. So people from Yorkshire don't seem to get around a lot. For London the situation is completely different. That is what we would expect because London is a people magnet. People move there all the time. They meet there, they get children and the kids born in London, their genetic ancestry has nothing to do with London. It's from all over the place, from the British Isles and the world. So that's why the colors are strongly dissolved. So this study came out also this summer. And it's the first time that I have seen that we actually really can achieve regional resolution. And I find this possibility for biodiversity science really exciting. So it was made possible by very sophisticated statistical approaches which are able to analyze genetic data from highly complex evolutionary and ecological systems. And at the same time these analyses are able to handle big data. We we're talking about gigabytes and terabytes of data and results. So a statistical geneticist are developing new methods of data representation to handle this amount of data. And then we are able to sufficiently extract the signal for a very specific question from data which are very low signal to noise ratio. So to get there, we need many experts and specialists. So we need statistical geneticists, big data experts who also might contribute machine learning expertise. We need molecular biologists who know how to sequence complex genomes. We now need bioinformatics with an expertise in genomics for assembly, annotation and alignment of genomic sequences. The result is actually this: This is the author list for the thousands genomes project reference data set, and I don't expect you to be able to read it, but the bold type is of interest because it shows all the different tasks that are necessary to produce a standardized and highly cleaned reverence dataset. So the whole author list is something like 1.5 pages long and even considering that some authors will have contributed to several tasks. The publications for reference datasets mostly have author lists that are far over 50 people. So they are huge collaborative efforts. Now we take the step into biodiversity science. Here these are eight gastrotrichs, they are little worm like... organisms who live in the sediments of freshwater lakes and marine sediment. They are in general a couple of hundreds micro meters large. And I don't have any numbers, but my guess would be that maybe worldwide, a hundred to a thousand people actually work on these species. There are 800 species of gastrotrichs. So let's say there's one, two, maybe three experts per species for these organisms. So how are these three people going to manage all these tasks to produce a reference dataset? You might say, well, it's gastrotrichs, I mean, have never heard about them. Maybe they are not so important. Maybe you don't need a reference data sets, but actually some of those species are bioindicators for water quality. So what we observe right now is a gap for biodiversity conservation. In model organisms, we have Pandora's Box open. We have all the statistical analyses at our hands to analyze our data sets. However, in none model organisms, we are still stuck with summary statistics that don't provide us the resolution that we need. And we know that to close this gap, even for a single species, it's a huge effort. But at the same time, we have over 35.000 species listed by scientists which need already now effective protection. So we need to find a way to close this gap and actually move in this direction. And the good thing is, so all of this... in biodiversity science, in academia, and we need to make the transition over the conservational genomic gap into the big loop of real world conservation tasks. And the good thing is we already know what we have to do. So we need to have reference data sets, distribution range wide. We need to have statistics. And it's going to be big data. So we need collection management, data management and an analysis environment. So looking at different ingredients or different steps the first we need is a general data infrastructure for global diversity of reference data sets that actually can be used across species for preferably as many species as possible and provide a working environment for biodiversity scientists and experts. It should be user friendly so it can be used by scientists, but also that people from local communities and citizen scientists can add their observation data and their data into this data infrastructure. I have listed quite a lot of features that these kind of infrastructures should have. And I'm going to argue that these features are not some nice to have, but actually some must have. Because our goal is always application. So we need developers, managers and curators for data infrastructures. Since our goal is application, our main features are quality control and error reduction. These are the basis. So that our conservation tools can be robustly and reliably applied under real world operating conditions. And the way to achieve quality and error reduction is through chains of custody. So it means that from project of sign, from the questions through all the steps that are necessary to produce a reference data set and then...so from sample collection, genomic statistical analysis down to application. These steps need to be documented and standardized. They need to be, each one of them needs to be validated and reproducible. They should be modular so they can be user friendly. And the whole chain of custody needs to be scalable. So if our chains of custody have these characteristics, we actually will have tools that will work in everyday life. So we need professional developers and programmers who are able to produce these very collaborative softwares. We need free and open source experts. So we always can ensure that our code and that our infrastructures are still integer and we can check them. And I'm a biologist, I don't have any background in hardware, but I've heard a couple of talks here in the conference about Green IT. And I have the feeling we should have people who know hardware and software and know how to develop these high tech tools in a way sustainable so that by developing these tools, we don't use more resources than we are trying to protect. So I've shown all these features and characteristics that the software should have. And I'm arguing that these features are necessary because of the reality we find us in. It is one of rising over-exploitation and destruction of nature. So the extent of environmental crimes is up in the billions. All environmental crime together, the green bubbles are only second to drug associated crimes. They are up there with counterfeiting or human trafficing. So these are multi-billion enterprises. They are often transnational and industries with huge profits. So if there's some crime, some mafia boss, some criminal manager who just bribed a government official somewhere in the neck in the woods, it just would make sense that that person would not wait or not take the risks to be discovered just because some customs officer pulls out a container somewhere in the harbor, for example, opens it and says "This looks kind of weird. Let's take a sample, send it to a lab." and then a population geneticist comes back and says "Oh, yes, this sample is not from area A as documented, but actually it's from area B and it was illegally logged." If we have reference data sets, information rich reference data sets, they become highly valuable and they need protection themselves against manipulation and destruction. So we will need to think about IT security from the beginning. Also, these data sets are often very politically sensitive because if it is shown that in a certain country there is the illegal logging repeatedly, that country might not be too excited about this information. So we need to think about IT security experts. So my hope is that these kind of very high tech digital conservation tools can actually contribute to the U.N. Sustainable Development Goals by empowering indigenous people, local communities and also us to protect and force and sustainably use our lands and our biodiversity by providing some management and law enforcement tools. So we need people from around the world, users from around the world who use these tools and help to develop them further and to maintain them. And finally here, these high tech tools will just another technological fix. If we don't manage to get our back down, our way of life down to sustainable levels. So what we need is to today...this year, the Earth Overshoot Day was at the end of July. So at the end of July, we had used all the resources that we had available for the whole year. And we need to get this back to the end of the year so that our resources actually sustain us for the whole year. The graphic here for Germany suggests that we are on a good way. We are reducing our resource consumption and maybe even our biocapacity moves up a little bit. So actually it seems that our personal lifestyles and choices make a difference and we just need to close this gap here much quicker. So protecting biodiversity needs all of us to achieve that. And with that, thank you very much. Applause Angel: So thank you Jutta for this very interesting talk and the very valuable work you're doing. We have three mics here. Please line up at the microphones if you have any questions or suggestions or want to participate and work together with Jutta. We have one question from the Internet, so please Signal-Angel start. Signal-Angel: Why do wild plant species within a genus are further apart than wild animal species within a genus? Angel: Could you repeat it, please? Signal-Angel: Why do wild plant species within a genus are further apart than wild animal species within a genus? Jutta: I'm not sure I understand the background for the question. Mic 1: Because animals move and plants don't move. Jutta: Oh, okay. If that is the idea behind the question. Plants actually move, too. They don't move as individuals, but they move their genetic material through pollen or fragments. So actually diversity in plants and in animals can be quite similar. So the idea is that plants are just stuck and should have a completely different population structure does not hold because plants move around their genetic material through seeds, through pollen, through vegetative propagules. Angel: So thank you microphone 1 for helping out. Please ask your question. Mic 1: So my question is about the success factor of it. If you think of this, whatever database being set up there and I think it's gonna be a huge database...I downloaded my own genome on the Internet. It was about 150 megabytes. And if we multiply that, I think the genetic variation from one person to another is about 1 percent only. So we can compress that to 4 megabytes per person. If we sequence all the humans in the world, that would be 32 petabytes, that would cost approximately 15 billion dollars. And that's only for the storage. Now comes the entire management. Of course, we don't want to digitize all the human genome, but rather the plants and animal species genome. So it's a huge data program. And what would be for you the success factors for this thing to really fly? And did you talk to organizations like WikiData or others or where would it ideally be hosted? At a university or an international nonprofit or who would be running the thing? Jutta: Yeah, I mean, it's just really big data. I think our first goal is not to think about having all predicted 5 to 10 million species be sequenced on a population level. I think we need to think about the next step. And there it would make sense to start with species that are actually highly exploited, like many timber species and also many marine fishes. I think that's where we should start. And to host this kind of data I think it should be in political independent hands. So it should be with an NGO or with the U.N., some organization that is independent. Mic 1: Are you the first to think about this or are there existing initiatives? Jutta: There are actually existing initiatives. I have been in contact with the Forest Stewardship Council and they are actually starting to sample their concessions and initiated to build up the samples, they work together with Kew Botanical Gardens and the U.S. Forest Service. And right now they're analyzing the samples, using isotopes which is another method which is very powerful and can also produce geographic information. And so, yeah, so people are moving in this way. So, yeah, I think the idea is out there, just we have to start and we have to really do it and provide one infrastructure so that we can combine, for example, morphological data, isotope data and genomic data into one dataset, which will increase our resolution and our reliability. Angel: Okay. Microphone number two, please. Mic 2: Thank you for your valuable talk. My question would be you'd start your talk with the possible decrease of leaf beetles in the data set you showed on slide number six there was an increase in leaf beetle population until the 70s, something about that. Is there a possible explanation for that? Jutta: Yeah, I believe it is, because people started to much more systematically observe leaf beetles. So it's a sample effort. And also at that time the people - so it's a multi-people collaboration who actually has assembled this dataset so the people who are part of this collaboration they edit their own private data sets. And that's why you have an increase I think. While the people from the nineteen hundreds, nineteen hundred ten you only can use the data that is available in publications and samples in museums or in scientific collections. I think that is the reason why you have the sharp increase. Mic 2: Thank you. Angel: So we have another question of microphone number two. Mic 2: Thank you for your fine talking. Excuse me. Maybe my question is a bit off topic. Do you think the methods and roles that you identified in your talk could be transferred to the assessment of raw materials? I'm thinking about metals? Jutta: Maybe the data infrastructure, like if you wanted to collect raw metals or materials from all over the world and...a sampleized scientific collection and to have kind of a reference dataset that might work, actually. But the genomics obviously won't. So that part of what you would need to use different methods from physics, obviously. But actually the infrastructure, certain parts will be quite similar. I think so, yes. Angel: So we have one more question from the Internet. Signal-Angel: Who does contract a freelance evolutionary biologist? Can you give an example of this kind of work you proposed? Jutta: So I see this gap between science and applications, that we need these applications and there's a huge potential for these applications. We know that illegal logging and that is my background, but doesn't seem to be much different, for example, in marine fisheries. We know that there is this huge amount of illegal logging and timber trade going on. And we need to have some assets actually that have the power to detect illegally traded timber. So I think there is a huge need for these kind of methods and organizations who are interested in these kind of methods. Our governments, their companies, NGOs, customs, Interpol. So, yeah. Angel: Do we have any other questions? So thank you again Jutta for your talk and the valuable work you're doing. Please give a warm round of applause to Jutta. Applause 36c3 postrol music Subtitles created by c3subtitles.de in the year 2020. Join, and help us!