34c3 intro Hanno Böck: Yeah, so many of you probably know me from doing things around IT security, but I'm gonna surprise you to almost not talk about IT security today. But I'm gonna ask the question "Can we trust the scientific method?". I want to start this by giving you which is quite a simple example. So if we do science like we start with the theory and then we are trying to test if it's true, right? So I mean I said I'm not going to talk about IT security but I chose an example from IT security or kind of from IT security. So there was a post on Reddit a while ago, a picture from some book which claimed that if you use a Malachite crystal that can protect you from computer viruses. Which... to me doesn't sound very plausible, right? Like, these are crystals and if you put them on your computer, this book claims this protects you from malware. But of course if we really want to know, we could do a study on this. And if you say people don't do Studies on crazy things: that's wrong. I mean people do studies on homeopathy or all kinds of crazy things that are completely implausible. So we can do a study on this and what we will do is we will do a randomized control trial, which is kind of the gold standard of doing a test on these kinds of things. So this is our question: "Do Malachite crystals prevent malware infections?" and how we would test that, our study design is: ok, we take a group of maybe 20 computer users. And then we split them randomly to two groups, and then one group we'll give one of these crystals and tell them: "Put them on your desk or on your computer.". Then we need, the other group is our control group. That's very important because if we want to know if they help we need another group to compare it to. And to rule out that there are any kinds of placebo effects, we give these control groups a fake Malachite crystal so we can compare them against each other. And then we wait for maybe six months and then we check how many malware infections they had. Now, I didn't do that study, but I simulated it with a Python script and given that I don't believe that this theory is true I just simulated this as random data. So I'm not going to go through the whole script but I'm just like generating, I'm assuming there can be between 0 and 3 malware infections and it's totally random and then I compare the two groups. And then I calculate something which is called a p-value which is a very common thing in science whenever you do statistics. A p-value is, it's a bit technical, but it's the probability that if you have no effect that you would get this result. Which kind of in another way means, if you have 20 results in an idealized world then one of them is a false positive which means one of them says something happens although it doesn't. And in many fields of science this p-value of 0.05 is considered that significant which is like these twenty studies. So one error in twenty studies but as I said under idealized conditions. So and as it's the script and I can run it in less than a second I just did it twenty times instead of once. So here are my 20 simulated studies and most of them look not very interesting so of course we have a few random variations but nothing very significant. Except if you look at this one study, it says the people with the Malachite crystal had on average 1.8 malware infections and the people with the fake crystal had 0.8. So it means actually the crystal made it worse. But also this result is significant because it has a p-value of 0.03. So of course we can publish that, assuming I really did these studies. applause B.: And the other studies we just forget about. I mean they were not interesting right and who cares? Non significant results... Okay so you have just seen that I created a significant result out of random data. And that's concerning because people in science - I mean you can really do that. And this phenomena is called publication bias. So what's happening here is that, you're doing studies and if they get a positive result - meaning you're seeing an effect, then you publish them and if there's no effect you just forget about them. We learned earlier that with this p-value of 0.05 means 1 in 20 studies is a false positive, but you usually don't see the studies that are not significant, because they don't get published. And you may wonder: "Ok, what's stopping a scientist from doing exactly this? What's stopping a scientist from just doing so many experiments till one of them looks like it's a real result although it's just a random fluke?". And the disconcerning answer to that is, it's usually nothing. And this is not just a theoretical example. I want to give you an example, that has quite some impact and that was researched very well, and that is a research on antidepressants so called SSRIs. And in 2008 there was a study, the interesting situation here was, that the US Food and Drug Administration, which is the authority that decides whether a medical drug can be put on the market, they had knowledge about all the studies that had been done to register this medication. And then some researchers looked at that and compared it with what has been published. And they figured out there were 38 studies that saw that these medications had a real effect, had real improvements for patients. And from those 38 studies 37 got published. But then there were 36 studies that said: "These medications don't really have any effect.", "They are not really better than a placebo effect" and out of those only 14 got published. And even from those 14 there were 11, where the researcher said, okay they have spent the result in a way that it sounds like these medications do something. But they were also a bunch of studies that were just not published because they had a negative result. And it's clear that if you look at the published studies only and you ignore the studies with a negative result that haven't been published, then these medications look much better than they really are. And it's not like the earlier example there is a real effect from antidepressants, but they are not as good as people have believed in the past. So we've learnt in theory with publication bias you can create result out of nothing. But if you're a researcher and you have a theory that's not true but you really want to publish something about it, that's not really efficient, because you have to do 20 studies on average to get one of these random results that look like real results. So there are more efficient ways to get to a result from nothing. If you're doing a study then there are a lot of micro decisions you have to make, for example you may have dropouts from your study where people, I don't know they move to another place or they - you now longer reach them, so they are no longer part of your study. And there are different things how you can handle that. Then you may have cornercase results, where you're not entirely sure: "Is this an effect or not and how do you decide?", "How do you exactly measure?". And then also you may be looking for different things, maybe there are different tests you can do on people, and you may control for certain variables like "Do you split men and women into separate?", "Do you see them separately?" or "Do you separate them by age?". So there are many decisions you can make while doing a study. And of course each of these decisions has a small effect on the result. And it may very often be, that just by trying all the combinations you will get a p-value that looks like it's statistically significant, although there's no real effect. So and there's this term called p-Hacking which means you're just adjusting your methods long enough, that you get a significant result. And I'd like to point out here, that this is usually not that a scientist says: "Ok, today I'm going to p-hack my result, because I know my theory is wrong but I want to show it's true.". But it's a subconscious process, because usually the scientists believe in their theories. Honestly. They honestly think that their theory is true and that their research will show that. So they may subconsciously say: "Ok, if I analyze my data like this it looks a bit better so I will do this.". So subconsciously, they may p-hack themselves into getting a result that's not really there. And again we can ask: "What is stopping scientists from p-hacking?". And the concerning answer is the same: usually nothing. And I came to this conclusion that I say: "Ok, the scientific method it's a way to create evidence for whatever theory you like. No matter if it's true or not.". And you may say: "That's a pretty bold thing to say.". and I'm saying this even though I'm not even a scientist. I'm just like some hacker who, whatever... But I'm not alone in this, like there's a paper from a famous researcher John Ioannidis, who said: "Why most published research findings are false.". He published this in 2005 and if you look at the title, he doesn't really question that most research findings are false. He only wants to give reasons why this is the case. And he makes some very possible assumptions if you look at that many negative results don't get published, and that you will have some bias. And it comes to a very plausible conclusion, that this is the case and this is not even very controversial. If you ask people who are doing what you can call science on science or meta science, who look at scientific methodology, they will tell you: "Yeah, of course that's the case.". Some will even say: "Yeah, that's how science works, that's what we expect.". But I find it concerning. And if you take this seriously, it means: if you read about a study, like in a newspaper, the default assumption should be 'that's not true' - while we might usually think the opposite. And if science is a method to create evidence for whatever you like, you can think about something really crazy, like "Can people see into the future?", "Does our mind have some extra perception where we can sense things that happen in an hour?". And there was a psychologist called Daryl Bem and he thought that this is the case and he published a study on it. It was titled "feeling the future". He did a lot of experiments where he did something, and then something later happened, and he thought he had statistical evidence that what happened later influenced what happened earlier. So, I don't think that's very plausible - based on what we know about the universe, but yeah... and it was published in a real psychology journal. And a lot of things were wrong with this study. Basically, it's a very nice example for p-hacking and just even a book by Daryl Bem, where he describes something which basically looks like p-hacking, where he says that's how you do psychology. But the study was absolutely in line with the existing standards in Experimental Psychology. And that a lot of people found concerning. So, if you can show that precognition is real, that you can see into the future, then what else can you show and how can we trust our results? And psychology has debated this a lot in the past couple of years. So there's a lot of talk about the replication crisis in psychology. And many effects that psychology just thought were true, they figured out, okay, if they try to repeat these experiments, they couldn't get these results even though entire subfields were built on these results. And I want to show you an example, which is one of the ones that is not discussed so much. So there's a theory which is called moral licensing. And the idea is that if you do something good, or something you think is good, then later basically you behave like an asshole. Because you think I already did something good now, I don't have to be so nice anymore. And there were some famous studies that had the theory, that people consume organic food, that later they become more judgmental, or less social, less nice to their peers. But just last week someone tried to replicate this original experiments. And they tried it three times with more subjects and better research methodology and they totally couldn't find that effect. But like what you've seen here is lots of media articles. I have not found a single article reporting that this could not be replicated. Maybe they will come but yeah there's just a very recent example. But now I want to have a small warning for you because you may think now "yeah these psychologists, that all sounds very fishy and they even believe in precognition and whatever", but maybe your field is not much better maybe you just don't know about it yet because nobody else has started replicating studies in your field. And there are other fields that have replication problems and some much worse for example the pharma company Amgen in 2012 they published something where they said "We have tried to replicate cancer research and preclinical research" that is stuff in a petri dish or animal experiments so not drugs on humans but what happens before you develop a drug and they were only able to replicate 47 out of 53 studies. And these were they said landmark studies, so studies that have been published in the best journals. Now there are a few problems with this publication because they have not published their applications they have not told us which studies these were that they could not replicate. In the meantime I think they have published three of these replications but most of it is a bit in the dark which points to another problem because they say they did this because they collaborated with the original researchers and they only did this by agreeing that they would not publish the results. But it still sounds very concerning so but some fields don't have a replication problem because just nobody is trying to replicate previous results I mean then you will never know if your results hold up. So what can be done about all this and fundamentally I think the core issue here is that the scientific process is tied together with results, so we do a study and only after that we decide whether it's going to be published. Or we do a study and only after we have the data we're trying to analyze it. So essentially we need to decouple the scientific process from its results and one way of doing that is pre-registration so what you're doing there is that before you start doing a study you will register it in a public register and say "I'm gonna do a study like on this medication or whatever on this psychological effect" and that's how I'm gonna do it and then later on people can check if you really did that. And yeah that's what I said. And this is more or less standard practice in medical drug trials the summary about it is it does not work very well but it's better than nothing. So, and the problem is mostly enforcement so people register study and then don't publish it and nothing happens to them even though they are legally required to publish it. And there are two campaigns I'd like to point out, there's the all trials campaign which has been started by Ben Goldacre he's a doctor from the UK and they like demand that like every trial it's done on medication should be published. And there's also a project by the same guy the compare project and they are trying to see if a medical trial has been registered and later published did they do the same or did they change something in their protocol and was there a reason for it or did they just change it to get a result, which they otherwise wouldn't get.But then again like these issues in medicine they offer get a lot of attention and for good reasons because if we have bad science in medicine then people die, that's pretty immediate and pretty massive. But if you read about this you always have to think that these issues in drug trials at least they have pre-registration, most scientific fields don't bother doing anything like that. So whenever you hear something about maybe about publication bias in medicine you should always think the same thing happens in many fields of science and usually nobody is doing anything about it. And particularly to this audience I'd like to say there's currently a big trend that people from computer science want to revolutionize medicine: big data and machine learning, these things, which in principle is ok but I know a lot of people in medicine are very worried about this and the reason is, that these computer science people don't have the same scientific standards as people in medicine expect them and might say "Yeah we don't need really need to do a study on this it's obvious that this helps" and that is worrying and I come from computer science and I very well understand that people from medicine are worried about this. So there's an idea that goes even further as pre-registration and it's called registered reports. There is a couple of years ago some scientists wrote an open letter to the Guardian where they.. that was published there and the idea there is that you turn the scientific publication process upside down, so if you want to do a study the first thing you would do with the register report is, you submit your design your study design protocol to the journal and then the journal decides whether they will publish that before they see any result, because then you can prevent publication bias and then you prevent the journals only publish the nice findings and ignore the negative findings. And then you do the study and then it gets published but it gets published independent of what the result was. And there of course other things you can do to improve science, there's a lot of talk about sharing data, sharing code, sharing methods because if you want to replicate a study it's of course easier if you have access to all the details how the original study was done. Then you could say "Okay we could do large collaborations" because many studies are just too small if you have a study with twenty people you just don't get a very reliable outcome. So maybe in many situations it would be better get together 10 teams of scientists and let them all do a big study together and then you can reliably answer a question. And also some people propose just to get higher statistical thresholds that p-value of 0.05 means practically nothing. There was recently a paper that just argued which would just like put the dot one more to the left and have 0.005 and that would already solve a lot of problems. And for example in physics they have they have something called Sigma 5 which is I think zero point and then 5 zeroes and 3 or something like that so in physics they have much higher statistical thresholds. Now whatever if you're working in any scientific field you might ask yourself like "If we have statistic results are they pre registered in any way and do we publish negative results?" like we tested an effect and we got nothing and are there replications of all relevant results and I would say if you answer all these questions with "no" which I think many people will do, then you're not really doing science what you're doing is the alchemy of our time. Applause Thanks. Herald: Thank you very much.. Hanno: No I have more, sorry, I have three more slides, that was not the finishing line. Big issue is also that there are bad incentives in science, so a very standard thing to evaluate the impact of science is citation counts for you say "if your scientific study is cited a lot then this is a good thing and if your journal is cited a lot this is a good thing" and this for example the impact factor but there are also other measurements. And also universities like publicity so if your study gets a lot of media reports then your press department likes you. And these incentives tend to favor interesting results but they don't favor correct results and this is bad because if we are realistic most results are not that interesting, most results will be "Yeah we have this interesting and counterintuitive theory and it's totally wrong" and then there's this idea that science is self-correcting. So if you confront scientists with these issues with publication bias and peer hacking surely they will immediately change that's what scientists do right? And I want to cite something here with this sorry it's a bit long but "There are some evidence that inferior statistical tests are commonly used research which yields non significant results is not published." That sounds like publication bias and then it also says: "Significant results published in these fields are seldom verified by independent replication" so it seems there's a replication problem. These wise words were set in 1959, so by a statistician called Theodore Sterling and because science is so self-correcting in 1995 he complained that this article presents evidence that published result of scientific investigations are not a representative sample of all scientific studies. "These results also indicate that practice leading to publication bias has not changed over a period of 30 years" and here we are in 2018 and publication bias is still a problem. So if science is self- correcting then it's pretty damn slow in correcting itself, right? And finally I would like to ask you, if you're prepared for boring science, because ultimately, I think, we have a choice between what I would like to call TEDTalk science and boring science.. Applause .. so with tedtalk science we get mostly positive and surprising results and interesting results we have large defects many citations lots of media attention and you may have a TED talk about it. Unfortunately usually it's not true and I would like to propose boring science as the alternative which is mostly negative results, pretty boring, small effects but it may be closer to the truth. And I would like to have boring science but I know it's a pretty tough sell. Sorry I didn't hear that. Yeah, thanks for listening. Applause Herald: Thank you. Hanno: Two questions, or? Herald: We don't have that much time for questions, three minutes, three minutes guys. Question one - shoot. Mic: This isn't a question but I just wanted to comment Hanno you missed out a very critical topic here, which is the use of Bayesian probability. So you did conflate p-values with the scientific method which isn't.. which gave the rest of you talk. I felt a slightly unnecessary anti science slant. On p, p-values isn't the be-all and end-all of the scientific method so p-values is sort of calculating the probability that your data will happen given that no hypothesis is true whereas Bayesian probability would be calculating the probability that your hypothesis is true given the data and more and more scientists are slowly starting to realize that this sort of method is probably a better way of doing science than p-values. So this is probably a a third alternative to your sort of proposal boring science is doing the other side's Bayesian probability. Hanno: Sorry yeah, I agree with you I unfortunately I only had half an hour here. Herald: Where are you going after this like where are we going after this lecture can they find you somewhere in the bar? Hanno: I know him.. Herald: You know science is broken but then scientists it's a little bit like the next lecture actually that's waiting there it's like: "you scratch my back and I scratch yours for publication". Hanno: Maybe two more minutes? Herald: One minute. Please go ahead. Mic: Yeah hi, thank you for your talk. I'm curious so you've raised, you know, ways we can address this assuming good actors, assuming people who want to do better science that this happens out of ignorance or willful ignorance. What do we do about bad actors. So for example the medical community drug companies, maybe they really like the idea of being profitably incentivized by these random control trials, to make out essentially a placebo do something. How do we begin to address them current trying to maliciously p-hack or maliciously abuse the pre-reg system or something like that? Hanno: I mean it's a big question, right? But I think if the standards are kind of confining you so much that there's not much room to cheat that's way out right and a basis and also I don't think deliberate cheating is that much of a problem, I actually really think the bigger problem is people honestly believe what they do is true. Herald: Okay one last, you sir, please? Mic: So the value in science is often an account of publications right? Account of citations so and so on, so is it true that to improve this situation you've described, journals of whose publications are available, who are like prospective, should impose more higher standards so the journals are those who must like raise the bar, they should enforce publication of protocols before like accepting and etc etc. So is it journals who should, like, do work on that or can we regular scientists do something also? I mean you can publish in the journals that have better standards, right? There are journals that have these registered reports, but of course I mean as a single scientist is always difficult because you're playing in a system that has all these wrong incentives. Herald: Okay guys that's it, we have to shut down. Please. There is a reference better science dot-org, go there, and one last request give really warm applause! Applause 34c3 outro subtitles created by c3subtitles.de in the year 2018. Join, and help us!