WEBVTT 00:00:00.099 --> 00:00:14.890 34c3 intro 00:00:14.890 --> 00:00:19.090 Hanno Böck: Yeah, so many of you probably know me from doing things around IT 00:00:19.090 --> 00:00:25.000 security, but I'm gonna surprise you to almost not talk about IT security today. 00:00:25.000 --> 00:00:32.189 But I'm gonna ask the question "Can we trust the scientific method?". I want to 00:00:32.189 --> 00:00:38.809 start this by giving you which is quite a simple example. So if we do science like 00:00:38.809 --> 00:00:45.210 we start with the theory and then we are trying to test if it's true, right? So I 00:00:45.210 --> 00:00:49.760 mean I said I'm not going to talk about IT security but I chose an example from IT 00:00:49.760 --> 00:00:56.690 security or kind of from IT security. So there was a post on Reddit a while ago, 00:00:56.690 --> 00:01:01.329 a picture from some book which claimed that if you use a Malachite crystal that can 00:01:01.329 --> 00:01:06.240 protect you from computer viruses. Which... to me doesn't sound very 00:01:06.240 --> 00:01:11.009 plausible, right? Like, these are crystals and if you put them on your computer, this book 00:01:11.009 --> 00:01:18.590 claims this protects you from malware. But of course if we really want to know, we 00:01:18.590 --> 00:01:23.990 could do a study on this. And if you say people don't do Studies on crazy things: 00:01:23.990 --> 00:01:28.770 that's wrong. I mean people do studies on homeopathy or all kinds of crazy things 00:01:28.770 --> 00:01:34.549 that are completely implausible. So we can do a study on this and what we will do is 00:01:34.549 --> 00:01:39.509 we will do a randomized control trial, which is kind of the gold standard of 00:01:39.509 --> 00:01:46.310 doing a test on these kinds of things. So this is our question: "Do Malachite 00:01:46.310 --> 00:01:52.479 crystals prevent malware infections?" and how we would test that, our study design 00:01:52.479 --> 00:01:58.399 is: ok, we take a group of maybe 20 computer users. And then we split them 00:01:58.399 --> 00:02:06.009 randomly to two groups, and then one group we'll give one of these crystals and tell 00:02:06.009 --> 00:02:10.919 them: "Put them on your desk or on your computer.". Then we need, the other group 00:02:10.919 --> 00:02:15.800 is our control group. That's very important because if we want to know if 00:02:15.800 --> 00:02:20.940 they help we need another group to compare it to. And to rule out that there are any 00:02:20.940 --> 00:02:27.130 kinds of placebo effects, we give these control groups a fake Malachite crystal so 00:02:27.130 --> 00:02:32.260 we can compare them against each other. And then we wait for maybe six months and 00:02:32.260 --> 00:02:39.310 then we check how many malware infections they had. Now, I didn't do that study, but 00:02:39.310 --> 00:02:45.090 I simulated it with a Python script and given that I don't believe that this 00:02:45.090 --> 00:02:50.310 theory is true I just simulated this as random data. So I'm not going to go 00:02:50.310 --> 00:02:55.090 through the whole script but I'm just like generating, I'm assuming there can be 00:02:55.090 --> 00:02:59.950 between 0 and 3 malware infections and it's totally random and then I compare the 00:02:59.950 --> 00:03:04.790 two groups. And then I calculate something which is called a p-value which is a very 00:03:04.790 --> 00:03:10.730 common thing in science whenever you do statistics. A p-value is, it's a bit 00:03:10.730 --> 00:03:17.290 technical, but it's the probability that if you have no effect that you would get 00:03:17.290 --> 00:03:23.570 this result. Which kind of in another way means, if you have 20 results in an 00:03:23.570 --> 00:03:29.260 idealized world then one of them is a false positive which means one of them 00:03:29.260 --> 00:03:34.510 says something happens although it doesn't. And in many fields of science 00:03:34.510 --> 00:03:41.180 this p-value of 0.05 is considered that significant which is like these twenty 00:03:41.180 --> 00:03:48.620 studies. So one error in twenty studies but as I said under idealized conditions. 00:03:48.620 --> 00:03:53.330 So and as it's the script and I can run it in less than a second I just did it twenty 00:03:53.330 --> 00:03:59.821 times instead of once. So here are my 20 simulated studies and most of them look 00:03:59.821 --> 00:04:06.360 not very interesting so of course we have a few random variations but nothing very 00:04:06.360 --> 00:04:12.460 significant. Except if you look at this one study, it says the people with the 00:04:12.460 --> 00:04:17.160 Malachite crystal had on average 1.8 malware infections and the people with the 00:04:17.160 --> 00:04:24.670 fake crystal had 0.8. So it means actually the crystal made it worse. But also this 00:04:24.670 --> 00:04:32.100 result is significant because it has a p-value of 0.03. So of course we can 00:04:32.100 --> 00:04:36.110 publish that, assuming I really did these studies. 00:04:36.110 --> 00:04:40.600 applause B.: And the other studies we just forget 00:04:40.600 --> 00:04:45.850 about. I mean they were not interesting right and who cares? Non significant 00:04:45.850 --> 00:04:52.990 results... Okay so you have just seen that I created a significant result out of 00:04:52.990 --> 00:05:00.590 random data. And that's concerning because people in science - I mean you can really do 00:05:00.590 --> 00:05:07.850 that. And this phenomena is called publication bias. So what's happening here 00:05:07.850 --> 00:05:13.130 is that, you're doing studies and if they get a positive result - meaning you're 00:05:13.130 --> 00:05:18.990 seeing an effect, then you publish them and if there's no effect you just forget 00:05:18.990 --> 00:05:26.670 about them. We learned earlier that with this p-value of 0.05 means 1 in 20 studies 00:05:26.670 --> 00:05:32.760 is a false positive, but you usually don't see the studies that are not significant, 00:05:32.760 --> 00:05:39.320 because they don't get published. And you may wonder: "Ok, what's stopping a 00:05:39.320 --> 00:05:43.500 scientist from doing exactly this? What's stopping a scientist from just doing so 00:05:43.500 --> 00:05:47.750 many experiments till one of them looks like it's a real result although it's just 00:05:47.750 --> 00:05:54.710 a random fluke?". And the disconcerning answer to that is, it's usually nothing. 00:05:56.760 --> 00:06:03.620 And this is not just a theoretical example. I want to give you an example, 00:06:03.620 --> 00:06:09.110 that has quite some impact and that was researched very well, and that is a 00:06:09.110 --> 00:06:17.980 research on antidepressants so called SSRIs. And in 2008 there was a study, the 00:06:17.980 --> 00:06:22.680 interesting situation here was, that the US Food and Drug Administration, which is 00:06:22.680 --> 00:06:29.480 the authority that decides whether a medical drug can be put on the market, 00:06:29.480 --> 00:06:35.490 they had knowledge about all the studies that had been done to register this 00:06:35.490 --> 00:06:40.380 medication. And then some researchers looked at that and compared it with what 00:06:40.380 --> 00:06:45.810 has been published. And they figured out there were 38 studies that saw that these 00:06:45.810 --> 00:06:51.040 medications had a real effect, had real improvements for patients. And from those 00:06:51.040 --> 00:06:56.790 38 studies 37 got published. But then there were 36 studies that said: "These 00:06:56.790 --> 00:07:00.010 medications don't really have any effect.", "They are not really better than 00:07:00.010 --> 00:07:06.530 a placebo effect" and out of those only 14 got published. And even from those 14 00:07:06.530 --> 00:07:11.010 there were 11, where the researcher said, okay they have spent the result in a way 00:07:11.010 --> 00:07:17.920 that it sounds like these medications do something. But they were also a bunch of 00:07:17.920 --> 00:07:21.870 studies that were just not published because they had a negative result. And 00:07:21.870 --> 00:07:26.390 it's clear that if you look at the published studies only and you ignore the 00:07:26.390 --> 00:07:29.320 studies with a negative result that haven't been published, then these 00:07:29.320 --> 00:07:34.290 medications look much better than they really are. And it's not like the earlier 00:07:34.290 --> 00:07:38.240 example there is a real effect from antidepressants, but they are not as good 00:07:38.240 --> 00:07:40.210 as people have believed in the past. 00:07:43.020 --> 00:07:45.860 So we've learnt in theory with publication bias 00:07:45.860 --> 00:07:50.520 you can create result out of nothing. But if you're a researcher and you have a 00:07:50.520 --> 00:07:54.790 theory that's not true but you really want to publish something about it, that's not 00:07:54.790 --> 00:07:59.699 really efficient, because you have to do 20 studies on average to get one of these 00:07:59.699 --> 00:08:06.130 random results that look like real results. So there are more efficient ways 00:08:06.130 --> 00:08:12.780 to get to a result from nothing. If you're doing a study then there are a lot of 00:08:12.780 --> 00:08:17.320 micro decisions you have to make, for example you may have dropouts from your 00:08:17.320 --> 00:08:22.150 study where people, I don't know they move to another place or they - you now longer 00:08:22.150 --> 00:08:26.020 reach them, so they are no longer part of your study. And there are different things 00:08:26.020 --> 00:08:30.480 how you can handle that. Then you may have cornercase results, where you're not 00:08:30.480 --> 00:08:34.509 entirely sure: "Is this an effect or not and how do you decide?", "How do you 00:08:34.509 --> 00:08:39.639 exactly measure?". And then also you may be looking for different things, maybe 00:08:39.639 --> 00:08:46.620 there are different tests you can do on people, and you may control for certain 00:08:46.620 --> 00:08:51.639 variables like "Do you split men and women into separate?", "Do you see them 00:08:51.639 --> 00:08:56.430 separately?" or "Do you separate them by age?". So there are many decisions you can 00:08:56.430 --> 00:09:02.050 make while doing a study. And of course each of these decisions has a small effect 00:09:02.050 --> 00:09:10.399 on the result. And it may very often be, that just by trying all the combinations 00:09:10.399 --> 00:09:15.230 you will get a p-value that looks like it's statistically significant, although 00:09:15.230 --> 00:09:20.670 there's no real effect. So and there's this term called p-Hacking which means 00:09:20.670 --> 00:09:25.550 you're just adjusting your methods long enough, that you get a significant result. 00:09:27.050 --> 00:09:32.550 And I'd like to point out here, that this is usually not that a scientist says: "Ok, 00:09:32.550 --> 00:09:36.259 today I'm going to p-hack my result, because I know my theory is wrong but I 00:09:36.259 --> 00:09:42.420 want to show it's true.". But it's a subconscious process, because usually the 00:09:42.420 --> 00:09:47.399 scientists believe in their theories. Honestly. They honestly think that their 00:09:47.399 --> 00:09:52.040 theory is true and that their research will show that. So they may subconsciously 00:09:52.040 --> 00:09:58.279 say: "Ok, if I analyze my data like this it looks a bit better so I will do this.". 00:09:58.279 --> 00:10:05.079 So subconsciously, they may p-hack themselves into getting a result that's 00:10:05.079 --> 00:10:11.449 not really there. And again we can ask: "What is stopping scientists from 00:10:11.449 --> 00:10:22.009 p-hacking?". And the concerning answer is the same: usually nothing. And I came to 00:10:22.009 --> 00:10:26.069 this conclusion that I say: "Ok, the scientific method it's a way to create 00:10:26.069 --> 00:10:31.899 evidence for whatever theory you like. No matter if it's true or not.". And you may 00:10:31.899 --> 00:10:35.720 say: "That's a pretty bold thing to say.". and I'm saying this even though I'm not 00:10:35.720 --> 00:10:42.480 even a scientist. I'm just like some hacker who, whatever... But I'm not alone 00:10:42.480 --> 00:10:47.759 in this, like there's a paper from a famous researcher John Ioannidis, who 00:10:47.759 --> 00:10:51.529 said: "Why most published research findings are false.". He published this in 00:10:51.529 --> 00:10:57.170 2005 and if you look at the title, he doesn't really question that most research 00:10:57.170 --> 00:11:02.560 findings are false. He only wants to give reasons why this is the case. And he makes 00:11:02.560 --> 00:11:08.499 some very possible assumptions if you look at that many negative results don't get 00:11:08.499 --> 00:11:12.129 published, and that you will have some bias. And it comes to a very plausible 00:11:12.129 --> 00:11:17.180 conclusion, that this is the case and this is not even very controversial. If you ask 00:11:17.180 --> 00:11:23.491 people who are doing what you can call science on science or meta science, who 00:11:23.491 --> 00:11:28.410 look at scientific methodology, they will tell you: "Yeah, of course that's the 00:11:28.410 --> 00:11:32.079 case.". Some will even say: "Yeah, that's how science works, that's what we 00:11:32.079 --> 00:11:37.689 expect.". But I find it concerning. And if you take this seriously, it means: if you 00:11:37.689 --> 00:11:43.160 read about a study, like in a newspaper, the default assumption should be 'that's 00:11:43.160 --> 00:11:51.179 not true' - while we might usually think the opposite. And if science is a method 00:11:51.179 --> 00:11:55.709 to create evidence for whatever you like, you can think about something really 00:11:55.709 --> 00:12:00.939 crazy, like "Can people see into the future?", "Does our mind have 00:12:00.939 --> 00:12:09.720 some extra perception where we can sense things that happen in an hour?". And 00:12:09.720 --> 00:12:15.559 there was a psychologist called Daryl Bem and he thought that this is the case and 00:12:15.559 --> 00:12:20.399 he published a study on it. It was titled "feeling the future". He did a lot of 00:12:20.399 --> 00:12:25.449 experiments where he did something, and then something later happened, and he 00:12:25.449 --> 00:12:29.569 thought he had statistical evidence that what happened later influenced what 00:12:29.569 --> 00:12:34.999 happened earlier. So, I don't think that's very plausible - based on what we know 00:12:34.999 --> 00:12:41.550 about the universe, but yeah... and it was published in a real psychology journal. 00:12:41.550 --> 00:12:46.680 And a lot of things were wrong with this study. Basically, it's a very nice example 00:12:46.680 --> 00:12:51.009 for p-hacking and just even a book by Daryl Bem, where he describes something 00:12:51.009 --> 00:12:55.040 which basically looks like p-hacking, where he says that's how you do 00:12:55.040 --> 00:13:03.870 psychology. But the study was absolutely in line with the existing standards in 00:13:03.870 --> 00:13:08.759 Experimental Psychology. And that a lot of people found concerning. So, if you can 00:13:08.759 --> 00:13:13.619 show that precognition is real, that you can see into the future, then what else 00:13:13.619 --> 00:13:19.139 can you show and how can we trust our results? And psychology has debated this a 00:13:19.139 --> 00:13:21.880 lot in the past couple of years. So there's a lot of talk about the 00:13:21.880 --> 00:13:30.009 replication crisis in psychology. And many effects that psychology just thought were 00:13:30.009 --> 00:13:35.040 true, they figured out, okay, if they try to repeat these experiments, they couldn't 00:13:35.040 --> 00:13:40.759 get these results even though entire subfields were built on these results. 00:13:44.369 --> 00:13:48.069 And I want to show you an example, which is one of the ones that is not discussed so 00:13:48.069 --> 00:13:55.540 much. So there's a theory which is called moral licensing. And the idea is that if 00:13:55.540 --> 00:14:00.649 you do something good, or something you think is good, then later basically you 00:14:00.649 --> 00:14:04.880 behave like an asshole. Because you think I already did something good now, I don't 00:14:04.880 --> 00:14:10.689 have to be so nice anymore. And there were some famous studies that had the theory, 00:14:10.689 --> 00:14:17.870 that people consume organic food, that later they become more judgmental, or less 00:14:17.870 --> 00:14:27.949 social, less nice to their peers. But just last week someone tried to replicate this 00:14:27.949 --> 00:14:32.720 original experiments. And they tried it three times with more subjects and better 00:14:32.720 --> 00:14:39.010 research methodology and they totally couldn't find that effect. But like what 00:14:39.010 --> 00:14:43.790 you've seen here is lots of media articles. I have not found a single 00:14:43.790 --> 00:14:51.179 article reporting that this could not be replicated. Maybe they will come but yeah 00:14:51.179 --> 00:14:57.360 there's just a very recent example. But now I want to have a small warning for you 00:14:57.360 --> 00:15:01.319 because you may think now "yeah these psychologists, that all sounds very 00:15:01.319 --> 00:15:05.329 fishy and they even believe in precognition and whatever", but maybe your 00:15:05.329 --> 00:15:09.889 field is not much better maybe you just don't know about it yet because nobody 00:15:09.889 --> 00:15:15.990 else has started replicating studies in your field. And there are other fields 00:15:15.990 --> 00:15:21.670 that have replication problems and some much worse for example the pharma company 00:15:21.670 --> 00:15:27.279 Amgen in 2012 they published something where they said "We have tried to 00:15:27.279 --> 00:15:32.940 replicate cancer research and preclinical research" that is stuff in a petri dish or 00:15:32.940 --> 00:15:38.869 animal experiments so not drugs on humans but what happens before you develop a drug 00:15:38.869 --> 00:15:44.699 and they were only able to replicate 47 out of 53 studies. And these were they 00:15:44.699 --> 00:15:50.050 said landmark studies, so studies that have been published in the best journals. 00:15:50.050 --> 00:15:54.099 Now there are a few problems with this publication because they have not 00:15:54.099 --> 00:15:58.760 published their applications they have not told us which studies these were that they 00:15:58.760 --> 00:16:02.730 could not replicate. In the meantime I think they have published three of these 00:16:02.730 --> 00:16:07.290 replications but most of it is a bit in the dark which points to another problem 00:16:07.290 --> 00:16:10.689 because they say they did this because they collaborated with the original 00:16:10.689 --> 00:16:16.109 researchers and they only did this by agreeing that they would not publish the 00:16:16.109 --> 00:16:22.379 results. But it still sounds very concerning so but some fields don't have a 00:16:22.379 --> 00:16:27.170 replication problem because just nobody is trying to replicate previous results I 00:16:27.170 --> 00:16:34.269 mean then you will never know if your results hold up. So what can be done about 00:16:34.269 --> 00:16:42.930 all this and fundamentally I think the core issue here is that the scientific 00:16:42.930 --> 00:16:49.970 process is tied together with results, so we do a study and only after that we 00:16:49.970 --> 00:16:54.759 decide whether it's going to be published. Or we do a study and only after we have 00:16:54.759 --> 00:17:01.230 the data we're trying to analyze it. So essentially we need to decouple the 00:17:01.230 --> 00:17:09.800 scientific process from its results and one way of doing that is pre-registration 00:17:09.800 --> 00:17:14.490 so what you're doing there is that before you start doing a study you will register 00:17:14.490 --> 00:17:20.500 it in a public register and say "I'm gonna do a study like on this medication or 00:17:20.500 --> 00:17:25.670 whatever on this psychological effect" and that's how I'm gonna do it and then later 00:17:25.670 --> 00:17:33.980 on people can check if you really did that. And yeah that's what I said. And this 00:17:33.980 --> 00:17:41.179 is more or less standard practice in medical drug trials the summary about it 00:17:41.179 --> 00:17:47.130 is it does not work very well but it's better than nothing. So, and the problem 00:17:47.130 --> 00:17:52.029 is mostly enforcement so people register study and then don't publish it and 00:17:52.029 --> 00:17:57.190 nothing happens to them even though they are legally required to publish it. And 00:17:57.190 --> 00:18:01.889 there are two campaigns I'd like to point out, there's the all trials campaign which 00:18:01.889 --> 00:18:08.149 has been started by Ben Goldacre he's a doctor from the UK and they like demand 00:18:08.149 --> 00:18:13.330 that like every trial it's done on medication should be published. And 00:18:13.330 --> 00:18:18.870 there's also a project by the same guy the compare project and they are trying to see 00:18:18.870 --> 00:18:25.380 if a medical trial has been registered and later published did they do the same or 00:18:25.380 --> 00:18:29.480 did they change something in their protocol and was there a reason for it or 00:18:29.480 --> 00:18:36.799 did they just change it to get a result, which they otherwise wouldn't get.But then 00:18:36.799 --> 00:18:41.080 again like these issues in medicine they offer get a lot of attention and for good 00:18:41.080 --> 00:18:46.820 reasons because if we have bad science in medicine then people die, that's pretty 00:18:46.820 --> 00:18:52.960 immediate and pretty massive. But if you read about this you always have to think 00:18:52.960 --> 00:18:58.510 that these issues in drug trials at least they have pre-registration, most 00:18:58.510 --> 00:19:04.330 scientific fields don't bother doing anything like that. So whenever you hear 00:19:04.330 --> 00:19:08.470 something about maybe about publication bias in medicine you should always think 00:19:08.470 --> 00:19:12.630 the same thing happens in many fields of science and usually nobody is doing 00:19:12.630 --> 00:19:18.809 anything about it. And particularly to this audience I'd like to say there's 00:19:18.809 --> 00:19:23.580 currently a big trend that people from computer science want to revolutionize 00:19:23.580 --> 00:19:30.300 medicine: big data and machine learning, these things, which in principle is ok but 00:19:30.300 --> 00:19:34.750 I know a lot of people in medicine are very worried about this and the reason is, 00:19:34.750 --> 00:19:39.470 that these computer science people don't have the same scientific standards as 00:19:39.470 --> 00:19:44.399 people in medicine expect them and might say "Yeah we don't need really need to do 00:19:44.399 --> 00:19:50.450 a study on this it's obvious that this helps" and that is worrying and I come 00:19:50.450 --> 00:19:53.580 from computer science and I very well understand that people from medicine are 00:19:53.580 --> 00:20:00.540 worried about this. So there's an idea that goes even further as pre-registration 00:20:00.540 --> 00:20:05.210 and it's called registered reports. There is a couple of years ago some scientists 00:20:05.210 --> 00:20:10.539 wrote an open letter to the Guardian where they.. that was published there and the idea 00:20:10.539 --> 00:20:16.451 there is that you turn the scientific publication process upside down, so if you 00:20:16.451 --> 00:20:21.210 want to do a study the first thing you would do with the register report is, you 00:20:21.210 --> 00:20:27.000 submit your design your study design protocol to the journal and then the 00:20:27.000 --> 00:20:33.110 journal decides whether they will publish that before they see any result, because 00:20:33.110 --> 00:20:36.990 then you can prevent publication bias and then you prevent the journals only publish 00:20:36.990 --> 00:20:42.710 the nice findings and ignore the negative findings. And then you do the study and 00:20:42.710 --> 00:20:46.330 then it gets published but it gets published independent of what the result 00:20:46.330 --> 00:20:53.830 was. And there of course other things you can do to improve science, there's a lot 00:20:53.830 --> 00:20:58.610 of talk about sharing data, sharing code, sharing methods because if you want to 00:20:58.610 --> 00:21:04.130 replicate a study it's of course easier if you have access to all the details how the 00:21:04.130 --> 00:21:11.090 original study was done. Then you could say "Okay we could do large 00:21:11.090 --> 00:21:15.269 collaborations" because many studies are just too small if you have a study with 00:21:15.269 --> 00:21:19.630 twenty people you just don't get a very reliable outcome. So maybe in many 00:21:19.630 --> 00:21:25.669 situations it would be better get together 10 teams of scientists and let them all do 00:21:25.669 --> 00:21:31.640 a big study together and then you can reliably answer a question. And also some 00:21:31.640 --> 00:21:36.390 people propose just to get higher statistical thresholds that p-value of 00:21:36.390 --> 00:21:42.260 0.05 means practically nothing. There was recently a paper that just argued which 00:21:42.260 --> 00:21:47.880 would just like put the dot one more to the left and have 0.005 and that would 00:21:47.880 --> 00:21:55.029 already solve a lot of problems. And for example in physics they have they have 00:21:55.029 --> 00:22:00.870 something called Sigma 5 which is I think zero point and then 5 zeroes and 3 or 00:22:00.870 --> 00:22:08.350 something like that so in physics they have much higher statistical thresholds. 00:22:08.350 --> 00:22:13.210 Now whatever if you're working in any scientific field you might ask yourself 00:22:13.210 --> 00:22:20.200 like "If we have statistic results are they pre registered in any way and do we 00:22:20.200 --> 00:22:26.380 publish negative results?" like we tested an effect and we got nothing and are there 00:22:26.380 --> 00:22:32.350 replications of all relevant results and I would say if you answer all these 00:22:32.350 --> 00:22:36.289 questions with "no" which I think many people will do, then you're not really 00:22:36.289 --> 00:22:41.510 doing science what you're doing is the alchemy of our time. 00:22:41.510 --> 00:22:50.220 Applause Thanks. 00:22:50.220 --> 00:22:54.499 Herald: Thank you very much.. Hanno: No I have more, sorry, I have 00:22:54.499 --> 00:23:03.060 three more slides, that was not the finishing line. Big issue is also that 00:23:03.060 --> 00:23:09.830 there are bad incentives in science, so a very standard thing to evaluate the impact 00:23:09.830 --> 00:23:15.710 of science is citation counts for you say "if your scientific study is cited a lot 00:23:15.710 --> 00:23:18.960 then this is a good thing and if your journal is cited a lot this is a good 00:23:18.960 --> 00:23:22.390 thing" and this for example the impact factor but there are also other 00:23:22.390 --> 00:23:27.059 measurements. And also universities like publicity so if your study gets a lot of 00:23:27.059 --> 00:23:33.490 media reports then your press department likes you. And these incentives tend to 00:23:33.490 --> 00:23:40.200 favor interesting results but they don't favor correct results and this is bad 00:23:40.200 --> 00:23:44.899 because if we are realistic most results are not that interesting, most results 00:23:44.899 --> 00:23:49.879 will be "Yeah we have this interesting and counterintuitive theory and it's totally 00:23:49.879 --> 00:24:00.470 wrong" and then there's this idea that science is self-correcting. So if you 00:24:00.470 --> 00:24:05.320 confront scientists with these issues with publication bias and peer hacking surely 00:24:05.320 --> 00:24:11.909 they will immediately change that's what scientists do right? And I want to cite 00:24:11.909 --> 00:24:16.259 something here with this sorry it's a bit long but "There are some evidence that 00:24:16.259 --> 00:24:21.329 inferior statistical tests are commonly used research which yields non significant 00:24:21.329 --> 00:24:28.730 results is not published." That sounds like publication bias and then it also 00:24:28.730 --> 00:24:32.450 says: "Significant results published in these fields are seldom verified by 00:24:32.450 --> 00:24:37.889 independent replication" so it seems there's a replication problem. These wise 00:24:37.889 --> 00:24:46.750 words were set in 1959, so by a statistician called Theodore Sterling and 00:24:46.750 --> 00:24:52.059 because science is so self-correcting in 1995 he complained that this article 00:24:52.059 --> 00:24:56.389 presents evidence that published result of scientific investigations are not a 00:24:56.389 --> 00:25:01.240 representative sample of all scientific studies. "These results also indicate that 00:25:01.240 --> 00:25:06.899 practice leading to publication bias has not changed over a period of 30 years" and 00:25:06.899 --> 00:25:13.030 here we are in 2018 and publication bias is still a problem. So if science is self- 00:25:13.030 --> 00:25:21.090 correcting then it's pretty damn slow in correcting itself, right? And finally I 00:25:21.090 --> 00:25:27.400 would like to ask you, if you're prepared for boring science, because ultimately, I 00:25:27.400 --> 00:25:31.950 think, we have a choice between what I would like to call TEDTalk science and 00:25:31.950 --> 00:25:40.980 boring science.. Applause 00:25:40.980 --> 00:25:46.779 .. so with tedtalk science we get mostly positive and surprising results and 00:25:46.779 --> 00:25:53.380 interesting results we have large defects many citations lots of media attention and 00:25:53.380 --> 00:26:00.139 you may have a TED talk about it. Unfortunately usually it's not true and I 00:26:00.139 --> 00:26:03.820 would like to propose boring science as the alternative which is mostly negative 00:26:03.820 --> 00:26:11.620 results, pretty boring, small effects but it may be closer to the truth. And I would 00:26:11.620 --> 00:26:18.230 like to have boring science but I know it's a pretty tough sell. Sorry I didn't 00:26:18.230 --> 00:26:35.280 hear that. Yeah, thanks for listening. Applause 00:26:35.280 --> 00:26:38.480 Herald: Thank you. Hanno: Two questions, or? 00:26:38.480 --> 00:26:41.030 Herald: We don't have that much time for questions, three minutes, three minutes 00:26:41.030 --> 00:26:45.250 guys. Question one - shoot. Mic: This isn't a question but I just 00:26:45.250 --> 00:26:48.700 wanted to comment Hanno you missed out a very critical topic here, which is the use 00:26:48.700 --> 00:26:53.130 of Bayesian probability. So you did conflate p-values with the scientific 00:26:53.130 --> 00:26:57.260 method which isn't.. which gave the rest of you talk. I felt a slightly unnecessary 00:26:57.260 --> 00:27:02.380 anti science slant. On p, p-values isn't the be-all and end-all of the scientific 00:27:02.380 --> 00:27:06.840 method so p-values is sort of calculating the probability that your data will happen 00:27:06.840 --> 00:27:10.860 given that no hypothesis is true whereas Bayesian probability would be calculating 00:27:10.860 --> 00:27:15.960 the probability that your hypothesis is true given the data and more and more 00:27:15.960 --> 00:27:19.559 scientists are slowly starting to realize that this sort of method is probably a 00:27:19.559 --> 00:27:25.809 better way of doing science than p-values. So this is probably a a third alternative 00:27:25.809 --> 00:27:29.950 to your sort of proposal boring science is doing the other side's Bayesian 00:27:29.950 --> 00:27:34.029 probability. Hanno: Sorry yeah, I agree with you I 00:27:34.029 --> 00:27:37.530 unfortunately I only had half an hour here. 00:27:37.530 --> 00:27:40.610 Herald: Where are you going after this like where are we going after this lecture 00:27:40.610 --> 00:27:46.269 can they find you somewhere in the bar? Hanno: I know him.. 00:27:46.269 --> 00:27:50.559 Herald: You know science is broken but then scientists it's a little bit like the 00:27:50.559 --> 00:27:54.990 next lecture actually that's waiting there it's like: "you scratch my back and I 00:27:54.990 --> 00:27:59.160 scratch yours for publication". Hanno: Maybe two more minutes? 00:27:59.160 --> 00:28:04.870 Herald: One minute. Please go ahead. 00:28:04.870 --> 00:28:11.820 Mic: Yeah hi, thank you for your talk. I'm curious so you've raised, you know, ways 00:28:11.820 --> 00:28:15.529 we can address this assuming good actors, assuming people who want to do better 00:28:15.529 --> 00:28:20.769 science that this happens out of ignorance or willful ignorance. What do we do about 00:28:20.769 --> 00:28:26.389 bad actors. So for example the medical community drug companies, maybe they 00:28:26.389 --> 00:28:29.539 really like the idea of being profitably incentivized by these random control 00:28:29.539 --> 00:28:34.929 trials, to make out essentially a placebo do something. How do we begin to address 00:28:34.929 --> 00:28:40.639 them current trying to maliciously p-hack or maliciously abuse the pre-reg system or 00:28:40.639 --> 00:28:44.409 something like that? Hanno: I mean it's a big question, right? 00:28:44.409 --> 00:28:50.660 But I think if the standards are kind of confining you so much that there's not 00:28:50.660 --> 00:28:56.380 much room to cheat that's way out right and a basis and also I don't think 00:28:56.380 --> 00:29:00.110 deliberate cheating is that much of a problem, I actually really think the 00:29:00.110 --> 00:29:07.120 bigger problem is people honestly believe what they do is true. 00:29:07.120 --> 00:29:15.640 Herald: Okay one last, you sir, please? Mic: So the value in science is often an 00:29:15.640 --> 00:29:20.559 account of publications right? Account of citations so and so on, so is it true that 00:29:20.559 --> 00:29:24.799 to improve this situation you've described, journals of whose publications 00:29:24.799 --> 00:29:31.120 are available, who are like prospective, should impose more higher standards so the 00:29:31.120 --> 00:29:37.470 journals are those who must like raise the bar, they should enforce publication of 00:29:37.470 --> 00:29:43.330 protocols before like accepting and etc etc. So is it journals who should, like, 00:29:43.330 --> 00:29:49.340 do work on that or can we regular scientists do something also? I mean you 00:29:49.340 --> 00:29:53.270 can publish in the journals that have better standards, right? There are 00:29:53.270 --> 00:29:59.299 journals that have these registered reports, but of course I mean as a single 00:29:59.299 --> 00:30:03.360 scientist is always difficult because you're playing in a system that has all 00:30:03.360 --> 00:30:06.580 these wrong incentives. Herald: Okay guys that's it, we have to 00:30:06.580 --> 00:30:12.670 shut down. Please. There is a reference better science dot-org, go there, and one 00:30:12.670 --> 00:30:16.299 last request give really warm applause! 00:30:16.299 --> 00:30:24.249 Applause 00:30:24.249 --> 00:30:29.245 34c3 outro 00:30:29.245 --> 00:30:46.000 subtitles created by c3subtitles.de in the year 2018. Join, and help us!