0:00:00.099,0:00:14.890 34c3 intro 0:00:14.890,0:00:19.090 Hanno Böck: Yeah, so many of you probably[br]know me from doing things around IT 0:00:19.090,0:00:25.000 security, but I'm gonna surprise you to[br]almost not talk about IT security today. 0:00:25.000,0:00:32.189 But I'm gonna ask the question "Can we[br]trust the scientific method?". I want to 0:00:32.189,0:00:38.809 start this by giving you which is quite a[br]simple example. So if we do science like 0:00:38.809,0:00:45.210 we start with the theory and then we are[br]trying to test if it's true, right? So I 0:00:45.210,0:00:49.760 mean I said I'm not going to talk about IT[br]security but I chose an example from IT 0:00:49.760,0:00:56.690 security or kind of from IT security. So[br]there was a post on Reddit a while ago, 0:00:56.690,0:01:01.329 a picture from some book which claimed that[br]if you use a Malachite crystal that can 0:01:01.329,0:01:06.240 protect you from computer viruses.[br]Which... to me doesn't sound very 0:01:06.240,0:01:11.009 plausible, right? Like, these are crystals and[br]if you put them on your computer, this book 0:01:11.009,0:01:18.590 claims this protects you from malware. But[br]of course if we really want to know, we 0:01:18.590,0:01:23.990 could do a study on this. And if you say[br]people don't do Studies on crazy things: 0:01:23.990,0:01:28.770 that's wrong. I mean people do studies on[br]homeopathy or all kinds of crazy things 0:01:28.770,0:01:34.549 that are completely implausible. So we can[br]do a study on this and what we will do is 0:01:34.549,0:01:39.509 we will do a randomized control trial,[br]which is kind of the gold standard of 0:01:39.509,0:01:46.310 doing a test on these kinds of things. So[br]this is our question: "Do Malachite 0:01:46.310,0:01:52.479 crystals prevent malware infections?" and[br]how we would test that, our study design 0:01:52.479,0:01:58.399 is: ok, we take a group of maybe 20[br]computer users. And then we split them 0:01:58.399,0:02:06.009 randomly to two groups, and then one group[br]we'll give one of these crystals and tell 0:02:06.009,0:02:10.919 them: "Put them on your desk or on your[br]computer.". Then we need, the other group 0:02:10.919,0:02:15.800 is our control group. That's very[br]important because if we want to know if 0:02:15.800,0:02:20.940 they help we need another group to compare[br]it to. And to rule out that there are any 0:02:20.940,0:02:27.130 kinds of placebo effects, we give these[br]control groups a fake Malachite crystal so 0:02:27.130,0:02:32.260 we can compare them against each other.[br]And then we wait for maybe six months and 0:02:32.260,0:02:39.310 then we check how many malware infections[br]they had. Now, I didn't do that study, but 0:02:39.310,0:02:45.090 I simulated it with a Python script and[br]given that I don't believe that this 0:02:45.090,0:02:50.310 theory is true I just simulated this as[br]random data. So I'm not going to go 0:02:50.310,0:02:55.090 through the whole script but I'm just like[br]generating, I'm assuming there can be 0:02:55.090,0:02:59.950 between 0 and 3 malware infections and[br]it's totally random and then I compare the 0:02:59.950,0:03:04.790 two groups. And then I calculate something[br]which is called a p-value which is a very 0:03:04.790,0:03:10.730 common thing in science whenever you do[br]statistics. A p-value is, it's a bit 0:03:10.730,0:03:17.290 technical, but it's the probability that[br]if you have no effect that you would get 0:03:17.290,0:03:23.570 this result. Which kind of in another way[br]means, if you have 20 results in an 0:03:23.570,0:03:29.260 idealized world then one of them is a[br]false positive which means one of them 0:03:29.260,0:03:34.510 says something happens although it[br]doesn't. And in many fields of science 0:03:34.510,0:03:41.180 this p-value of 0.05 is considered that[br]significant which is like these twenty 0:03:41.180,0:03:48.620 studies. So one error in twenty studies[br]but as I said under idealized conditions. 0:03:48.620,0:03:53.330 So and as it's the script and I can run it[br]in less than a second I just did it twenty 0:03:53.330,0:03:59.821 times instead of once. So here are my 20[br]simulated studies and most of them look 0:03:59.821,0:04:06.360 not very interesting so of course we have[br]a few random variations but nothing very 0:04:06.360,0:04:12.460 significant. Except if you look at this[br]one study, it says the people with the 0:04:12.460,0:04:17.160 Malachite crystal had on average 1.8[br]malware infections and the people with the 0:04:17.160,0:04:24.670 fake crystal had 0.8. So it means actually[br]the crystal made it worse. But also this 0:04:24.670,0:04:32.100 result is significant because it has a[br]p-value of 0.03. So of course we can 0:04:32.100,0:04:36.110 publish that, assuming I really did these[br]studies. 0:04:36.110,0:04:40.600 applause[br]B.: And the other studies we just forget 0:04:40.600,0:04:45.850 about. I mean they were not interesting[br]right and who cares? Non significant 0:04:45.850,0:04:52.990 results... Okay so you have just seen that[br]I created a significant result out of 0:04:52.990,0:05:00.590 random data. And that's concerning because[br]people in science - I mean you can really do 0:05:00.590,0:05:07.850 that. And this phenomena is called[br]publication bias. So what's happening here 0:05:07.850,0:05:13.130 is that, you're doing studies and if they[br]get a positive result - meaning you're 0:05:13.130,0:05:18.990 seeing an effect, then you publish them[br]and if there's no effect you just forget 0:05:18.990,0:05:26.670 about them. We learned earlier that with[br]this p-value of 0.05 means 1 in 20 studies 0:05:26.670,0:05:32.760 is a false positive, but you usually don't[br]see the studies that are not significant, 0:05:32.760,0:05:39.320 because they don't get published. And you[br]may wonder: "Ok, what's stopping a 0:05:39.320,0:05:43.500 scientist from doing exactly this? What's[br]stopping a scientist from just doing so 0:05:43.500,0:05:47.750 many experiments till one of them looks[br]like it's a real result although it's just 0:05:47.750,0:05:54.710 a random fluke?". And the disconcerning[br]answer to that is, it's usually nothing. 0:05:56.760,0:06:03.620 And this is not just a theoretical[br]example. I want to give you an example, 0:06:03.620,0:06:09.110 that has quite some impact and that was[br]researched very well, and that is a 0:06:09.110,0:06:17.980 research on antidepressants so called[br]SSRIs. And in 2008 there was a study, the 0:06:17.980,0:06:22.680 interesting situation here was, that the[br]US Food and Drug Administration, which is 0:06:22.680,0:06:29.480 the authority that decides whether a[br]medical drug can be put on the market, 0:06:29.480,0:06:35.490 they had knowledge about all the studies[br]that had been done to register this 0:06:35.490,0:06:40.380 medication. And then some researchers[br]looked at that and compared it with what 0:06:40.380,0:06:45.810 has been published. And they figured out[br]there were 38 studies that saw that these 0:06:45.810,0:06:51.040 medications had a real effect, had real[br]improvements for patients. And from those 0:06:51.040,0:06:56.790 38 studies 37 got published. But then[br]there were 36 studies that said: "These 0:06:56.790,0:07:00.010 medications don't really have any[br]effect.", "They are not really better than 0:07:00.010,0:07:06.530 a placebo effect" and out of those only 14[br]got published. And even from those 14 0:07:06.530,0:07:11.010 there were 11, where the researcher said,[br]okay they have spent the result in a way 0:07:11.010,0:07:17.920 that it sounds like these medications do[br]something. But they were also a bunch of 0:07:17.920,0:07:21.870 studies that were just not published[br]because they had a negative result. And 0:07:21.870,0:07:26.390 it's clear that if you look at the[br]published studies only and you ignore the 0:07:26.390,0:07:29.320 studies with a negative result that[br]haven't been published, then these 0:07:29.320,0:07:34.290 medications look much better than they[br]really are. And it's not like the earlier 0:07:34.290,0:07:38.240 example there is a real effect from[br]antidepressants, but they are not as good 0:07:38.240,0:07:40.210 as people have believed in the past. 0:07:43.020,0:07:45.860 So we've learnt in theory with publication bias 0:07:45.860,0:07:50.520 you can create result out of nothing.[br]But if you're a researcher and you have a 0:07:50.520,0:07:54.790 theory that's not true but you really want[br]to publish something about it, that's not 0:07:54.790,0:07:59.699 really efficient, because you have to do[br]20 studies on average to get one of these 0:07:59.699,0:08:06.130 random results that look like real[br]results. So there are more efficient ways 0:08:06.130,0:08:12.780 to get to a result from nothing. If you're[br]doing a study then there are a lot of 0:08:12.780,0:08:17.320 micro decisions you have to make, for[br]example you may have dropouts from your 0:08:17.320,0:08:22.150 study where people, I don't know they move[br]to another place or they - you now longer 0:08:22.150,0:08:26.020 reach them, so they are no longer part of[br]your study. And there are different things 0:08:26.020,0:08:30.480 how you can handle that. Then you may have[br]cornercase results, where you're not 0:08:30.480,0:08:34.509 entirely sure: "Is this an effect or not[br]and how do you decide?", "How do you 0:08:34.509,0:08:39.639 exactly measure?". And then also you may[br]be looking for different things, maybe 0:08:39.639,0:08:46.620 there are different tests you can do on[br]people, and you may control for certain 0:08:46.620,0:08:51.639 variables like "Do you split men and women[br]into separate?", "Do you see them 0:08:51.639,0:08:56.430 separately?" or "Do you separate them by[br]age?". So there are many decisions you can 0:08:56.430,0:09:02.050 make while doing a study. And of course[br]each of these decisions has a small effect 0:09:02.050,0:09:10.399 on the result. And it may very often be,[br]that just by trying all the combinations 0:09:10.399,0:09:15.230 you will get a p-value that looks like[br]it's statistically significant, although 0:09:15.230,0:09:20.670 there's no real effect. So and there's[br]this term called p-Hacking which means 0:09:20.670,0:09:25.550 you're just adjusting your methods long[br]enough, that you get a significant result. 0:09:27.050,0:09:32.550 And I'd like to point out here, that this[br]is usually not that a scientist says: "Ok, 0:09:32.550,0:09:36.259 today I'm going to p-hack my result,[br]because I know my theory is wrong but I 0:09:36.259,0:09:42.420 want to show it's true.". But it's a[br]subconscious process, because usually the 0:09:42.420,0:09:47.399 scientists believe in their theories.[br]Honestly. They honestly think that their 0:09:47.399,0:09:52.040 theory is true and that their research[br]will show that. So they may subconsciously 0:09:52.040,0:09:58.279 say: "Ok, if I analyze my data like this[br]it looks a bit better so I will do this.". 0:09:58.279,0:10:05.079 So subconsciously, they may p-hack[br]themselves into getting a result that's 0:10:05.079,0:10:11.449 not really there. And again we can ask:[br]"What is stopping scientists from 0:10:11.449,0:10:22.009 p-hacking?". And the concerning answer is[br]the same: usually nothing. And I came to 0:10:22.009,0:10:26.069 this conclusion that I say: "Ok, the[br]scientific method it's a way to create 0:10:26.069,0:10:31.899 evidence for whatever theory you like. No[br]matter if it's true or not.". And you may 0:10:31.899,0:10:35.720 say: "That's a pretty bold thing to say.".[br]and I'm saying this even though I'm not 0:10:35.720,0:10:42.480 even a scientist. I'm just like some[br]hacker who, whatever... But I'm not alone 0:10:42.480,0:10:47.759 in this, like there's a paper from a[br]famous researcher John Ioannidis, who 0:10:47.759,0:10:51.529 said: "Why most published research[br]findings are false.". He published this in 0:10:51.529,0:10:57.170 2005 and if you look at the title, he[br]doesn't really question that most research 0:10:57.170,0:11:02.560 findings are false. He only wants to give[br]reasons why this is the case. And he makes 0:11:02.560,0:11:08.499 some very possible assumptions if you look[br]at that many negative results don't get 0:11:08.499,0:11:12.129 published, and that you will have some[br]bias. And it comes to a very plausible 0:11:12.129,0:11:17.180 conclusion, that this is the case and this[br]is not even very controversial. If you ask 0:11:17.180,0:11:23.491 people who are doing what you can call[br]science on science or meta science, who 0:11:23.491,0:11:28.410 look at scientific methodology, they will[br]tell you: "Yeah, of course that's the 0:11:28.410,0:11:32.079 case.". Some will even say: "Yeah, that's[br]how science works, that's what we 0:11:32.079,0:11:37.689 expect.". But I find it concerning. And if[br]you take this seriously, it means: if you 0:11:37.689,0:11:43.160 read about a study, like in a newspaper,[br]the default assumption should be 'that's 0:11:43.160,0:11:51.179 not true' - while we might usually think[br]the opposite. And if science is a method 0:11:51.179,0:11:55.709 to create evidence for whatever you like,[br]you can think about something really 0:11:55.709,0:12:00.939 crazy, like "Can people see into the future?",[br]"Does our mind have[br] 0:12:00.939,0:12:09.720 some extra perception where we can[br]sense things that happen in an hour?". And 0:12:09.720,0:12:15.559 there was a psychologist called Daryl Bem[br]and he thought that this is the case and 0:12:15.559,0:12:20.399 he published a study on it. It was titled[br]"feeling the future". He did a lot of 0:12:20.399,0:12:25.449 experiments where he did something, and[br]then something later happened, and he 0:12:25.449,0:12:29.569 thought he had statistical evidence that[br]what happened later influenced what 0:12:29.569,0:12:34.999 happened earlier. So, I don't think that's[br]very plausible - based on what we know 0:12:34.999,0:12:41.550 about the universe, but yeah... and it was[br]published in a real psychology journal. 0:12:41.550,0:12:46.680 And a lot of things were wrong with this[br]study. Basically, it's a very nice example 0:12:46.680,0:12:51.009 for p-hacking and just even a book by[br]Daryl Bem, where he describes something 0:12:51.009,0:12:55.040 which basically looks like p-hacking,[br]where he says that's how you do 0:12:55.040,0:13:03.870 psychology. But the study was absolutely[br]in line with the existing standards in 0:13:03.870,0:13:08.759 Experimental Psychology. And that a lot of[br]people found concerning. So, if you can 0:13:08.759,0:13:13.619 show that precognition is real, that you[br]can see into the future, then what else 0:13:13.619,0:13:19.139 can you show and how can we trust our[br]results? And psychology has debated this a 0:13:19.139,0:13:21.880 lot in the past couple of years. So[br]there's a lot of talk about the 0:13:21.880,0:13:30.009 replication crisis in psychology. And many[br]effects that psychology just thought were 0:13:30.009,0:13:35.040 true, they figured out, okay, if they try[br]to repeat these experiments, they couldn't 0:13:35.040,0:13:40.759 get these results even though entire[br]subfields were built on these results. 0:13:44.369,0:13:48.069 And I want to show you an example, which[br]is one of the ones that is not discussed so 0:13:48.069,0:13:55.540 much. So there's a theory which is called[br]moral licensing. And the idea is that if 0:13:55.540,0:14:00.649 you do something good, or something you[br]think is good, then later basically you 0:14:00.649,0:14:04.880 behave like an asshole. Because you think[br]I already did something good now, I don't 0:14:04.880,0:14:10.689 have to be so nice anymore. And there were[br]some famous studies that had the theory, 0:14:10.689,0:14:17.870 that people consume organic food, that[br]later they become more judgmental, or less 0:14:17.870,0:14:27.949 social, less nice to their peers. But just[br]last week someone tried to replicate this 0:14:27.949,0:14:32.720 original experiments. And they tried it[br]three times with more subjects and better 0:14:32.720,0:14:39.010 research methodology and they totally[br]couldn't find that effect. But like what 0:14:39.010,0:14:43.790 you've seen here is lots of media[br]articles. I have not found a single 0:14:43.790,0:14:51.179 article reporting that this could not be[br]replicated. Maybe they will come but yeah 0:14:51.179,0:14:57.360 there's just a very recent example. But[br]now I want to have a small warning for you 0:14:57.360,0:15:01.319 because you may think now "yeah these[br]psychologists, that all sounds very 0:15:01.319,0:15:05.329 fishy and they even believe in[br]precognition and whatever", but maybe your 0:15:05.329,0:15:09.889 field is not much better maybe you just[br]don't know about it yet because nobody 0:15:09.889,0:15:15.990 else has started replicating studies in[br]your field. And there are other fields 0:15:15.990,0:15:21.670 that have replication problems and some[br]much worse for example the pharma company 0:15:21.670,0:15:27.279 Amgen in 2012 they published something[br]where they said "We have tried to 0:15:27.279,0:15:32.940 replicate cancer research and preclinical[br]research" that is stuff in a petri dish or 0:15:32.940,0:15:38.869 animal experiments so not drugs on humans[br]but what happens before you develop a drug 0:15:38.869,0:15:44.699 and they were only able to replicate 47[br]out of 53 studies. And these were they 0:15:44.699,0:15:50.050 said landmark studies, so studies that[br]have been published in the best journals. 0:15:50.050,0:15:54.099 Now there are a few problems with this[br]publication because they have not 0:15:54.099,0:15:58.760 published their applications they have not[br]told us which studies these were that they 0:15:58.760,0:16:02.730 could not replicate. In the meantime I[br]think they have published three of these 0:16:02.730,0:16:07.290 replications but most of it is a bit in[br]the dark which points to another problem 0:16:07.290,0:16:10.689 because they say they did this because[br]they collaborated with the original 0:16:10.689,0:16:16.109 researchers and they only did this by[br]agreeing that they would not publish the 0:16:16.109,0:16:22.379 results. But it still sounds very[br]concerning so but some fields don't have a 0:16:22.379,0:16:27.170 replication problem because just nobody is[br]trying to replicate previous results I 0:16:27.170,0:16:34.269 mean then you will never know if your[br]results hold up. So what can be done about 0:16:34.269,0:16:42.930 all this and fundamentally I think the[br]core issue here is that the scientific 0:16:42.930,0:16:49.970 process is tied together with results, so[br]we do a study and only after that we 0:16:49.970,0:16:54.759 decide whether it's going to be published.[br]Or we do a study and only after we have 0:16:54.759,0:17:01.230 the data we're trying to analyze it. So[br]essentially we need to decouple the 0:17:01.230,0:17:09.800 scientific process from its results and[br]one way of doing that is pre-registration 0:17:09.800,0:17:14.490 so what you're doing there is that before[br]you start doing a study you will register 0:17:14.490,0:17:20.500 it in a public register and say "I'm gonna[br]do a study like on this medication or 0:17:20.500,0:17:25.670 whatever on this psychological effect" and[br]that's how I'm gonna do it and then later 0:17:25.670,0:17:33.980 on people can check if you really did[br]that. And yeah that's what I said. And this 0:17:33.980,0:17:41.179 is more or less standard practice in[br]medical drug trials the summary about it 0:17:41.179,0:17:47.130 is it does not work very well but it's[br]better than nothing. So, and the problem 0:17:47.130,0:17:52.029 is mostly enforcement so people register[br]study and then don't publish it and 0:17:52.029,0:17:57.190 nothing happens to them even though they[br]are legally required to publish it. And 0:17:57.190,0:18:01.889 there are two campaigns I'd like to point[br]out, there's the all trials campaign which 0:18:01.889,0:18:08.149 has been started by Ben Goldacre he's a[br]doctor from the UK and they like demand 0:18:08.149,0:18:13.330 that like every trial it's done on[br]medication should be published. And 0:18:13.330,0:18:18.870 there's also a project by the same guy the[br]compare project and they are trying to see 0:18:18.870,0:18:25.380 if a medical trial has been registered and[br]later published did they do the same or 0:18:25.380,0:18:29.480 did they change something in their[br]protocol and was there a reason for it or 0:18:29.480,0:18:36.799 did they just change it to get a result,[br]which they otherwise wouldn't get.But then 0:18:36.799,0:18:41.080 again like these issues in medicine they[br]offer get a lot of attention and for good 0:18:41.080,0:18:46.820 reasons because if we have bad science in[br]medicine then people die, that's pretty 0:18:46.820,0:18:52.960 immediate and pretty massive. But if you[br]read about this you always have to think 0:18:52.960,0:18:58.510 that these issues in drug trials at least[br]they have pre-registration, most 0:18:58.510,0:19:04.330 scientific fields don't bother doing[br]anything like that. So whenever you hear 0:19:04.330,0:19:08.470 something about maybe about publication[br]bias in medicine you should always think 0:19:08.470,0:19:12.630 the same thing happens in many fields of[br]science and usually nobody is doing 0:19:12.630,0:19:18.809 anything about it. And particularly to[br]this audience I'd like to say there's 0:19:18.809,0:19:23.580 currently a big trend that people from[br]computer science want to revolutionize 0:19:23.580,0:19:30.300 medicine: big data and machine learning,[br]these things, which in principle is ok but 0:19:30.300,0:19:34.750 I know a lot of people in medicine are[br]very worried about this and the reason is, 0:19:34.750,0:19:39.470 that these computer science people don't[br]have the same scientific standards as 0:19:39.470,0:19:44.399 people in medicine expect them and might[br]say "Yeah we don't need really need to do 0:19:44.399,0:19:50.450 a study on this it's obvious that this[br]helps" and that is worrying and I come 0:19:50.450,0:19:53.580 from computer science and I very well[br]understand that people from medicine are 0:19:53.580,0:20:00.540 worried about this. So there's an idea[br]that goes even further as pre-registration 0:20:00.540,0:20:05.210 and it's called registered reports. There[br]is a couple of years ago some scientists 0:20:05.210,0:20:10.539 wrote an open letter to the Guardian where[br]they.. that was published there and the idea 0:20:10.539,0:20:16.451 there is that you turn the scientific[br]publication process upside down, so if you 0:20:16.451,0:20:21.210 want to do a study the first thing you[br]would do with the register report is, you 0:20:21.210,0:20:27.000 submit your design your study design[br]protocol to the journal and then the 0:20:27.000,0:20:33.110 journal decides whether they will publish[br]that before they see any result, because 0:20:33.110,0:20:36.990 then you can prevent publication bias and[br]then you prevent the journals only publish 0:20:36.990,0:20:42.710 the nice findings and ignore the negative[br]findings. And then you do the study and 0:20:42.710,0:20:46.330 then it gets published but it gets[br]published independent of what the result 0:20:46.330,0:20:53.830 was. And there of course other things you[br]can do to improve science, there's a lot 0:20:53.830,0:20:58.610 of talk about sharing data, sharing code,[br]sharing methods because if you want to 0:20:58.610,0:21:04.130 replicate a study it's of course easier if[br]you have access to all the details how the 0:21:04.130,0:21:11.090 original study was done. Then you could[br]say "Okay we could do large 0:21:11.090,0:21:15.269 collaborations" because many studies are[br]just too small if you have a study with 0:21:15.269,0:21:19.630 twenty people you just don't get a very[br]reliable outcome. So maybe in many 0:21:19.630,0:21:25.669 situations it would be better get together[br]10 teams of scientists and let them all do 0:21:25.669,0:21:31.640 a big study together and then you can[br]reliably answer a question. And also some 0:21:31.640,0:21:36.390 people propose just to get higher[br]statistical thresholds that p-value of 0:21:36.390,0:21:42.260 0.05 means practically nothing. There was[br]recently a paper that just argued which 0:21:42.260,0:21:47.880 would just like put the dot one more to[br]the left and have 0.005 and that would 0:21:47.880,0:21:55.029 already solve a lot of problems. And for[br]example in physics they have they have 0:21:55.029,0:22:00.870 something called Sigma 5 which is I think[br]zero point and then 5 zeroes and 3 or 0:22:00.870,0:22:08.350 something like that so in physics they[br]have much higher statistical thresholds. 0:22:08.350,0:22:13.210 Now whatever if you're working in any[br]scientific field you might ask yourself 0:22:13.210,0:22:20.200 like "If we have statistic results are[br]they pre registered in any way and do we 0:22:20.200,0:22:26.380 publish negative results?" like we tested[br]an effect and we got nothing and are there 0:22:26.380,0:22:32.350 replications of all relevant results and I[br]would say if you answer all these 0:22:32.350,0:22:36.289 questions with "no" which I think many[br]people will do, then you're not really 0:22:36.289,0:22:41.510 doing science what you're doing is the[br]alchemy of our time. 0:22:41.510,0:22:50.220 Applause[br]Thanks. 0:22:50.220,0:22:54.499 Herald: Thank you very much..[br]Hanno: No I have more, sorry, I have 0:22:54.499,0:23:03.060 three more slides, that was not the[br]finishing line. Big issue is also that 0:23:03.060,0:23:09.830 there are bad incentives in science, so a[br]very standard thing to evaluate the impact 0:23:09.830,0:23:15.710 of science is citation counts for you say[br]"if your scientific study is cited a lot 0:23:15.710,0:23:18.960 then this is a good thing and if your[br]journal is cited a lot this is a good 0:23:18.960,0:23:22.390 thing" and this for example the impact[br]factor but there are also other 0:23:22.390,0:23:27.059 measurements. And also universities like[br]publicity so if your study gets a lot of 0:23:27.059,0:23:33.490 media reports then your press department[br]likes you. And these incentives tend to 0:23:33.490,0:23:40.200 favor interesting results but they don't[br]favor correct results and this is bad 0:23:40.200,0:23:44.899 because if we are realistic most results[br]are not that interesting, most results 0:23:44.899,0:23:49.879 will be "Yeah we have this interesting and[br]counterintuitive theory and it's totally 0:23:49.879,0:24:00.470 wrong" and then there's this idea that[br]science is self-correcting. So if you 0:24:00.470,0:24:05.320 confront scientists with these issues with[br]publication bias and peer hacking surely 0:24:05.320,0:24:11.909 they will immediately change that's what[br]scientists do right? And I want to cite 0:24:11.909,0:24:16.259 something here with this sorry it's a bit[br]long but "There are some evidence that 0:24:16.259,0:24:21.329 inferior statistical tests are commonly[br]used research which yields non significant 0:24:21.329,0:24:28.730 results is not published." That sounds[br]like publication bias and then it also 0:24:28.730,0:24:32.450 says: "Significant results published in[br]these fields are seldom verified by 0:24:32.450,0:24:37.889 independent replication" so it seems[br]there's a replication problem. These wise 0:24:37.889,0:24:46.750 words were set in 1959, so by a[br]statistician called Theodore Sterling and 0:24:46.750,0:24:52.059 because science is so self-correcting in[br]1995 he complained that this article 0:24:52.059,0:24:56.389 presents evidence that published result of[br]scientific investigations are not a 0:24:56.389,0:25:01.240 representative sample of all scientific[br]studies. "These results also indicate that 0:25:01.240,0:25:06.899 practice leading to publication bias has[br]not changed over a period of 30 years" and 0:25:06.899,0:25:13.030 here we are in 2018 and publication bias[br]is still a problem. So if science is self- 0:25:13.030,0:25:21.090 correcting then it's pretty damn slow in[br]correcting itself, right? And finally I 0:25:21.090,0:25:27.400 would like to ask you, if you're prepared[br]for boring science, because ultimately, I 0:25:27.400,0:25:31.950 think, we have a choice between what I[br]would like to call TEDTalk science and 0:25:31.950,0:25:40.980 boring science..[br]Applause 0:25:40.980,0:25:46.779 .. so with tedtalk science we get mostly[br]positive and surprising results and 0:25:46.779,0:25:53.380 interesting results we have large defects[br]many citations lots of media attention and 0:25:53.380,0:26:00.139 you may have a TED talk about it.[br]Unfortunately usually it's not true and I 0:26:00.139,0:26:03.820 would like to propose boring science as[br]the alternative which is mostly negative 0:26:03.820,0:26:11.620 results, pretty boring, small effects but[br]it may be closer to the truth. And I would 0:26:11.620,0:26:18.230 like to have boring science but I know[br]it's a pretty tough sell. Sorry I didn't 0:26:18.230,0:26:35.280 hear that. Yeah, thanks for listening.[br]Applause 0:26:35.280,0:26:38.480 Herald: Thank you.[br]Hanno: Two questions, or? 0:26:38.480,0:26:41.030 Herald: We don't have that much time for[br]questions, three minutes, three minutes 0:26:41.030,0:26:45.250 guys. Question one - shoot.[br]Mic: This isn't a question but I just 0:26:45.250,0:26:48.700 wanted to comment Hanno you missed out a[br]very critical topic here, which is the use 0:26:48.700,0:26:53.130 of Bayesian probability. So you did[br]conflate p-values with the scientific 0:26:53.130,0:26:57.260 method which isn't.. which gave the rest[br]of you talk. I felt a slightly unnecessary 0:26:57.260,0:27:02.380 anti science slant. On p, p-values isn't[br]the be-all and end-all of the scientific 0:27:02.380,0:27:06.840 method so p-values is sort of calculating[br]the probability that your data will happen 0:27:06.840,0:27:10.860 given that no hypothesis is true whereas[br]Bayesian probability would be calculating 0:27:10.860,0:27:15.960 the probability that your hypothesis is[br]true given the data and more and more 0:27:15.960,0:27:19.559 scientists are slowly starting to realize[br]that this sort of method is probably a 0:27:19.559,0:27:25.809 better way of doing science than p-values.[br]So this is probably a a third alternative 0:27:25.809,0:27:29.950 to your sort of proposal boring science is[br]doing the other side's Bayesian 0:27:29.950,0:27:34.029 probability.[br]Hanno: Sorry yeah, I agree with you I 0:27:34.029,0:27:37.530 unfortunately I only had[br]half an hour here. 0:27:37.530,0:27:40.610 Herald: Where are you going after this[br]like where are we going after this lecture 0:27:40.610,0:27:46.269 can they find you somewhere in the bar?[br]Hanno: I know him.. 0:27:46.269,0:27:50.559 Herald: You know science is broken but[br]then scientists it's a little bit like the 0:27:50.559,0:27:54.990 next lecture actually that's waiting there[br]it's like: "you scratch my back and I 0:27:54.990,0:27:59.160 scratch yours for publication". Hanno:[br]Maybe two more minutes? 0:27:59.160,0:28:04.870 Herald: One minute.[br]Please go ahead. 0:28:04.870,0:28:11.820 Mic: Yeah hi, thank you for your talk. I'm[br]curious so you've raised, you know, ways 0:28:11.820,0:28:15.529 we can address this assuming good actors,[br]assuming people who want to do better 0:28:15.529,0:28:20.769 science that this happens out of ignorance[br]or willful ignorance. What do we do about 0:28:20.769,0:28:26.389 bad actors. So for example the medical[br]community drug companies, maybe they 0:28:26.389,0:28:29.539 really like the idea of being profitably[br]incentivized by these random control 0:28:29.539,0:28:34.929 trials, to make out essentially a placebo[br]do something. How do we begin to address 0:28:34.929,0:28:40.639 them current trying to maliciously p-hack[br]or maliciously abuse the pre-reg system or 0:28:40.639,0:28:44.409 something like that?[br]Hanno: I mean it's a big question, right? 0:28:44.409,0:28:50.660 But I think if the standards are kind of[br]confining you so much that there's not 0:28:50.660,0:28:56.380 much room to cheat that's way out right[br]and a basis and also I don't think 0:28:56.380,0:29:00.110 deliberate cheating is that much of a[br]problem, I actually really think the 0:29:00.110,0:29:07.120 bigger problem is people honestly[br]believe what they do is true. 0:29:07.120,0:29:15.640 Herald: Okay one last, you sir, please?[br]Mic: So the value in science is often an 0:29:15.640,0:29:20.559 account of publications right? Account of[br]citations so and so on, so is it true that 0:29:20.559,0:29:24.799 to improve this situation you've[br]described, journals of whose publications 0:29:24.799,0:29:31.120 are available, who are like prospective,[br]should impose more higher standards so the 0:29:31.120,0:29:37.470 journals are those who must like raise the[br]bar, they should enforce publication of 0:29:37.470,0:29:43.330 protocols before like accepting and etc[br]etc. So is it journals who should, like, 0:29:43.330,0:29:49.340 do work on that or can we regular[br]scientists do something also? I mean you 0:29:49.340,0:29:53.270 can publish in the journals that have[br]better standards, right? There are 0:29:53.270,0:29:59.299 journals that have these registered[br]reports, but of course I mean as a single 0:29:59.299,0:30:03.360 scientist is always difficult because[br]you're playing in a system that has all 0:30:03.360,0:30:06.580 these wrong incentives.[br]Herald: Okay guys that's it, we have to 0:30:06.580,0:30:12.670 shut down. Please. There is a reference[br]better science dot-org, go there, and one 0:30:12.670,0:30:16.299 last request give really warm applause! 0:30:16.299,0:30:24.249 Applause 0:30:24.249,0:30:29.245 34c3 outro 0:30:29.245,0:30:46.000 subtitles created by c3subtitles.de[br]in the year 2018. Join, and help us!