0:00:00.099,0:00:14.890
34c3 intro
0:00:14.890,0:00:19.090
Hanno Böck: Yeah, so many of you probably[br]know me from doing things around IT
0:00:19.090,0:00:25.000
security, but I'm gonna surprise you to[br]almost not talk about IT security today.
0:00:25.000,0:00:32.189
But I'm gonna ask the question "Can we[br]trust the scientific method?". I want to
0:00:32.189,0:00:38.809
start this by giving you which is quite a[br]simple example. So if we do science like
0:00:38.809,0:00:45.210
we start with the theory and then we are[br]trying to test if it's true, right? So I
0:00:45.210,0:00:49.760
mean I said I'm not going to talk about IT[br]security but I chose an example from IT
0:00:49.760,0:00:56.690
security or kind of from IT security. So[br]there was a post on Reddit a while ago,
0:00:56.690,0:01:01.329
a picture from some book which claimed that[br]if you use a Malachite crystal that can
0:01:01.329,0:01:06.240
protect you from computer viruses.[br]Which... to me doesn't sound very
0:01:06.240,0:01:11.009
plausible, right? Like, these are crystals and[br]if you put them on your computer, this book
0:01:11.009,0:01:18.590
claims this protects you from malware. But[br]of course if we really want to know, we
0:01:18.590,0:01:23.990
could do a study on this. And if you say[br]people don't do Studies on crazy things:
0:01:23.990,0:01:28.770
that's wrong. I mean people do studies on[br]homeopathy or all kinds of crazy things
0:01:28.770,0:01:34.549
that are completely implausible. So we can[br]do a study on this and what we will do is
0:01:34.549,0:01:39.509
we will do a randomized control trial,[br]which is kind of the gold standard of
0:01:39.509,0:01:46.310
doing a test on these kinds of things. So[br]this is our question: "Do Malachite
0:01:46.310,0:01:52.479
crystals prevent malware infections?" and[br]how we would test that, our study design
0:01:52.479,0:01:58.399
is: ok, we take a group of maybe 20[br]computer users. And then we split them
0:01:58.399,0:02:06.009
randomly to two groups, and then one group[br]we'll give one of these crystals and tell
0:02:06.009,0:02:10.919
them: "Put them on your desk or on your[br]computer.". Then we need, the other group
0:02:10.919,0:02:15.800
is our control group. That's very[br]important because if we want to know if
0:02:15.800,0:02:20.940
they help we need another group to compare[br]it to. And to rule out that there are any
0:02:20.940,0:02:27.130
kinds of placebo effects, we give these[br]control groups a fake Malachite crystal so
0:02:27.130,0:02:32.260
we can compare them against each other.[br]And then we wait for maybe six months and
0:02:32.260,0:02:39.310
then we check how many malware infections[br]they had. Now, I didn't do that study, but
0:02:39.310,0:02:45.090
I simulated it with a Python script and[br]given that I don't believe that this
0:02:45.090,0:02:50.310
theory is true I just simulated this as[br]random data. So I'm not going to go
0:02:50.310,0:02:55.090
through the whole script but I'm just like[br]generating, I'm assuming there can be
0:02:55.090,0:02:59.950
between 0 and 3 malware infections and[br]it's totally random and then I compare the
0:02:59.950,0:03:04.790
two groups. And then I calculate something[br]which is called a p-value which is a very
0:03:04.790,0:03:10.730
common thing in science whenever you do[br]statistics. A p-value is, it's a bit
0:03:10.730,0:03:17.290
technical, but it's the probability that[br]if you have no effect that you would get
0:03:17.290,0:03:23.570
this result. Which kind of in another way[br]means, if you have 20 results in an
0:03:23.570,0:03:29.260
idealized world then one of them is a[br]false positive which means one of them
0:03:29.260,0:03:34.510
says something happens although it[br]doesn't. And in many fields of science
0:03:34.510,0:03:41.180
this p-value of 0.05 is considered that[br]significant which is like these twenty
0:03:41.180,0:03:48.620
studies. So one error in twenty studies[br]but as I said under idealized conditions.
0:03:48.620,0:03:53.330
So and as it's the script and I can run it[br]in less than a second I just did it twenty
0:03:53.330,0:03:59.821
times instead of once. So here are my 20[br]simulated studies and most of them look
0:03:59.821,0:04:06.360
not very interesting so of course we have[br]a few random variations but nothing very
0:04:06.360,0:04:12.460
significant. Except if you look at this[br]one study, it says the people with the
0:04:12.460,0:04:17.160
Malachite crystal had on average 1.8[br]malware infections and the people with the
0:04:17.160,0:04:24.670
fake crystal had 0.8. So it means actually[br]the crystal made it worse. But also this
0:04:24.670,0:04:32.100
result is significant because it has a[br]p-value of 0.03. So of course we can
0:04:32.100,0:04:36.110
publish that, assuming I really did these[br]studies.
0:04:36.110,0:04:40.600
applause[br]B.: And the other studies we just forget
0:04:40.600,0:04:45.850
about. I mean they were not interesting[br]right and who cares? Non significant
0:04:45.850,0:04:52.990
results... Okay so you have just seen that[br]I created a significant result out of
0:04:52.990,0:05:00.590
random data. And that's concerning because[br]people in science - I mean you can really do
0:05:00.590,0:05:07.850
that. And this phenomena is called[br]publication bias. So what's happening here
0:05:07.850,0:05:13.130
is that, you're doing studies and if they[br]get a positive result - meaning you're
0:05:13.130,0:05:18.990
seeing an effect, then you publish them[br]and if there's no effect you just forget
0:05:18.990,0:05:26.670
about them. We learned earlier that with[br]this p-value of 0.05 means 1 in 20 studies
0:05:26.670,0:05:32.760
is a false positive, but you usually don't[br]see the studies that are not significant,
0:05:32.760,0:05:39.320
because they don't get published. And you[br]may wonder: "Ok, what's stopping a
0:05:39.320,0:05:43.500
scientist from doing exactly this? What's[br]stopping a scientist from just doing so
0:05:43.500,0:05:47.750
many experiments till one of them looks[br]like it's a real result although it's just
0:05:47.750,0:05:54.710
a random fluke?". And the disconcerning[br]answer to that is, it's usually nothing.
0:05:56.760,0:06:03.620
And this is not just a theoretical[br]example. I want to give you an example,
0:06:03.620,0:06:09.110
that has quite some impact and that was[br]researched very well, and that is a
0:06:09.110,0:06:17.980
research on antidepressants so called[br]SSRIs. And in 2008 there was a study, the
0:06:17.980,0:06:22.680
interesting situation here was, that the[br]US Food and Drug Administration, which is
0:06:22.680,0:06:29.480
the authority that decides whether a[br]medical drug can be put on the market,
0:06:29.480,0:06:35.490
they had knowledge about all the studies[br]that had been done to register this
0:06:35.490,0:06:40.380
medication. And then some researchers[br]looked at that and compared it with what
0:06:40.380,0:06:45.810
has been published. And they figured out[br]there were 38 studies that saw that these
0:06:45.810,0:06:51.040
medications had a real effect, had real[br]improvements for patients. And from those
0:06:51.040,0:06:56.790
38 studies 37 got published. But then[br]there were 36 studies that said: "These
0:06:56.790,0:07:00.010
medications don't really have any[br]effect.", "They are not really better than
0:07:00.010,0:07:06.530
a placebo effect" and out of those only 14[br]got published. And even from those 14
0:07:06.530,0:07:11.010
there were 11, where the researcher said,[br]okay they have spent the result in a way
0:07:11.010,0:07:17.920
that it sounds like these medications do[br]something. But they were also a bunch of
0:07:17.920,0:07:21.870
studies that were just not published[br]because they had a negative result. And
0:07:21.870,0:07:26.390
it's clear that if you look at the[br]published studies only and you ignore the
0:07:26.390,0:07:29.320
studies with a negative result that[br]haven't been published, then these
0:07:29.320,0:07:34.290
medications look much better than they[br]really are. And it's not like the earlier
0:07:34.290,0:07:38.240
example there is a real effect from[br]antidepressants, but they are not as good
0:07:38.240,0:07:40.210
as people have believed in the past.
0:07:43.020,0:07:45.860
So we've learnt in theory with publication bias
0:07:45.860,0:07:50.520
you can create result out of nothing.[br]But if you're a researcher and you have a
0:07:50.520,0:07:54.790
theory that's not true but you really want[br]to publish something about it, that's not
0:07:54.790,0:07:59.699
really efficient, because you have to do[br]20 studies on average to get one of these
0:07:59.699,0:08:06.130
random results that look like real[br]results. So there are more efficient ways
0:08:06.130,0:08:12.780
to get to a result from nothing. If you're[br]doing a study then there are a lot of
0:08:12.780,0:08:17.320
micro decisions you have to make, for[br]example you may have dropouts from your
0:08:17.320,0:08:22.150
study where people, I don't know they move[br]to another place or they - you now longer
0:08:22.150,0:08:26.020
reach them, so they are no longer part of[br]your study. And there are different things
0:08:26.020,0:08:30.480
how you can handle that. Then you may have[br]cornercase results, where you're not
0:08:30.480,0:08:34.509
entirely sure: "Is this an effect or not[br]and how do you decide?", "How do you
0:08:34.509,0:08:39.639
exactly measure?". And then also you may[br]be looking for different things, maybe
0:08:39.639,0:08:46.620
there are different tests you can do on[br]people, and you may control for certain
0:08:46.620,0:08:51.639
variables like "Do you split men and women[br]into separate?", "Do you see them
0:08:51.639,0:08:56.430
separately?" or "Do you separate them by[br]age?". So there are many decisions you can
0:08:56.430,0:09:02.050
make while doing a study. And of course[br]each of these decisions has a small effect
0:09:02.050,0:09:10.399
on the result. And it may very often be,[br]that just by trying all the combinations
0:09:10.399,0:09:15.230
you will get a p-value that looks like[br]it's statistically significant, although
0:09:15.230,0:09:20.670
there's no real effect. So and there's[br]this term called p-Hacking which means
0:09:20.670,0:09:25.550
you're just adjusting your methods long[br]enough, that you get a significant result.
0:09:27.050,0:09:32.550
And I'd like to point out here, that this[br]is usually not that a scientist says: "Ok,
0:09:32.550,0:09:36.259
today I'm going to p-hack my result,[br]because I know my theory is wrong but I
0:09:36.259,0:09:42.420
want to show it's true.". But it's a[br]subconscious process, because usually the
0:09:42.420,0:09:47.399
scientists believe in their theories.[br]Honestly. They honestly think that their
0:09:47.399,0:09:52.040
theory is true and that their research[br]will show that. So they may subconsciously
0:09:52.040,0:09:58.279
say: "Ok, if I analyze my data like this[br]it looks a bit better so I will do this.".
0:09:58.279,0:10:05.079
So subconsciously, they may p-hack[br]themselves into getting a result that's
0:10:05.079,0:10:11.449
not really there. And again we can ask:[br]"What is stopping scientists from
0:10:11.449,0:10:22.009
p-hacking?". And the concerning answer is[br]the same: usually nothing. And I came to
0:10:22.009,0:10:26.069
this conclusion that I say: "Ok, the[br]scientific method it's a way to create
0:10:26.069,0:10:31.899
evidence for whatever theory you like. No[br]matter if it's true or not.". And you may
0:10:31.899,0:10:35.720
say: "That's a pretty bold thing to say.".[br]and I'm saying this even though I'm not
0:10:35.720,0:10:42.480
even a scientist. I'm just like some[br]hacker who, whatever... But I'm not alone
0:10:42.480,0:10:47.759
in this, like there's a paper from a[br]famous researcher John Ioannidis, who
0:10:47.759,0:10:51.529
said: "Why most published research[br]findings are false.". He published this in
0:10:51.529,0:10:57.170
2005 and if you look at the title, he[br]doesn't really question that most research
0:10:57.170,0:11:02.560
findings are false. He only wants to give[br]reasons why this is the case. And he makes
0:11:02.560,0:11:08.499
some very possible assumptions if you look[br]at that many negative results don't get
0:11:08.499,0:11:12.129
published, and that you will have some[br]bias. And it comes to a very plausible
0:11:12.129,0:11:17.180
conclusion, that this is the case and this[br]is not even very controversial. If you ask
0:11:17.180,0:11:23.491
people who are doing what you can call[br]science on science or meta science, who
0:11:23.491,0:11:28.410
look at scientific methodology, they will[br]tell you: "Yeah, of course that's the
0:11:28.410,0:11:32.079
case.". Some will even say: "Yeah, that's[br]how science works, that's what we
0:11:32.079,0:11:37.689
expect.". But I find it concerning. And if[br]you take this seriously, it means: if you
0:11:37.689,0:11:43.160
read about a study, like in a newspaper,[br]the default assumption should be 'that's
0:11:43.160,0:11:51.179
not true' - while we might usually think[br]the opposite. And if science is a method
0:11:51.179,0:11:55.709
to create evidence for whatever you like,[br]you can think about something really
0:11:55.709,0:12:00.939
crazy, like "Can people see into the future?",[br]"Does our mind have[br]
0:12:00.939,0:12:09.720
some extra perception where we can[br]sense things that happen in an hour?". And
0:12:09.720,0:12:15.559
there was a psychologist called Daryl Bem[br]and he thought that this is the case and
0:12:15.559,0:12:20.399
he published a study on it. It was titled[br]"feeling the future". He did a lot of
0:12:20.399,0:12:25.449
experiments where he did something, and[br]then something later happened, and he
0:12:25.449,0:12:29.569
thought he had statistical evidence that[br]what happened later influenced what
0:12:29.569,0:12:34.999
happened earlier. So, I don't think that's[br]very plausible - based on what we know
0:12:34.999,0:12:41.550
about the universe, but yeah... and it was[br]published in a real psychology journal.
0:12:41.550,0:12:46.680
And a lot of things were wrong with this[br]study. Basically, it's a very nice example
0:12:46.680,0:12:51.009
for p-hacking and just even a book by[br]Daryl Bem, where he describes something
0:12:51.009,0:12:55.040
which basically looks like p-hacking,[br]where he says that's how you do
0:12:55.040,0:13:03.870
psychology. But the study was absolutely[br]in line with the existing standards in
0:13:03.870,0:13:08.759
Experimental Psychology. And that a lot of[br]people found concerning. So, if you can
0:13:08.759,0:13:13.619
show that precognition is real, that you[br]can see into the future, then what else
0:13:13.619,0:13:19.139
can you show and how can we trust our[br]results? And psychology has debated this a
0:13:19.139,0:13:21.880
lot in the past couple of years. So[br]there's a lot of talk about the
0:13:21.880,0:13:30.009
replication crisis in psychology. And many[br]effects that psychology just thought were
0:13:30.009,0:13:35.040
true, they figured out, okay, if they try[br]to repeat these experiments, they couldn't
0:13:35.040,0:13:40.759
get these results even though entire[br]subfields were built on these results.
0:13:44.369,0:13:48.069
And I want to show you an example, which[br]is one of the ones that is not discussed so
0:13:48.069,0:13:55.540
much. So there's a theory which is called[br]moral licensing. And the idea is that if
0:13:55.540,0:14:00.649
you do something good, or something you[br]think is good, then later basically you
0:14:00.649,0:14:04.880
behave like an asshole. Because you think[br]I already did something good now, I don't
0:14:04.880,0:14:10.689
have to be so nice anymore. And there were[br]some famous studies that had the theory,
0:14:10.689,0:14:17.870
that people consume organic food, that[br]later they become more judgmental, or less
0:14:17.870,0:14:27.949
social, less nice to their peers. But just[br]last week someone tried to replicate this
0:14:27.949,0:14:32.720
original experiments. And they tried it[br]three times with more subjects and better
0:14:32.720,0:14:39.010
research methodology and they totally[br]couldn't find that effect. But like what
0:14:39.010,0:14:43.790
you've seen here is lots of media[br]articles. I have not found a single
0:14:43.790,0:14:51.179
article reporting that this could not be[br]replicated. Maybe they will come but yeah
0:14:51.179,0:14:57.360
there's just a very recent example. But[br]now I want to have a small warning for you
0:14:57.360,0:15:01.319
because you may think now "yeah these[br]psychologists, that all sounds very
0:15:01.319,0:15:05.329
fishy and they even believe in[br]precognition and whatever", but maybe your
0:15:05.329,0:15:09.889
field is not much better maybe you just[br]don't know about it yet because nobody
0:15:09.889,0:15:15.990
else has started replicating studies in[br]your field. And there are other fields
0:15:15.990,0:15:21.670
that have replication problems and some[br]much worse for example the pharma company
0:15:21.670,0:15:27.279
Amgen in 2012 they published something[br]where they said "We have tried to
0:15:27.279,0:15:32.940
replicate cancer research and preclinical[br]research" that is stuff in a petri dish or
0:15:32.940,0:15:38.869
animal experiments so not drugs on humans[br]but what happens before you develop a drug
0:15:38.869,0:15:44.699
and they were only able to replicate 47[br]out of 53 studies. And these were they
0:15:44.699,0:15:50.050
said landmark studies, so studies that[br]have been published in the best journals.
0:15:50.050,0:15:54.099
Now there are a few problems with this[br]publication because they have not
0:15:54.099,0:15:58.760
published their applications they have not[br]told us which studies these were that they
0:15:58.760,0:16:02.730
could not replicate. In the meantime I[br]think they have published three of these
0:16:02.730,0:16:07.290
replications but most of it is a bit in[br]the dark which points to another problem
0:16:07.290,0:16:10.689
because they say they did this because[br]they collaborated with the original
0:16:10.689,0:16:16.109
researchers and they only did this by[br]agreeing that they would not publish the
0:16:16.109,0:16:22.379
results. But it still sounds very[br]concerning so but some fields don't have a
0:16:22.379,0:16:27.170
replication problem because just nobody is[br]trying to replicate previous results I
0:16:27.170,0:16:34.269
mean then you will never know if your[br]results hold up. So what can be done about
0:16:34.269,0:16:42.930
all this and fundamentally I think the[br]core issue here is that the scientific
0:16:42.930,0:16:49.970
process is tied together with results, so[br]we do a study and only after that we
0:16:49.970,0:16:54.759
decide whether it's going to be published.[br]Or we do a study and only after we have
0:16:54.759,0:17:01.230
the data we're trying to analyze it. So[br]essentially we need to decouple the
0:17:01.230,0:17:09.800
scientific process from its results and[br]one way of doing that is pre-registration
0:17:09.800,0:17:14.490
so what you're doing there is that before[br]you start doing a study you will register
0:17:14.490,0:17:20.500
it in a public register and say "I'm gonna[br]do a study like on this medication or
0:17:20.500,0:17:25.670
whatever on this psychological effect" and[br]that's how I'm gonna do it and then later
0:17:25.670,0:17:33.980
on people can check if you really did[br]that. And yeah that's what I said. And this
0:17:33.980,0:17:41.179
is more or less standard practice in[br]medical drug trials the summary about it
0:17:41.179,0:17:47.130
is it does not work very well but it's[br]better than nothing. So, and the problem
0:17:47.130,0:17:52.029
is mostly enforcement so people register[br]study and then don't publish it and
0:17:52.029,0:17:57.190
nothing happens to them even though they[br]are legally required to publish it. And
0:17:57.190,0:18:01.889
there are two campaigns I'd like to point[br]out, there's the all trials campaign which
0:18:01.889,0:18:08.149
has been started by Ben Goldacre he's a[br]doctor from the UK and they like demand
0:18:08.149,0:18:13.330
that like every trial it's done on[br]medication should be published. And
0:18:13.330,0:18:18.870
there's also a project by the same guy the[br]compare project and they are trying to see
0:18:18.870,0:18:25.380
if a medical trial has been registered and[br]later published did they do the same or
0:18:25.380,0:18:29.480
did they change something in their[br]protocol and was there a reason for it or
0:18:29.480,0:18:36.799
did they just change it to get a result,[br]which they otherwise wouldn't get.But then
0:18:36.799,0:18:41.080
again like these issues in medicine they[br]offer get a lot of attention and for good
0:18:41.080,0:18:46.820
reasons because if we have bad science in[br]medicine then people die, that's pretty
0:18:46.820,0:18:52.960
immediate and pretty massive. But if you[br]read about this you always have to think
0:18:52.960,0:18:58.510
that these issues in drug trials at least[br]they have pre-registration, most
0:18:58.510,0:19:04.330
scientific fields don't bother doing[br]anything like that. So whenever you hear
0:19:04.330,0:19:08.470
something about maybe about publication[br]bias in medicine you should always think
0:19:08.470,0:19:12.630
the same thing happens in many fields of[br]science and usually nobody is doing
0:19:12.630,0:19:18.809
anything about it. And particularly to[br]this audience I'd like to say there's
0:19:18.809,0:19:23.580
currently a big trend that people from[br]computer science want to revolutionize
0:19:23.580,0:19:30.300
medicine: big data and machine learning,[br]these things, which in principle is ok but
0:19:30.300,0:19:34.750
I know a lot of people in medicine are[br]very worried about this and the reason is,
0:19:34.750,0:19:39.470
that these computer science people don't[br]have the same scientific standards as
0:19:39.470,0:19:44.399
people in medicine expect them and might[br]say "Yeah we don't need really need to do
0:19:44.399,0:19:50.450
a study on this it's obvious that this[br]helps" and that is worrying and I come
0:19:50.450,0:19:53.580
from computer science and I very well[br]understand that people from medicine are
0:19:53.580,0:20:00.540
worried about this. So there's an idea[br]that goes even further as pre-registration
0:20:00.540,0:20:05.210
and it's called registered reports. There[br]is a couple of years ago some scientists
0:20:05.210,0:20:10.539
wrote an open letter to the Guardian where[br]they.. that was published there and the idea
0:20:10.539,0:20:16.451
there is that you turn the scientific[br]publication process upside down, so if you
0:20:16.451,0:20:21.210
want to do a study the first thing you[br]would do with the register report is, you
0:20:21.210,0:20:27.000
submit your design your study design[br]protocol to the journal and then the
0:20:27.000,0:20:33.110
journal decides whether they will publish[br]that before they see any result, because
0:20:33.110,0:20:36.990
then you can prevent publication bias and[br]then you prevent the journals only publish
0:20:36.990,0:20:42.710
the nice findings and ignore the negative[br]findings. And then you do the study and
0:20:42.710,0:20:46.330
then it gets published but it gets[br]published independent of what the result
0:20:46.330,0:20:53.830
was. And there of course other things you[br]can do to improve science, there's a lot
0:20:53.830,0:20:58.610
of talk about sharing data, sharing code,[br]sharing methods because if you want to
0:20:58.610,0:21:04.130
replicate a study it's of course easier if[br]you have access to all the details how the
0:21:04.130,0:21:11.090
original study was done. Then you could[br]say "Okay we could do large
0:21:11.090,0:21:15.269
collaborations" because many studies are[br]just too small if you have a study with
0:21:15.269,0:21:19.630
twenty people you just don't get a very[br]reliable outcome. So maybe in many
0:21:19.630,0:21:25.669
situations it would be better get together[br]10 teams of scientists and let them all do
0:21:25.669,0:21:31.640
a big study together and then you can[br]reliably answer a question. And also some
0:21:31.640,0:21:36.390
people propose just to get higher[br]statistical thresholds that p-value of
0:21:36.390,0:21:42.260
0.05 means practically nothing. There was[br]recently a paper that just argued which
0:21:42.260,0:21:47.880
would just like put the dot one more to[br]the left and have 0.005 and that would
0:21:47.880,0:21:55.029
already solve a lot of problems. And for[br]example in physics they have they have
0:21:55.029,0:22:00.870
something called Sigma 5 which is I think[br]zero point and then 5 zeroes and 3 or
0:22:00.870,0:22:08.350
something like that so in physics they[br]have much higher statistical thresholds.
0:22:08.350,0:22:13.210
Now whatever if you're working in any[br]scientific field you might ask yourself
0:22:13.210,0:22:20.200
like "If we have statistic results are[br]they pre registered in any way and do we
0:22:20.200,0:22:26.380
publish negative results?" like we tested[br]an effect and we got nothing and are there
0:22:26.380,0:22:32.350
replications of all relevant results and I[br]would say if you answer all these
0:22:32.350,0:22:36.289
questions with "no" which I think many[br]people will do, then you're not really
0:22:36.289,0:22:41.510
doing science what you're doing is the[br]alchemy of our time.
0:22:41.510,0:22:50.220
Applause[br]Thanks.
0:22:50.220,0:22:54.499
Herald: Thank you very much..[br]Hanno: No I have more, sorry, I have
0:22:54.499,0:23:03.060
three more slides, that was not the[br]finishing line. Big issue is also that
0:23:03.060,0:23:09.830
there are bad incentives in science, so a[br]very standard thing to evaluate the impact
0:23:09.830,0:23:15.710
of science is citation counts for you say[br]"if your scientific study is cited a lot
0:23:15.710,0:23:18.960
then this is a good thing and if your[br]journal is cited a lot this is a good
0:23:18.960,0:23:22.390
thing" and this for example the impact[br]factor but there are also other
0:23:22.390,0:23:27.059
measurements. And also universities like[br]publicity so if your study gets a lot of
0:23:27.059,0:23:33.490
media reports then your press department[br]likes you. And these incentives tend to
0:23:33.490,0:23:40.200
favor interesting results but they don't[br]favor correct results and this is bad
0:23:40.200,0:23:44.899
because if we are realistic most results[br]are not that interesting, most results
0:23:44.899,0:23:49.879
will be "Yeah we have this interesting and[br]counterintuitive theory and it's totally
0:23:49.879,0:24:00.470
wrong" and then there's this idea that[br]science is self-correcting. So if you
0:24:00.470,0:24:05.320
confront scientists with these issues with[br]publication bias and peer hacking surely
0:24:05.320,0:24:11.909
they will immediately change that's what[br]scientists do right? And I want to cite
0:24:11.909,0:24:16.259
something here with this sorry it's a bit[br]long but "There are some evidence that
0:24:16.259,0:24:21.329
inferior statistical tests are commonly[br]used research which yields non significant
0:24:21.329,0:24:28.730
results is not published." That sounds[br]like publication bias and then it also
0:24:28.730,0:24:32.450
says: "Significant results published in[br]these fields are seldom verified by
0:24:32.450,0:24:37.889
independent replication" so it seems[br]there's a replication problem. These wise
0:24:37.889,0:24:46.750
words were set in 1959, so by a[br]statistician called Theodore Sterling and
0:24:46.750,0:24:52.059
because science is so self-correcting in[br]1995 he complained that this article
0:24:52.059,0:24:56.389
presents evidence that published result of[br]scientific investigations are not a
0:24:56.389,0:25:01.240
representative sample of all scientific[br]studies. "These results also indicate that
0:25:01.240,0:25:06.899
practice leading to publication bias has[br]not changed over a period of 30 years" and
0:25:06.899,0:25:13.030
here we are in 2018 and publication bias[br]is still a problem. So if science is self-
0:25:13.030,0:25:21.090
correcting then it's pretty damn slow in[br]correcting itself, right? And finally I
0:25:21.090,0:25:27.400
would like to ask you, if you're prepared[br]for boring science, because ultimately, I
0:25:27.400,0:25:31.950
think, we have a choice between what I[br]would like to call TEDTalk science and
0:25:31.950,0:25:40.980
boring science..[br]Applause
0:25:40.980,0:25:46.779
.. so with tedtalk science we get mostly[br]positive and surprising results and
0:25:46.779,0:25:53.380
interesting results we have large defects[br]many citations lots of media attention and
0:25:53.380,0:26:00.139
you may have a TED talk about it.[br]Unfortunately usually it's not true and I
0:26:00.139,0:26:03.820
would like to propose boring science as[br]the alternative which is mostly negative
0:26:03.820,0:26:11.620
results, pretty boring, small effects but[br]it may be closer to the truth. And I would
0:26:11.620,0:26:18.230
like to have boring science but I know[br]it's a pretty tough sell. Sorry I didn't
0:26:18.230,0:26:35.280
hear that. Yeah, thanks for listening.[br]Applause
0:26:35.280,0:26:38.480
Herald: Thank you.[br]Hanno: Two questions, or?
0:26:38.480,0:26:41.030
Herald: We don't have that much time for[br]questions, three minutes, three minutes
0:26:41.030,0:26:45.250
guys. Question one - shoot.[br]Mic: This isn't a question but I just
0:26:45.250,0:26:48.700
wanted to comment Hanno you missed out a[br]very critical topic here, which is the use
0:26:48.700,0:26:53.130
of Bayesian probability. So you did[br]conflate p-values with the scientific
0:26:53.130,0:26:57.260
method which isn't.. which gave the rest[br]of you talk. I felt a slightly unnecessary
0:26:57.260,0:27:02.380
anti science slant. On p, p-values isn't[br]the be-all and end-all of the scientific
0:27:02.380,0:27:06.840
method so p-values is sort of calculating[br]the probability that your data will happen
0:27:06.840,0:27:10.860
given that no hypothesis is true whereas[br]Bayesian probability would be calculating
0:27:10.860,0:27:15.960
the probability that your hypothesis is[br]true given the data and more and more
0:27:15.960,0:27:19.559
scientists are slowly starting to realize[br]that this sort of method is probably a
0:27:19.559,0:27:25.809
better way of doing science than p-values.[br]So this is probably a a third alternative
0:27:25.809,0:27:29.950
to your sort of proposal boring science is[br]doing the other side's Bayesian
0:27:29.950,0:27:34.029
probability.[br]Hanno: Sorry yeah, I agree with you I
0:27:34.029,0:27:37.530
unfortunately I only had[br]half an hour here.
0:27:37.530,0:27:40.610
Herald: Where are you going after this[br]like where are we going after this lecture
0:27:40.610,0:27:46.269
can they find you somewhere in the bar?[br]Hanno: I know him..
0:27:46.269,0:27:50.559
Herald: You know science is broken but[br]then scientists it's a little bit like the
0:27:50.559,0:27:54.990
next lecture actually that's waiting there[br]it's like: "you scratch my back and I
0:27:54.990,0:27:59.160
scratch yours for publication". Hanno:[br]Maybe two more minutes?
0:27:59.160,0:28:04.870
Herald: One minute.[br]Please go ahead.
0:28:04.870,0:28:11.820
Mic: Yeah hi, thank you for your talk. I'm[br]curious so you've raised, you know, ways
0:28:11.820,0:28:15.529
we can address this assuming good actors,[br]assuming people who want to do better
0:28:15.529,0:28:20.769
science that this happens out of ignorance[br]or willful ignorance. What do we do about
0:28:20.769,0:28:26.389
bad actors. So for example the medical[br]community drug companies, maybe they
0:28:26.389,0:28:29.539
really like the idea of being profitably[br]incentivized by these random control
0:28:29.539,0:28:34.929
trials, to make out essentially a placebo[br]do something. How do we begin to address
0:28:34.929,0:28:40.639
them current trying to maliciously p-hack[br]or maliciously abuse the pre-reg system or
0:28:40.639,0:28:44.409
something like that?[br]Hanno: I mean it's a big question, right?
0:28:44.409,0:28:50.660
But I think if the standards are kind of[br]confining you so much that there's not
0:28:50.660,0:28:56.380
much room to cheat that's way out right[br]and a basis and also I don't think
0:28:56.380,0:29:00.110
deliberate cheating is that much of a[br]problem, I actually really think the
0:29:00.110,0:29:07.120
bigger problem is people honestly[br]believe what they do is true.
0:29:07.120,0:29:15.640
Herald: Okay one last, you sir, please?[br]Mic: So the value in science is often an
0:29:15.640,0:29:20.559
account of publications right? Account of[br]citations so and so on, so is it true that
0:29:20.559,0:29:24.799
to improve this situation you've[br]described, journals of whose publications
0:29:24.799,0:29:31.120
are available, who are like prospective,[br]should impose more higher standards so the
0:29:31.120,0:29:37.470
journals are those who must like raise the[br]bar, they should enforce publication of
0:29:37.470,0:29:43.330
protocols before like accepting and etc[br]etc. So is it journals who should, like,
0:29:43.330,0:29:49.340
do work on that or can we regular[br]scientists do something also? I mean you
0:29:49.340,0:29:53.270
can publish in the journals that have[br]better standards, right? There are
0:29:53.270,0:29:59.299
journals that have these registered[br]reports, but of course I mean as a single
0:29:59.299,0:30:03.360
scientist is always difficult because[br]you're playing in a system that has all
0:30:03.360,0:30:06.580
these wrong incentives.[br]Herald: Okay guys that's it, we have to
0:30:06.580,0:30:12.670
shut down. Please. There is a reference[br]better science dot-org, go there, and one
0:30:12.670,0:30:16.299
last request give really warm applause!
0:30:16.299,0:30:24.249
Applause
0:30:24.249,0:30:29.245
34c3 outro
0:30:29.245,0:30:46.000
subtitles created by c3subtitles.de[br]in the year 2018. Join, and help us!