1
00:00:00,099 --> 00:00:14,890
34c3 intro
2
00:00:14,890 --> 00:00:19,090
Hanno Böck: Yeah, so many of you probably
know me from doing things around IT
3
00:00:19,090 --> 00:00:25,000
security, but I'm gonna surprise you to
almost not talk about IT security today.
4
00:00:25,000 --> 00:00:32,189
But I'm gonna ask the question "Can we
trust the scientific method?". I want to
5
00:00:32,189 --> 00:00:38,809
start this by giving you which is quite a
simple example. So if we do science like
6
00:00:38,809 --> 00:00:45,210
we start with the theory and then we are
trying to test if it's true, right? So I
7
00:00:45,210 --> 00:00:49,760
mean I said I'm not going to talk about IT
security but I chose an example from IT
8
00:00:49,760 --> 00:00:56,690
security or kind of from IT security. So
there was a post on Reddit a while ago,
9
00:00:56,690 --> 00:01:01,329
a picture from some book which claimed that
if you use a Malachite crystal that can
10
00:01:01,329 --> 00:01:06,240
protect you from computer viruses.
Which... to me doesn't sound very
11
00:01:06,240 --> 00:01:11,009
plausible, right? Like, these are crystals and
if you put them on your computer, this book
12
00:01:11,009 --> 00:01:18,590
claims this protects you from malware. But
of course if we really want to know, we
13
00:01:18,590 --> 00:01:23,990
could do a study on this. And if you say
people don't do Studies on crazy things:
14
00:01:23,990 --> 00:01:28,770
that's wrong. I mean people do studies on
homeopathy or all kinds of crazy things
15
00:01:28,770 --> 00:01:34,549
that are completely implausible. So we can
do a study on this and what we will do is
16
00:01:34,549 --> 00:01:39,509
we will do a randomized control trial,
which is kind of the gold standard of
17
00:01:39,509 --> 00:01:46,310
doing a test on these kinds of things. So
this is our question: "Do Malachite
18
00:01:46,310 --> 00:01:52,479
crystals prevent malware infections?" and
how we would test that, our study design
19
00:01:52,479 --> 00:01:58,399
is: ok, we take a group of maybe 20
computer users. And then we split them
20
00:01:58,399 --> 00:02:06,009
randomly to two groups, and then one group
we'll give one of these crystals and tell
21
00:02:06,009 --> 00:02:10,919
them: "Put them on your desk or on your
computer.". Then we need, the other group
22
00:02:10,919 --> 00:02:15,800
is our control group. That's very
important because if we want to know if
23
00:02:15,800 --> 00:02:20,940
they help we need another group to compare
it to. And to rule out that there are any
24
00:02:20,940 --> 00:02:27,130
kinds of placebo effects, we give these
control groups a fake Malachite crystal so
25
00:02:27,130 --> 00:02:32,260
we can compare them against each other.
And then we wait for maybe six months and
26
00:02:32,260 --> 00:02:39,310
then we check how many malware infections
they had. Now, I didn't do that study, but
27
00:02:39,310 --> 00:02:45,090
I simulated it with a Python script and
given that I don't believe that this
28
00:02:45,090 --> 00:02:50,310
theory is true I just simulated this as
random data. So I'm not going to go
29
00:02:50,310 --> 00:02:55,090
through the whole script but I'm just like
generating, I'm assuming there can be
30
00:02:55,090 --> 00:02:59,950
between 0 and 3 malware infections and
it's totally random and then I compare the
31
00:02:59,950 --> 00:03:04,790
two groups. And then I calculate something
which is called a p-value which is a very
32
00:03:04,790 --> 00:03:10,730
common thing in science whenever you do
statistics. A p-value is, it's a bit
33
00:03:10,730 --> 00:03:17,290
technical, but it's the probability that
if you have no effect that you would get
34
00:03:17,290 --> 00:03:23,570
this result. Which kind of in another way
means, if you have 20 results in an
35
00:03:23,570 --> 00:03:29,260
idealized world then one of them is a
false positive which means one of them
36
00:03:29,260 --> 00:03:34,510
says something happens although it
doesn't. And in many fields of science
37
00:03:34,510 --> 00:03:41,180
this p-value of 0.05 is considered that
significant which is like these twenty
38
00:03:41,180 --> 00:03:48,620
studies. So one error in twenty studies
but as I said under idealized conditions.
39
00:03:48,620 --> 00:03:53,330
So and as it's the script and I can run it
in less than a second I just did it twenty
40
00:03:53,330 --> 00:03:59,821
times instead of once. So here are my 20
simulated studies and most of them look
41
00:03:59,821 --> 00:04:06,360
not very interesting so of course we have
a few random variations but nothing very
42
00:04:06,360 --> 00:04:12,460
significant. Except if you look at this
one study, it says the people with the
43
00:04:12,460 --> 00:04:17,160
Malachite crystal had on average 1.8
malware infections and the people with the
44
00:04:17,160 --> 00:04:24,670
fake crystal had 0.8. So it means actually
the crystal made it worse. But also this
45
00:04:24,670 --> 00:04:32,100
result is significant because it has a
p-value of 0.03. So of course we can
46
00:04:32,100 --> 00:04:36,110
publish that, assuming I really did these
studies.
47
00:04:36,110 --> 00:04:40,600
applause
B.: And the other studies we just forget
48
00:04:40,600 --> 00:04:45,850
about. I mean they were not interesting
right and who cares? Non significant
49
00:04:45,850 --> 00:04:52,990
results... Okay so you have just seen that
I created a significant result out of
50
00:04:52,990 --> 00:05:00,590
random data. And that's concerning because
people in science - I mean you can really do
51
00:05:00,590 --> 00:05:07,850
that. And this phenomena is called
publication bias. So what's happening here
52
00:05:07,850 --> 00:05:13,130
is that, you're doing studies and if they
get a positive result - meaning you're
53
00:05:13,130 --> 00:05:18,990
seeing an effect, then you publish them
and if there's no effect you just forget
54
00:05:18,990 --> 00:05:26,670
about them. We learned earlier that with
this p-value of 0.05 means 1 in 20 studies
55
00:05:26,670 --> 00:05:32,760
is a false positive, but you usually don't
see the studies that are not significant,
56
00:05:32,760 --> 00:05:39,320
because they don't get published. And you
may wonder: "Ok, what's stopping a
57
00:05:39,320 --> 00:05:43,500
scientist from doing exactly this? What's
stopping a scientist from just doing so
58
00:05:43,500 --> 00:05:47,750
many experiments till one of them looks
like it's a real result although it's just
59
00:05:47,750 --> 00:05:54,710
a random fluke?". And the disconcerning
answer to that is, it's usually nothing.
60
00:05:56,760 --> 00:06:03,620
And this is not just a theoretical
example. I want to give you an example,
61
00:06:03,620 --> 00:06:09,110
that has quite some impact and that was
researched very well, and that is a
62
00:06:09,110 --> 00:06:17,980
research on antidepressants so called
SSRIs. And in 2008 there was a study, the
63
00:06:17,980 --> 00:06:22,680
interesting situation here was, that the
US Food and Drug Administration, which is
64
00:06:22,680 --> 00:06:29,480
the authority that decides whether a
medical drug can be put on the market,
65
00:06:29,480 --> 00:06:35,490
they had knowledge about all the studies
that had been done to register this
66
00:06:35,490 --> 00:06:40,380
medication. And then some researchers
looked at that and compared it with what
67
00:06:40,380 --> 00:06:45,810
has been published. And they figured out
there were 38 studies that saw that these
68
00:06:45,810 --> 00:06:51,040
medications had a real effect, had real
improvements for patients. And from those
69
00:06:51,040 --> 00:06:56,790
38 studies 37 got published. But then
there were 36 studies that said: "These
70
00:06:56,790 --> 00:07:00,010
medications don't really have any
effect.", "They are not really better than
71
00:07:00,010 --> 00:07:06,530
a placebo effect" and out of those only 14
got published. And even from those 14
72
00:07:06,530 --> 00:07:11,010
there were 11, where the researcher said,
okay they have spent the result in a way
73
00:07:11,010 --> 00:07:17,920
that it sounds like these medications do
something. But they were also a bunch of
74
00:07:17,920 --> 00:07:21,870
studies that were just not published
because they had a negative result. And
75
00:07:21,870 --> 00:07:26,390
it's clear that if you look at the
published studies only and you ignore the
76
00:07:26,390 --> 00:07:29,320
studies with a negative result that
haven't been published, then these
77
00:07:29,320 --> 00:07:34,290
medications look much better than they
really are. And it's not like the earlier
78
00:07:34,290 --> 00:07:38,240
example there is a real effect from
antidepressants, but they are not as good
79
00:07:38,240 --> 00:07:40,210
as people have believed in the past.
80
00:07:43,020 --> 00:07:45,860
So we've learnt in theory with publication bias
81
00:07:45,860 --> 00:07:50,520
you can create result out of nothing.
But if you're a researcher and you have a
82
00:07:50,520 --> 00:07:54,790
theory that's not true but you really want
to publish something about it, that's not
83
00:07:54,790 --> 00:07:59,699
really efficient, because you have to do
20 studies on average to get one of these
84
00:07:59,699 --> 00:08:06,130
random results that look like real
results. So there are more efficient ways
85
00:08:06,130 --> 00:08:12,780
to get to a result from nothing. If you're
doing a study then there are a lot of
86
00:08:12,780 --> 00:08:17,320
micro decisions you have to make, for
example you may have dropouts from your
87
00:08:17,320 --> 00:08:22,150
study where people, I don't know they move
to another place or they - you now longer
88
00:08:22,150 --> 00:08:26,020
reach them, so they are no longer part of
your study. And there are different things
89
00:08:26,020 --> 00:08:30,480
how you can handle that. Then you may have
cornercase results, where you're not
90
00:08:30,480 --> 00:08:34,509
entirely sure: "Is this an effect or not
and how do you decide?", "How do you
91
00:08:34,509 --> 00:08:39,639
exactly measure?". And then also you may
be looking for different things, maybe
92
00:08:39,639 --> 00:08:46,620
there are different tests you can do on
people, and you may control for certain
93
00:08:46,620 --> 00:08:51,639
variables like "Do you split men and women
into separate?", "Do you see them
94
00:08:51,639 --> 00:08:56,430
separately?" or "Do you separate them by
age?". So there are many decisions you can
95
00:08:56,430 --> 00:09:02,050
make while doing a study. And of course
each of these decisions has a small effect
96
00:09:02,050 --> 00:09:10,399
on the result. And it may very often be,
that just by trying all the combinations
97
00:09:10,399 --> 00:09:15,230
you will get a p-value that looks like
it's statistically significant, although
98
00:09:15,230 --> 00:09:20,670
there's no real effect. So and there's
this term called p-Hacking which means
99
00:09:20,670 --> 00:09:25,550
you're just adjusting your methods long
enough, that you get a significant result.
100
00:09:27,050 --> 00:09:32,550
And I'd like to point out here, that this
is usually not that a scientist says: "Ok,
101
00:09:32,550 --> 00:09:36,259
today I'm going to p-hack my result,
because I know my theory is wrong but I
102
00:09:36,259 --> 00:09:42,420
want to show it's true.". But it's a
subconscious process, because usually the
103
00:09:42,420 --> 00:09:47,399
scientists believe in their theories.
Honestly. They honestly think that their
104
00:09:47,399 --> 00:09:52,040
theory is true and that their research
will show that. So they may subconsciously
105
00:09:52,040 --> 00:09:58,279
say: "Ok, if I analyze my data like this
it looks a bit better so I will do this.".
106
00:09:58,279 --> 00:10:05,079
So subconsciously, they may p-hack
themselves into getting a result that's
107
00:10:05,079 --> 00:10:11,449
not really there. And again we can ask:
"What is stopping scientists from
108
00:10:11,449 --> 00:10:22,009
p-hacking?". And the concerning answer is
the same: usually nothing. And I came to
109
00:10:22,009 --> 00:10:26,069
this conclusion that I say: "Ok, the
scientific method it's a way to create
110
00:10:26,069 --> 00:10:31,899
evidence for whatever theory you like. No
matter if it's true or not.". And you may
111
00:10:31,899 --> 00:10:35,720
say: "That's a pretty bold thing to say.".
and I'm saying this even though I'm not
112
00:10:35,720 --> 00:10:42,480
even a scientist. I'm just like some
hacker who, whatever... But I'm not alone
113
00:10:42,480 --> 00:10:47,759
in this, like there's a paper from a
famous researcher John Ioannidis, who
114
00:10:47,759 --> 00:10:51,529
said: "Why most published research
findings are false.". He published this in
115
00:10:51,529 --> 00:10:57,170
2005 and if you look at the title, he
doesn't really question that most research
116
00:10:57,170 --> 00:11:02,560
findings are false. He only wants to give
reasons why this is the case. And he makes
117
00:11:02,560 --> 00:11:08,499
some very possible assumptions if you look
at that many negative results don't get
118
00:11:08,499 --> 00:11:12,129
published, and that you will have some
bias. And it comes to a very plausible
119
00:11:12,129 --> 00:11:17,180
conclusion, that this is the case and this
is not even very controversial. If you ask
120
00:11:17,180 --> 00:11:23,491
people who are doing what you can call
science on science or meta science, who
121
00:11:23,491 --> 00:11:28,410
look at scientific methodology, they will
tell you: "Yeah, of course that's the
122
00:11:28,410 --> 00:11:32,079
case.". Some will even say: "Yeah, that's
how science works, that's what we
123
00:11:32,079 --> 00:11:37,689
expect.". But I find it concerning. And if
you take this seriously, it means: if you
124
00:11:37,689 --> 00:11:43,160
read about a study, like in a newspaper,
the default assumption should be 'that's
125
00:11:43,160 --> 00:11:51,179
not true' - while we might usually think
the opposite. And if science is a method
126
00:11:51,179 --> 00:11:55,709
to create evidence for whatever you like,
you can think about something really
127
00:11:55,709 --> 00:12:00,939
crazy, like "Can people see into the future?",
"Does our mind have
128
00:12:00,939 --> 00:12:09,720
some extra perception where we can
sense things that happen in an hour?". And
129
00:12:09,720 --> 00:12:15,559
there was a psychologist called Daryl Bem
and he thought that this is the case and
130
00:12:15,559 --> 00:12:20,399
he published a study on it. It was titled
"feeling the future". He did a lot of
131
00:12:20,399 --> 00:12:25,449
experiments where he did something, and
then something later happened, and he
132
00:12:25,449 --> 00:12:29,569
thought he had statistical evidence that
what happened later influenced what
133
00:12:29,569 --> 00:12:34,999
happened earlier. So, I don't think that's
very plausible - based on what we know
134
00:12:34,999 --> 00:12:41,550
about the universe, but yeah... and it was
published in a real psychology journal.
135
00:12:41,550 --> 00:12:46,680
And a lot of things were wrong with this
study. Basically, it's a very nice example
136
00:12:46,680 --> 00:12:51,009
for p-hacking and just even a book by
Daryl Bem, where he describes something
137
00:12:51,009 --> 00:12:55,040
which basically looks like p-hacking,
where he says that's how you do
138
00:12:55,040 --> 00:13:03,870
psychology. But the study was absolutely
in line with the existing standards in
139
00:13:03,870 --> 00:13:08,759
Experimental Psychology. And that a lot of
people found concerning. So, if you can
140
00:13:08,759 --> 00:13:13,619
show that precognition is real, that you
can see into the future, then what else
141
00:13:13,619 --> 00:13:19,139
can you show and how can we trust our
results? And psychology has debated this a
142
00:13:19,139 --> 00:13:21,880
lot in the past couple of years. So
there's a lot of talk about the
143
00:13:21,880 --> 00:13:30,009
replication crisis in psychology. And many
effects that psychology just thought were
144
00:13:30,009 --> 00:13:35,040
true, they figured out, okay, if they try
to repeat these experiments, they couldn't
145
00:13:35,040 --> 00:13:40,759
get these results even though entire
subfields were built on these results.
146
00:13:44,369 --> 00:13:48,069
And I want to show you an example, which
is one of the ones that is not discussed so
147
00:13:48,069 --> 00:13:55,540
much. So there's a theory which is called
moral licensing. And the idea is that if
148
00:13:55,540 --> 00:14:00,649
you do something good, or something you
think is good, then later basically you
149
00:14:00,649 --> 00:14:04,880
behave like an asshole. Because you think
I already did something good now, I don't
150
00:14:04,880 --> 00:14:10,689
have to be so nice anymore. And there were
some famous studies that had the theory,
151
00:14:10,689 --> 00:14:17,870
that people consume organic food, that
later they become more judgmental, or less
152
00:14:17,870 --> 00:14:27,949
social, less nice to their peers. But just
last week someone tried to replicate this
153
00:14:27,949 --> 00:14:32,720
original experiments. And they tried it
three times with more subjects and better
154
00:14:32,720 --> 00:14:39,010
research methodology and they totally
couldn't find that effect. But like what
155
00:14:39,010 --> 00:14:43,790
you've seen here is lots of media
articles. I have not found a single
156
00:14:43,790 --> 00:14:51,179
article reporting that this could not be
replicated. Maybe they will come but yeah
157
00:14:51,179 --> 00:14:57,360
there's just a very recent example. But
now I want to have a small warning for you
158
00:14:57,360 --> 00:15:01,319
because you may think now "yeah these
psychologists, that all sounds very
159
00:15:01,319 --> 00:15:05,329
fishy and they even believe in
precognition and whatever", but maybe your
160
00:15:05,329 --> 00:15:09,889
field is not much better maybe you just
don't know about it yet because nobody
161
00:15:09,889 --> 00:15:15,990
else has started replicating studies in
your field. And there are other fields
162
00:15:15,990 --> 00:15:21,670
that have replication problems and some
much worse for example the pharma company
163
00:15:21,670 --> 00:15:27,279
Amgen in 2012 they published something
where they said "We have tried to
164
00:15:27,279 --> 00:15:32,940
replicate cancer research and preclinical
research" that is stuff in a petri dish or
165
00:15:32,940 --> 00:15:38,869
animal experiments so not drugs on humans
but what happens before you develop a drug
166
00:15:38,869 --> 00:15:44,699
and they were only able to replicate 47
out of 53 studies. And these were they
167
00:15:44,699 --> 00:15:50,050
said landmark studies, so studies that
have been published in the best journals.
168
00:15:50,050 --> 00:15:54,099
Now there are a few problems with this
publication because they have not
169
00:15:54,099 --> 00:15:58,760
published their applications they have not
told us which studies these were that they
170
00:15:58,760 --> 00:16:02,730
could not replicate. In the meantime I
think they have published three of these
171
00:16:02,730 --> 00:16:07,290
replications but most of it is a bit in
the dark which points to another problem
172
00:16:07,290 --> 00:16:10,689
because they say they did this because
they collaborated with the original
173
00:16:10,689 --> 00:16:16,109
researchers and they only did this by
agreeing that they would not publish the
174
00:16:16,109 --> 00:16:22,379
results. But it still sounds very
concerning so but some fields don't have a
175
00:16:22,379 --> 00:16:27,170
replication problem because just nobody is
trying to replicate previous results I
176
00:16:27,170 --> 00:16:34,269
mean then you will never know if your
results hold up. So what can be done about
177
00:16:34,269 --> 00:16:42,930
all this and fundamentally I think the
core issue here is that the scientific
178
00:16:42,930 --> 00:16:49,970
process is tied together with results, so
we do a study and only after that we
179
00:16:49,970 --> 00:16:54,759
decide whether it's going to be published.
Or we do a study and only after we have
180
00:16:54,759 --> 00:17:01,230
the data we're trying to analyze it. So
essentially we need to decouple the
181
00:17:01,230 --> 00:17:09,800
scientific process from its results and
one way of doing that is pre-registration
182
00:17:09,800 --> 00:17:14,490
so what you're doing there is that before
you start doing a study you will register
183
00:17:14,490 --> 00:17:20,500
it in a public register and say "I'm gonna
do a study like on this medication or
184
00:17:20,500 --> 00:17:25,670
whatever on this psychological effect" and
that's how I'm gonna do it and then later
185
00:17:25,670 --> 00:17:33,980
on people can check if you really did
that. And yeah that's what I said. And this
186
00:17:33,980 --> 00:17:41,179
is more or less standard practice in
medical drug trials the summary about it
187
00:17:41,179 --> 00:17:47,130
is it does not work very well but it's
better than nothing. So, and the problem
188
00:17:47,130 --> 00:17:52,029
is mostly enforcement so people register
study and then don't publish it and
189
00:17:52,029 --> 00:17:57,190
nothing happens to them even though they
are legally required to publish it. And
190
00:17:57,190 --> 00:18:01,889
there are two campaigns I'd like to point
out, there's the all trials campaign which
191
00:18:01,889 --> 00:18:08,149
has been started by Ben Goldacre he's a
doctor from the UK and they like demand
192
00:18:08,149 --> 00:18:13,330
that like every trial it's done on
medication should be published. And
193
00:18:13,330 --> 00:18:18,870
there's also a project by the same guy the
compare project and they are trying to see
194
00:18:18,870 --> 00:18:25,380
if a medical trial has been registered and
later published did they do the same or
195
00:18:25,380 --> 00:18:29,480
did they change something in their
protocol and was there a reason for it or
196
00:18:29,480 --> 00:18:36,799
did they just change it to get a result,
which they otherwise wouldn't get.But then
197
00:18:36,799 --> 00:18:41,080
again like these issues in medicine they
offer get a lot of attention and for good
198
00:18:41,080 --> 00:18:46,820
reasons because if we have bad science in
medicine then people die, that's pretty
199
00:18:46,820 --> 00:18:52,960
immediate and pretty massive. But if you
read about this you always have to think
200
00:18:52,960 --> 00:18:58,510
that these issues in drug trials at least
they have pre-registration, most
201
00:18:58,510 --> 00:19:04,330
scientific fields don't bother doing
anything like that. So whenever you hear
202
00:19:04,330 --> 00:19:08,470
something about maybe about publication
bias in medicine you should always think
203
00:19:08,470 --> 00:19:12,630
the same thing happens in many fields of
science and usually nobody is doing
204
00:19:12,630 --> 00:19:18,809
anything about it. And particularly to
this audience I'd like to say there's
205
00:19:18,809 --> 00:19:23,580
currently a big trend that people from
computer science want to revolutionize
206
00:19:23,580 --> 00:19:30,300
medicine: big data and machine learning,
these things, which in principle is ok but
207
00:19:30,300 --> 00:19:34,750
I know a lot of people in medicine are
very worried about this and the reason is,
208
00:19:34,750 --> 00:19:39,470
that these computer science people don't
have the same scientific standards as
209
00:19:39,470 --> 00:19:44,399
people in medicine expect them and might
say "Yeah we don't need really need to do
210
00:19:44,399 --> 00:19:50,450
a study on this it's obvious that this
helps" and that is worrying and I come
211
00:19:50,450 --> 00:19:53,580
from computer science and I very well
understand that people from medicine are
212
00:19:53,580 --> 00:20:00,540
worried about this. So there's an idea
that goes even further as pre-registration
213
00:20:00,540 --> 00:20:05,210
and it's called registered reports. There
is a couple of years ago some scientists
214
00:20:05,210 --> 00:20:10,539
wrote an open letter to the Guardian where
they.. that was published there and the idea
215
00:20:10,539 --> 00:20:16,451
there is that you turn the scientific
publication process upside down, so if you
216
00:20:16,451 --> 00:20:21,210
want to do a study the first thing you
would do with the register report is, you
217
00:20:21,210 --> 00:20:27,000
submit your design your study design
protocol to the journal and then the
218
00:20:27,000 --> 00:20:33,110
journal decides whether they will publish
that before they see any result, because
219
00:20:33,110 --> 00:20:36,990
then you can prevent publication bias and
then you prevent the journals only publish
220
00:20:36,990 --> 00:20:42,710
the nice findings and ignore the negative
findings. And then you do the study and
221
00:20:42,710 --> 00:20:46,330
then it gets published but it gets
published independent of what the result
222
00:20:46,330 --> 00:20:53,830
was. And there of course other things you
can do to improve science, there's a lot
223
00:20:53,830 --> 00:20:58,610
of talk about sharing data, sharing code,
sharing methods because if you want to
224
00:20:58,610 --> 00:21:04,130
replicate a study it's of course easier if
you have access to all the details how the
225
00:21:04,130 --> 00:21:11,090
original study was done. Then you could
say "Okay we could do large
226
00:21:11,090 --> 00:21:15,269
collaborations" because many studies are
just too small if you have a study with
227
00:21:15,269 --> 00:21:19,630
twenty people you just don't get a very
reliable outcome. So maybe in many
228
00:21:19,630 --> 00:21:25,669
situations it would be better get together
10 teams of scientists and let them all do
229
00:21:25,669 --> 00:21:31,640
a big study together and then you can
reliably answer a question. And also some
230
00:21:31,640 --> 00:21:36,390
people propose just to get higher
statistical thresholds that p-value of
231
00:21:36,390 --> 00:21:42,260
0.05 means practically nothing. There was
recently a paper that just argued which
232
00:21:42,260 --> 00:21:47,880
would just like put the dot one more to
the left and have 0.005 and that would
233
00:21:47,880 --> 00:21:55,029
already solve a lot of problems. And for
example in physics they have they have
234
00:21:55,029 --> 00:22:00,870
something called Sigma 5 which is I think
zero point and then 5 zeroes and 3 or
235
00:22:00,870 --> 00:22:08,350
something like that so in physics they
have much higher statistical thresholds.
236
00:22:08,350 --> 00:22:13,210
Now whatever if you're working in any
scientific field you might ask yourself
237
00:22:13,210 --> 00:22:20,200
like "If we have statistic results are
they pre registered in any way and do we
238
00:22:20,200 --> 00:22:26,380
publish negative results?" like we tested
an effect and we got nothing and are there
239
00:22:26,380 --> 00:22:32,350
replications of all relevant results and I
would say if you answer all these
240
00:22:32,350 --> 00:22:36,289
questions with "no" which I think many
people will do, then you're not really
241
00:22:36,289 --> 00:22:41,510
doing science what you're doing is the
alchemy of our time.
242
00:22:41,510 --> 00:22:50,220
Applause
Thanks.
243
00:22:50,220 --> 00:22:54,499
Herald: Thank you very much..
Hanno: No I have more, sorry, I have
244
00:22:54,499 --> 00:23:03,060
three more slides, that was not the
finishing line. Big issue is also that
245
00:23:03,060 --> 00:23:09,830
there are bad incentives in science, so a
very standard thing to evaluate the impact
246
00:23:09,830 --> 00:23:15,710
of science is citation counts for you say
"if your scientific study is cited a lot
247
00:23:15,710 --> 00:23:18,960
then this is a good thing and if your
journal is cited a lot this is a good
248
00:23:18,960 --> 00:23:22,390
thing" and this for example the impact
factor but there are also other
249
00:23:22,390 --> 00:23:27,059
measurements. And also universities like
publicity so if your study gets a lot of
250
00:23:27,059 --> 00:23:33,490
media reports then your press department
likes you. And these incentives tend to
251
00:23:33,490 --> 00:23:40,200
favor interesting results but they don't
favor correct results and this is bad
252
00:23:40,200 --> 00:23:44,899
because if we are realistic most results
are not that interesting, most results
253
00:23:44,899 --> 00:23:49,879
will be "Yeah we have this interesting and
counterintuitive theory and it's totally
254
00:23:49,879 --> 00:24:00,470
wrong" and then there's this idea that
science is self-correcting. So if you
255
00:24:00,470 --> 00:24:05,320
confront scientists with these issues with
publication bias and peer hacking surely
256
00:24:05,320 --> 00:24:11,909
they will immediately change that's what
scientists do right? And I want to cite
257
00:24:11,909 --> 00:24:16,259
something here with this sorry it's a bit
long but "There are some evidence that
258
00:24:16,259 --> 00:24:21,329
inferior statistical tests are commonly
used research which yields non significant
259
00:24:21,329 --> 00:24:28,730
results is not published." That sounds
like publication bias and then it also
260
00:24:28,730 --> 00:24:32,450
says: "Significant results published in
these fields are seldom verified by
261
00:24:32,450 --> 00:24:37,889
independent replication" so it seems
there's a replication problem. These wise
262
00:24:37,889 --> 00:24:46,750
words were set in 1959, so by a
statistician called Theodore Sterling and
263
00:24:46,750 --> 00:24:52,059
because science is so self-correcting in
1995 he complained that this article
264
00:24:52,059 --> 00:24:56,389
presents evidence that published result of
scientific investigations are not a
265
00:24:56,389 --> 00:25:01,240
representative sample of all scientific
studies. "These results also indicate that
266
00:25:01,240 --> 00:25:06,899
practice leading to publication bias has
not changed over a period of 30 years" and
267
00:25:06,899 --> 00:25:13,030
here we are in 2018 and publication bias
is still a problem. So if science is self-
268
00:25:13,030 --> 00:25:21,090
correcting then it's pretty damn slow in
correcting itself, right? And finally I
269
00:25:21,090 --> 00:25:27,400
would like to ask you, if you're prepared
for boring science, because ultimately, I
270
00:25:27,400 --> 00:25:31,950
think, we have a choice between what I
would like to call TEDTalk science and
271
00:25:31,950 --> 00:25:40,980
boring science..
Applause
272
00:25:40,980 --> 00:25:46,779
.. so with tedtalk science we get mostly
positive and surprising results and
273
00:25:46,779 --> 00:25:53,380
interesting results we have large defects
many citations lots of media attention and
274
00:25:53,380 --> 00:26:00,139
you may have a TED talk about it.
Unfortunately usually it's not true and I
275
00:26:00,139 --> 00:26:03,820
would like to propose boring science as
the alternative which is mostly negative
276
00:26:03,820 --> 00:26:11,620
results, pretty boring, small effects but
it may be closer to the truth. And I would
277
00:26:11,620 --> 00:26:18,230
like to have boring science but I know
it's a pretty tough sell. Sorry I didn't
278
00:26:18,230 --> 00:26:35,280
hear that. Yeah, thanks for listening.
Applause
279
00:26:35,280 --> 00:26:38,480
Herald: Thank you.
Hanno: Two questions, or?
280
00:26:38,480 --> 00:26:41,030
Herald: We don't have that much time for
questions, three minutes, three minutes
281
00:26:41,030 --> 00:26:45,250
guys. Question one - shoot.
Mic: This isn't a question but I just
282
00:26:45,250 --> 00:26:48,700
wanted to comment Hanno you missed out a
very critical topic here, which is the use
283
00:26:48,700 --> 00:26:53,130
of Bayesian probability. So you did
conflate p-values with the scientific
284
00:26:53,130 --> 00:26:57,260
method which isn't.. which gave the rest
of you talk. I felt a slightly unnecessary
285
00:26:57,260 --> 00:27:02,380
anti science slant. On p, p-values isn't
the be-all and end-all of the scientific
286
00:27:02,380 --> 00:27:06,840
method so p-values is sort of calculating
the probability that your data will happen
287
00:27:06,840 --> 00:27:10,860
given that no hypothesis is true whereas
Bayesian probability would be calculating
288
00:27:10,860 --> 00:27:15,960
the probability that your hypothesis is
true given the data and more and more
289
00:27:15,960 --> 00:27:19,559
scientists are slowly starting to realize
that this sort of method is probably a
290
00:27:19,559 --> 00:27:25,809
better way of doing science than p-values.
So this is probably a a third alternative
291
00:27:25,809 --> 00:27:29,950
to your sort of proposal boring science is
doing the other side's Bayesian
292
00:27:29,950 --> 00:27:34,029
probability.
Hanno: Sorry yeah, I agree with you I
293
00:27:34,029 --> 00:27:37,530
unfortunately I only had
half an hour here.
294
00:27:37,530 --> 00:27:40,610
Herald: Where are you going after this
like where are we going after this lecture
295
00:27:40,610 --> 00:27:46,269
can they find you somewhere in the bar?
Hanno: I know him..
296
00:27:46,269 --> 00:27:50,559
Herald: You know science is broken but
then scientists it's a little bit like the
297
00:27:50,559 --> 00:27:54,990
next lecture actually that's waiting there
it's like: "you scratch my back and I
298
00:27:54,990 --> 00:27:59,160
scratch yours for publication". Hanno:
Maybe two more minutes?
299
00:27:59,160 --> 00:28:04,870
Herald: One minute.
Please go ahead.
300
00:28:04,870 --> 00:28:11,820
Mic: Yeah hi, thank you for your talk. I'm
curious so you've raised, you know, ways
301
00:28:11,820 --> 00:28:15,529
we can address this assuming good actors,
assuming people who want to do better
302
00:28:15,529 --> 00:28:20,769
science that this happens out of ignorance
or willful ignorance. What do we do about
303
00:28:20,769 --> 00:28:26,389
bad actors. So for example the medical
community drug companies, maybe they
304
00:28:26,389 --> 00:28:29,539
really like the idea of being profitably
incentivized by these random control
305
00:28:29,539 --> 00:28:34,929
trials, to make out essentially a placebo
do something. How do we begin to address
306
00:28:34,929 --> 00:28:40,639
them current trying to maliciously p-hack
or maliciously abuse the pre-reg system or
307
00:28:40,639 --> 00:28:44,409
something like that?
Hanno: I mean it's a big question, right?
308
00:28:44,409 --> 00:28:50,660
But I think if the standards are kind of
confining you so much that there's not
309
00:28:50,660 --> 00:28:56,380
much room to cheat that's way out right
and a basis and also I don't think
310
00:28:56,380 --> 00:29:00,110
deliberate cheating is that much of a
problem, I actually really think the
311
00:29:00,110 --> 00:29:07,120
bigger problem is people honestly
believe what they do is true.
312
00:29:07,120 --> 00:29:15,640
Herald: Okay one last, you sir, please?
Mic: So the value in science is often an
313
00:29:15,640 --> 00:29:20,559
account of publications right? Account of
citations so and so on, so is it true that
314
00:29:20,559 --> 00:29:24,799
to improve this situation you've
described, journals of whose publications
315
00:29:24,799 --> 00:29:31,120
are available, who are like prospective,
should impose more higher standards so the
316
00:29:31,120 --> 00:29:37,470
journals are those who must like raise the
bar, they should enforce publication of
317
00:29:37,470 --> 00:29:43,330
protocols before like accepting and etc
etc. So is it journals who should, like,
318
00:29:43,330 --> 00:29:49,340
do work on that or can we regular
scientists do something also? I mean you
319
00:29:49,340 --> 00:29:53,270
can publish in the journals that have
better standards, right? There are
320
00:29:53,270 --> 00:29:59,299
journals that have these registered
reports, but of course I mean as a single
321
00:29:59,299 --> 00:30:03,360
scientist is always difficult because
you're playing in a system that has all
322
00:30:03,360 --> 00:30:06,580
these wrong incentives.
Herald: Okay guys that's it, we have to
323
00:30:06,580 --> 00:30:12,670
shut down. Please. There is a reference
better science dot-org, go there, and one
324
00:30:12,670 --> 00:30:16,299
last request give really warm applause!
325
00:30:16,299 --> 00:30:24,249
Applause
326
00:30:24,249 --> 00:30:29,245
34c3 outro
327
00:30:29,245 --> 00:30:46,000
subtitles created by c3subtitles.de
in the year 2018. Join, and help us!