WEBVTT

00:00:00.099 --> 00:00:14.890
<i>34c3 intro</i>

00:00:14.890 --> 00:00:19.090
Hanno Böck: Yeah, so many of you probably
know me from doing things around IT

00:00:19.090 --> 00:00:25.000
security, but I'm gonna surprise you to
almost not talk about IT security today.

00:00:25.000 --> 00:00:32.189
But I'm gonna ask the question "Can we
trust the scientific method?". I want to

00:00:32.189 --> 00:00:38.809
start this by giving you which is quite a
simple example. So if we do science like

00:00:38.809 --> 00:00:45.210
we start with the theory and then we are
trying to test if it's true, right? So I

00:00:45.210 --> 00:00:49.760
mean I said I'm not going to talk about IT
security but I chose an example from IT

00:00:49.760 --> 00:00:56.690
security or kind of from IT security. So
there was a post on Reddit a while ago,

00:00:56.690 --> 00:01:01.329
a picture from some book which claimed that
if you use a Malachite crystal that can

00:01:01.329 --> 00:01:06.240
protect you from computer viruses.
Which... to me doesn't sound very

00:01:06.240 --> 00:01:11.009
plausible, right? Like, these are crystals and
if you put them on your computer, this book

00:01:11.009 --> 00:01:18.590
claims this protects you from malware. But
of course if we really want to know, we

00:01:18.590 --> 00:01:23.990
could do a study on this. And if you say
people don't do Studies on crazy things:

00:01:23.990 --> 00:01:28.770
that's wrong. I mean people do studies on
homeopathy or all kinds of crazy things

00:01:28.770 --> 00:01:34.549
that are completely implausible. So we can
do a study on this and what we will do is

00:01:34.549 --> 00:01:39.509
we will do a randomized control trial,
which is kind of the gold standard of

00:01:39.509 --> 00:01:46.310
doing a test on these kinds of things. So
this is our question: "Do Malachite

00:01:46.310 --> 00:01:52.479
crystals prevent malware infections?" and
how we would test that, our study design

00:01:52.479 --> 00:01:58.399
is: ok, we take a group of maybe 20
computer users. And then we split them

00:01:58.399 --> 00:02:06.009
randomly to two groups, and then one group
we'll give one of these crystals and tell

00:02:06.009 --> 00:02:10.919
them: "Put them on your desk or on your
computer.". Then we need, the other group

00:02:10.919 --> 00:02:15.800
is our control group. That's very
important because if we want to know if

00:02:15.800 --> 00:02:20.940
they help we need another group to compare
it to. And to rule out that there are any

00:02:20.940 --> 00:02:27.130
kinds of placebo effects, we give these
control groups a fake Malachite crystal so

00:02:27.130 --> 00:02:32.260
we can compare them against each other.
And then we wait for maybe six months and

00:02:32.260 --> 00:02:39.310
then we check how many malware infections
they had. Now, I didn't do that study, but

00:02:39.310 --> 00:02:45.090
I simulated it with a Python script and
given that I don't believe that this

00:02:45.090 --> 00:02:50.310
theory is true I just simulated this as
random data. So I'm not going to go

00:02:50.310 --> 00:02:55.090
through the whole script but I'm just like
generating, I'm assuming there can be

00:02:55.090 --> 00:02:59.950
between 0 and 3 malware infections and
it's totally random and then I compare the

00:02:59.950 --> 00:03:04.790
two groups. And then I calculate something
which is called a p-value which is a very

00:03:04.790 --> 00:03:10.730
common thing in science whenever you do
statistics. A p-value is, it's a bit

00:03:10.730 --> 00:03:17.290
technical, but it's the probability that
if you have no effect that you would get

00:03:17.290 --> 00:03:23.570
this result. Which kind of in another way
means, if you have 20 results in an

00:03:23.570 --> 00:03:29.260
idealized world then one of them is a
false positive which means one of them

00:03:29.260 --> 00:03:34.510
says something happens although it
doesn't. And in many fields of science

00:03:34.510 --> 00:03:41.180
this p-value of 0.05 is considered that
significant which is like these twenty

00:03:41.180 --> 00:03:48.620
studies. So one error in twenty studies
but as I said under idealized conditions.

00:03:48.620 --> 00:03:53.330
So and as it's the script and I can run it
in less than a second I just did it twenty

00:03:53.330 --> 00:03:59.821
times instead of once. So here are my 20
simulated studies and most of them look

00:03:59.821 --> 00:04:06.360
not very interesting so of course we have
a few random variations but nothing very

00:04:06.360 --> 00:04:12.460
significant. Except if you look at this
one study, it says the people with the

00:04:12.460 --> 00:04:17.160
Malachite crystal had on average 1.8
malware infections and the people with the

00:04:17.160 --> 00:04:24.670
fake crystal had 0.8. So it means actually
the crystal made it worse. But also this

00:04:24.670 --> 00:04:32.100
result is significant because it has a
p-value of 0.03. So of course we can

00:04:32.100 --> 00:04:36.110
publish that, assuming I really did these
studies.

00:04:36.110 --> 00:04:40.600
<i>applause</i>
B.: And the other studies we just forget

00:04:40.600 --> 00:04:45.850
about. I mean they were not interesting
right and who cares? Non significant

00:04:45.850 --> 00:04:52.990
results... Okay so you have just seen that
I created a significant result out of

00:04:52.990 --> 00:05:00.590
random data. And that's concerning because
people in science - I mean you can really do

00:05:00.590 --> 00:05:07.850
that. And this phenomena is called
publication bias. So what's happening here

00:05:07.850 --> 00:05:13.130
is that, you're doing studies and if they
get a positive result - meaning you're

00:05:13.130 --> 00:05:18.990
seeing an effect, then you publish them
and if there's no effect you just forget

00:05:18.990 --> 00:05:26.670
about them. We learned earlier that with
this p-value of 0.05 means 1 in 20 studies

00:05:26.670 --> 00:05:32.760
is a false positive, but you usually don't
see the studies that are not significant,

00:05:32.760 --> 00:05:39.320
because they don't get published. And you
may wonder: "Ok, what's stopping a

00:05:39.320 --> 00:05:43.500
scientist from doing exactly this? What's
stopping a scientist from just doing so

00:05:43.500 --> 00:05:47.750
many experiments till one of them looks
like it's a real result although it's just

00:05:47.750 --> 00:05:54.710
a random fluke?". And the disconcerning
answer to that is, it's usually nothing.

00:05:56.760 --> 00:06:03.620
And this is not just a theoretical
example. I want to give you an example,

00:06:03.620 --> 00:06:09.110
that has quite some impact and that was
researched very well, and that is a

00:06:09.110 --> 00:06:17.980
research on antidepressants so called
SSRIs. And in 2008 there was a study, the

00:06:17.980 --> 00:06:22.680
interesting situation here was, that the
US Food and Drug Administration, which is

00:06:22.680 --> 00:06:29.480
the authority that decides whether a
medical drug can be put on the market,

00:06:29.480 --> 00:06:35.490
they had knowledge about all the studies
that had been done to register this

00:06:35.490 --> 00:06:40.380
medication. And then some researchers
looked at that and compared it with what

00:06:40.380 --> 00:06:45.810
has been published. And they figured out
there were 38 studies that saw that these

00:06:45.810 --> 00:06:51.040
medications had a real effect, had real
improvements for patients. And from those

00:06:51.040 --> 00:06:56.790
38 studies 37 got published. But then
there were 36 studies that said: "These

00:06:56.790 --> 00:07:00.010
medications don't really have any
effect.", "They are not really better than

00:07:00.010 --> 00:07:06.530
a placebo effect" and out of those only 14
got published. And even from those 14

00:07:06.530 --> 00:07:11.010
there were 11, where the researcher said,
okay they have spent the result in a way

00:07:11.010 --> 00:07:17.920
that it sounds like these medications do
something. But they were also a bunch of

00:07:17.920 --> 00:07:21.870
studies that were just not published
because they had a negative result. And

00:07:21.870 --> 00:07:26.390
it's clear that if you look at the
published studies only and you ignore the

00:07:26.390 --> 00:07:29.320
studies with a negative result that
haven't been published, then these

00:07:29.320 --> 00:07:34.290
medications look much better than they
really are. And it's not like the earlier

00:07:34.290 --> 00:07:38.240
example there is a real effect from
antidepressants, but they are not as good

00:07:38.240 --> 00:07:40.210
as people have believed in the past.

00:07:43.020 --> 00:07:45.860
So we've learnt in theory with publication bias

00:07:45.860 --> 00:07:50.520
you can create result out of nothing.
But if you're a researcher and you have a

00:07:50.520 --> 00:07:54.790
theory that's not true but you really want
to publish something about it, that's not

00:07:54.790 --> 00:07:59.699
really efficient, because you have to do
20 studies on average to get one of these

00:07:59.699 --> 00:08:06.130
random results that look like real
results. So there are more efficient ways

00:08:06.130 --> 00:08:12.780
to get to a result from nothing. If you're
doing a study then there are a lot of

00:08:12.780 --> 00:08:17.320
micro decisions you have to make, for
example you may have dropouts from your

00:08:17.320 --> 00:08:22.150
study where people, I don't know they move
to another place or they - you now longer

00:08:22.150 --> 00:08:26.020
reach them, so they are no longer part of
your study. And there are different things

00:08:26.020 --> 00:08:30.480
how you can handle that. Then you may have
cornercase results, where you're not

00:08:30.480 --> 00:08:34.509
entirely sure: "Is this an effect or not
and how do you decide?", "How do you

00:08:34.509 --> 00:08:39.639
exactly measure?". And then also you may
be looking for different things, maybe

00:08:39.639 --> 00:08:46.620
there are different tests you can do on
people, and you may control for certain

00:08:46.620 --> 00:08:51.639
variables like "Do you split men and women
into separate?", "Do you see them

00:08:51.639 --> 00:08:56.430
separately?" or "Do you separate them by
age?". So there are many decisions you can

00:08:56.430 --> 00:09:02.050
make while doing a study. And of course
each of these decisions has a small effect

00:09:02.050 --> 00:09:10.399
on the result. And it may very often be,
that just by trying all the combinations

00:09:10.399 --> 00:09:15.230
you will get a p-value that looks like
it's statistically significant, although

00:09:15.230 --> 00:09:20.670
there's no real effect. So and there's
this term called p-Hacking which means

00:09:20.670 --> 00:09:25.550
you're just adjusting your methods long
enough, that you get a significant result.

00:09:27.050 --> 00:09:32.550
And I'd like to point out here, that this
is usually not that a scientist says: "Ok,

00:09:32.550 --> 00:09:36.259
today I'm going to p-hack my result,
because I know my theory is wrong but I

00:09:36.259 --> 00:09:42.420
want to show it's true.". But it's a
subconscious process, because usually the

00:09:42.420 --> 00:09:47.399
scientists believe in their theories.
Honestly. They honestly think that their

00:09:47.399 --> 00:09:52.040
theory is true and that their research
will show that. So they may subconsciously

00:09:52.040 --> 00:09:58.279
say: "Ok, if I analyze my data like this
it looks a bit better so I will do this.".

00:09:58.279 --> 00:10:05.079
So subconsciously, they may p-hack
themselves into getting a result that's

00:10:05.079 --> 00:10:11.449
not really there. And again we can ask:
"What is stopping scientists from

00:10:11.449 --> 00:10:22.009
p-hacking?". And the concerning answer is
the same: usually nothing. And I came to

00:10:22.009 --> 00:10:26.069
this conclusion that I say: "Ok, the
scientific method it's a way to create

00:10:26.069 --> 00:10:31.899
evidence for whatever theory you like. No
matter if it's true or not.". And you may

00:10:31.899 --> 00:10:35.720
say: "That's a pretty bold thing to say.".
and I'm saying this even though I'm not

00:10:35.720 --> 00:10:42.480
even a scientist. I'm just like some
hacker who, whatever... But I'm not alone

00:10:42.480 --> 00:10:47.759
in this, like there's a paper from a
famous researcher John Ioannidis, who

00:10:47.759 --> 00:10:51.529
said: "Why most published research
findings are false.". He published this in

00:10:51.529 --> 00:10:57.170
2005 and if you look at the title, he
doesn't really question that most research

00:10:57.170 --> 00:11:02.560
findings are false. He only wants to give
reasons why this is the case. And he makes

00:11:02.560 --> 00:11:08.499
some very possible assumptions if you look
at that many negative results don't get

00:11:08.499 --> 00:11:12.129
published, and that you will have some
bias. And it comes to a very plausible

00:11:12.129 --> 00:11:17.180
conclusion, that this is the case and this
is not even very controversial. If you ask

00:11:17.180 --> 00:11:23.491
people who are doing what you can call
science on science or meta science, who

00:11:23.491 --> 00:11:28.410
look at scientific methodology, they will
tell you: "Yeah, of course that's the

00:11:28.410 --> 00:11:32.079
case.". Some will even say: "Yeah, that's
how science works, that's what we

00:11:32.079 --> 00:11:37.689
expect.". But I find it concerning. And if
you take this seriously, it means: if you

00:11:37.689 --> 00:11:43.160
read about a study, like in a newspaper,
the default assumption should be 'that's

00:11:43.160 --> 00:11:51.179
not true' - while we might usually think
the opposite. And if science is a method

00:11:51.179 --> 00:11:55.709
to create evidence for whatever you like,
you can think about something really

00:11:55.709 --> 00:12:00.939
crazy, like "Can people see into the future?",
"Does our mind have


00:12:00.939 --> 00:12:09.720
some extra perception where we can
sense things that happen in an hour?". And

00:12:09.720 --> 00:12:15.559
there was a psychologist called Daryl Bem
and he thought that this is the case and

00:12:15.559 --> 00:12:20.399
he published a study on it. It was titled
"feeling the future". He did a lot of

00:12:20.399 --> 00:12:25.449
experiments where he did something, and
then something later happened, and he

00:12:25.449 --> 00:12:29.569
thought he had statistical evidence that
what happened later influenced what

00:12:29.569 --> 00:12:34.999
happened earlier. So, I don't think that's
very plausible - based on what we know

00:12:34.999 --> 00:12:41.550
about the universe, but yeah... and it was
published in a real psychology journal.

00:12:41.550 --> 00:12:46.680
And a lot of things were wrong with this
study. Basically, it's a very nice example

00:12:46.680 --> 00:12:51.009
for p-hacking and just even a book by
Daryl Bem, where he describes something

00:12:51.009 --> 00:12:55.040
which basically looks like p-hacking,
where he says that's how you do

00:12:55.040 --> 00:13:03.870
psychology. But the study was absolutely
in line with the existing standards in

00:13:03.870 --> 00:13:08.759
Experimental Psychology. And that a lot of
people found concerning. So, if you can

00:13:08.759 --> 00:13:13.619
show that precognition is real, that you
can see into the future, then what else

00:13:13.619 --> 00:13:19.139
can you show and how can we trust our
results? And psychology has debated this a

00:13:19.139 --> 00:13:21.880
lot in the past couple of years. So
there's a lot of talk about the

00:13:21.880 --> 00:13:30.009
replication crisis in psychology. And many
effects that psychology just thought were

00:13:30.009 --> 00:13:35.040
true, they figured out, okay, if they try
to repeat these experiments, they couldn't

00:13:35.040 --> 00:13:40.759
get these results even though entire
subfields were built on these results.

00:13:44.369 --> 00:13:48.069
And I want to show you an example, which
is one of the ones that is not discussed so

00:13:48.069 --> 00:13:55.540
much. So there's a theory which is called
moral licensing. And the idea is that if

00:13:55.540 --> 00:14:00.649
you do something good, or something you
think is good, then later basically you

00:14:00.649 --> 00:14:04.880
behave like an asshole. Because you think
I already did something good now, I don't

00:14:04.880 --> 00:14:10.689
have to be so nice anymore. And there were
some famous studies that had the theory,

00:14:10.689 --> 00:14:17.870
that people consume organic food, that
later they become more judgmental, or less

00:14:17.870 --> 00:14:27.949
social, less nice to their peers. But just
last week someone tried to replicate this

00:14:27.949 --> 00:14:32.720
original experiments. And they tried it
three times with more subjects and better

00:14:32.720 --> 00:14:39.010
research methodology and they totally
couldn't find that effect. But like what

00:14:39.010 --> 00:14:43.790
you've seen here is lots of media
articles. I have not found a single

00:14:43.790 --> 00:14:51.179
article reporting that this could not be
replicated. Maybe they will come but yeah

00:14:51.179 --> 00:14:57.360
there's just a very recent example. But
now I want to have a small warning for you

00:14:57.360 --> 00:15:01.319
because you may think now "yeah these
psychologists, that all sounds very

00:15:01.319 --> 00:15:05.329
fishy and they even believe in
precognition and whatever", but maybe your

00:15:05.329 --> 00:15:09.889
field is not much better maybe you just
don't know about it yet because nobody

00:15:09.889 --> 00:15:15.990
else has started replicating studies in
your field. And there are other fields

00:15:15.990 --> 00:15:21.670
that have replication problems and some
much worse for example the pharma company

00:15:21.670 --> 00:15:27.279
Amgen in 2012 they published something
where they said "We have tried to

00:15:27.279 --> 00:15:32.940
replicate cancer research and preclinical
research" that is stuff in a petri dish or

00:15:32.940 --> 00:15:38.869
animal experiments so not drugs on humans
but what happens before you develop a drug

00:15:38.869 --> 00:15:44.699
and they were only able to replicate 47
out of 53 studies. And these were they

00:15:44.699 --> 00:15:50.050
said landmark studies, so studies that
have been published in the best journals.

00:15:50.050 --> 00:15:54.099
Now there are a few problems with this
publication because they have not

00:15:54.099 --> 00:15:58.760
published their applications they have not
told us which studies these were that they

00:15:58.760 --> 00:16:02.730
could not replicate. In the meantime I
think they have published three of these

00:16:02.730 --> 00:16:07.290
replications but most of it is a bit in
the dark which points to another problem

00:16:07.290 --> 00:16:10.689
because they say they did this because
they collaborated with the original

00:16:10.689 --> 00:16:16.109
researchers and they only did this by
agreeing that they would not publish the

00:16:16.109 --> 00:16:22.379
results. But it still sounds very
concerning so but some fields don't have a

00:16:22.379 --> 00:16:27.170
replication problem because just nobody is
trying to replicate previous results I

00:16:27.170 --> 00:16:34.269
mean then you will never know if your
results hold up. So what can be done about

00:16:34.269 --> 00:16:42.930
all this and fundamentally I think the
core issue here is that the scientific

00:16:42.930 --> 00:16:49.970
process is tied together with results, so
we do a study and only after that we

00:16:49.970 --> 00:16:54.759
decide whether it's going to be published.
Or we do a study and only after we have

00:16:54.759 --> 00:17:01.230
the data we're trying to analyze it. So
essentially we need to decouple the

00:17:01.230 --> 00:17:09.800
scientific process from its results and
one way of doing that is pre-registration

00:17:09.800 --> 00:17:14.490
so what you're doing there is that before
you start doing a study you will register

00:17:14.490 --> 00:17:20.500
it in a public register and say "I'm gonna
do a study like on this medication or

00:17:20.500 --> 00:17:25.670
whatever on this psychological effect" and
that's how I'm gonna do it and then later

00:17:25.670 --> 00:17:33.980
on people can check if you really did
that. And yeah that's what I said. And this

00:17:33.980 --> 00:17:41.179
is more or less standard practice in
medical drug trials the summary about it

00:17:41.179 --> 00:17:47.130
is it does not work very well but it's
better than nothing. So, and the problem

00:17:47.130 --> 00:17:52.029
is mostly enforcement so people register
study and then don't publish it and

00:17:52.029 --> 00:17:57.190
nothing happens to them even though they
are legally required to publish it. And

00:17:57.190 --> 00:18:01.889
there are two campaigns I'd like to point
out, there's the all trials campaign which

00:18:01.889 --> 00:18:08.149
has been started by Ben Goldacre he's a
doctor from the UK and they like demand

00:18:08.149 --> 00:18:13.330
that like every trial it's done on
medication should be published. And

00:18:13.330 --> 00:18:18.870
there's also a project by the same guy the
compare project and they are trying to see

00:18:18.870 --> 00:18:25.380
if a medical trial has been registered and
later published did they do the same or

00:18:25.380 --> 00:18:29.480
did they change something in their
protocol and was there a reason for it or

00:18:29.480 --> 00:18:36.799
did they just change it to get a result,
which they otherwise wouldn't get.But then

00:18:36.799 --> 00:18:41.080
again like these issues in medicine they
offer get a lot of attention and for good

00:18:41.080 --> 00:18:46.820
reasons because if we have bad science in
medicine then people die, that's pretty

00:18:46.820 --> 00:18:52.960
immediate and pretty massive. But if you
read about this you always have to think

00:18:52.960 --> 00:18:58.510
that these issues in drug trials at least
they have pre-registration, most

00:18:58.510 --> 00:19:04.330
scientific fields don't bother doing
anything like that. So whenever you hear

00:19:04.330 --> 00:19:08.470
something about maybe about publication
bias in medicine you should always think

00:19:08.470 --> 00:19:12.630
the same thing happens in many fields of
science and usually nobody is doing

00:19:12.630 --> 00:19:18.809
anything about it. And particularly to
this audience I'd like to say there's

00:19:18.809 --> 00:19:23.580
currently a big trend that people from
computer science want to revolutionize

00:19:23.580 --> 00:19:30.300
medicine: big data and machine learning,
these things, which in principle is ok but

00:19:30.300 --> 00:19:34.750
I know a lot of people in medicine are
very worried about this and the reason is,

00:19:34.750 --> 00:19:39.470
that these computer science people don't
have the same scientific standards as

00:19:39.470 --> 00:19:44.399
people in medicine expect them and might
say "Yeah we don't need really need to do

00:19:44.399 --> 00:19:50.450
a study on this it's obvious that this
helps" and that is worrying and I come

00:19:50.450 --> 00:19:53.580
from computer science and I very well
understand that people from medicine are

00:19:53.580 --> 00:20:00.540
worried about this. So there's an idea
that goes even further as pre-registration

00:20:00.540 --> 00:20:05.210
and it's called registered reports. There
is a couple of years ago some scientists

00:20:05.210 --> 00:20:10.539
wrote an open letter to the Guardian where
they.. that was published there and the idea

00:20:10.539 --> 00:20:16.451
there is that you turn the scientific
publication process upside down, so if you

00:20:16.451 --> 00:20:21.210
want to do a study the first thing you
would do with the register report is, you

00:20:21.210 --> 00:20:27.000
submit your design your study design
protocol to the journal and then the

00:20:27.000 --> 00:20:33.110
journal decides whether they will publish
that before they see any result, because

00:20:33.110 --> 00:20:36.990
then you can prevent publication bias and
then you prevent the journals only publish

00:20:36.990 --> 00:20:42.710
the nice findings and ignore the negative
findings. And then you do the study and

00:20:42.710 --> 00:20:46.330
then it gets published but it gets
published independent of what the result

00:20:46.330 --> 00:20:53.830
was. And there of course other things you
can do to improve science, there's a lot

00:20:53.830 --> 00:20:58.610
of talk about sharing data, sharing code,
sharing methods because if you want to

00:20:58.610 --> 00:21:04.130
replicate a study it's of course easier if
you have access to all the details how the

00:21:04.130 --> 00:21:11.090
original study was done. Then you could
say "Okay we could do large

00:21:11.090 --> 00:21:15.269
collaborations" because many studies are
just too small if you have a study with

00:21:15.269 --> 00:21:19.630
twenty people you just don't get a very
reliable outcome. So maybe in many

00:21:19.630 --> 00:21:25.669
situations it would be better get together
10 teams of scientists and let them all do

00:21:25.669 --> 00:21:31.640
a big study together and then you can
reliably answer a question. And also some

00:21:31.640 --> 00:21:36.390
people propose just to get higher
statistical thresholds that p-value of

00:21:36.390 --> 00:21:42.260
0.05 means practically nothing. There was
recently a paper that just argued which

00:21:42.260 --> 00:21:47.880
would just like put the dot one more to
the left and have 0.005 and that would

00:21:47.880 --> 00:21:55.029
already solve a lot of problems. And for
example in physics they have they have

00:21:55.029 --> 00:22:00.870
something called Sigma 5 which is I think
zero point and then 5 zeroes and 3 or

00:22:00.870 --> 00:22:08.350
something like that so in physics they
have much higher statistical thresholds.

00:22:08.350 --> 00:22:13.210
Now whatever if you're working in any
scientific field you might ask yourself

00:22:13.210 --> 00:22:20.200
like "If we have statistic results are
they pre registered in any way and do we

00:22:20.200 --> 00:22:26.380
publish negative results?" like we tested
an effect and we got nothing and are there

00:22:26.380 --> 00:22:32.350
replications of all relevant results and I
would say if you answer all these

00:22:32.350 --> 00:22:36.289
questions with "no" which I think many
people will do, then you're not really

00:22:36.289 --> 00:22:41.510
doing science what you're doing is the
alchemy of our time.

00:22:41.510 --> 00:22:50.220
<i>Applause</i>
Thanks.

00:22:50.220 --> 00:22:54.499
Herald: Thank you very much..
Hanno: No I have more, sorry, I have

00:22:54.499 --> 00:23:03.060
three more slides, that was not the
finishing line. Big issue is also that

00:23:03.060 --> 00:23:09.830
there are bad incentives in science, so a
very standard thing to evaluate the impact

00:23:09.830 --> 00:23:15.710
of science is citation counts for you say
"if your scientific study is cited a lot

00:23:15.710 --> 00:23:18.960
then this is a good thing and if your
journal is cited a lot this is a good

00:23:18.960 --> 00:23:22.390
thing" and this for example the impact
factor but there are also other

00:23:22.390 --> 00:23:27.059
measurements. And also universities like
publicity so if your study gets a lot of

00:23:27.059 --> 00:23:33.490
media reports then your press department
likes you. And these incentives tend to

00:23:33.490 --> 00:23:40.200
favor interesting results but they don't
favor correct results and this is bad

00:23:40.200 --> 00:23:44.899
because if we are realistic most results
are not that interesting, most results

00:23:44.899 --> 00:23:49.879
will be "Yeah we have this interesting and
counterintuitive theory and it's totally

00:23:49.879 --> 00:24:00.470
wrong" and then there's this idea that
science is self-correcting. So if you

00:24:00.470 --> 00:24:05.320
confront scientists with these issues with
publication bias and peer hacking surely

00:24:05.320 --> 00:24:11.909
they will immediately change that's what
scientists do right? And I want to cite

00:24:11.909 --> 00:24:16.259
something here with this sorry it's a bit
long but "There are some evidence that

00:24:16.259 --> 00:24:21.329
inferior statistical tests are commonly
used research which yields non significant

00:24:21.329 --> 00:24:28.730
results is not published." That sounds
like publication bias and then it also

00:24:28.730 --> 00:24:32.450
says: "Significant results published in
these fields are seldom verified by

00:24:32.450 --> 00:24:37.889
independent replication" so it seems
there's a replication problem. These wise

00:24:37.889 --> 00:24:46.750
words were set in 1959, so by a
statistician called Theodore Sterling and

00:24:46.750 --> 00:24:52.059
because science is so self-correcting in
1995 he complained that this article

00:24:52.059 --> 00:24:56.389
presents evidence that published result of
scientific investigations are not a

00:24:56.389 --> 00:25:01.240
representative sample of all scientific
studies. "These results also indicate that

00:25:01.240 --> 00:25:06.899
practice leading to publication bias has
not changed over a period of 30 years" and

00:25:06.899 --> 00:25:13.030
here we are in 2018 and publication bias
is still a problem. So if science is self-

00:25:13.030 --> 00:25:21.090
correcting then it's pretty damn slow in
correcting itself, right? And finally I

00:25:21.090 --> 00:25:27.400
would like to ask you, if you're prepared
for boring science, because ultimately, I

00:25:27.400 --> 00:25:31.950
think, we have a choice between what I
would like to call TEDTalk science and

00:25:31.950 --> 00:25:40.980
boring science..
<i>Applause</i>

00:25:40.980 --> 00:25:46.779
.. so with tedtalk science we get mostly
positive and surprising results and

00:25:46.779 --> 00:25:53.380
interesting results we have large defects
many citations lots of media attention and

00:25:53.380 --> 00:26:00.139
you may have a TED talk about it.
Unfortunately usually it's not true and I

00:26:00.139 --> 00:26:03.820
would like to propose boring science as
the alternative which is mostly negative

00:26:03.820 --> 00:26:11.620
results, pretty boring, small effects but
it may be closer to the truth. And I would

00:26:11.620 --> 00:26:18.230
like to have boring science but I know
it's a pretty tough sell. Sorry I didn't

00:26:18.230 --> 00:26:35.280
hear that. Yeah, thanks for listening.
<i>Applause</i>

00:26:35.280 --> 00:26:38.480
Herald: Thank you.
Hanno: Two questions, or?

00:26:38.480 --> 00:26:41.030
Herald: We don't have that much time for
questions, three minutes, three minutes

00:26:41.030 --> 00:26:45.250
guys. Question one - shoot.
Mic: This isn't a question but I just

00:26:45.250 --> 00:26:48.700
wanted to comment Hanno you missed out a
very critical topic here, which is the use

00:26:48.700 --> 00:26:53.130
of Bayesian probability. So you did
conflate p-values with the scientific

00:26:53.130 --> 00:26:57.260
method which isn't.. which gave the rest
of you talk. I felt a slightly unnecessary

00:26:57.260 --> 00:27:02.380
anti science slant. On p, p-values isn't
the be-all and end-all of the scientific

00:27:02.380 --> 00:27:06.840
method so p-values is sort of calculating
the probability that your data will happen

00:27:06.840 --> 00:27:10.860
given that no hypothesis is true whereas
Bayesian probability would be calculating

00:27:10.860 --> 00:27:15.960
the probability that your hypothesis is
true given the data and more and more

00:27:15.960 --> 00:27:19.559
scientists are slowly starting to realize
that this sort of method is probably a

00:27:19.559 --> 00:27:25.809
better way of doing science than p-values.
So this is probably a a third alternative

00:27:25.809 --> 00:27:29.950
to your sort of proposal boring science is
doing the other side's Bayesian

00:27:29.950 --> 00:27:34.029
probability.
Hanno: Sorry yeah, I agree with you I

00:27:34.029 --> 00:27:37.530
unfortunately I only had
half an hour here.

00:27:37.530 --> 00:27:40.610
Herald: Where are you going after this
like where are we going after this lecture

00:27:40.610 --> 00:27:46.269
can they find you somewhere in the bar?
Hanno: I know him..

00:27:46.269 --> 00:27:50.559
Herald: You know science is broken but
then scientists it's a little bit like the

00:27:50.559 --> 00:27:54.990
next lecture actually that's waiting there
it's like: "you scratch my back and I

00:27:54.990 --> 00:27:59.160
scratch yours for publication". Hanno:
Maybe two more minutes?

00:27:59.160 --> 00:28:04.870
Herald: One minute.
Please go ahead.

00:28:04.870 --> 00:28:11.820
Mic: Yeah hi, thank you for your talk. I'm
curious so you've raised, you know, ways

00:28:11.820 --> 00:28:15.529
we can address this assuming good actors,
assuming people who want to do better

00:28:15.529 --> 00:28:20.769
science that this happens out of ignorance
or willful ignorance. What do we do about

00:28:20.769 --> 00:28:26.389
bad actors. So for example the medical
community drug companies, maybe they

00:28:26.389 --> 00:28:29.539
really like the idea of being profitably
incentivized by these random control

00:28:29.539 --> 00:28:34.929
trials, to make out essentially a placebo
do something. How do we begin to address

00:28:34.929 --> 00:28:40.639
them current trying to maliciously p-hack
or maliciously abuse the pre-reg system or

00:28:40.639 --> 00:28:44.409
something like that?
Hanno: I mean it's a big question, right?

00:28:44.409 --> 00:28:50.660
But I think if the standards are kind of
confining you so much that there's not

00:28:50.660 --> 00:28:56.380
much room to cheat that's way out right
and a basis and also I don't think

00:28:56.380 --> 00:29:00.110
deliberate cheating is that much of a
problem, I actually really think the

00:29:00.110 --> 00:29:07.120
bigger problem is people honestly
believe what they do is true.

00:29:07.120 --> 00:29:15.640
Herald: Okay one last, you sir, please?
Mic: So the value in science is often an

00:29:15.640 --> 00:29:20.559
account of publications right? Account of
citations so and so on, so is it true that

00:29:20.559 --> 00:29:24.799
to improve this situation you've
described, journals of whose publications

00:29:24.799 --> 00:29:31.120
are available, who are like prospective,
should impose more higher standards so the

00:29:31.120 --> 00:29:37.470
journals are those who must like raise the
bar, they should enforce publication of

00:29:37.470 --> 00:29:43.330
protocols before like accepting and etc
etc. So is it journals who should, like,

00:29:43.330 --> 00:29:49.340
do work on that or can we regular
scientists do something also? I mean you

00:29:49.340 --> 00:29:53.270
can publish in the journals that have
better standards, right? There are

00:29:53.270 --> 00:29:59.299
journals that have these registered
reports, but of course I mean as a single

00:29:59.299 --> 00:30:03.360
scientist is always difficult because
you're playing in a system that has all

00:30:03.360 --> 00:30:06.580
these wrong incentives.
Herald: Okay guys that's it, we have to

00:30:06.580 --> 00:30:12.670
shut down. Please. There is a reference
better science dot-org, go there, and one

00:30:12.670 --> 00:30:16.299
last request give really warm applause!

00:30:16.299 --> 00:30:24.249
<i>Applause</i>

00:30:24.249 --> 00:30:29.245
<i>34c3 outro</i>

00:30:29.245 --> 00:30:46.000
subtitles created by c3subtitles.de
in the year 2018. Join, and help us!