-
34c3 intro
-
Hanno Böck: Yeah, so many of you probably
know me from doing things around IT
-
security, but I'm gonna surprise you to
almost not talk about IT security today.
-
But I'm gonna ask the question "Can we
trust the scientific method?". I want to
-
start this by giving you which is quite a
simple example. So if we do science like
-
we start with the theory and then we are
trying to test if it's true, right? So I
-
mean I said I'm not going to talk about IT
security but I chose an example from IT
-
security or kind of from IT security. So
there was a post on Reddit a while ago,
-
a picture from some book which claimed that
if you use a Malachite crystal that can
-
protect you from computer viruses.
Which... to me doesn't sound very
-
plausible, right? Like, these are crystals and
if you put them on your computer, this book
-
claims this protects you from malware. But
of course if we really want to know, we
-
could do a study on this. And if you say
people don't do Studies on crazy things:
-
that's wrong. I mean people do studies on
homeopathy or all kinds of crazy things
-
that are completely implausible. So we can
do a study on this and what we will do is
-
we will do a randomized control trial,
which is kind of the gold standard of
-
doing a test on these kinds of things. So
this is our question: "Do Malachite
-
crystals prevent malware infections?" and
how we would test that, our study design
-
is: ok, we take a group of maybe 20
computer users. And then we split them
-
randomly to two groups, and then one group
we'll give one of these crystals and tell
-
them: "Put them on your desk or on your
computer.". Then we need, the other group
-
is our control group. That's very
important because if we want to know if
-
they help we need another group to compare
it to. And to rule out that there are any
-
kinds of placebo effects, we give these
control groups a fake Malachite crystal so
-
we can compare them against each other.
And then we wait for maybe six months and
-
then we check how many malware infections
they had. Now, I didn't do that study, but
-
I simulated it with a Python script and
given that I don't believe that this
-
theory is true I just simulated this as
random data. So I'm not going to go
-
through the whole script but I'm just like
generating, I'm assuming there can be
-
between 0 and 3 malware infections and
it's totally random and then I compare the
-
two groups. And then I calculate something
which is called a p-value which is a very
-
common thing in science whenever you do
statistics. A p-value is, it's a bit
-
technical, but it's the probability that
if you have no effect that you would get
-
this result. Which kind of in another way
means, if you have 20 results in an
-
idealized world then one of them is a
false positive which means one of them
-
says something happens although it
doesn't. And in many fields of science
-
this p-value of 0.05 is considered that
significant which is like these twenty
-
studies. So one error in twenty studies
but as I said under idealized conditions.
-
So and as it's the script and I can run it
in less than a second I just did it twenty
-
times instead of once. So here are my 20
simulated studies and most of them look
-
not very interesting so of course we have
a few random variations but nothing very
-
significant. Except if you look at this
one study, it says the people with the
-
Malachite crystal had on average 1.8
malware infections and the people with the
-
fake crystal had 0.8. So it means actually
the crystal made it worse. But also this
-
result is significant because it has a
p-value of 0.03. So of course we can
-
publish that, assuming I really did these
studies.
-
applause
B.: And the other studies we just forget
-
about. I mean they were not interesting
right and who cares? Non significant
-
results... Okay so you have just seen that
I created a significant result out of
-
random data. And that's concerning because
people in science - I mean you can really do
-
that. And this phenomena is called
publication bias. So what's happening here
-
is that, you're doing studies and if they
get a positive result - meaning you're
-
seeing an effect, then you publish them
and if there's no effect you just forget
-
about them. We learned earlier that with
this p-value of 0.05 means 1 in 20 studies
-
is a false positive, but you usually don't
see the studies that are not significant,
-
because they don't get published. And you
may wonder: "Ok, what's stopping a
-
scientist from doing exactly this? What's
stopping a scientist from just doing so
-
many experiments till one of them looks
like it's a real result although it's just
-
a random fluke?". And the disconcerning
answer to that is, it's usually nothing.
-
And this is not just a theoretical
example. I want to give you an example,
-
that has quite some impact and that was
researched very well, and that is a
-
research on antidepressants so called
SSRIs. And in 2008 there was a study, the
-
interesting situation here was, that the
US Food and Drug Administration, which is
-
the authority that decides whether a
medical drug can be put on the market,
-
they had knowledge about all the studies
that had been done to register this
-
medication. And then some researchers
looked at that and compared it with what
-
has been published. And they figured out
there were 38 studies that saw that these
-
medications had a real effect, had real
improvements for patients. And from those
-
38 studies 37 got published. But then
there were 36 studies that said: "These
-
medications don't really have any
effect.", "They are not really better than
-
a placebo effect" and out of those only 14
got published. And even from those 14
-
there were 11, where the researcher said,
okay they have spent the result in a way
-
that it sounds like these medications do
something. But they were also a bunch of
-
studies that were just not published
because they had a negative result. And
-
it's clear that if you look at the
published studies only and you ignore the
-
studies with a negative result that
haven't been published, then these
-
medications look much better than they
really are. And it's not like the earlier
-
example there is a real effect from
antidepressants, but they are not as good
-
as people have believed in the past.
-
So we've learnt in theory with publication bias
-
you can create result out of nothing.
But if you're a researcher and you have a
-
theory that's not true but you really want
to publish something about it, that's not
-
really efficient, because you have to do
20 studies on average to get one of these
-
random results that look like real
results. So there are more efficient ways
-
to get to a result from nothing. If you're
doing a study then there are a lot of
-
micro decisions you have to make, for
example you may have dropouts from your
-
study where people, I don't know they move
to another place or they - you now longer
-
reach them, so they are no longer part of
your study. And there are different things
-
how you can handle that. Then you may have
cornercase results, where you're not
-
entirely sure: "Is this an effect or not
and how do you decide?", "How do you
-
exactly measure?". And then also you may
be looking for different things, maybe
-
there are different tests you can do on
people, and you may control for certain
-
variables like "Do you split men and women
into separate?", "Do you see them
-
separately?" or "Do you separate them by
age?". So there are many decisions you can
-
make while doing a study. And of course
each of these decisions has a small effect
-
on the result. And it may very often be,
that just by trying all the combinations
-
you will get a p-value that looks like
it's statistically significant, although
-
there's no real effect. So and there's
this term called p-Hacking which means
-
you're just adjusting your methods long
enough, that you get a significant result.
-
And I'd like to point out here, that this
is usually not that a scientist says: "Ok,
-
today I'm going to p-hack my result,
because I know my theory is wrong but I
-
want to show it's true.". But it's a
subconscious process, because usually the
-
scientists believe in their theories.
Honestly. They honestly think that their
-
theory is true and that their research
will show that. So they may subconsciously
-
say: "Ok, if I analyze my data like this
it looks a bit better so I will do this.".
-
So subconsciously, they may p-hack
themselves into getting a result that's
-
not really there. And again we can ask:
"What is stopping scientists from
-
p-hacking?". And the concerning answer is
the same: usually nothing. And I came to
-
this conclusion that I say: "Ok, the
scientific method it's a way to create
-
evidence for whatever theory you like. No
matter if it's true or not.". And you may
-
say: "That's a pretty bold thing to say.".
and I'm saying this even though I'm not
-
even a scientist. I'm just like some
hacker who, whatever... But I'm not alone
-
in this, like there's a paper from a
famous researcher John Ioannidis, who
-
said: "Why most published research
findings are false.". He published this in
-
2005 and if you look at the title, he
doesn't really question that most research
-
findings are false. He only wants to give
reasons why this is the case. And he makes
-
some very possible assumptions if you look
at that many negative results don't get
-
published, and that you will have some
bias. And it comes to a very plausible
-
conclusion, that this is the case and this
is not even very controversial. If you ask
-
people who are doing what you can call
science on science or meta science, who
-
look at scientific methodology, they will
tell you: "Yeah, of course that's the
-
case.". Some will even say: "Yeah, that's
how science works, that's what we
-
expect.". But I find it concerning. And if
you take this seriously, it means: if you
-
read about a study, like in a newspaper,
the default assumption should be 'that's
-
not true' - while we might usually think
the opposite. And if science is a method
-
to create evidence for whatever you like,
you can think about something really
-
crazy, like "Can people see into the future?",
"Does our mind have
-
some extra perception where we can
sense things that happen in an hour?". And
-
there was a psychologist called Daryl Bem
and he thought that this is the case and
-
he published a study on it. It was titled
"feeling the future". He did a lot of
-
experiments where he did something, and
then something later happened, and he
-
thought he had statistical evidence that
what happened later influenced what
-
happened earlier. So, I don't think that's
very plausible - based on what we know
-
about the universe, but yeah... and it was
published in a real psychology journal.
-
And a lot of things were wrong with this
study. Basically, it's a very nice example
-
for p-hacking and just even a book by
Daryl Bem, where he describes something
-
which basically looks like p-hacking,
where he says that's how you do
-
psychology. But the study was absolutely
in line with the existing standards in
-
Experimental Psychology. And that a lot of
people found concerning. So, if you can
-
show that precognition is real, that you
can see into the future, then what else
-
can you show and how can we trust our
results? And psychology has debated this a
-
lot in the past couple of years. So
there's a lot of talk about the
-
replication crisis in psychology. And many
effects that psychology just thought were
-
true, they figured out, okay, if they try
to repeat these experiments, they couldn't
-
get these results even though entire
subfields were built on these results.
-
And I want to show you an example, which
is one of the ones that is not discussed so
-
much. So there's a theory which is called
moral licensing. And the idea is that if
-
you do something good, or something you
think is good, then later basically you
-
behave like an asshole. Because you think
I already did something good now, I don't
-
have to be so nice anymore. And there were
some famous studies that had the theory,
-
that people consume organic food, that
later they become more judgmental, or less
-
social, less nice to their peers. But just
last week someone tried to replicate this
-
original experiments. And they tried it
three times with more subjects and better
-
research methodology and they totally
couldn't find that effect. But like what
-
you've seen here is lots of media
articles. I have not found a single
-
article reporting that this could not be
replicated. Maybe they will come but yeah
-
there's just a very recent example. But
now I want to have a small warning for you
-
because you may think now "yeah these
psychologists, that all sounds very
-
fishy and they even believe in
precognition and whatever", but maybe your
-
field is not much better maybe you just
don't know about it yet because nobody
-
else has started replicating studies in
your field. And there are other fields
-
that have replication problems and some
much worse for example the pharma company
-
Amgen in 2012 they published something
where they said "We have tried to
-
replicate cancer research and preclinical
research" that is stuff in a petri dish or
-
animal experiments so not drugs on humans
but what happens before you develop a drug
-
and they were only able to replicate 47
out of 53 studies. And these were they
-
said landmark studies, so studies that
have been published in the best journals.
-
Now there are a few problems with this
publication because they have not
-
published their applications they have not
told us which studies these were that they
-
could not replicate. In the meantime I
think they have published three of these
-
replications but most of it is a bit in
the dark which points to another problem
-
because they say they did this because
they collaborated with the original
-
researchers and they only did this by
agreeing that they would not publish the
-
results. But it still sounds very
concerning so but some fields don't have a
-
replication problem because just nobody is
trying to replicate previous results I
-
mean then you will never know if your
results hold up. So what can be done about
-
all this and fundamentally I think the
core issue here is that the scientific
-
process is tied together with results, so
we do a study and only after that we
-
decide whether it's going to be published.
Or we do a study and only after we have
-
the data we're trying to analyze it. So
essentially we need to decouple the
-
scientific process from its results and
one way of doing that is pre-registration
-
so what you're doing there is that before
you start doing a study you will register
-
it in a public register and say "I'm gonna
do a study like on this medication or
-
whatever on this psychological effect" and
that's how I'm gonna do it and then later
-
on people can check if you really did
that. And yeah that's what I said. And this
-
is more or less standard practice in
medical drug trials the summary about it
-
is it does not work very well but it's
better than nothing. So, and the problem
-
is mostly enforcement so people register
study and then don't publish it and
-
nothing happens to them even though they
are legally required to publish it. And
-
there are two campaigns I'd like to point
out, there's the all trials campaign which
-
has been started by Ben Goldacre he's a
doctor from the UK and they like demand
-
that like every trial it's done on
medication should be published. And
-
there's also a project by the same guy the
compare project and they are trying to see
-
if a medical trial has been registered and
later published did they do the same or
-
did they change something in their
protocol and was there a reason for it or
-
did they just change it to get a result,
which they otherwise wouldn't get.But then
-
again like these issues in medicine they
offer get a lot of attention and for good
-
reasons because if we have bad science in
medicine then people die, that's pretty
-
immediate and pretty massive. But if you
read about this you always have to think
-
that these issues in drug trials at least
they have pre-registration, most
-
scientific fields don't bother doing
anything like that. So whenever you hear
-
something about maybe about publication
bias in medicine you should always think
-
the same thing happens in many fields of
science and usually nobody is doing
-
anything about it. And particularly to
this audience I'd like to say there's
-
currently a big trend that people from
computer science want to revolutionize
-
medicine: big data and machine learning,
these things, which in principle is ok but
-
I know a lot of people in medicine are
very worried about this and the reason is,
-
that these computer science people don't
have the same scientific standards as
-
people in medicine expect them and might
say "Yeah we don't need really need to do
-
a study on this it's obvious that this
helps" and that is worrying and I come
-
from computer science and I very well
understand that people from medicine are
-
worried about this. So there's an idea
that goes even further as pre-registration
-
and it's called registered reports. There
is a couple of years ago some scientists
-
wrote an open letter to the Guardian where
they.. that was published there and the idea
-
there is that you turn the scientific
publication process upside down, so if you
-
want to do a study the first thing you
would do with the register report is, you
-
submit your design your study design
protocol to the journal and then the
-
journal decides whether they will publish
that before they see any result, because
-
then you can prevent publication bias and
then you prevent the journals only publish
-
the nice findings and ignore the negative
findings. And then you do the study and
-
then it gets published but it gets
published independent of what the result
-
was. And there of course other things you
can do to improve science, there's a lot
-
of talk about sharing data, sharing code,
sharing methods because if you want to
-
replicate a study it's of course easier if
you have access to all the details how the
-
original study was done. Then you could
say "Okay we could do large
-
collaborations" because many studies are
just too small if you have a study with
-
twenty people you just don't get a very
reliable outcome. So maybe in many
-
situations it would be better get together
10 teams of scientists and let them all do
-
a big study together and then you can
reliably answer a question. And also some
-
people propose just to get higher
statistical thresholds that p-value of
-
0.05 means practically nothing. There was
recently a paper that just argued which
-
would just like put the dot one more to
the left and have 0.005 and that would
-
already solve a lot of problems. And for
example in physics they have they have
-
something called Sigma 5 which is I think
zero point and then 5 zeroes and 3 or
-
something like that so in physics they
have much higher statistical thresholds.
-
Now whatever if you're working in any
scientific field you might ask yourself
-
like "If we have statistic results are
they pre registered in any way and do we
-
publish negative results?" like we tested
an effect and we got nothing and are there
-
replications of all relevant results and I
would say if you answer all these
-
questions with "no" which I think many
people will do, then you're not really
-
doing science what you're doing is the
alchemy of our time.
-
Applause
Thanks.
-
Herald: Thank you very much..
Hanno: No I have more, sorry, I have
-
three more slides, that was not the
finishing line. Big issue is also that
-
there are bad incentives in science, so a
very standard thing to evaluate the impact
-
of science is citation counts for you say
"if your scientific study is cited a lot
-
then this is a good thing and if your
journal is cited a lot this is a good
-
thing" and this for example the impact
factor but there are also other
-
measurements. And also universities like
publicity so if your study gets a lot of
-
media reports then your press department
likes you. And these incentives tend to
-
favor interesting results but they don't
favor correct results and this is bad
-
because if we are realistic most results
are not that interesting, most results
-
will be "Yeah we have this interesting and
counterintuitive theory and it's totally
-
wrong" and then there's this idea that
science is self-correcting. So if you
-
confront scientists with these issues with
publication bias and peer hacking surely
-
they will immediately change that's what
scientists do right? And I want to cite
-
something here with this sorry it's a bit
long but "There are some evidence that
-
inferior statistical tests are commonly
used research which yields non significant
-
results is not published." That sounds
like publication bias and then it also
-
says: "Significant results published in
these fields are seldom verified by
-
independent replication" so it seems
there's a replication problem. These wise
-
words were set in 1959, so by a
statistician called Theodore Sterling and
-
because science is so self-correcting in
1995 he complained that this article
-
presents evidence that published result of
scientific investigations are not a
-
representative sample of all scientific
studies. "These results also indicate that
-
practice leading to publication bias has
not changed over a period of 30 years" and
-
here we are in 2018 and publication bias
is still a problem. So if science is self-
-
correcting then it's pretty damn slow in
correcting itself, right? And finally I
-
would like to ask you, if you're prepared
for boring science, because ultimately, I
-
think, we have a choice between what I
would like to call TEDTalk science and
-
boring science..
Applause
-
.. so with tedtalk science we get mostly
positive and surprising results and
-
interesting results we have large defects
many citations lots of media attention and
-
you may have a TED talk about it.
Unfortunately usually it's not true and I
-
would like to propose boring science as
the alternative which is mostly negative
-
results, pretty boring, small effects but
it may be closer to the truth. And I would
-
like to have boring science but I know
it's a pretty tough sell. Sorry I didn't
-
hear that. Yeah, thanks for listening.
Applause
-
Herald: Thank you.
Hanno: Two questions, or?
-
Herald: We don't have that much time for
questions, three minutes, three minutes
-
guys. Question one - shoot.
Mic: This isn't a question but I just
-
wanted to comment Hanno you missed out a
very critical topic here, which is the use
-
of Bayesian probability. So you did
conflate p-values with the scientific
-
method which isn't.. which gave the rest
of you talk. I felt a slightly unnecessary
-
anti science slant. On p, p-values isn't
the be-all and end-all of the scientific
-
method so p-values is sort of calculating
the probability that your data will happen
-
given that no hypothesis is true whereas
Bayesian probability would be calculating
-
the probability that your hypothesis is
true given the data and more and more
-
scientists are slowly starting to realize
that this sort of method is probably a
-
better way of doing science than p-values.
So this is probably a a third alternative
-
to your sort of proposal boring science is
doing the other side's Bayesian
-
probability.
Hanno: Sorry yeah, I agree with you I
-
unfortunately I only had
half an hour here.
-
Herald: Where are you going after this
like where are we going after this lecture
-
can they find you somewhere in the bar?
Hanno: I know him..
-
Herald: You know science is broken but
then scientists it's a little bit like the
-
next lecture actually that's waiting there
it's like: "you scratch my back and I
-
scratch yours for publication". Hanno:
Maybe two more minutes?
-
Herald: One minute.
Please go ahead.
-
Mic: Yeah hi, thank you for your talk. I'm
curious so you've raised, you know, ways
-
we can address this assuming good actors,
assuming people who want to do better
-
science that this happens out of ignorance
or willful ignorance. What do we do about
-
bad actors. So for example the medical
community drug companies, maybe they
-
really like the idea of being profitably
incentivized by these random control
-
trials, to make out essentially a placebo
do something. How do we begin to address
-
them current trying to maliciously p-hack
or maliciously abuse the pre-reg system or
-
something like that?
Hanno: I mean it's a big question, right?
-
But I think if the standards are kind of
confining you so much that there's not
-
much room to cheat that's way out right
and a basis and also I don't think
-
deliberate cheating is that much of a
problem, I actually really think the
-
bigger problem is people honestly
believe what they do is true.
-
Herald: Okay one last, you sir, please?
Mic: So the value in science is often an
-
account of publications right? Account of
citations so and so on, so is it true that
-
to improve this situation you've
described, journals of whose publications
-
are available, who are like prospective,
should impose more higher standards so the
-
journals are those who must like raise the
bar, they should enforce publication of
-
protocols before like accepting and etc
etc. So is it journals who should, like,
-
do work on that or can we regular
scientists do something also? I mean you
-
can publish in the journals that have
better standards, right? There are
-
journals that have these registered
reports, but of course I mean as a single
-
scientist is always difficult because
you're playing in a system that has all
-
these wrong incentives.
Herald: Okay guys that's it, we have to
-
shut down. Please. There is a reference
better science dot-org, go there, and one
-
last request give really warm applause!
-
Applause
-
34c3 outro
-
subtitles created by c3subtitles.de
in the year 2018. Join, and help us!