<i>34c3 intro</i>

Hanno Böck: Yeah, so many of you probably
know me from doing things around IT

security, but I'm gonna surprise you to
almost not talk about IT security today.

But I'm gonna ask the question "Can we
trust the scientific method?". I want to

start this by giving you which is quite a
simple example. So if we do science like

we start with the theory and then we are
trying to test if it's true, right? So I

mean I said I'm not going to talk about IT
security but I chose an example from IT

security or kind of from IT security. So
there was a post on Reddit a while ago,

a picture from some book which claimed that
if you use a Malachite crystal that can

protect you from computer viruses.
Which... to me doesn't sound very

plausible, right? Like, these are crystals and
if you put them on your computer, this book

claims this protects you from malware. But
of course if we really want to know, we

could do a study on this. And if you say
people don't do Studies on crazy things:

that's wrong. I mean people do studies on
homeopathy or all kinds of crazy things

that are completely implausible. So we can
do a study on this and what we will do is

we will do a randomized control trial,
which is kind of the gold standard of

doing a test on these kinds of things. So
this is our question: "Do Malachite

crystals prevent malware infections?" and
how we would test that, our study design

is: ok, we take a group of maybe 20
computer users. And then we split them

randomly to two groups, and then one group
we'll give one of these crystals and tell

them: "Put them on your desk or on your
computer.". Then we need, the other group

is our control group. That's very
important because if we want to know if

they help we need another group to compare
it to. And to rule out that there are any

kinds of placebo effects, we give these
control groups a fake Malachite crystal so

we can compare them against each other.
And then we wait for maybe six months and

then we check how many malware infections
they had. Now, I didn't do that study, but

I simulated it with a Python script and
given that I don't believe that this

theory is true I just simulated this as
random data. So I'm not going to go

through the whole script but I'm just like
generating, I'm assuming there can be

between 0 and 3 malware infections and
it's totally random and then I compare the

two groups. And then I calculate something
which is called a p-value which is a very

common thing in science whenever you do
statistics. A p-value is, it's a bit

technical, but it's the probability that
if you have no effect that you would get

this result. Which kind of in another way
means, if you have 20 results in an

idealized world then one of them is a
false positive which means one of them

says something happens although it
doesn't. And in many fields of science

this p-value of 0.05 is considered that
significant which is like these twenty

studies. So one error in twenty studies
but as I said under idealized conditions.

So and as it's the script and I can run it
in less than a second I just did it twenty

times instead of once. So here are my 20
simulated studies and most of them look

not very interesting so of course we have
a few random variations but nothing very

significant. Except if you look at this
one study, it says the people with the

Malachite crystal had on average 1.8
malware infections and the people with the

fake crystal had 0.8. So it means actually
the crystal made it worse. But also this

result is significant because it has a
p-value of 0.03. So of course we can

publish that, assuming I really did these
studies.

<i>applause</i>
B.: And the other studies we just forget

about. I mean they were not interesting
right and who cares? Non significant

results... Okay so you have just seen that
I created a significant result out of

random data. And that's concerning because
people in science - I mean you can really do

that. And this phenomena is called
publication bias. So what's happening here

is that, you're doing studies and if they
get a positive result - meaning you're

seeing an effect, then you publish them
and if there's no effect you just forget

about them. We learned earlier that with
this p-value of 0.05 means 1 in 20 studies

is a false positive, but you usually don't
see the studies that are not significant,

because they don't get published. And you
may wonder: "Ok, what's stopping a

scientist from doing exactly this? What's
stopping a scientist from just doing so

many experiments till one of them looks
like it's a real result although it's just

a random fluke?". And the disconcerning
answer to that is, it's usually nothing.

And this is not just a theoretical
example. I want to give you an example,

that has quite some impact and that was
researched very well, and that is a

research on antidepressants so called
SSRIs. And in 2008 there was a study, the

interesting situation here was, that the
US Food and Drug Administration, which is

the authority that decides whether a
medical drug can be put on the market,

they had knowledge about all the studies
that had been done to register this

medication. And then some researchers
looked at that and compared it with what

has been published. And they figured out
there were 38 studies that saw that these

medications had a real effect, had real
improvements for patients. And from those

38 studies 37 got published. But then
there were 36 studies that said: "These

medications don't really have any
effect.", "They are not really better than

a placebo effect" and out of those only 14
got published. And even from those 14

there were 11, where the researcher said,
okay they have spent the result in a way

that it sounds like these medications do
something. But they were also a bunch of

studies that were just not published
because they had a negative result. And

it's clear that if you look at the
published studies only and you ignore the

studies with a negative result that
haven't been published, then these

medications look much better than they
really are. And it's not like the earlier

example there is a real effect from
antidepressants, but they are not as good

as people have believed in the past.

So we've learnt in theory with publication bias

you can create result out of nothing.
But if you're a researcher and you have a

theory that's not true but you really want
to publish something about it, that's not

really efficient, because you have to do
20 studies on average to get one of these

random results that look like real
results. So there are more efficient ways

to get to a result from nothing. If you're
doing a study then there are a lot of

micro decisions you have to make, for
example you may have dropouts from your

study where people, I don't know they move
to another place or they - you now longer

reach them, so they are no longer part of
your study. And there are different things

how you can handle that. Then you may have
cornercase results, where you're not

entirely sure: "Is this an effect or not
and how do you decide?", "How do you

exactly measure?". And then also you may
be looking for different things, maybe

there are different tests you can do on
people, and you may control for certain

variables like "Do you split men and women
into separate?", "Do you see them

separately?" or "Do you separate them by
age?". So there are many decisions you can

make while doing a study. And of course
each of these decisions has a small effect

on the result. And it may very often be,
that just by trying all the combinations

you will get a p-value that looks like
it's statistically significant, although

there's no real effect. So and there's
this term called p-Hacking which means

you're just adjusting your methods long
enough, that you get a significant result.

And I'd like to point out here, that this
is usually not that a scientist says: "Ok,

today I'm going to p-hack my result,
because I know my theory is wrong but I

want to show it's true.". But it's a
subconscious process, because usually the

scientists believe in their theories.
Honestly. They honestly think that their

theory is true and that their research
will show that. So they may subconsciously

say: "Ok, if I analyze my data like this
it looks a bit better so I will do this.".

So subconsciously, they may p-hack
themselves into getting a result that's

not really there. And again we can ask:
"What is stopping scientists from

p-hacking?". And the concerning answer is
the same: usually nothing. And I came to

this conclusion that I say: "Ok, the
scientific method it's a way to create

evidence for whatever theory you like. No
matter if it's true or not.". And you may

say: "That's a pretty bold thing to say.".
and I'm saying this even though I'm not

even a scientist. I'm just like some
hacker who, whatever... But I'm not alone

in this, like there's a paper from a
famous researcher John Ioannidis, who

said: "Why most published research
findings are false.". He published this in

2005 and if you look at the title, he
doesn't really question that most research

findings are false. He only wants to give
reasons why this is the case. And he makes

some very possible assumptions if you look
at that many negative results don't get

published, and that you will have some
bias. And it comes to a very plausible

conclusion, that this is the case and this
is not even very controversial. If you ask

people who are doing what you can call
science on science or meta science, who

look at scientific methodology, they will
tell you: "Yeah, of course that's the

case.". Some will even say: "Yeah, that's
how science works, that's what we

expect.". But I find it concerning. And if
you take this seriously, it means: if you

read about a study, like in a newspaper,
the default assumption should be 'that's

not true' - while we might usually think
the opposite. And if science is a method

to create evidence for whatever you like,
you can think about something really

crazy, like "Can people see into the future?",
"Does our mind have

some extra perception where we can
sense things that happen in an hour?". And

there was a psychologist called Daryl Bem
and he thought that this is the case and

he published a study on it. It was titled
"feeling the future". He did a lot of

experiments where he did something, and
then something later happened, and he

thought he had statistical evidence that
what happened later influenced what

happened earlier. So, I don't think that's
very plausible - based on what we know

about the universe, but yeah... and it was
published in a real psychology journal.

And a lot of things were wrong with this
study. Basically, it's a very nice example

for p-hacking and just even a book by
Daryl Bem, where he describes something

which basically looks like p-hacking,
where he says that's how you do

psychology. But the study was absolutely
in line with the existing standards in

Experimental Psychology. And that a lot of
people found concerning. So, if you can

show that precognition is real, that you
can see into the future, then what else

can you show and how can we trust our
results? And psychology has debated this a

lot in the past couple of years. So
there's a lot of talk about the

replication crisis in psychology. And many
effects that psychology just thought were

true, they figured out, okay, if they try
to repeat these experiments, they couldn't

get these results even though entire
subfields were built on these results.

And I want to show you an example, which
is one of the ones that is not discussed so

much. So there's a theory which is called
moral licensing. And the idea is that if

you do something good, or something you
think is good, then later basically you

behave like an asshole. Because you think
I already did something good now, I don't

have to be so nice anymore. And there were
some famous studies that had the theory,

that people consume organic food, that
later they become more judgmental, or less

social, less nice to their peers. But just
last week someone tried to replicate this

original experiments. And they tried it
three times with more subjects and better

research methodology and they totally
couldn't find that effect. But like what

you've seen here is lots of media
articles. I have not found a single

article reporting that this could not be
replicated. Maybe they will come but yeah

there's just a very recent example. But
now I want to have a small warning for you

because you may think now "yeah these
psychologists, that all sounds very

fishy and they even believe in
precognition and whatever", but maybe your

field is not much better maybe you just
don't know about it yet because nobody

else has started replicating studies in
your field. And there are other fields

that have replication problems and some
much worse for example the pharma company

Amgen in 2012 they published something
where they said "We have tried to

replicate cancer research and preclinical
research" that is stuff in a petri dish or

animal experiments so not drugs on humans
but what happens before you develop a drug

and they were only able to replicate 47
out of 53 studies. And these were they

said landmark studies, so studies that
have been published in the best journals.

Now there are a few problems with this
publication because they have not

published their applications they have not
told us which studies these were that they

could not replicate. In the meantime I
think they have published three of these

replications but most of it is a bit in
the dark which points to another problem

because they say they did this because
they collaborated with the original

researchers and they only did this by
agreeing that they would not publish the

results. But it still sounds very
concerning so but some fields don't have a

replication problem because just nobody is
trying to replicate previous results I

mean then you will never know if your
results hold up. So what can be done about

all this and fundamentally I think the
core issue here is that the scientific

process is tied together with results, so
we do a study and only after that we

decide whether it's going to be published.
Or we do a study and only after we have

the data we're trying to analyze it. So
essentially we need to decouple the

scientific process from its results and
one way of doing that is pre-registration

so what you're doing there is that before
you start doing a study you will register

it in a public register and say "I'm gonna
do a study like on this medication or

whatever on this psychological effect" and
that's how I'm gonna do it and then later

on people can check if you really did
that. And yeah that's what I said. And this

is more or less standard practice in
medical drug trials the summary about it

is it does not work very well but it's
better than nothing. So, and the problem

is mostly enforcement so people register
study and then don't publish it and

nothing happens to them even though they
are legally required to publish it. And

there are two campaigns I'd like to point
out, there's the all trials campaign which

has been started by Ben Goldacre he's a
doctor from the UK and they like demand

that like every trial it's done on
medication should be published. And

there's also a project by the same guy the
compare project and they are trying to see

if a medical trial has been registered and
later published did they do the same or

did they change something in their
protocol and was there a reason for it or

did they just change it to get a result,
which they otherwise wouldn't get.But then

again like these issues in medicine they
offer get a lot of attention and for good

reasons because if we have bad science in
medicine then people die, that's pretty

immediate and pretty massive. But if you
read about this you always have to think

that these issues in drug trials at least
they have pre-registration, most

scientific fields don't bother doing
anything like that. So whenever you hear

something about maybe about publication
bias in medicine you should always think

the same thing happens in many fields of
science and usually nobody is doing

anything about it. And particularly to
this audience I'd like to say there's

currently a big trend that people from
computer science want to revolutionize

medicine: big data and machine learning,
these things, which in principle is ok but

I know a lot of people in medicine are
very worried about this and the reason is,

that these computer science people don't
have the same scientific standards as

people in medicine expect them and might
say "Yeah we don't need really need to do

a study on this it's obvious that this
helps" and that is worrying and I come

from computer science and I very well
understand that people from medicine are

worried about this. So there's an idea
that goes even further as pre-registration

and it's called registered reports. There
is a couple of years ago some scientists

wrote an open letter to the Guardian where
they.. that was published there and the idea

there is that you turn the scientific
publication process upside down, so if you

want to do a study the first thing you
would do with the register report is, you

submit your design your study design
protocol to the journal and then the

journal decides whether they will publish
that before they see any result, because

then you can prevent publication bias and
then you prevent the journals only publish

the nice findings and ignore the negative
findings. And then you do the study and

then it gets published but it gets
published independent of what the result

was. And there of course other things you
can do to improve science, there's a lot

of talk about sharing data, sharing code,
sharing methods because if you want to

replicate a study it's of course easier if
you have access to all the details how the

original study was done. Then you could
say "Okay we could do large

collaborations" because many studies are
just too small if you have a study with

twenty people you just don't get a very
reliable outcome. So maybe in many

situations it would be better get together
10 teams of scientists and let them all do

a big study together and then you can
reliably answer a question. And also some

people propose just to get higher
statistical thresholds that p-value of

0.05 means practically nothing. There was
recently a paper that just argued which

would just like put the dot one more to
the left and have 0.005 and that would

already solve a lot of problems. And for
example in physics they have they have

something called Sigma 5 which is I think
zero point and then 5 zeroes and 3 or

something like that so in physics they
have much higher statistical thresholds.

Now whatever if you're working in any
scientific field you might ask yourself

like "If we have statistic results are
they pre registered in any way and do we

publish negative results?" like we tested
an effect and we got nothing and are there

replications of all relevant results and I
would say if you answer all these

questions with "no" which I think many
people will do, then you're not really

doing science what you're doing is the
alchemy of our time.

<i>Applause</i>
Thanks.

Herald: Thank you very much..
Hanno: No I have more, sorry, I have

three more slides, that was not the
finishing line. Big issue is also that

there are bad incentives in science, so a
very standard thing to evaluate the impact

of science is citation counts for you say
"if your scientific study is cited a lot

then this is a good thing and if your
journal is cited a lot this is a good

thing" and this for example the impact
factor but there are also other

measurements. And also universities like
publicity so if your study gets a lot of

media reports then your press department
likes you. And these incentives tend to

favor interesting results but they don't
favor correct results and this is bad

because if we are realistic most results
are not that interesting, most results

will be "Yeah we have this interesting and
counterintuitive theory and it's totally

wrong" and then there's this idea that
science is self-correcting. So if you

confront scientists with these issues with
publication bias and peer hacking surely

they will immediately change that's what
scientists do right? And I want to cite

something here with this sorry it's a bit
long but "There are some evidence that

inferior statistical tests are commonly
used research which yields non significant

results is not published." That sounds
like publication bias and then it also

says: "Significant results published in
these fields are seldom verified by

independent replication" so it seems
there's a replication problem. These wise

words were set in 1959, so by a
statistician called Theodore Sterling and

because science is so self-correcting in
1995 he complained that this article

presents evidence that published result of
scientific investigations are not a

representative sample of all scientific
studies. "These results also indicate that

practice leading to publication bias has
not changed over a period of 30 years" and

here we are in 2018 and publication bias
is still a problem. So if science is self-

correcting then it's pretty damn slow in
correcting itself, right? And finally I

would like to ask you, if you're prepared
for boring science, because ultimately, I

think, we have a choice between what I
would like to call TEDTalk science and

boring science..
<i>Applause</i>

.. so with tedtalk science we get mostly
positive and surprising results and

interesting results we have large defects
many citations lots of media attention and

you may have a TED talk about it.
Unfortunately usually it's not true and I

would like to propose boring science as
the alternative which is mostly negative

results, pretty boring, small effects but
it may be closer to the truth. And I would

like to have boring science but I know
it's a pretty tough sell. Sorry I didn't

hear that. Yeah, thanks for listening.
<i>Applause</i>

Herald: Thank you.
Hanno: Two questions, or?

Herald: We don't have that much time for
questions, three minutes, three minutes

guys. Question one - shoot.
Mic: This isn't a question but I just

wanted to comment Hanno you missed out a
very critical topic here, which is the use

of Bayesian probability. So you did
conflate p-values with the scientific

method which isn't.. which gave the rest
of you talk. I felt a slightly unnecessary

anti science slant. On p, p-values isn't
the be-all and end-all of the scientific

method so p-values is sort of calculating
the probability that your data will happen

given that no hypothesis is true whereas
Bayesian probability would be calculating

the probability that your hypothesis is
true given the data and more and more

scientists are slowly starting to realize
that this sort of method is probably a

better way of doing science than p-values.
So this is probably a a third alternative

to your sort of proposal boring science is
doing the other side's Bayesian

probability.
Hanno: Sorry yeah, I agree with you I

unfortunately I only had
half an hour here.

Herald: Where are you going after this
like where are we going after this lecture

can they find you somewhere in the bar?
Hanno: I know him..

Herald: You know science is broken but
then scientists it's a little bit like the

next lecture actually that's waiting there
it's like: "you scratch my back and I

scratch yours for publication". Hanno:
Maybe two more minutes?

Herald: One minute.
Please go ahead.

Mic: Yeah hi, thank you for your talk. I'm
curious so you've raised, you know, ways

we can address this assuming good actors,
assuming people who want to do better

science that this happens out of ignorance
or willful ignorance. What do we do about

bad actors. So for example the medical
community drug companies, maybe they

really like the idea of being profitably
incentivized by these random control

trials, to make out essentially a placebo
do something. How do we begin to address

them current trying to maliciously p-hack
or maliciously abuse the pre-reg system or

something like that?
Hanno: I mean it's a big question, right?

But I think if the standards are kind of
confining you so much that there's not

much room to cheat that's way out right
and a basis and also I don't think

deliberate cheating is that much of a
problem, I actually really think the

bigger problem is people honestly
believe what they do is true.

Herald: Okay one last, you sir, please?
Mic: So the value in science is often an

account of publications right? Account of
citations so and so on, so is it true that

to improve this situation you've
described, journals of whose publications

are available, who are like prospective,
should impose more higher standards so the

journals are those who must like raise the
bar, they should enforce publication of

protocols before like accepting and etc
etc. So is it journals who should, like,

do work on that or can we regular
scientists do something also? I mean you

can publish in the journals that have
better standards, right? There are

journals that have these registered
reports, but of course I mean as a single

scientist is always difficult because
you're playing in a system that has all

these wrong incentives.
Herald: Okay guys that's it, we have to

shut down. Please. There is a reference
better science dot-org, go there, and one

last request give really warm applause!

<i>Applause</i>

<i>34c3 outro</i>

subtitles created by c3subtitles.de
in the year 2018. Join, and help us!