Thomas Fricke: Thank you very much for the
invitation. So second talk tomorrow –

Thank you – ähm today. So this is my
background. More or less I do, Kubernetes

security and critical infrastructure,
founded several companies and are now my

main focus is on Kubernetes security. This
this rabbit hole of Kubernetes, if you

look deeper into it, then you should be a
little bit scared, and I want to explain

why. The first approach is the
application, and then the application

normally is run in containers. And the
containers, what is not really well known,

have access to service accounts in
Kubernetes, which is one of the major

flaws in Kubernetes at the moment. If you
take over the service account, it might be

that you can take over a cluster. and if
you can take over a cluster, you might

take over a node and then your entire
cloud service account, Which is the work

of somebody else, I will mention later on
these slides. So let's look what happens:

So, the target is I have an application
exposed to the internet, and I want to own

the entire cluster from outside.
Application might be vulnerable. Examples?

Yeah, lots of them. One example I want to
present is imagetragick, who normally

should not do eval or exec statements in
any framework – should be PHP, NodeJS, or

any other framework – and execute commands
in the context of your application,

because something can go wrong and
developers are responsible for this. Let's

see how it looks like: This is the attack
model based on an attack. I thought it was

old and has been fixed in 2016, but now
there was a new overview by Emil Lerner,

who again showed, yes, you can, in current
versions of ImageMagick, exploit this

attack. So, it works. ImageMagick is for
uploading images, so you convert the image

in a different format, scale the size, and
then if you do something wrong in this

image, you can own the entire container.
This also works for non containerized

applications if you have a server running
something with ImageMagick on it. Please

be careful. OK. If we have mastered this
step, the next step is, yes, we want

access to the service account. And this is
by default enabled in Kubernetes: So, you

have a Kubernetes design flaw because your
service account is exposed to the

container where the application runs it.
The next step of an attacker is

installation of additional software. So,
you want to take over. You need a curl or

kubectl or chmod, and then you are owner
of the service account and can actually do

commands by uploading pictures in
ImageTragick. So responsible for this flaw

is the image creator. Let's see what else
can happen. To get total control, you also

need role-binding to a cluster-admin role.
This is not enabled by default, but the

internet is always good for bad advice. So
if you copy the installation requirements

or recommendations from the internet,
somebody else might take over the entire

cluster. Let's look deeper into it: Worst
practice here is what you can see in the

elastic installation recommendation: They
just mentioned they have a newer version,

but they use the cluster admin permissions
here to install ElasticSearch in your

Kubernetes cluster. So they recommend it
and a lot of other applications also have

this – which is a little bit outdated, but
it's quite common – in the installation

requirements. Never, ever do this, please.
It also can come with Helm Charts, so you

have Helm Charts where the cluster-admin
role is included. Here you see it, it was

in Apache Heron, which is an Apache
project, and it uses the cluster-admin

role, so by a helm install you might be
affected by this flaw to. So with these

four steps, which effectively are three
steps, you have a cluster application

exposed, and through that path, you can
take over the entire cluster from the

outside, and do anything what the cluster-
admin world can do. Effectively, is this

cluster-admin role-binding is like a
doormat attack, so you have the best

cryptography, the most expensive locks on
one side and then you put the lock under

the doormat or under the flower at the
door or something like that. This is

something which is, not really, what you
want. I can do an example walkthrough

which shows how it goes. So, I've
published all my trainings notebooks on

GitHub. Here's the way you can build this
out-dated ImageTragick version in

OpenShift. So, I use CRC, which is the
code-ready container version. It's based

on the ImageTragick proof of concept by
Mike Williams. And here you run and create

a vulnerable image. A little bit lengthy.
It's compiled inside and so on. So, don't

get a full Version. Which is the reason
why I don't show it here, but effectively

at the end, you have a vulnerable
application in a container internal and in

OpenShift. And that's exactly what we need
to run the application. Here is the

exploit. And the exploit starts with the
deployment of this container, which is

standard Kubernetes. Here "oc" is like
kubesctl. So, you get an overview.

Additionally, in OpenShift, you have a
very simple version of creating a root,

which is connected to a hostname, and then
you can upload it by using that hostname.

You expose the deployment, you expose the
service which is created, you expose the

route finally, and then you have access.
The next step is you get this root and

then here you have a URL, which you can
use. And in a full demo, I would just

simply call this URL and then I can upload
images here. I've created these files,

which are valid postscript files, but you
see at the end there is a full command.

And here, because there's a curl in the
container, I can download a version of

kubectl. Effectively, the containers,
specially the RedHat containers are not so

vulnerable as others, but you have always
writable temp, which is enough to deploy

some software. So, we curl kubectl from
the internet, put it into temp, and then

we use a simple chmod command to activate
kubectl. So now we can call kubectl

commands from inside an image. It's a
death bells, more or less so. Exactly at

the right place. We have a working exploit
now and warning, it might also already

work in older versions of Kubernetes.
Because in newer versions will need some.

Pill of poison, additionally, and this is
exactly this cluster all binding to the

cluster admin, which needs to be done,
that we have full access from the outside

and if we do this, and expose our cluster
admin account to the same account, which

is already exposed inside the container,
we can execute commands with this kubectl

so we can create deployments by uploading
pictures. Which is exactly what you never

want, but an attacker now has full access
to your cluster by simply uploading

prepared malicious pictures. Can do this.
So this is an example here, just. Create

and delete. Containers and deployments
this way, you can effectively do

everything. And again, this is the problem
here from the application side. If you

have a vulnerable version of ImageMagick,
you can include commands, and you can

definitely install software on the
Kubernetes server side. There are several

trys to fix this. For example, you can use
better images like Red Hat does, so this

is a Red Hat health index, which is quite
good, but effectively these images have

the advantage only that you not run
anything as root. But you run the same as

another user I.D. and it's the same user
is allowed to write to the temp directory,

effectively, yeah, you don't need root for
installing software. So, the container

also was good practice, no root inside, it
has an immutable root file system, but the

curl which is completely unnecessary, was
also deployed, we had write access to

temp. We had a chmod. And the first thing
you would prevent. All the stuff I'm doing

here is and if you're going to and don't
learn anything from this talk, please go.

Look into your service account and try to
disable the automountServiceAccountToken

features, so all of the service accounts
which are not running operators don't need

this service account open. If you have an
operator, it might be broken now and it

can be, um, overwritten by the Pod
definition, but effectively this. entire

example would not work without this
service account token. So, we have fixed

that. We cannot fix the application
because this is something, uh, somebody

else is creating for us, and we might even
have a floor which is not affected, so

there might be a zero-day. The next thing
we must prevent is the installation of

software. Fix the images, so use really
immutable images. Temp only if you need

it. PID is 1, anyway. Uh, OK, you might
have some variable data, but you should

use containers from scratch, no curl, no
wget and this also affects Red Hat UBIs

And most of the standard images have this
flaw, so you have a full operating system

inside with all the tools you like. But
this is not your territory. It's just,

yeah, it's a tool for the attacker. So
please run only trusted images, build your

own images and build them from scratch.
This is my example I also have uploaded to

GitHub, how to harden the container, which
is based on nginx alpine. nginx alpine

normally is a very small container, but
you can do more. You can use the script,

which is in this repository, just to get
only the tools you need. So this is not

statically linked because the original
engine is not statically linked. But it's

very close. This means you only positively
install the software you need. This is

dynamically linked, therefor the -d, so we
use LVD. Extract all the dynamic link

libraries and then all the configuration
files which are necessary. It is the

password registry group. OK. Some licenses
and share. Need some directories for

logging and then you can install it from
scratch because this script installs it in

a directory \temp\harden and you can with
this. Multi-stage build you can install

all what you need from \temp\harden. And
then the next container is based on

scratch and you can use nginx the same way
you would use it more or less. An

application which is statically linked. So
now we have created a hardened image

without kubectl, curl inside. So, we are
much closer to a secure application. The

next thing is, yeah, role binding to
cluster admin role. Don't do this. If

something in your application goes wrong,
you have additional measures, which you

can take just to prevent the application
from break-out of the container. So, you

can separate the internet exposure of
services or ingresses in Kubernetes from

privilege operations. So you have node
settings. ElasticSearch is doing a lot of

these things, so a lot is really not true
so, doing a sysctl. Some applications have

hostPaths on or have connection to the
host inter-process communication, which is

not necessary if you have exposed it and
then separate the applications who need

this from the applications which don't
need it. So, cluster admin should be more

or less restricted to very privileged
operators. And by the way, Argo is also a

very privileged operator. Don't run an
Argo on a Kubernetes cluster in a security

critical environment because I've seen
Argo also is binding to cluster admin. It

doesn't mean that Argo by default is
unsafe, but it's a very complex

application and I would definitely run it
in a separate cluster, not in the critical

cluster. And what does an architecture fix
look like, here you have the lifecycle of

a Pod, so the time is going to from left
to right. Here you see if the container is

ready, it can be accessed from the
internet. And if you do something from the

init system, like a sysctl, please do it
inside a container which is not connected

to the internet, just to use the pause
container, as a pause container to limit

it and restrict it and that is not really
connected to the network. So, this is

something which covers the architecture.
Additionally, I already mentioned here the

network policy which will come later, so
this is our threat matrix. We have exposed

and not exposed services. You have
unprivileged and privileged things. The

dangerous ones are the privileged ones
which are exposed, but normally you only

have an exposed privileged application if
you have an IDE running in Kubernetes,

which is not what I would like to see in
critical infrastructure, something like

rstudio or have a web ui to a gitops
framework. And normally you only have a

web application. And what should not be
exposed under normal conditions is an

operator's sysctl, build systems, host
operators and so on. If you do this, it's

virtually not possible to own the cluster,
you should do all the three because if you

have security in depth, you can make a
mistake on one of these levels and the

other means other levels keep you from
being exploited. You can even do more

isolation on the network side, you have
network policies for egress on the node

side, you can activate seccomp, gvisor,
and the common Frameworks, SELinux,

Apparmor. You can use PodSecurity
policies, or in the future, the open

policy agent to prevent the node from
being hacked. For the identity and access

management, you should use individual
service accounts for all your tasks. So

you have enough of a lot of roles. You
should use role based access control to

check this. OK, but I promise, yes, we can
go even deeper, and this needs a little

help from your cloud administrator and
here, the example from Nico Meisenzahl,

who does a very similar example on hi-
jacking Kubernetes, and he's doing it,

obviously in one of the clouds. And what
he has found out is you can get access to

the azure.json file, which has user
assigned identities. This is not the

Kubernetes identities. This is the Azure
identity. You can get a token, you can get

a subscription, you can get a resource
group and then you can use a curl command,

with this token, to change things on the
API version of this resource group with

this subscription. So, you might be able
to hack your node with the privilege

container and then take over your cloud
account. And he told me that this is also

the truth for the other cloud, so it might
even work something similar in AWS and

GCP. So please, also protect your cloud
account. Understand your identity and

access management in the cloud. So, at
least, someone in the team should

understand it. And limit also the
underlying account to the bare minimum. It

might even be a good idea to block access
addresses like 169.254.something. And the

other clouds, as I already mentioned, also
might be affected. And my call to the

cloud providers, is don't deliver account
data in containers or nodes. This is not

necessary. It's yes, it's very
comfortable, as the service account

talking is very comfortable for running
operators, but it's a major security flaw

and it might be that you lose all your
accounts and data. Conclusion: We have a

full attach chain from the application to
the cloud account. And it's your task to

prevent it and fix it. This is called
shared responsibility, so the cloud

providers effectively only care for the
infrastructure, but not really for the

security in your clou d. This is your
task. OK. Thank you for your attention, I

hope it was interesting. Please ask your
questions. And now I'm open for he Q&amp;A.

<i>Applaus</i>

Herald: Thank you for the talk. This is
working? Yeah. So do we have any questions

from the internet? I don't see any coming
in so far, but we, I think, usually a bit

ahead so I'll ask one:
Q: What do you think? So who's in in the

responsibility mainly to fix these
insecurities? Do you think this can be

fixed by better default in these
infrastructures and configuration files?

Is this to be fixed for better tutorials
and better education for the devop

engineers? What was the main point of
responsibility?

A: I definitely would prefer to have
secure default installations. But then you

have this shared responsibility in the
contracts: From a certain point, you are

responsible for the security of the
account, and we have seen this complexity

because this might be 20 steps. Every step
is very simple and every step is looking

very harmless, but all the steps together
might create a full exploit of a cloud. So

this must be overseen, and it's very hard
for developers who are cloud native and

are focusing on the application to have an
overview of the security. Developers now

have 10 or 100 times more code on the hard
disk than ten years before. And this means

developers are not able really to have a
full judgment about what is going on in

terms of security. This is something
developers talk about security, either

they are specialized on it or they have
not seen things like this. What I normally

notice: The developers are not aware of
these problems.

Q: OK. And what do you think, what can we
do about the complexity? So do you think

we need better education for people to
actually understand the systems? Or is

there a way in cloud infrastructures to
reduce the complexity?

A: Better education? And do all the simple
fixes. These are five steps, and the fixes

are also very simple. And you have to
check them and then you need a tool

because you might have 20 clusters like
this. Every cluster has 20 applications,

so this might be quite complicated. So you
need tools for an overview and in the

trainings material, you see examples how
you can check your Kubernetes clusters for

exploits like this.
Herald: OK, thank you very much. Thanks

for being here. We will continue in about
half an hour with the next talk, then

again in German. Thanks.
Thomas: Thank you very much. <i>Applaus</i>

Outro: Everything is licensed under CC BY
4.0. And it is all for the community, to

the unknown and for everyone.

Subtitles created by c3subtitles.de
in the year 2022. Join, and help us!