-
Thomas Fricke: Thank you very much for the
invitation. So second talk tomorrow –
-
Thank you – ähm today. So this is my
background. More or less I do, Kubernetes
-
security and critical infrastructure,
founded several companies and are now my
-
main focus is on Kubernetes security. This
this rabbit hole of Kubernetes, if you
-
look deeper into it, then you should be a
little bit scared, and I want to explain
-
why. The first approach is the
application, and then the application
-
normally is run in containers. And the
containers, what is not really well known,
-
have access to service accounts in
Kubernetes, which is one of the major
-
flaws in Kubernetes at the moment. If you
take over the service account, it might be
-
that you can take over a cluster. and if
you can take over a cluster, you might
-
take over a node and then your entire
cloud service account, Which is the work
-
of somebody else, I will mention later on
these slides. So let's look what happens:
-
So, the target is I have an application
exposed to the internet, and I want to own
-
the entire cluster from outside.
Application might be vulnerable. Examples?
-
Yeah, lots of them. One example I want to
present is imagetragick, who normally
-
should not do eval or exec statements in
any framework – should be PHP, NodeJS, or
-
any other framework – and execute commands
in the context of your application,
-
because something can go wrong and
developers are responsible for this. Let's
-
see how it looks like: This is the attack
model based on an attack. I thought it was
-
old and has been fixed in 2016, but now
there was a new overview by Emil Lerner,
-
who again showed, yes, you can, in current
versions of ImageMagick, exploit this
-
attack. So, it works. ImageMagick is for
uploading images, so you convert the image
-
in a different format, scale the size, and
then if you do something wrong in this
-
image, you can own the entire container.
This also works for non containerized
-
applications if you have a server running
something with ImageMagick on it. Please
-
be careful. OK. If we have mastered this
step, the next step is, yes, we want
-
access to the service account. And this is
by default enabled in Kubernetes: So, you
-
have a Kubernetes design flaw because your
service account is exposed to the
-
container where the application runs it.
The next step of an attacker is
-
installation of additional software. So,
you want to take over. You need a curl or
-
kubectl or chmod, and then you are owner
of the service account and can actually do
-
commands by uploading pictures in
ImageTragick. So responsible for this flaw
-
is the image creator. Let's see what else
can happen. To get total control, you also
-
need role-binding to a cluster-admin role.
This is not enabled by default, but the
-
internet is always good for bad advice. So
if you copy the installation requirements
-
or recommendations from the internet,
somebody else might take over the entire
-
cluster. Let's look deeper into it: Worst
practice here is what you can see in the
-
elastic installation recommendation: They
just mentioned they have a newer version,
-
but they use the cluster admin permissions
here to install ElasticSearch in your
-
Kubernetes cluster. So they recommend it
and a lot of other applications also have
-
this – which is a little bit outdated, but
it's quite common – in the installation
-
requirements. Never, ever do this, please.
It also can come with Helm Charts, so you
-
have Helm Charts where the cluster-admin
role is included. Here you see it, it was
-
in Apache Heron, which is an Apache
project, and it uses the cluster-admin
-
role, so by a helm install you might be
affected by this flaw to. So with these
-
four steps, which effectively are three
steps, you have a cluster application
-
exposed, and through that path, you can
take over the entire cluster from the
-
outside, and do anything what the cluster-
admin world can do. Effectively, is this
-
cluster-admin role-binding is like a
doormat attack, so you have the best
-
cryptography, the most expensive locks on
one side and then you put the lock under
-
the doormat or under the flower at the
door or something like that. This is
-
something which is, not really, what you
want. I can do an example walkthrough
-
which shows how it goes. So, I've
published all my trainings notebooks on
-
GitHub. Here's the way you can build this
out-dated ImageTragick version in
-
OpenShift. So, I use CRC, which is the
code-ready container version. It's based
-
on the ImageTragick proof of concept by
Mike Williams. And here you run and create
-
a vulnerable image. A little bit lengthy.
It's compiled inside and so on. So, don't
-
get a full Version. Which is the reason
why I don't show it here, but effectively
-
at the end, you have a vulnerable
application in a container internal and in
-
OpenShift. And that's exactly what we need
to run the application. Here is the
-
exploit. And the exploit starts with the
deployment of this container, which is
-
standard Kubernetes. Here "oc" is like
kubesctl. So, you get an overview.
-
Additionally, in OpenShift, you have a
very simple version of creating a root,
-
which is connected to a hostname, and then
you can upload it by using that hostname.
-
You expose the deployment, you expose the
service which is created, you expose the
-
route finally, and then you have access.
The next step is you get this root and
-
then here you have a URL, which you can
use. And in a full demo, I would just
-
simply call this URL and then I can upload
images here. I've created these files,
-
which are valid postscript files, but you
see at the end there is a full command.
-
And here, because there's a curl in the
container, I can download a version of
-
kubectl. Effectively, the containers,
specially the RedHat containers are not so
-
vulnerable as others, but you have always
writable temp, which is enough to deploy
-
some software. So, we curl kubectl from
the internet, put it into temp, and then
-
we use a simple chmod command to activate
kubectl. So now we can call kubectl
-
commands from inside an image. It's a
death bells, more or less so. Exactly at
-
the right place. We have a working exploit
now and warning, it might also already
-
work in older versions of Kubernetes.
Because in newer versions will need some.
-
Pill of poison, additionally, and this is
exactly this cluster all binding to the
-
cluster admin, which needs to be done,
that we have full access from the outside
-
and if we do this, and expose our cluster
admin account to the same account, which
-
is already exposed inside the container,
we can execute commands with this kubectl
-
so we can create deployments by uploading
pictures. Which is exactly what you never
-
want, but an attacker now has full access
to your cluster by simply uploading
-
prepared malicious pictures. Can do this.
So this is an example here, just. Create
-
and delete. Containers and deployments
this way, you can effectively do
-
everything. And again, this is the problem
here from the application side. If you
-
have a vulnerable version of ImageMagick,
you can include commands, and you can
-
definitely install software on the
Kubernetes server side. There are several
-
trys to fix this. For example, you can use
better images like Red Hat does, so this
-
is a Red Hat health index, which is quite
good, but effectively these images have
-
the advantage only that you not run
anything as root. But you run the same as
-
another user I.D. and it's the same user
is allowed to write to the temp directory,
-
effectively, yeah, you don't need root for
installing software. So, the container
-
also was good practice, no root inside, it
has an immutable root file system, but the
-
curl which is completely unnecessary, was
also deployed, we had write access to
-
temp. We had a chmod. And the first thing
you would prevent. All the stuff I'm doing
-
here is and if you're going to and don't
learn anything from this talk, please go.
-
Look into your service account and try to
disable the automountServiceAccountToken
-
features, so all of the service accounts
which are not running operators don't need
-
this service account open. If you have an
operator, it might be broken now and it
-
can be, um, overwritten by the Pod
definition, but effectively this. entire
-
example would not work without this
service account token. So, we have fixed
-
that. We cannot fix the application
because this is something, uh, somebody
-
else is creating for us, and we might even
have a floor which is not affected, so
-
there might be a zero-day. The next thing
we must prevent is the installation of
-
software. Fix the images, so use really
immutable images. Temp only if you need
-
it. PID is 1, anyway. Uh, OK, you might
have some variable data, but you should
-
use containers from scratch, no curl, no
wget and this also affects Red Hat UBIs
-
And most of the standard images have this
flaw, so you have a full operating system
-
inside with all the tools you like. But
this is not your territory. It's just,
-
yeah, it's a tool for the attacker. So
please run only trusted images, build your
-
own images and build them from scratch.
This is my example I also have uploaded to
-
GitHub, how to harden the container, which
is based on nginx alpine. nginx alpine
-
normally is a very small container, but
you can do more. You can use the script,
-
which is in this repository, just to get
only the tools you need. So this is not
-
statically linked because the original
engine is not statically linked. But it's
-
very close. This means you only positively
install the software you need. This is
-
dynamically linked, therefor the -d, so we
use LVD. Extract all the dynamic link
-
libraries and then all the configuration
files which are necessary. It is the
-
password registry group. OK. Some licenses
and share. Need some directories for
-
logging and then you can install it from
scratch because this script installs it in
-
a directory \temp\harden and you can with
this. Multi-stage build you can install
-
all what you need from \temp\harden. And
then the next container is based on
-
scratch and you can use nginx the same way
you would use it more or less. An
-
application which is statically linked. So
now we have created a hardened image
-
without kubectl, curl inside. So, we are
much closer to a secure application. The
-
next thing is, yeah, role binding to
cluster admin role. Don't do this. If
-
something in your application goes wrong,
you have additional measures, which you
-
can take just to prevent the application
from break-out of the container. So, you
-
can separate the internet exposure of
services or ingresses in Kubernetes from
-
privilege operations. So you have node
settings. ElasticSearch is doing a lot of
-
these things, so a lot is really not true
so, doing a sysctl. Some applications have
-
hostPaths on or have connection to the
host inter-process communication, which is
-
not necessary if you have exposed it and
then separate the applications who need
-
this from the applications which don't
need it. So, cluster admin should be more
-
or less restricted to very privileged
operators. And by the way, Argo is also a
-
very privileged operator. Don't run an
Argo on a Kubernetes cluster in a security
-
critical environment because I've seen
Argo also is binding to cluster admin. It
-
doesn't mean that Argo by default is
unsafe, but it's a very complex
-
application and I would definitely run it
in a separate cluster, not in the critical
-
cluster. And what does an architecture fix
look like, here you have the lifecycle of
-
a Pod, so the time is going to from left
to right. Here you see if the container is
-
ready, it can be accessed from the
internet. And if you do something from the
-
init system, like a sysctl, please do it
inside a container which is not connected
-
to the internet, just to use the pause
container, as a pause container to limit
-
it and restrict it and that is not really
connected to the network. So, this is
-
something which covers the architecture.
Additionally, I already mentioned here the
-
network policy which will come later, so
this is our threat matrix. We have exposed
-
and not exposed services. You have
unprivileged and privileged things. The
-
dangerous ones are the privileged ones
which are exposed, but normally you only
-
have an exposed privileged application if
you have an IDE running in Kubernetes,
-
which is not what I would like to see in
critical infrastructure, something like
-
rstudio or have a web ui to a gitops
framework. And normally you only have a
-
web application. And what should not be
exposed under normal conditions is an
-
operator's sysctl, build systems, host
operators and so on. If you do this, it's
-
virtually not possible to own the cluster,
you should do all the three because if you
-
have security in depth, you can make a
mistake on one of these levels and the
-
other means other levels keep you from
being exploited. You can even do more
-
isolation on the network side, you have
network policies for egress on the node
-
side, you can activate seccomp, gvisor,
and the common Frameworks, SELinux,
-
Apparmor. You can use PodSecurity
policies, or in the future, the open
-
policy agent to prevent the node from
being hacked. For the identity and access
-
management, you should use individual
service accounts for all your tasks. So
-
you have enough of a lot of roles. You
should use role based access control to
-
check this. OK, but I promise, yes, we can
go even deeper, and this needs a little
-
help from your cloud administrator and
here, the example from Nico Meisenzahl,
-
who does a very similar example on hi-
jacking Kubernetes, and he's doing it,
-
obviously in one of the clouds. And what
he has found out is you can get access to
-
the azure.json file, which has user
assigned identities. This is not the
-
Kubernetes identities. This is the Azure
identity. You can get a token, you can get
-
a subscription, you can get a resource
group and then you can use a curl command,
-
with this token, to change things on the
API version of this resource group with
-
this subscription. So, you might be able
to hack your node with the privilege
-
container and then take over your cloud
account. And he told me that this is also
-
the truth for the other cloud, so it might
even work something similar in AWS and
-
GCP. So please, also protect your cloud
account. Understand your identity and
-
access management in the cloud. So, at
least, someone in the team should
-
understand it. And limit also the
underlying account to the bare minimum. It
-
might even be a good idea to block access
addresses like 169.254.something. And the
-
other clouds, as I already mentioned, also
might be affected. And my call to the
-
cloud providers, is don't deliver account
data in containers or nodes. This is not
-
necessary. It's yes, it's very
comfortable, as the service account
-
talking is very comfortable for running
operators, but it's a major security flaw
-
and it might be that you lose all your
accounts and data. Conclusion: We have a
-
full attach chain from the application to
the cloud account. And it's your task to
-
prevent it and fix it. This is called
shared responsibility, so the cloud
-
providers effectively only care for the
infrastructure, but not really for the
-
security in your clou d. This is your
task. OK. Thank you for your attention, I
-
hope it was interesting. Please ask your
questions. And now I'm open for he Q&A.
-
Applaus
-
Herald: Thank you for the talk. This is
working? Yeah. So do we have any questions
-
from the internet? I don't see any coming
in so far, but we, I think, usually a bit
-
ahead so I'll ask one:
Q: What do you think? So who's in in the
-
responsibility mainly to fix these
insecurities? Do you think this can be
-
fixed by better default in these
infrastructures and configuration files?
-
Is this to be fixed for better tutorials
and better education for the devop
-
engineers? What was the main point of
responsibility?
-
A: I definitely would prefer to have
secure default installations. But then you
-
have this shared responsibility in the
contracts: From a certain point, you are
-
responsible for the security of the
account, and we have seen this complexity
-
because this might be 20 steps. Every step
is very simple and every step is looking
-
very harmless, but all the steps together
might create a full exploit of a cloud. So
-
this must be overseen, and it's very hard
for developers who are cloud native and
-
are focusing on the application to have an
overview of the security. Developers now
-
have 10 or 100 times more code on the hard
disk than ten years before. And this means
-
developers are not able really to have a
full judgment about what is going on in
-
terms of security. This is something
developers talk about security, either
-
they are specialized on it or they have
not seen things like this. What I normally
-
notice: The developers are not aware of
these problems.
-
Q: OK. And what do you think, what can we
do about the complexity? So do you think
-
we need better education for people to
actually understand the systems? Or is
-
there a way in cloud infrastructures to
reduce the complexity?
-
A: Better education? And do all the simple
fixes. These are five steps, and the fixes
-
are also very simple. And you have to
check them and then you need a tool
-
because you might have 20 clusters like
this. Every cluster has 20 applications,
-
so this might be quite complicated. So you
need tools for an overview and in the
-
trainings material, you see examples how
you can check your Kubernetes clusters for
-
exploits like this.
Herald: OK, thank you very much. Thanks
-
for being here. We will continue in about
half an hour with the next talk, then
-
again in German. Thanks.
Thomas: Thank you very much. Applaus
-
Outro: Everything is licensed under CC BY
4.0. And it is all for the community, to
-
the unknown and for everyone.
-
Subtitles created by c3subtitles.de
in the year 2022. Join, and help us!