Thomas Fricke: Thank you very much for the invitation. So second talk tomorrow – Thank you – ähm today. So this is my background. More or less I do, Kubernetes security and critical infrastructure, founded several companies and are now my main focus is on Kubernetes security. This this rabbit hole of Kubernetes, if you look deeper into it, then you should be a little bit scared, and I want to explain why. The first approach is the application, and then the application normally is run in containers. And the containers, what is not really well known, have access to service accounts in Kubernetes, which is one of the major flaws in Kubernetes at the moment. If you take over the service account, it might be that you can take over a cluster. and if you can take over a cluster, you might take over a node and then your entire cloud service account, Which is the work of somebody else, I will mention later on these slides. So let's look what happens: So, the target is I have an application exposed to the internet, and I want to own the entire cluster from outside. Application might be vulnerable. Examples? Yeah, lots of them. One example I want to present is imagetragick, who normally should not do eval or exec statements in any framework – should be PHP, NodeJS, or any other framework – and execute commands in the context of your application, because something can go wrong and developers are responsible for this. Let's see how it looks like: This is the attack model based on an attack. I thought it was old and has been fixed in 2016, but now there was a new overview by Emil Lerner, who again showed, yes, you can, in current versions of ImageMagick, exploit this attack. So, it works. ImageMagick is for uploading images, so you convert the image in a different format, scale the size, and then if you do something wrong in this image, you can own the entire container. This also works for non containerized applications if you have a server running something with ImageMagick on it. Please be careful. OK. If we have mastered this step, the next step is, yes, we want access to the service account. And this is by default enabled in Kubernetes: So, you have a Kubernetes design flaw because your service account is exposed to the container where the application runs it. The next step of an attacker is installation of additional software. So, you want to take over. You need a curl or kubectl or chmod, and then you are owner of the service account and can actually do commands by uploading pictures in ImageTragick. So responsible for this flaw is the image creator. Let's see what else can happen. To get total control, you also need role-binding to a cluster-admin role. This is not enabled by default, but the internet is always good for bad advice. So if you copy the installation requirements or recommendations from the internet, somebody else might take over the entire cluster. Let's look deeper into it: Worst practice here is what you can see in the elastic installation recommendation: They just mentioned they have a newer version, but they use the cluster admin permissions here to install ElasticSearch in your Kubernetes cluster. So they recommend it and a lot of other applications also have this – which is a little bit outdated, but it's quite common – in the installation requirements. Never, ever do this, please. It also can come with Helm Charts, so you have Helm Charts where the cluster-admin role is included. Here you see it, it was in Apache Heron, which is an Apache project, and it uses the cluster-admin role, so by a helm install you might be affected by this flaw to. So with these four steps, which effectively are three steps, you have a cluster application exposed, and through that path, you can take over the entire cluster from the outside, and do anything what the cluster- admin world can do. Effectively, is this cluster-admin role-binding is like a doormat attack, so you have the best cryptography, the most expensive locks on one side and then you put the lock under the doormat or under the flower at the door or something like that. This is something which is, not really, what you want. I can do an example walkthrough which shows how it goes. So, I've published all my trainings notebooks on GitHub. Here's the way you can build this out-dated ImageTragick version in OpenShift. So, I use CRC, which is the code-ready container version. It's based on the ImageTragick proof of concept by Mike Williams. And here you run and create a vulnerable image. A little bit lengthy. It's compiled inside and so on. So, don't get a full Version. Which is the reason why I don't show it here, but effectively at the end, you have a vulnerable application in a container internal and in OpenShift. And that's exactly what we need to run the application. Here is the exploit. And the exploit starts with the deployment of this container, which is standard Kubernetes. Here "oc" is like kubesctl. So, you get an overview. Additionally, in OpenShift, you have a very simple version of creating a root, which is connected to a hostname, and then you can upload it by using that hostname. You expose the deployment, you expose the service which is created, you expose the route finally, and then you have access. The next step is you get this root and then here you have a URL, which you can use. And in a full demo, I would just simply call this URL and then I can upload images here. I've created these files, which are valid postscript files, but you see at the end there is a full command. And here, because there's a curl in the container, I can download a version of kubectl. Effectively, the containers, specially the RedHat containers are not so vulnerable as others, but you have always writable temp, which is enough to deploy some software. So, we curl kubectl from the internet, put it into temp, and then we use a simple chmod command to activate kubectl. So now we can call kubectl commands from inside an image. It's a death bells, more or less so. Exactly at the right place. We have a working exploit now and warning, it might also already work in older versions of Kubernetes. Because in newer versions will need some. Pill of poison, additionally, and this is exactly this cluster all binding to the cluster admin, which needs to be done, that we have full access from the outside and if we do this, and expose our cluster admin account to the same account, which is already exposed inside the container, we can execute commands with this kubectl so we can create deployments by uploading pictures. Which is exactly what you never want, but an attacker now has full access to your cluster by simply uploading prepared malicious pictures. Can do this. So this is an example here, just. Create and delete. Containers and deployments this way, you can effectively do everything. And again, this is the problem here from the application side. If you have a vulnerable version of ImageMagick, you can include commands, and you can definitely install software on the Kubernetes server side. There are several trys to fix this. For example, you can use better images like Red Hat does, so this is a Red Hat health index, which is quite good, but effectively these images have the advantage only that you not run anything as root. But you run the same as another user I.D. and it's the same user is allowed to write to the temp directory, effectively, yeah, you don't need root for installing software. So, the container also was good practice, no root inside, it has an immutable root file system, but the curl which is completely unnecessary, was also deployed, we had write access to temp. We had a chmod. And the first thing you would prevent. All the stuff I'm doing here is and if you're going to and don't learn anything from this talk, please go. Look into your service account and try to disable the automountServiceAccountToken features, so all of the service accounts which are not running operators don't need this service account open. If you have an operator, it might be broken now and it can be, um, overwritten by the Pod definition, but effectively this. entire example would not work without this service account token. So, we have fixed that. We cannot fix the application because this is something, uh, somebody else is creating for us, and we might even have a floor which is not affected, so there might be a zero-day. The next thing we must prevent is the installation of software. Fix the images, so use really immutable images. Temp only if you need it. PID is 1, anyway. Uh, OK, you might have some variable data, but you should use containers from scratch, no curl, no wget and this also affects Red Hat UBIs And most of the standard images have this flaw, so you have a full operating system inside with all the tools you like. But this is not your territory. It's just, yeah, it's a tool for the attacker. So please run only trusted images, build your own images and build them from scratch. This is my example I also have uploaded to GitHub, how to harden the container, which is based on nginx alpine. nginx alpine normally is a very small container, but you can do more. You can use the script, which is in this repository, just to get only the tools you need. So this is not statically linked because the original engine is not statically linked. But it's very close. This means you only positively install the software you need. This is dynamically linked, therefor the -d, so we use LVD. Extract all the dynamic link libraries and then all the configuration files which are necessary. It is the password registry group. OK. Some licenses and share. Need some directories for logging and then you can install it from scratch because this script installs it in a directory \temp\harden and you can with this. Multi-stage build you can install all what you need from \temp\harden. And then the next container is based on scratch and you can use nginx the same way you would use it more or less. An application which is statically linked. So now we have created a hardened image without kubectl, curl inside. So, we are much closer to a secure application. The next thing is, yeah, role binding to cluster admin role. Don't do this. If something in your application goes wrong, you have additional measures, which you can take just to prevent the application from break-out of the container. So, you can separate the internet exposure of services or ingresses in Kubernetes from privilege operations. So you have node settings. ElasticSearch is doing a lot of these things, so a lot is really not true so, doing a sysctl. Some applications have hostPaths on or have connection to the host inter-process communication, which is not necessary if you have exposed it and then separate the applications who need this from the applications which don't need it. So, cluster admin should be more or less restricted to very privileged operators. And by the way, Argo is also a very privileged operator. Don't run an Argo on a Kubernetes cluster in a security critical environment because I've seen Argo also is binding to cluster admin. It doesn't mean that Argo by default is unsafe, but it's a very complex application and I would definitely run it in a separate cluster, not in the critical cluster. And what does an architecture fix look like, here you have the lifecycle of a Pod, so the time is going to from left to right. Here you see if the container is ready, it can be accessed from the internet. And if you do something from the init system, like a sysctl, please do it inside a container which is not connected to the internet, just to use the pause container, as a pause container to limit it and restrict it and that is not really connected to the network. So, this is something which covers the architecture. Additionally, I already mentioned here the network policy which will come later, so this is our threat matrix. We have exposed and not exposed services. You have unprivileged and privileged things. The dangerous ones are the privileged ones which are exposed, but normally you only have an exposed privileged application if you have an IDE running in Kubernetes, which is not what I would like to see in critical infrastructure, something like rstudio or have a web ui to a gitops framework. And normally you only have a web application. And what should not be exposed under normal conditions is an operator's sysctl, build systems, host operators and so on. If you do this, it's virtually not possible to own the cluster, you should do all the three because if you have security in depth, you can make a mistake on one of these levels and the other means other levels keep you from being exploited. You can even do more isolation on the network side, you have network policies for egress on the node side, you can activate seccomp, gvisor, and the common Frameworks, SELinux, Apparmor. You can use PodSecurity policies, or in the future, the open policy agent to prevent the node from being hacked. For the identity and access management, you should use individual service accounts for all your tasks. So you have enough of a lot of roles. You should use role based access control to check this. OK, but I promise, yes, we can go even deeper, and this needs a little help from your cloud administrator and here, the example from Nico Meisenzahl, who does a very similar example on hi- jacking Kubernetes, and he's doing it, obviously in one of the clouds. And what he has found out is you can get access to the azure.json file, which has user assigned identities. This is not the Kubernetes identities. This is the Azure identity. You can get a token, you can get a subscription, you can get a resource group and then you can use a curl command, with this token, to change things on the API version of this resource group with this subscription. So, you might be able to hack your node with the privilege container and then take over your cloud account. And he told me that this is also the truth for the other cloud, so it might even work something similar in AWS and GCP. So please, also protect your cloud account. Understand your identity and access management in the cloud. So, at least, someone in the team should understand it. And limit also the underlying account to the bare minimum. It might even be a good idea to block access addresses like 169.254.something. And the other clouds, as I already mentioned, also might be affected. And my call to the cloud providers, is don't deliver account data in containers or nodes. This is not necessary. It's yes, it's very comfortable, as the service account talking is very comfortable for running operators, but it's a major security flaw and it might be that you lose all your accounts and data. Conclusion: We have a full attach chain from the application to the cloud account. And it's your task to prevent it and fix it. This is called shared responsibility, so the cloud providers effectively only care for the infrastructure, but not really for the security in your clou d. This is your task. OK. Thank you for your attention, I hope it was interesting. Please ask your questions. And now I'm open for he Q&A. Applaus Herald: Thank you for the talk. This is working? Yeah. So do we have any questions from the internet? I don't see any coming in so far, but we, I think, usually a bit ahead so I'll ask one: Q: What do you think? So who's in in the responsibility mainly to fix these insecurities? Do you think this can be fixed by better default in these infrastructures and configuration files? Is this to be fixed for better tutorials and better education for the devop engineers? What was the main point of responsibility? A: I definitely would prefer to have secure default installations. But then you have this shared responsibility in the contracts: From a certain point, you are responsible for the security of the account, and we have seen this complexity because this might be 20 steps. Every step is very simple and every step is looking very harmless, but all the steps together might create a full exploit of a cloud. So this must be overseen, and it's very hard for developers who are cloud native and are focusing on the application to have an overview of the security. Developers now have 10 or 100 times more code on the hard disk than ten years before. And this means developers are not able really to have a full judgment about what is going on in terms of security. This is something developers talk about security, either they are specialized on it or they have not seen things like this. What I normally notice: The developers are not aware of these problems. Q: OK. And what do you think, what can we do about the complexity? So do you think we need better education for people to actually understand the systems? Or is there a way in cloud infrastructures to reduce the complexity? A: Better education? And do all the simple fixes. These are five steps, and the fixes are also very simple. And you have to check them and then you need a tool because you might have 20 clusters like this. Every cluster has 20 applications, so this might be quite complicated. So you need tools for an overview and in the trainings material, you see examples how you can check your Kubernetes clusters for exploits like this. Herald: OK, thank you very much. Thanks for being here. We will continue in about half an hour with the next talk, then again in German. Thanks. Thomas: Thank you very much. Applaus Outro: Everything is licensed under CC BY 4.0. And it is all for the community, to the unknown and for everyone. Subtitles created by c3subtitles.de in the year 2022. Join, and help us!