Return to Video

Hacking Containers, Kubernetes and Clouds

  • 0:04 - 0:09
    Thomas Fricke: Thank you very much for the
    invitation. So second talk tomorrow –
  • 0:09 - 0:14
    Thank you – ähm today. So this is my
    background. More or less I do, Kubernetes
  • 0:14 - 0:19
    security and critical infrastructure,
    founded several companies and are now my
  • 0:19 - 0:25
    main focus is on Kubernetes security. This
    this rabbit hole of Kubernetes, if you
  • 0:25 - 0:31
    look deeper into it, then you should be a
    little bit scared, and I want to explain
  • 0:31 - 0:37
    why. The first approach is the
    application, and then the application
  • 0:37 - 0:43
    normally is run in containers. And the
    containers, what is not really well known,
  • 0:43 - 0:48
    have access to service accounts in
    Kubernetes, which is one of the major
  • 0:48 - 0:54
    flaws in Kubernetes at the moment. If you
    take over the service account, it might be
  • 0:54 - 1:01
    that you can take over a cluster. and if
    you can take over a cluster, you might
  • 1:01 - 1:08
    take over a node and then your entire
    cloud service account, Which is the work
  • 1:08 - 1:15
    of somebody else, I will mention later on
    these slides. So let's look what happens:
  • 1:15 - 1:22
    So, the target is I have an application
    exposed to the internet, and I want to own
  • 1:22 - 1:29
    the entire cluster from outside.
    Application might be vulnerable. Examples?
  • 1:29 - 1:37
    Yeah, lots of them. One example I want to
    present is imagetragick, who normally
  • 1:37 - 1:44
    should not do eval or exec statements in
    any framework – should be PHP, NodeJS, or
  • 1:44 - 1:51
    any other framework – and execute commands
    in the context of your application,
  • 1:51 - 1:56
    because something can go wrong and
    developers are responsible for this. Let's
  • 1:56 - 2:05
    see how it looks like: This is the attack
    model based on an attack. I thought it was
  • 2:05 - 2:13
    old and has been fixed in 2016, but now
    there was a new overview by Emil Lerner,
  • 2:13 - 2:22
    who again showed, yes, you can, in current
    versions of ImageMagick, exploit this
  • 2:22 - 2:33
    attack. So, it works. ImageMagick is for
    uploading images, so you convert the image
  • 2:33 - 2:41
    in a different format, scale the size, and
    then if you do something wrong in this
  • 2:41 - 2:47
    image, you can own the entire container.
    This also works for non containerized
  • 2:47 - 2:53
    applications if you have a server running
    something with ImageMagick on it. Please
  • 2:53 - 3:02
    be careful. OK. If we have mastered this
    step, the next step is, yes, we want
  • 3:02 - 3:08
    access to the service account. And this is
    by default enabled in Kubernetes: So, you
  • 3:08 - 3:14
    have a Kubernetes design flaw because your
    service account is exposed to the
  • 3:14 - 3:20
    container where the application runs it.
    The next step of an attacker is
  • 3:20 - 3:27
    installation of additional software. So,
    you want to take over. You need a curl or
  • 3:27 - 3:35
    kubectl or chmod, and then you are owner
    of the service account and can actually do
  • 3:35 - 3:43
    commands by uploading pictures in
    ImageTragick. So responsible for this flaw
  • 3:43 - 3:51
    is the image creator. Let's see what else
    can happen. To get total control, you also
  • 3:51 - 3:58
    need role-binding to a cluster-admin role.
    This is not enabled by default, but the
  • 3:58 - 4:04
    internet is always good for bad advice. So
    if you copy the installation requirements
  • 4:04 - 4:11
    or recommendations from the internet,
    somebody else might take over the entire
  • 4:11 - 4:23
    cluster. Let's look deeper into it: Worst
    practice here is what you can see in the
  • 4:23 - 4:34
    elastic installation recommendation: They
    just mentioned they have a newer version,
  • 4:34 - 4:43
    but they use the cluster admin permissions
    here to install ElasticSearch in your
  • 4:43 - 4:54
    Kubernetes cluster. So they recommend it
    and a lot of other applications also have
  • 4:54 - 5:00
    this – which is a little bit outdated, but
    it's quite common – in the installation
  • 5:00 - 5:08
    requirements. Never, ever do this, please.
    It also can come with Helm Charts, so you
  • 5:08 - 5:17
    have Helm Charts where the cluster-admin
    role is included. Here you see it, it was
  • 5:17 - 5:23
    in Apache Heron, which is an Apache
    project, and it uses the cluster-admin
  • 5:23 - 5:40
    role, so by a helm install you might be
    affected by this flaw to. So with these
  • 5:40 - 5:47
    four steps, which effectively are three
    steps, you have a cluster application
  • 5:47 - 5:54
    exposed, and through that path, you can
    take over the entire cluster from the
  • 5:54 - 6:02
    outside, and do anything what the cluster-
    admin world can do. Effectively, is this
  • 6:02 - 6:10
    cluster-admin role-binding is like a
    doormat attack, so you have the best
  • 6:10 - 6:18
    cryptography, the most expensive locks on
    one side and then you put the lock under
  • 6:18 - 6:24
    the doormat or under the flower at the
    door or something like that. This is
  • 6:24 - 6:34
    something which is, not really, what you
    want. I can do an example walkthrough
  • 6:34 - 6:40
    which shows how it goes. So, I've
    published all my trainings notebooks on
  • 6:40 - 6:50
    GitHub. Here's the way you can build this
    out-dated ImageTragick version in
  • 6:50 - 6:58
    OpenShift. So, I use CRC, which is the
    code-ready container version. It's based
  • 6:58 - 7:05
    on the ImageTragick proof of concept by
    Mike Williams. And here you run and create
  • 7:05 - 7:14
    a vulnerable image. A little bit lengthy.
    It's compiled inside and so on. So, don't
  • 7:14 - 7:20
    get a full Version. Which is the reason
    why I don't show it here, but effectively
  • 7:20 - 7:30
    at the end, you have a vulnerable
    application in a container internal and in
  • 7:30 - 7:38
    OpenShift. And that's exactly what we need
    to run the application. Here is the
  • 7:38 - 7:49
    exploit. And the exploit starts with the
    deployment of this container, which is
  • 7:49 - 7:54
    standard Kubernetes. Here "oc" is like
    kubesctl. So, you get an overview.
  • 7:55 - 8:00
    Additionally, in OpenShift, you have a
    very simple version of creating a root,
  • 8:00 - 8:07
    which is connected to a hostname, and then
    you can upload it by using that hostname.
  • 8:09 - 8:16
    You expose the deployment, you expose the
    service which is created, you expose the
  • 8:16 - 8:23
    route finally, and then you have access.
    The next step is you get this root and
  • 8:23 - 8:31
    then here you have a URL, which you can
    use. And in a full demo, I would just
  • 8:31 - 8:38
    simply call this URL and then I can upload
    images here. I've created these files,
  • 8:38 - 8:45
    which are valid postscript files, but you
    see at the end there is a full command.
  • 8:45 - 8:51
    And here, because there's a curl in the
    container, I can download a version of
  • 8:51 - 8:58
    kubectl. Effectively, the containers,
    specially the RedHat containers are not so
  • 8:58 - 9:08
    vulnerable as others, but you have always
    writable temp, which is enough to deploy
  • 9:08 - 9:17
    some software. So, we curl kubectl from
    the internet, put it into temp, and then
  • 9:17 - 9:26
    we use a simple chmod command to activate
    kubectl. So now we can call kubectl
  • 9:26 - 9:40
    commands from inside an image. It's a
    death bells, more or less so. Exactly at
  • 9:40 - 9:46
    the right place. We have a working exploit
    now and warning, it might also already
  • 9:46 - 9:53
    work in older versions of Kubernetes.
    Because in newer versions will need some.
  • 9:54 - 10:02
    Pill of poison, additionally, and this is
    exactly this cluster all binding to the
  • 10:02 - 10:07
    cluster admin, which needs to be done,
    that we have full access from the outside
  • 10:07 - 10:16
    and if we do this, and expose our cluster
    admin account to the same account, which
  • 10:16 - 10:22
    is already exposed inside the container,
    we can execute commands with this kubectl
  • 10:22 - 10:29
    so we can create deployments by uploading
    pictures. Which is exactly what you never
  • 10:29 - 10:35
    want, but an attacker now has full access
    to your cluster by simply uploading
  • 10:36 - 10:46
    prepared malicious pictures. Can do this.
    So this is an example here, just. Create
  • 10:46 - 10:51
    and delete. Containers and deployments
    this way, you can effectively do
  • 10:51 - 11:11
    everything. And again, this is the problem
    here from the application side. If you
  • 11:11 - 11:21
    have a vulnerable version of ImageMagick,
    you can include commands, and you can
  • 11:21 - 11:27
    definitely install software on the
    Kubernetes server side. There are several
  • 11:27 - 11:35
    trys to fix this. For example, you can use
    better images like Red Hat does, so this
  • 11:35 - 11:40
    is a Red Hat health index, which is quite
    good, but effectively these images have
  • 11:40 - 11:48
    the advantage only that you not run
    anything as root. But you run the same as
  • 11:49 - 11:55
    another user I.D. and it's the same user
    is allowed to write to the temp directory,
  • 11:55 - 12:04
    effectively, yeah, you don't need root for
    installing software. So, the container
  • 12:04 - 12:12
    also was good practice, no root inside, it
    has an immutable root file system, but the
  • 12:12 - 12:17
    curl which is completely unnecessary, was
    also deployed, we had write access to
  • 12:17 - 12:23
    temp. We had a chmod. And the first thing
    you would prevent. All the stuff I'm doing
  • 12:23 - 12:29
    here is and if you're going to and don't
    learn anything from this talk, please go.
  • 12:30 - 12:36
    Look into your service account and try to
    disable the automountServiceAccountToken
  • 12:37 - 12:42
    features, so all of the service accounts
    which are not running operators don't need
  • 12:42 - 12:49
    this service account open. If you have an
    operator, it might be broken now and it
  • 12:49 - 12:58
    can be, um, overwritten by the Pod
    definition, but effectively this. entire
  • 12:58 - 13:05
    example would not work without this
    service account token. So, we have fixed
  • 13:05 - 13:10
    that. We cannot fix the application
    because this is something, uh, somebody
  • 13:10 - 13:15
    else is creating for us, and we might even
    have a floor which is not affected, so
  • 13:15 - 13:20
    there might be a zero-day. The next thing
    we must prevent is the installation of
  • 13:20 - 13:30
    software. Fix the images, so use really
    immutable images. Temp only if you need
  • 13:30 - 13:40
    it. PID is 1, anyway. Uh, OK, you might
    have some variable data, but you should
  • 13:40 - 13:48
    use containers from scratch, no curl, no
    wget and this also affects Red Hat UBIs
  • 13:48 - 13:53
    And most of the standard images have this
    flaw, so you have a full operating system
  • 13:53 - 14:02
    inside with all the tools you like. But
    this is not your territory. It's just,
  • 14:02 - 14:08
    yeah, it's a tool for the attacker. So
    please run only trusted images, build your
  • 14:08 - 14:17
    own images and build them from scratch.
    This is my example I also have uploaded to
  • 14:17 - 14:23
    GitHub, how to harden the container, which
    is based on nginx alpine. nginx alpine
  • 14:23 - 14:28
    normally is a very small container, but
    you can do more. You can use the script,
  • 14:28 - 14:34
    which is in this repository, just to get
    only the tools you need. So this is not
  • 14:34 - 14:40
    statically linked because the original
    engine is not statically linked. But it's
  • 14:40 - 14:56
    very close. This means you only positively
    install the software you need. This is
  • 14:56 - 15:02
    dynamically linked, therefor the -d, so we
    use LVD. Extract all the dynamic link
  • 15:02 - 15:10
    libraries and then all the configuration
    files which are necessary. It is the
  • 15:10 - 15:18
    password registry group. OK. Some licenses
    and share. Need some directories for
  • 15:18 - 15:23
    logging and then you can install it from
    scratch because this script installs it in
  • 15:23 - 15:32
    a directory \temp\harden and you can with
    this. Multi-stage build you can install
  • 15:32 - 15:42
    all what you need from \temp\harden. And
    then the next container is based on
  • 15:42 - 15:49
    scratch and you can use nginx the same way
    you would use it more or less. An
  • 15:49 - 15:57
    application which is statically linked. So
    now we have created a hardened image
  • 15:57 - 16:04
    without kubectl, curl inside. So, we are
    much closer to a secure application. The
  • 16:04 - 16:11
    next thing is, yeah, role binding to
    cluster admin role. Don't do this. If
  • 16:11 - 16:18
    something in your application goes wrong,
    you have additional measures, which you
  • 16:18 - 16:25
    can take just to prevent the application
    from break-out of the container. So, you
  • 16:25 - 16:30
    can separate the internet exposure of
    services or ingresses in Kubernetes from
  • 16:30 - 16:37
    privilege operations. So you have node
    settings. ElasticSearch is doing a lot of
  • 16:37 - 16:44
    these things, so a lot is really not true
    so, doing a sysctl. Some applications have
  • 16:44 - 16:52
    hostPaths on or have connection to the
    host inter-process communication, which is
  • 16:52 - 16:58
    not necessary if you have exposed it and
    then separate the applications who need
  • 16:58 - 17:04
    this from the applications which don't
    need it. So, cluster admin should be more
  • 17:04 - 17:09
    or less restricted to very privileged
    operators. And by the way, Argo is also a
  • 17:09 - 17:15
    very privileged operator. Don't run an
    Argo on a Kubernetes cluster in a security
  • 17:15 - 17:21
    critical environment because I've seen
    Argo also is binding to cluster admin. It
  • 17:21 - 17:27
    doesn't mean that Argo by default is
    unsafe, but it's a very complex
  • 17:27 - 17:34
    application and I would definitely run it
    in a separate cluster, not in the critical
  • 17:34 - 17:41
    cluster. And what does an architecture fix
    look like, here you have the lifecycle of
  • 17:41 - 17:48
    a Pod, so the time is going to from left
    to right. Here you see if the container is
  • 17:48 - 17:56
    ready, it can be accessed from the
    internet. And if you do something from the
  • 17:56 - 18:04
    init system, like a sysctl, please do it
    inside a container which is not connected
  • 18:04 - 18:10
    to the internet, just to use the pause
    container, as a pause container to limit
  • 18:10 - 18:16
    it and restrict it and that is not really
    connected to the network. So, this is
  • 18:16 - 18:23
    something which covers the architecture.
    Additionally, I already mentioned here the
  • 18:23 - 18:28
    network policy which will come later, so
    this is our threat matrix. We have exposed
  • 18:28 - 18:33
    and not exposed services. You have
    unprivileged and privileged things. The
  • 18:33 - 18:39
    dangerous ones are the privileged ones
    which are exposed, but normally you only
  • 18:39 - 18:46
    have an exposed privileged application if
    you have an IDE running in Kubernetes,
  • 18:46 - 18:51
    which is not what I would like to see in
    critical infrastructure, something like
  • 18:51 - 18:58
    rstudio or have a web ui to a gitops
    framework. And normally you only have a
  • 18:58 - 19:04
    web application. And what should not be
    exposed under normal conditions is an
  • 19:04 - 19:12
    operator's sysctl, build systems, host
    operators and so on. If you do this, it's
  • 19:12 - 19:21
    virtually not possible to own the cluster,
    you should do all the three because if you
  • 19:21 - 19:26
    have security in depth, you can make a
    mistake on one of these levels and the
  • 19:26 - 19:32
    other means other levels keep you from
    being exploited. You can even do more
  • 19:32 - 19:39
    isolation on the network side, you have
    network policies for egress on the node
  • 19:39 - 19:44
    side, you can activate seccomp, gvisor,
    and the common Frameworks, SELinux,
  • 19:44 - 19:50
    Apparmor. You can use PodSecurity
    policies, or in the future, the open
  • 19:50 - 19:56
    policy agent to prevent the node from
    being hacked. For the identity and access
  • 19:56 - 20:03
    management, you should use individual
    service accounts for all your tasks. So
  • 20:03 - 20:09
    you have enough of a lot of roles. You
    should use role based access control to
  • 20:09 - 20:18
    check this. OK, but I promise, yes, we can
    go even deeper, and this needs a little
  • 20:18 - 20:28
    help from your cloud administrator and
    here, the example from Nico Meisenzahl,
  • 20:28 - 20:35
    who does a very similar example on hi-
    jacking Kubernetes, and he's doing it,
  • 20:35 - 20:43
    obviously in one of the clouds. And what
    he has found out is you can get access to
  • 20:43 - 20:49
    the azure.json file, which has user
    assigned identities. This is not the
  • 20:49 - 20:55
    Kubernetes identities. This is the Azure
    identity. You can get a token, you can get
  • 20:55 - 21:02
    a subscription, you can get a resource
    group and then you can use a curl command,
  • 21:02 - 21:07
    with this token, to change things on the
    API version of this resource group with
  • 21:07 - 21:12
    this subscription. So, you might be able
    to hack your node with the privilege
  • 21:12 - 21:17
    container and then take over your cloud
    account. And he told me that this is also
  • 21:17 - 21:25
    the truth for the other cloud, so it might
    even work something similar in AWS and
  • 21:25 - 21:33
    GCP. So please, also protect your cloud
    account. Understand your identity and
  • 21:33 - 21:38
    access management in the cloud. So, at
    least, someone in the team should
  • 21:38 - 21:44
    understand it. And limit also the
    underlying account to the bare minimum. It
  • 21:44 - 21:51
    might even be a good idea to block access
    addresses like 169.254.something. And the
  • 21:51 - 21:57
    other clouds, as I already mentioned, also
    might be affected. And my call to the
  • 21:57 - 22:03
    cloud providers, is don't deliver account
    data in containers or nodes. This is not
  • 22:03 - 22:08
    necessary. It's yes, it's very
    comfortable, as the service account
  • 22:08 - 22:13
    talking is very comfortable for running
    operators, but it's a major security flaw
  • 22:13 - 22:23
    and it might be that you lose all your
    accounts and data. Conclusion: We have a
  • 22:23 - 22:29
    full attach chain from the application to
    the cloud account. And it's your task to
  • 22:29 - 22:36
    prevent it and fix it. This is called
    shared responsibility, so the cloud
  • 22:36 - 22:41
    providers effectively only care for the
    infrastructure, but not really for the
  • 22:41 - 22:46
    security in your clou d. This is your
    task. OK. Thank you for your attention, I
  • 22:46 - 22:52
    hope it was interesting. Please ask your
    questions. And now I'm open for he Q&A.
  • 22:52 - 22:58
    Applaus
  • 22:58 - 23:05
    Herald: Thank you for the talk. This is
    working? Yeah. So do we have any questions
  • 23:05 - 23:12
    from the internet? I don't see any coming
    in so far, but we, I think, usually a bit
  • 23:12 - 23:17
    ahead so I'll ask one:
    Q: What do you think? So who's in in the
  • 23:17 - 23:21
    responsibility mainly to fix these
    insecurities? Do you think this can be
  • 23:21 - 23:27
    fixed by better default in these
    infrastructures and configuration files?
  • 23:27 - 23:32
    Is this to be fixed for better tutorials
    and better education for the devop
  • 23:32 - 23:35
    engineers? What was the main point of
    responsibility?
  • 23:35 - 23:45
    A: I definitely would prefer to have
    secure default installations. But then you
  • 23:45 - 23:51
    have this shared responsibility in the
    contracts: From a certain point, you are
  • 23:51 - 23:57
    responsible for the security of the
    account, and we have seen this complexity
  • 23:58 - 24:09
    because this might be 20 steps. Every step
    is very simple and every step is looking
  • 24:09 - 24:16
    very harmless, but all the steps together
    might create a full exploit of a cloud. So
  • 24:16 - 24:24
    this must be overseen, and it's very hard
    for developers who are cloud native and
  • 24:24 - 24:30
    are focusing on the application to have an
    overview of the security. Developers now
  • 24:30 - 24:37
    have 10 or 100 times more code on the hard
    disk than ten years before. And this means
  • 24:37 - 24:43
    developers are not able really to have a
    full judgment about what is going on in
  • 24:43 - 24:49
    terms of security. This is something
    developers talk about security, either
  • 24:49 - 24:56
    they are specialized on it or they have
    not seen things like this. What I normally
  • 24:56 - 25:00
    notice: The developers are not aware of
    these problems.
  • 25:00 - 25:07
    Q: OK. And what do you think, what can we
    do about the complexity? So do you think
  • 25:07 - 25:10
    we need better education for people to
    actually understand the systems? Or is
  • 25:10 - 25:14
    there a way in cloud infrastructures to
    reduce the complexity?
  • 25:14 - 25:24
    A: Better education? And do all the simple
    fixes. These are five steps, and the fixes
  • 25:24 - 25:30
    are also very simple. And you have to
    check them and then you need a tool
  • 25:30 - 25:36
    because you might have 20 clusters like
    this. Every cluster has 20 applications,
  • 25:36 - 25:41
    so this might be quite complicated. So you
    need tools for an overview and in the
  • 25:41 - 25:47
    trainings material, you see examples how
    you can check your Kubernetes clusters for
  • 25:47 - 25:52
    exploits like this.
    Herald: OK, thank you very much. Thanks
  • 25:52 - 25:58
    for being here. We will continue in about
    half an hour with the next talk, then
  • 25:58 - 26:04
    again in German. Thanks.
    Thomas: Thank you very much. Applaus
  • 26:04 - 26:13
    Outro: Everything is licensed under CC BY
    4.0. And it is all for the community, to
  • 26:13 - 26:14
    the unknown and for everyone.
  • 26:14 - 26:15
    Subtitles created by c3subtitles.de
    in the year 2022. Join, and help us!
Title:
Hacking Containers, Kubernetes and Clouds
Description:

more » « less
Video Language:
English
Duration:
26:15

English subtitles

Revisions