36c3 preroll Herald: So like many operators, in my group, we actually use a lot of ESXi servers. You would think that after using these things for 10 years, I would know how to speak, but I do not. We use these for virtualizing machines. Some of... some of these actually runs on sandboxes or, you know, run kind of dubious software on it. So we really do want to prevent these processes from jumping from the virtual environment to the hypervisor environment. We have today - we have - f1yyy, he wants to be known by f1yyy, so I'm respecting that; and he's from Triton Security Labs, and he's going to show us how the exploits that he discovered in the, I think it was the last Chinese GeekPwn capture the flag. He's gonna show us how these things work, and was that I would like to help. I would like to ask you, to help me, welcome f1yyy onto the stage. applause f1yyy: Hello. Thanks for the introduction. Good evening, everybody. I'm f1yyy a Senior Security Researcher at Chaitin Technology. I'm going to present The Great Escape of ESXi; Breaking Out of a Sandboxed Virtual Machine. We have demonstrated this exploit chain before at GeekPwn 2018. I will introduce our experience of escaping the sandbox on the ESXi. I will also introduce the work we have done about the sandbox on the ESXi. Now let's start it. We come from the Chaitin Security Research Lab. We have researched many practical targets in recent years, including PS4 jailbreak, Android rooting, IoT offensive research, and so on. Some of us also play CTF with Team b1o0p and Tea Deliverers. We recently own the championship at HITCON final. We are also the organizer of the Real World CTF. We've created some very hard challenges this year. So if you are interested in it, we welcome you to participate in our CTF game. Now, before we start our journey to escaping the virtual machine, we need to figure out what is virtual machine escape? I like to ask some of you that, did anyone use the virtualization software? If you have used the virtualization software, like VMware Workstation, Hyper-V, VirtualBox and so on, please raise your hand. Okay, okay, okay. Thanks, thanks, thanks. Many. So if you are a Software Engineer or a Security Researcher, you'll probably have used virtualisation software, but if anyone has heard of the word virtual machine escape, if you have heard of that, please raise your hand again. Oh, oh, surprised. Thanks, thanks, thanks. It surprises me that all you know about that, but I have to introduce that again. What's virtual machine escape? You know, in most circumstances the host OS runs on the hypervisor and the hypervisor will handle some sensitive instructions executed by the guest OS. Host OS emulates virtual hardware and handles RPC requests from the guest OS. That's the architecture of normal virtualization software. And the guest OS is isolated from each other and cannot affect the host OS. However, if there are some bugs, or if there are some vulnerabilities existing in the host OS, it's possible for the guest OS to escape from the virtualization environment. They can exploit these vulnerabilities. And finally, they can execute arbitrary code on the host. So this is the Virtual Machine Escape. Then why we chose ESXi as our target? The first reason is we know that more and more companies are using or plan to use private cloud to store its private data, including these companies and the vSphere is an enterprise solution offered by VMware. It's popular between companies. If you are a Net-Manager of a company, you may know about VMware vSphere. And the ESXi is the hypervisor for VMware vSphere, so it's widely used in private cloud. That's the first reason. The second one is that it's a challenging target for us. There are several exploitations of VMware Workstation in recent years. Hackers escape from the VMware Workstation by exploiting some vulnerabilities. These vulnerabilties exist in graphic cards, network cards and USB devices and so on. But, there has been no public escape of ESXi before, so it's a challenging target for us and we love challenge. Then why is the ESXi so challenging? The first reason I think is that there are little documents about its architecture. The only thing we have found is a white paper offered by VMware. The white paper only includes some definitions and pictures without details. So let's take a brief look at the architecture of ESXi first. ESXi is an Enterprise bare- metal hypervisor and it includes two parts. The kernel, it uses VMKernel developed by VMware and the User Worlds and the other part, the User Worlds. The VMKernel is a POSIX-like operating system. And it is uses an in-memory filesystem. It means that all files stored in this system are not persistent. And the VMKernel also manages hardware and schedules resource for ESXi. VMKernel also includes VMWare drivers, I/O Stacks and some User World APIs offered to the User Worlds. And the word "User World" is used by VMWare to refer the processes running in VMKernel operating system and the word "User World" means that a group of these processes. This process can only use a limited /proc directory and limited signals and it can just use some of the POSIX API. For example, there are some User Worlds processes like hosted, ssh, vmx and so on. Then this is the architecture of ESXi. I would like to give you an example to show how a virtual machine works on ESXi. The VMX process in the User World can communicate with the VMM by using some undocumented customized system call. And the VMM will initialize the environment for the guest OS. When guest OS executes some sensitive instructions, it will cause a VMExit and return to VMM. The VMX process also emulates virtual hardware and handles RPC requests from the guest. That's how a virtual machine works on ESXi. Then, how can we escape from the virtual machine on ESXi? If there is a vulnerability in the virtual hardware of the VMX, we can write a driver, or write an exploit, to escape from it. The driver will communicate with the virtual hardware and it can exploit the vulnerability. And finally we can execute shellcode in the VMX process. So it means that we have successfully escaped from the virtual machine on the ESXi. So the second reason about why ESXi is so challenging, is that User World API. The VMX uses many undocumented and customized system calls and if you want to reverse some code of VMX it is hard for you to understand which API the VMX is using. But luckily we found two system call tables after compromising the k.b00 field. There are 2 system call tables we found with symbols so this field will be useful if we want to reverse some code of the VMX. This is the second reason. Thirdly, there are some security mitigations here, including ASLR and NX. It means that we need to link some address information before we start our exploit to break the randomness of the address space. Furthermore after testing we found that there is another mitigation on the ESXi. There is a sandbox that isolates the VMX process. So even if you can execute some shellcode in the VMX process you can not execute any commands, you can not read any sensitive fields, unless you escape from the sandbox either. And finally, we think that the VMX of ESXi has a smaller attack surface. After comparison of the VMX binary between the Workstation and the ESXi we found that there are some function that have been moved from the VMX in User World to the VMKernel. For example the packet transmission function in the e1000 net card has been moved from the VMX to the VMKernel. And if you have read some security advisories published by VMware recently, you can notice that there are many vulnerabilites existing in the packet transmission part of the e1000 net card. And all these vulnerabilites only affect Workstation. So we think that the VMX of ESXi has a smaller attack surface. Now let's start the journey of escaping from the ESXi. Let's overview the entire exploit chain first. We use 2 memory corruption vulnerabilites in our exploit. The first one is an uninitialized stack usage vulnerability which CVE Number is CVE-2018-6981. And the second is an unitialized stack read vulnerability and the CVE number is CVE-2018-6982. And we can do arbitrary address free by using the first vulnerability, and we can get information leakage from the second one. After combining of these two vulnerabilites we can do arbitrary shellcode execution in VMX process. And finally we use a logic vulnerability to escape the sandbox of VMX and reverse a root shell from the ESXi. So that's the entire exploit chain we use. Now let's start the first one. The first vulnerability is an uninitialized stack usage vulnerability. It exists in VMXNET3 netcard. When the VMX VMXNET3 netcard tries to execute command UPDATE_MAC_FILTERS it will us a structure on the stack, the PhysMemPage structure. This structure is used to represent the memory mapping between the guest and the host. And it's also been used to transport data between the guest and the host. Then the VMXNET will call function DMA_MEM_CREATE to initialize the structure on the stack first, then it will use this structure to execute this command. And finally it uses PhysMemRelease to destroy the structure, the physical memory page structure. So it seems that there are no problems here. But if we look at the function DMA memory create, we can notice that there is a check before the initialization of the PhysMemoryPage structure. It will check the argument address and the argument size and if the check passes then it will initialize the structure. But if the check fails, it will never initialize the structure on the stack. And finally we found that we can control the address argument by writing a value to one of the registers of VMXNET3. What is worse is that in function PhysMemoryRelease there are no checks about if the PhysMemoryPage structure had been initialized and it just frees a pointer of this structure. So that's it about it. If we can pad the data on the stack it's possible for us to do arbitrary address free. We can pad a fake PhysMemoryPage structure on the stack and then make the check fail in the function DMA memory create and finally when it comes to the PhysMemoryRelease it will free a pointer of our PhysMemoryPage structure. So we just try to find a function to pad the data on the stack. There is a design pattern in software development, where we store the data into the stack, if the size is small, when we allocate some memory. And otherwise we will output it to the heap. And we found a function that fits this pattern. This function will be used when our guest OS executes the instruction outsb. It will check the size, if the size is smaller than 0x8000 it will use the stack to store the data. And finally it will copy the data we send from the guest into the stack. So we can use this function to pad the data on the stack. Then how do we combine this to do arbitrary address free? We can use outsb instruction in guest OS first to pad the data on the stack. This data should contain fake PhysMemoryPage structure and the page count of this fake structure should be zero. The page array of this fake PhysMemoryPage structure should be the address we want to free. Then we set some registers of the vmxnet3 to make the check fail in the function DMA memory create. And finally, we order the vmxnet3 netcard, to execute the command to update MAC filters and then in the VMX it will use the PhysMemRelease to destroy the structure we pad before. This structure is a fixed structure with pad in the first step and it will check the page count if it's 0. If it's 0, it will free the page array of this fake PhysMemPage structure. So we can do arbitrary address free now by using the first uninitialized stack usage vulnerability. Here come the next one, the second vulnerability also exists in the vmxnet3 net card. The vmxnet3 net card tries to execute command get_coalesce. It will first get a length from the guest, and the length must be 16. Then it initializes the first eight byte of a structure on the stack. But it's just for guest to initialize the next 8 byte of this structure and just write this structure back to our guest OS. So we can link 8 byte uninitialized data on the stack from the host to our guest. And after debugging the guest VMX process, we realized that there are fixed offsets between the images, so it's possible for us to get all the information about the address space by using this vulnerability. Now, what do we have now? We can do arbitrary address free by using the first one. And we can get all information about the address space by using the second one. What do we want to do? We want to do arbitrary shell code execution in the VMX. So how do we combine these two vulnerabilities to achieve our target? It's hard for us to do arbitrary shell code execution by using arbitrary address free. But it's easy for us to do arbitrary shell code execution by using an arbitrary address write. So our target changes into how to do arbitrary address write by using arbitrary address free. Then we realized that we need a structure and this structure should include pointers we can write and the size. So last we can overwrite this structure. We can do arbitrary address writes usually. When we first tried to exploit this vulnerability, we used some structures in the heap, but we've found that we can not manipulate the heap's layout stably because VMX frequently allocates and releases memory. So we cannot use the structures in the heap. And after reversing some code of VMX, we have found a structure. The structure's name is channel and it's used in VMWare RPCI. What's VMWare RPCI? VMWare has a series of RPC mechanisms to support communication between the guest and the host. And here it has an interesting name: backdoor. RPCI is one of them. And the other one we may be familiar with is VMWare tools. I'd like to ask again if anyone has installed VMWare tools in your guest OS, please raise your hands again. Oh, not as much as before. So if you use VMWare workstation, you'll probably have installed VMWare tools in your guest because once you installed it, you can use some convenient functions such as copy and the paste text fields between the guest and the host, drag and drop files, create shared folder and so on. VMWare tools are implemented by using some RPCI commands. And here are some examples about about some RPCI commands. For example, we can use info-set guestinfo to set some information about our guest and we can use info-get to retrieve this information back. Then what happens when we execute this RPCI command in our guest? For example, if we execute this RPCI command 'info-set guestinfo.a' 123 in our guest OS. What happens in VMX? It will call VM Exit first and finally it will return to the RPCI handler of VMX. Then the RPCI handler will choose a subcommand to use by checking the value of the registers of our guest OS. The RPC tool in our guest OS will use the subcommand, 'Open' first to open a channel and initialize it. Then it will use a subcommand, 'SendLen' to set the size of our channel and allocate heap memory to install the data of our RPC command and suddenly it will use the 'SendData' subcommand to pad the data of the memory we allocated before. And once the length of the data we sent from the guest re-calls to the sizeof from the 'SendLen' subcommand the VMX will use a corresponding RPCI command handler function after a string combination. And finally, it will use a 'Close' subcommand to destroy the channel structure including setting the size to zero and freeing the data pointer. That's what happens when we execute this RPCI commend in our guest. Furthermore, there is a channel structure area in the data segment we can use. So this is a perfect structure for our exploit. Now, you got all the things we want. We've got two vulnerabilities and we've got the structure we want. How do we combine this? We notice that the VMX uses ptmalloc of Glibc to manage its heap. So we just choose to use a fast-bin attack. What's the fast-bin attack? Fast-bin attack is a method to exploit heap vulnerabilities of ptmalloc by using the singly-linked list. And it's the easiest exploit method to exploit ptmalloc, I think. It's also the first method to exploit ptmalloc that I learned when I just started to learn how to how to exploit. Then after considering the check existing in the Glibc, we decided to free the address at the Reply Index of channel. Because by doing that, the Glibc will treat this address as a fake chunk and the Glibc will check the current chunk's size. And after doing that, the size of the fake chunk is also the size of the 'channel[N]', so we can set a valid value to the size of the 'channel[N]' to bypass the check. So we can bypass the check. Once we've freed this address this fake chunk will be put into the fast-bin linked list first. Then we can reallocate this fake chunk by using another channel, N+2. Now we have a data pointer pointed at the reply index of channel[N] and we can easily overwrite the channel[N+1] by using channel[N+2]. We can send a data to channel[N+2] and finally it will overwrite some parts of the channel[N+1]. So it's easy now for us to do arbitrary address write by faking some parts of the channel structure. Do remember our target? Our target is to do arbitrary shell code execution in VMX and we can do arbitrary address write now. There are many ways to do arbitrary shell code execution by using arbitrary address write. We choose to use a ROP. We can override the '.got.plt' segment. We can fake the channel[N+1], structure first, overwrite the data pointer at channel[N+1] to the address of .got.plt segment. Then we can overwrite the function pointer on the .got.plt segment. So once the VMX uses this function we overwrite, it will jump to our ROP gadget. So it's also easy for us to do arbitrary shell code execution by using ROP. So now we can do arbitrary shell code execution in the VMX process. We're seeing that we have escaped from the virtual machine of the ESXi fully, we tried to execute some command by using a system call execve, but it fails. We tried to open and read some sensitive files just like password, it fails again. Then we realize that there is a sandbox. We cannot execute any commands unless we escape the sandbox either. The next part come to comes to the how we analyze and the escape the sandbox. After realizing that there is a sandbox in the ESXi, we reverse some code of the VMkernel and we find the kernel module named as VM Kernel SAS control system. And this system, this module, implements the fine grained checks for the system call. And it seems that this sandbox is a rule-based sandbox. So we just tried to find the configuration file of this sandbox. We finally found it at this directory, /etc/vmware/secpolicy/domains, and it seems that there are many different sandboxes offered by VMWare to the different processes in the userworld. Like app, plugin and the globalVMDom is a file for our VMX process and for our VM. After reading that, it's obvious for us that the /var/run directory is the only directory where we have read and write permissions. Then we look at the files existing in this directory. We got a lot of pid filess just like crond, dcui, inetd and so on. And it's also obvious that inetd.conf configure file is only configure file we can write. What's inetd? inetd is open source software and it's a super-server domain that provides internet services. Then we just analyze the contents of the inetd.conf. The content of the inetd.conf is here on the ESXi. We can find that it a defines two services, ssh and the authd. And some of it defines which binary will be used by different services. For example, the authd will be used by the authd services. Also after some testing, we realize that the authd service is always enabled, while the sshd service is not. So this is the only configure file we can write. So we got an idea. How about overwriting this configure file? Or we can overwrite the binary part for authd like that, we can override the /sbin/authd to /bin/sh. So once can restart the inetd process we can bind the shell to the port authd is using. Then we just find a way to restart the inetd process. We analyzed the configure file of the sandbox again, and we found out the queue system call we can use in the VMX process. Then we just use the queue HUP to restart the inetd process. Once the inetd process restarts, we can execute any commands by sending them to the port the authd is using. So that's the method we use to escape from the sandbox. And here's a demo. Oh, sorry. Oh, it seems not, I cannot play this video, but it's OK. You can find it on YouTube and we created this demo after the GeekPwn 2018, we get a reverse shell after excuting the exploit in our guest OS. That's all. And if you want to get more details about our exploit chain, please check our paper here and that's all. Thanks. applause Herald: So I don't think I'm actually worthy to share the stage with f1yyy, that was awesome. If you have questions, we have microphones, you need to come up to the microphone, line up behind them and we'll take your question. Meanwhile, does the signal angel have anything? No questions yet. Do we not have questions from the audience? There is one. Can I have number six, please? Mic 6: Do you talk to VMWare for this little hack? f1yyy: We have reported all these vulnerabilities to VMWare after the GeekPwn 2018, and it has been one year since after they repair it. Mic 6: OK, Thanks. Herald: That's definitely a relief. Number one, please. Mic 1: First of all, thanks for the great talk. I just want to know if there is any meaningful thing a system administrator can do to lock down the sandbox further so that we can have some preventative, basically tasks, for our ESXi setups. Or if there is nothing we can do except patching, of course. f1yyy: Could you repeat your question? It's so fast for me. Sorry about that. Mic 1: Basically, is there anything you can do as an administrator to lock down the sandbox even more so that this is impossible or that it is harder than what you showed? f1yyy: OK. This is the first question. Your can set the sandbox down by executing a command on the ESXi shell. I didn't put the command here. I found the command to set the sandbox down. You can find it by searching the documents about the ESXi. Wait, wait, wait, wait. I found it, just by myself by using the command offered on the ESXi shell. It's not documented by the VMWare. OK, I will share this command on my Twitter later. Sorry about that. I didn't put this command into my slides. Mic 1: But would this have prevented the attack? f1yyy: Prevented it? Herald: By doing that change, by doing that command, would be possible to prevent the attack that you just showed? f1yyy: The sandbox is used to protect the VMX process. So if you update your ESXi, I think that it will be safe. Herald: Okay, great. We have a we have a question from the Internet. Signal Angel: Yes. Does this exploit also work on non-AMD VTx enabled VMs using binary translation? Herald: Is it is it more universal than just the AMD-VX? f1yyy: Yeah, can you repeat that again? I just hear the, okay. Signal Angel: Does it also work on non-AMD V or VTX-enabled VMs using binary translation? f1yyyy: Yes, because all these vulnerabilities exist in the virtual hardware. You will need to use virtual hardware in your virtual machine. Herald: So any further questions? I'm not seeing anybody on the microphones. Any further questions from the internet? That's it then. Good. Please, everybody help me in thanking f1yyyy for this fantastic talk. applause 36c3 postroll music Subtitles created by c3subtitles.de in the year 2020. Join, and help us!