36c3 preroll
Herald: So like many operators, in my
group, we actually use a lot of ESXi
servers. You would think that after using
these things for 10 years, I would know
how to speak, but I do not. We use these
for virtualizing machines. Some of... some of
these actually runs on sandboxes or, you
know, run kind of dubious software on it.
So we really do want to prevent these
processes from jumping from the virtual
environment to the hypervisor environment.
We have today - we have - f1yyy, he wants
to be known by f1yyy, so I'm respecting
that; and he's from Triton Security Labs,
and he's going to show us how the exploits
that he discovered in the, I think it was
the last Chinese GeekPwn capture the flag.
He's gonna show us how these things work,
and was that I would like to help. I would
like to ask you, to help me, welcome f1yyy
onto the stage.
applause
f1yyy: Hello. Thanks for the introduction.
Good evening, everybody. I'm f1yyy a
Senior Security Researcher at Chaitin
Technology. I'm going to present The Great
Escape of ESXi; Breaking Out of a
Sandboxed Virtual Machine. We have
demonstrated this exploit chain before at
GeekPwn 2018. I will introduce our
experience of escaping the sandbox on the
ESXi. I will also introduce the work we
have done about the sandbox on the ESXi.
Now let's start it. We come from the
Chaitin Security Research Lab. We have
researched many practical targets in
recent years, including PS4 jailbreak,
Android rooting, IoT offensive research,
and so on. Some of us also play CTF with
Team b1o0p and Tea Deliverers. We recently
own the championship at HITCON final. We
are also the organizer of the Real World
CTF. We've created some very hard
challenges this year. So if you are
interested in it, we welcome you to
participate in our CTF game. Now, before
we start our journey to escaping the
virtual machine, we need to figure out
what is virtual machine escape? I like to
ask some of you that, did anyone use the
virtualization software? If you have used
the virtualization software, like VMware
Workstation, Hyper-V, VirtualBox and so
on, please raise your hand. Okay, okay,
okay. Thanks, thanks, thanks. Many. So if
you are a Software Engineer or a Security
Researcher, you'll probably have used
virtualisation software, but if anyone has
heard of the word virtual machine escape,
if you have heard of that, please raise
your hand again. Oh, oh, surprised.
Thanks, thanks, thanks. It surprises me
that all you know about that, but I have
to introduce that again. What's virtual
machine escape? You know, in most
circumstances the host OS runs on the
hypervisor and the hypervisor will handle
some sensitive instructions executed by
the guest OS. Host OS emulates virtual
hardware and handles RPC requests from the
guest OS. That's the architecture of
normal virtualization software. And the
guest OS is isolated from each other and
cannot affect the host OS. However, if
there are some bugs, or if there are some
vulnerabilities existing in the host OS,
it's possible for the guest OS to escape
from the virtualization environment. They
can exploit these vulnerabilities. And
finally, they can execute arbitrary code
on the host. So this is the Virtual
Machine Escape. Then why we chose ESXi as
our target? The first reason is we know
that more and more companies are using or
plan to use private cloud to store its
private data, including these companies
and the vSphere is an enterprise solution
offered by VMware. It's popular between
companies. If you are a Net-Manager of a
company, you may know about VMware
vSphere. And the ESXi is the hypervisor
for VMware vSphere, so it's widely used in
private cloud. That's the first reason.
The second one is that it's a challenging
target for us. There are several
exploitations of VMware Workstation in
recent years. Hackers escape from the
VMware Workstation by exploiting some
vulnerabilities. These vulnerabilties
exist in graphic cards, network cards and
USB devices and so on. But, there has been
no public escape of ESXi before, so it's a
challenging target for us and we love
challenge. Then why is the ESXi so
challenging? The first reason I think is
that there are little documents about its
architecture. The only thing we have found
is a white paper offered by VMware. The
white paper only includes some definitions
and pictures without details. So let's
take a brief look at the architecture of
ESXi first. ESXi is an Enterprise bare-
metal hypervisor and it includes two
parts. The kernel, it uses VMKernel
developed by VMware and the User Worlds
and the other part, the User Worlds. The
VMKernel is a POSIX-like operating system.
And it is uses an in-memory filesystem. It
means that all files stored in this system
are not persistent. And the VMKernel also
manages hardware and schedules resource
for ESXi. VMKernel also includes VMWare
drivers, I/O Stacks and some User World
APIs offered to the User Worlds. And the
word "User World" is used by VMWare to
refer the processes running in VMKernel
operating system and the word "User World"
means that a group of these processes.
This process can only use a limited /proc
directory and limited signals and it can
just use some of the POSIX API. For
example, there are some User Worlds
processes like hosted, ssh, vmx and so on.
Then this is the architecture of ESXi. I
would like to give you an example to show
how a virtual machine works on ESXi. The
VMX process in the User World can
communicate with the VMM by using some
undocumented customized system call. And
the VMM will initialize the environment
for the guest OS. When guest OS executes
some sensitive instructions, it will cause
a VMExit and return to VMM. The VMX
process also emulates virtual hardware and
handles RPC requests from the guest.
That's how a virtual machine works on
ESXi. Then, how can we escape from the
virtual machine on ESXi? If there is a
vulnerability in the virtual hardware of
the VMX, we can write a driver, or write
an exploit, to escape from it. The driver
will communicate with the virtual hardware
and it can exploit the vulnerability. And
finally we can execute shellcode in the
VMX process. So it means that we have
successfully escaped from the virtual
machine on the ESXi. So the second reason
about why ESXi is so challenging, is that
User World API. The VMX uses many
undocumented and customized system calls
and if you want to reverse some code of
VMX it is hard for you to understand which
API the VMX is using. But luckily we found
two system call tables after compromising
the k.b00 field. There are 2 system call
tables we found with symbols so this field
will be useful if we want to reverse some
code of the VMX. This is the second
reason. Thirdly, there are some security
mitigations here, including ASLR and NX.
It means that we need to link some address
information before we start our exploit
to break the randomness of the address
space. Furthermore after testing we found
that there is another mitigation on the
ESXi. There is a sandbox that isolates the
VMX process. So even if you can execute
some shellcode in the VMX process you can
not execute any commands, you can not read
any sensitive fields, unless you escape
from the sandbox either. And finally, we
think that the VMX of ESXi has a smaller
attack surface. After comparison of the
VMX binary between the Workstation and the
ESXi we found that there are some function
that have been moved from the VMX in User
World to the VMKernel. For example the
packet transmission function in the e1000
net card has been moved from the VMX to
the VMKernel. And if you have read some
security advisories published by VMware
recently, you can notice that there are
many vulnerabilites existing in the packet
transmission part of the e1000 net card.
And all these vulnerabilites only affect
Workstation. So we think that the VMX of
ESXi has a smaller attack surface. Now
let's start the journey of escaping from
the ESXi. Let's overview the entire
exploit chain first. We use 2 memory
corruption vulnerabilites in our exploit.
The first one is an uninitialized stack
usage vulnerability which CVE Number is
CVE-2018-6981. And the second is an
unitialized stack read vulnerability and
the CVE number is CVE-2018-6982. And we
can do arbitrary address free by using the
first vulnerability, and we can get
information leakage from the second one.
After combining of these two
vulnerabilites we can do arbitrary
shellcode execution in VMX process. And
finally we use a logic vulnerability to
escape the sandbox of VMX and reverse a
root shell from the ESXi. So that's the
entire exploit chain we use. Now let's
start the first one. The first
vulnerability is an uninitialized stack
usage vulnerability. It exists in VMXNET3
netcard. When the VMX VMXNET3 netcard
tries to execute command
UPDATE_MAC_FILTERS it will us a structure
on the stack, the PhysMemPage structure.
This structure is used to represent the
memory mapping between the guest and the
host. And it's also been used to transport
data between the guest and the host. Then
the VMXNET will call function
DMA_MEM_CREATE to initialize the structure
on the stack first, then it will use this
structure to execute this command. And
finally it uses PhysMemRelease to destroy
the structure, the physical memory page
structure. So it seems that there are no
problems here. But if we look at the
function DMA memory create, we can notice
that there is a check before the
initialization of the PhysMemoryPage
structure. It will check the argument
address and the argument size and if the
check passes then it will initialize the
structure. But if the check fails, it will
never initialize the structure on the
stack. And finally we found that we can
control the address argument by writing a
value to one of the registers of VMXNET3.
What is worse is that in function
PhysMemoryRelease there are no checks
about if the PhysMemoryPage structure had
been initialized and it just frees a
pointer of this structure. So that's it
about it. If we can pad the data on the
stack it's possible for us to do arbitrary
address free. We can pad a fake
PhysMemoryPage structure on the stack and
then make the check fail in the function
DMA memory create and finally when it
comes to the PhysMemoryRelease it will
free a pointer of our PhysMemoryPage
structure. So we just try to find a
function to pad the data on the stack.
There is a design pattern in software
development, where we store the data into
the stack, if the size is small, when we
allocate some memory. And otherwise we
will output it to the heap. And we found a
function that fits this pattern. This
function will be used when our guest OS
executes the instruction outsb. It will
check the size, if the size is smaller
than 0x8000 it will use the stack to store
the data. And finally it will copy the
data we send from the guest into the
stack. So we can use this function to pad
the data on the stack. Then how do we
combine this to do arbitrary address free?
We can use outsb instruction in guest OS
first to pad the data on the stack. This
data should contain fake PhysMemoryPage
structure and the page count of this fake
structure should be zero. The page array
of this fake PhysMemoryPage structure
should be the address we want to free.
Then we set some registers of the vmxnet3
to make the check fail in the function DMA
memory create. And finally, we order the
vmxnet3 netcard, to execute the command to
update MAC filters and then in the VMX it
will use the PhysMemRelease to destroy the
structure we pad before. This structure is
a fixed structure with pad in the first
step and it will check the page count if it's 0. If
it's 0, it will free the page
array of this fake PhysMemPage structure.
So we can do arbitrary address free now by
using the first uninitialized stack usage
vulnerability. Here come the next one, the
second vulnerability also exists in the
vmxnet3 net card. The vmxnet3 net card
tries to execute command get_coalesce. It
will first get a length from the guest,
and the length must be 16. Then it
initializes the first eight byte of a
structure on the stack. But it's just for
guest to initialize the next 8 byte of
this structure and just write this
structure back to our guest OS. So we can
link 8 byte uninitialized data on the
stack from the host to our guest. And
after debugging the guest VMX process, we
realized that there are fixed offsets
between the images, so it's possible for
us to get all the information about the
address space by using this vulnerability.
Now, what do we have now? We can do
arbitrary address free by using the first
one. And we can get all information about
the address space by using the second one.
What do we want to do? We want to do
arbitrary shell code execution in the
VMX. So how do we combine these two
vulnerabilities to achieve our target?
It's hard for us to do arbitrary shell
code execution by using arbitrary address
free. But it's easy for us to do arbitrary
shell code execution by using an arbitrary
address write. So our target changes into
how to do arbitrary address write by using
arbitrary address free. Then we realized
that we need a structure and this
structure should include pointers we can
write and the size. So last we can
overwrite this structure. We can do
arbitrary address writes usually. When we
first tried to exploit this vulnerability,
we used some structures in the heap, but
we've found that we can not manipulate the
heap's layout stably because VMX
frequently allocates and releases
memory. So we cannot use the structures in
the heap. And after reversing some code of
VMX, we have found a structure. The
structure's name is channel and it's used
in VMWare RPCI. What's VMWare RPCI?
VMWare has a series of RPC mechanisms to
support communication between the guest
and the host. And here it has an
interesting name: backdoor. RPCI is one of
them. And the other one we may be familiar
with is VMWare tools. I'd like to ask
again if anyone has installed VMWare tools
in your guest OS, please raise your hands
again. Oh, not as much as before. So if
you use VMWare workstation, you'll
probably have installed VMWare tools in
your guest because once you installed it,
you can use some convenient functions such
as copy and the paste text fields between
the guest and the host, drag and drop
files, create shared folder and so on.
VMWare tools are implemented by using some
RPCI commands. And here are some examples
about about some RPCI commands. For
example, we can use info-set guestinfo to
set some information about our guest and
we can use info-get to retrieve this
information back. Then what happens when
we execute this RPCI command in our guest?
For example, if we execute this RPCI
command 'info-set guestinfo.a' 123 in our
guest OS. What happens in VMX? It will
call VM Exit first and finally it will return
to the RPCI handler of VMX. Then the RPCI
handler will choose a subcommand to use by
checking the value of the registers of our
guest OS. The RPC tool in our guest OS
will use the subcommand, 'Open' first to
open a channel and initialize it. Then it
will use a subcommand, 'SendLen' to set
the size of our channel and allocate heap
memory to install the data of our RPC
command and suddenly it will use the
'SendData' subcommand to pad the data of
the memory we allocated before. And once
the length of the data we sent from the
guest re-calls to the sizeof from the
'SendLen' subcommand the VMX will use a
corresponding RPCI command handler
function after a string combination. And
finally, it will use a 'Close' subcommand
to destroy the channel structure including
setting the size to zero and freeing the
data pointer. That's what happens when we
execute this RPCI commend in our guest.
Furthermore, there is a channel structure
area in the data segment we can use. So
this is a perfect structure for our
exploit. Now, you got all the things we
want. We've got two vulnerabilities and
we've got the structure we want. How do we
combine this? We notice that the VMX uses
ptmalloc of Glibc to manage its heap. So
we just choose to use a fast-bin attack.
What's the fast-bin attack? Fast-bin
attack is a method to exploit heap
vulnerabilities of ptmalloc by using the
singly-linked list. And it's the easiest
exploit method to exploit ptmalloc, I
think. It's also the first method to
exploit ptmalloc that I learned when I
just started to learn how to how to
exploit. Then after considering the check
existing in the Glibc, we decided to free
the address at the Reply Index of channel.
Because by doing that, the Glibc will treat
this address as a fake chunk and the Glibc
will check the current chunk's size. And
after doing that, the size of the fake
chunk is also the size of the
'channel[N]', so we can set a valid value
to the size of the 'channel[N]' to bypass
the check. So we can bypass the check.
Once we've freed this address this fake
chunk will be put into the fast-bin linked
list first. Then we can reallocate this
fake chunk by using another channel, N+2.
Now we have a data pointer pointed at the
reply index of channel[N] and we can
easily overwrite the channel[N+1] by using
channel[N+2]. We can send a data to
channel[N+2] and finally it will overwrite
some parts of the channel[N+1]. So it's
easy now for us to do arbitrary address
write by faking some parts of the channel
structure. Do remember our target? Our
target is to do arbitrary shell code
execution in VMX and we can do arbitrary
address write now. There are many ways to
do arbitrary shell code execution by using
arbitrary address write. We choose to use
a ROP. We can override the '.got.plt'
segment. We can fake the channel[N+1],
structure first, overwrite the data
pointer at channel[N+1] to the address of
.got.plt segment. Then we can overwrite
the function pointer on the .got.plt
segment. So once the VMX uses this
function we overwrite, it will jump to our
ROP gadget. So it's also easy for us to do
arbitrary shell code execution by using
ROP. So now we can do arbitrary shell code
execution in the VMX process. We're seeing
that we have escaped from the virtual
machine of the ESXi fully, we tried to
execute some command by using a system
call execve, but it fails. We tried to
open and read some sensitive files just
like password, it fails again. Then we
realize that there is a sandbox. We cannot
execute any commands unless we escape the
sandbox either. The next part come to
comes to the how we analyze and the
escape the sandbox. After realizing that
there is a sandbox in the ESXi, we reverse
some code of the VMkernel and we find the
kernel module named as VM Kernel SAS
control system. And this system, this
module, implements the fine grained checks
for the system call. And it seems that
this sandbox is a rule-based sandbox. So
we just tried to find the configuration
file of this sandbox. We finally found it
at this directory,
/etc/vmware/secpolicy/domains, and it
seems that there are many different
sandboxes offered by VMWare to the
different processes in the userworld. Like
app, plugin and the globalVMDom is a file
for our VMX process and for our VM. After
reading that, it's obvious for us that the
/var/run directory is the only directory
where we have read and write permissions.
Then we look at the files existing in this
directory. We got a lot of pid filess just
like crond, dcui, inetd and so on. And
it's also obvious that inetd.conf
configure file is only configure file we
can write. What's inetd? inetd is open
source software and it's a super-server
domain that provides internet services.
Then we just analyze the contents of the
inetd.conf. The content of the inetd.conf
is here on the ESXi. We can find that it a
defines two services, ssh and the authd.
And some of it defines which binary will
be used by different services. For
example, the authd will be used by the
authd services. Also after some testing,
we realize that the authd service is
always enabled, while the sshd service is
not. So this is the only configure file we
can write. So we got an idea. How about
overwriting this configure file? Or we can
overwrite the binary part for authd like
that, we can override the /sbin/authd to
/bin/sh. So once can restart the inetd
process we can bind the shell to the port
authd is using. Then we just find a way to
restart the inetd process. We analyzed the
configure file of the sandbox again, and
we found out the queue system call we can
use in the VMX process. Then we just use
the queue HUP to restart the inetd
process. Once the inetd process restarts,
we can execute any commands by sending
them to the port the authd is using. So
that's the method we use to escape from
the sandbox. And here's a demo.
Oh, sorry.
Oh, it seems not, I cannot play this
video, but it's OK. You can find it on
YouTube and we created this demo after
the GeekPwn 2018, we get a reverse shell
after excuting the exploit in our guest
OS. That's all. And if you want to get
more details about our exploit chain,
please check our paper here and that's
all. Thanks.
applause
Herald: So I don't think I'm actually
worthy to share the stage with
f1yyy, that was awesome. If you have
questions, we have microphones, you need
to come up to the microphone, line up
behind them and we'll take your question.
Meanwhile, does the signal angel have
anything? No questions yet. Do we not have
questions from the audience? There is one.
Can I have number six, please?
Mic 6: Do you talk to VMWare for this
little hack?
f1yyy: We have reported all these
vulnerabilities to VMWare after the
GeekPwn 2018, and it has been one year
since after they repair it.
Mic 6: OK, Thanks.
Herald: That's definitely a relief. Number
one, please.
Mic 1: First of all, thanks for the great
talk. I just want to know if there is any
meaningful thing a system administrator
can do to lock down the sandbox further so
that we can have some preventative,
basically tasks, for our ESXi setups. Or
if there is nothing we can do except
patching, of course.
f1yyy: Could you repeat your question?
It's so fast for me. Sorry about that.
Mic 1: Basically, is there anything you
can do as an administrator to lock down
the sandbox even more so that this is
impossible or that it is harder than what
you showed?
f1yyy: OK. This is the first question.
Your can set the sandbox down by executing
a command on the ESXi shell. I didn't put
the command here. I found the command to
set the sandbox down. You can find it by
searching the documents about the ESXi.
Wait, wait, wait, wait. I found it, just
by myself by using the command offered on
the ESXi shell. It's not documented by the
VMWare. OK, I will share this command on
my Twitter later. Sorry about that. I
didn't put this command into my slides.
Mic 1: But would this have prevented the
attack?
f1yyy: Prevented it?
Herald: By doing that change, by doing
that command, would be possible to prevent
the attack that you just showed?
f1yyy: The sandbox is used to protect the
VMX process. So if you update your ESXi, I
think that it will be safe.
Herald: Okay, great. We have a we have a
question from the Internet.
Signal Angel: Yes. Does this exploit also
work on non-AMD VTx enabled VMs using binary
translation?
Herald: Is it is it more universal than
just the AMD-VX?
f1yyy: Yeah, can you repeat that again?
I just hear the, okay.
Signal Angel: Does it also work on non-AMD
V or VTX-enabled VMs using binary
translation?
f1yyyy: Yes, because all these
vulnerabilities exist in the virtual
hardware. You will need to use virtual
hardware in your virtual machine.
Herald: So any further questions? I'm not
seeing anybody on the microphones. Any
further questions from the internet?
That's it then. Good. Please, everybody help
me in thanking f1yyyy for this fantastic talk.
applause
36c3 postroll music
Subtitles created by c3subtitles.de
in the year 2020. Join, and help us!