<i>36c3 preroll</i>

Herald: So like many operators, in my
group, we actually use a lot of ESXi

servers. You would think that after using
these things for 10 years, I would know

how to speak, but I do not. We use these
for virtualizing machines. Some of... some of

these actually runs on sandboxes or, you
know, run kind of dubious software on it.

So we really do want to prevent these
processes from jumping from the virtual

environment to the hypervisor environment.
We have today - we have - f1yyy, he wants

to be known by f1yyy, so I'm respecting
that; and he's from Triton Security Labs,

and he's going to show us how the exploits
that he discovered in the, I think it was

the last Chinese GeekPwn capture the flag.
He's gonna show us how these things work,

and was that I would like to help. I would
like to ask you, to help me, welcome f1yyy

onto the stage.

<i>applause</i>

f1yyy: Hello. Thanks for the introduction.
Good evening, everybody. I'm f1yyy a

Senior Security Researcher at Chaitin
Technology. I'm going to present The Great

Escape of ESXi; Breaking Out of a
Sandboxed Virtual Machine. We have

demonstrated this exploit chain before at
GeekPwn 2018. I will introduce our

experience of escaping the sandbox on the
ESXi. I will also introduce the work we

have done about the sandbox on the ESXi.
Now let's start it. We come from the

Chaitin Security Research Lab. We have
researched many practical targets in

recent years, including PS4 jailbreak,
Android rooting, IoT offensive research,

and so on. Some of us also play CTF with
Team b1o0p and Tea Deliverers. We recently

own the championship at HITCON final. We
are also the organizer of the Real World

CTF. We've created some very hard
challenges this year. So if you are

interested in it, we welcome you to
participate in our CTF game. Now, before

we start our journey to escaping the
virtual machine, we need to figure out

what is virtual machine escape? I like to
ask some of you that, did anyone use the

virtualization software? If you have used
the virtualization software, like VMware

Workstation, Hyper-V, VirtualBox and so
on, please raise your hand. Okay, okay,

okay. Thanks, thanks, thanks. Many. So if
you are a Software Engineer or a Security

Researcher, you'll probably have used
virtualisation software, but if anyone has

heard of the word virtual machine escape,
if you have heard of that, please raise

your hand again. Oh, oh, surprised.
Thanks, thanks, thanks. It surprises me

that all you know about that, but I have
to introduce that again. What's virtual

machine escape? You know, in most
circumstances the host OS runs on the

hypervisor and the hypervisor will handle
some sensitive instructions executed by

the guest OS. Host OS emulates virtual
hardware and handles RPC requests from the

guest OS. That's the architecture of
normal virtualization software. And the

guest OS is isolated from each other and
cannot affect the host OS. However, if

there are some bugs, or if there are some
vulnerabilities existing in the host OS,

it's possible for the guest OS to escape
from the virtualization environment. They

can exploit these vulnerabilities. And
finally, they can execute arbitrary code

on the host. So this is the Virtual
Machine Escape. Then why we chose ESXi as

our target? The first reason is we know
that more and more companies are using or

plan to use private cloud to store its
private data, including these companies

and the vSphere is an enterprise solution
offered by VMware. It's popular between

companies. If you are a Net-Manager of a
company, you may know about VMware

vSphere. And the ESXi is the hypervisor
for VMware vSphere, so it's widely used in

private cloud. That's the first reason.
The second one is that it's a challenging

target for us. There are several
exploitations of VMware Workstation in

recent years. Hackers escape from the
VMware Workstation by exploiting some

vulnerabilities. These vulnerabilties
exist in graphic cards, network cards and

USB devices and so on. But, there has been
no public escape of ESXi before, so it's a

challenging target for us and we love
challenge. Then why is the ESXi so

challenging? The first reason I think is
that there are little documents about its

architecture. The only thing we have found
is a white paper offered by VMware. The

white paper only includes some definitions
and pictures without details. So let's

take a brief look at the architecture of
ESXi first. ESXi is an Enterprise bare-

metal hypervisor and it includes two
parts. The kernel, it uses VMKernel

developed by VMware and the User Worlds
and the other part, the User Worlds. The

VMKernel is a POSIX-like operating system.
And it is uses an in-memory filesystem. It

means that all files stored in this system
are not persistent. And the VMKernel also

manages hardware and schedules resource
for ESXi. VMKernel also includes VMWare

drivers, I/O Stacks and some User World
APIs offered to the User Worlds. And the

word "User World" is used by VMWare to
refer the processes running in VMKernel

operating system and the word "User World"
means that a group of these processes.

This process can only use a limited /proc
directory and limited signals and it can

just use some of the POSIX API. For
example, there are some User Worlds

processes like hosted, ssh, vmx and so on.
Then this is the architecture of ESXi. I

would like to give you an example to show
how a virtual machine works on ESXi. The

VMX process in the User World can
communicate with the VMM by using some

undocumented customized system call. And
the VMM will initialize the environment

for the guest OS. When guest OS executes
some sensitive instructions, it will cause

a VMExit and return to VMM. The VMX
process also emulates virtual hardware and

handles RPC requests from the guest.
That's how a virtual machine works on

ESXi. Then, how can we escape from the
virtual machine on ESXi? If there is a

vulnerability in the virtual hardware of
the VMX, we can write a driver, or write

an exploit, to escape from it. The driver
will communicate with the virtual hardware

and it can exploit the vulnerability. And
finally we can execute shellcode in the

VMX process. So it means that we have
successfully escaped from the virtual

machine on the ESXi. So the second reason
about why ESXi is so challenging, is that

User World API. The VMX uses many
undocumented and customized system calls

and if you want to reverse some code of
VMX it is hard for you to understand which

API the VMX is using. But luckily we found
two system call tables after compromising

the k.b00 field. There are 2 system call
tables we found with symbols so this field

will be useful if we want to reverse some
code of the VMX. This is the second

reason. Thirdly, there are some security
mitigations here, including ASLR and NX.

It means that we need to link some address
information before we start our exploit

to break the randomness of the address
space. Furthermore after testing we found

that there is another mitigation on the
ESXi. There is a sandbox that isolates the

VMX process. So even if you can execute
some shellcode in the VMX process you can

not execute any commands, you can not read
any sensitive fields, unless you escape

from the sandbox either. And finally, we
think that the VMX of ESXi has a smaller

attack surface. After comparison of the
VMX binary between the Workstation and the

ESXi we found that there are some function
that have been moved from the VMX in User

World to the VMKernel. For example the
packet transmission function in the e1000

net card has been moved from the VMX to
the VMKernel. And if you have read some

security advisories published by VMware
recently, you can notice that there are

many vulnerabilites existing in the packet
transmission part of the e1000 net card.

And all these vulnerabilites only affect
Workstation. So we think that the VMX of

ESXi has a smaller attack surface. Now
let's start the journey of escaping from

the ESXi. Let's overview the entire
exploit chain first. We use 2 memory

corruption vulnerabilites in our exploit.
The first one is an uninitialized stack

usage vulnerability which CVE Number is
CVE-2018-6981. And the second is an

unitialized stack read vulnerability and
the CVE number is CVE-2018-6982. And we

can do arbitrary address free by using the
first vulnerability, and we can get

information leakage from the second one.
After combining of these two

vulnerabilites we can do arbitrary
shellcode execution in VMX process. And

finally we use a logic vulnerability to
escape the sandbox of VMX and reverse a

root shell from the ESXi. So that's the
entire exploit chain we use. Now let's

start the first one. The first
vulnerability is an uninitialized stack

usage vulnerability. It exists in VMXNET3
netcard. When the VMX VMXNET3 netcard

tries to execute command
UPDATE_MAC_FILTERS it will us a structure

on the stack, the PhysMemPage structure.
This structure is used to represent the

memory mapping between the guest and the
host. And it's also been used to transport

data between the guest and the host. Then
the VMXNET will call function

DMA_MEM_CREATE to initialize the structure
on the stack first, then it will use this

structure to execute this command. And
finally it uses PhysMemRelease to destroy

the structure, the physical memory page
structure. So it seems that there are no

problems here. But if we look at the
function DMA memory create, we can notice

that there is a check before the
initialization of the PhysMemoryPage

structure. It will check the argument
address and the argument size and if the

check passes then it will initialize the
structure. But if the check fails, it will

never initialize the structure on the
stack. And finally we found that we can

control the address argument by writing a
value to one of the registers of VMXNET3.

What is worse is that in function
PhysMemoryRelease there are no checks

about if the PhysMemoryPage structure had
been initialized and it just frees a

pointer of this structure. So that's it
about it. If we can pad the data on the

stack it's possible for us to do arbitrary
address free. We can pad a fake

PhysMemoryPage structure on the stack and
then make the check fail in the function

DMA memory create and finally when it
comes to the PhysMemoryRelease it will

free a pointer of our PhysMemoryPage
structure. So we just try to find a

function to pad the data on the stack.
There is a design pattern in software

development, where we store the data into
the stack, if the size is small, when we

allocate some memory. And otherwise we
will output it to the heap. And we found a

function that fits this pattern. This
function will be used when our guest OS

executes the instruction outsb. It will
check the size, if the size is smaller

than 0x8000 it will use the stack to store
the data. And finally it will copy the

data we send from the guest into the
stack. So we can use this function to pad

the data on the stack. Then how do we
combine this to do arbitrary address free?

We can use outsb instruction in guest OS
first to pad the data on the stack. This

data should contain fake PhysMemoryPage
structure and the page count of this fake

structure should be zero. The page array
of this fake PhysMemoryPage structure

should be the address we want to free.
Then we set some registers of the vmxnet3

to make the check fail in the function DMA
memory create. And finally, we order the

vmxnet3 netcard, to execute the command to
update MAC filters and then in the VMX it

will use the PhysMemRelease to destroy the
structure we pad before. This structure is

a fixed structure with pad in the first
step and it will check the page count if it's 0. If

it's 0, it will free the page
array of this fake PhysMemPage structure.

So we can do arbitrary address free now by
using the first uninitialized stack usage

vulnerability. Here come the next one, the
second vulnerability also exists in the

vmxnet3 net card. The vmxnet3 net card
tries to execute command get_coalesce. It

will first get a length from the guest,
and the length must be 16. Then it

initializes the first eight byte of a
structure on the stack. But it's just for

guest to initialize the next 8 byte of
this structure and just write this

structure back to our guest OS. So we can
link 8 byte uninitialized data on the

stack from the host to our guest. And
after debugging the guest VMX process, we

realized that there are fixed offsets
between the images, so it's possible for

us to get all the information about the
address space by using this vulnerability.

Now, what do we have now? We can do
arbitrary address free by using the first

one. And we can get all information about
the address space by using the second one.

What do we want to do? We want to do
arbitrary shell code execution in the

VMX. So how do we combine these two
vulnerabilities to achieve our target?

It's hard for us to do arbitrary shell
code execution by using arbitrary address

free. But it's easy for us to do arbitrary
shell code execution by using an arbitrary

address write. So our target changes into
how to do arbitrary address write by using

arbitrary address free. Then we realized
that we need a structure and this

structure should include pointers we can
write and the size. So last we can

overwrite this structure. We can do
arbitrary address writes usually. When we

first tried to exploit this vulnerability,
we used some structures in the heap, but

we've found that we can not manipulate the
heap's layout stably because VMX

frequently allocates and releases
memory. So we cannot use the structures in

the heap. And after reversing some code of
VMX, we have found a structure. The

structure's name is channel and it's used 
in VMWare RPCI. What's VMWare RPCI?

VMWare has a series of RPC mechanisms to
support communication between the guest

and the host. And here it has an
interesting name: backdoor. RPCI is one of

them. And the other one we may be familiar
with is VMWare tools. I'd like to ask

again if anyone has installed VMWare tools
in your guest OS, please raise your hands

again. Oh, not as much as before. So if
you use VMWare workstation, you'll

probably have installed VMWare tools in
your guest because once you installed it,

you can use some convenient functions such
as copy and the paste text fields between

the guest and the host, drag and drop
files, create shared folder and so on.

VMWare tools are implemented by using some
RPCI commands. And here are some examples

about about some RPCI commands. For
example, we can use info-set guestinfo to

set some information about our guest and
we can use info-get to retrieve this

information back. Then what happens when
we execute this RPCI command in our guest?

For example, if we execute this RPCI
command 'info-set guestinfo.a' 123 in our

guest OS. What happens in VMX? It will
call VM Exit first and finally it will return

to the RPCI handler of VMX. Then the RPCI
handler will choose a subcommand to use by

checking the value of the registers of our
guest OS. The RPC tool in our guest OS

will use the subcommand, 'Open' first to
open a channel and initialize it. Then it

will use a subcommand, 'SendLen' to set
the size of our channel and allocate heap

memory to install the data of our RPC
command and suddenly it will use the

'SendData' subcommand to pad the data of
the memory we allocated before. And once

the length of the data we sent from the
guest re-calls to the sizeof from the

'SendLen' subcommand the VMX will use a
corresponding RPCI command handler

function after a string combination. And
finally, it will use a 'Close' subcommand

to destroy the channel structure including
setting the size to zero and freeing the

data pointer. That's what happens when we
execute this RPCI commend in our guest.

Furthermore, there is a channel structure
area in the data segment we can use. So

this is a perfect structure for our
exploit. Now, you got all the things we

want. We've got two vulnerabilities and
we've got the structure we want. How do we

combine this? We notice that the VMX uses
ptmalloc of Glibc to manage its heap. So

we just choose to use a fast-bin attack.
What's the fast-bin attack? Fast-bin

attack is a method to exploit heap
vulnerabilities of ptmalloc by using the

singly-linked list. And it's the easiest
exploit method to exploit ptmalloc, I

think. It's also the first method to
exploit ptmalloc that I learned when I

just started to learn how to how to
exploit. Then after considering the check

existing in the Glibc, we decided to free
the address at the Reply Index of channel.

Because by doing that, the Glibc will treat
this address as a fake chunk and the Glibc

will check the current chunk's size. And
after doing that, the size of the fake

chunk is also the size of the
'channel[N]', so we can set a valid value

to the size of the 'channel[N]' to bypass
the check. So we can bypass the check.

Once we've freed this address this fake
chunk will be put into the fast-bin linked

list first. Then we can reallocate this
fake chunk by using another channel, N+2.

Now we have a data pointer pointed at the
reply index of channel[N] and we can

easily overwrite the channel[N+1] by using
channel[N+2]. We can send a data to

channel[N+2] and finally it will overwrite
some parts of the channel[N+1]. So it's

easy now for us to do arbitrary address
write by faking some parts of the channel

structure. Do remember our target? Our
target is to do arbitrary shell code

execution in VMX and we can do arbitrary
address write now. There are many ways to

do arbitrary shell code execution by using
arbitrary address write. We choose to use

a ROP. We can override the '.got.plt'
segment. We can fake the channel[N+1],

structure first, overwrite the data
pointer at channel[N+1] to the address of

.got.plt segment. Then we can overwrite
the function pointer on the .got.plt

segment. So once the VMX uses this
function we overwrite, it will jump to our

ROP gadget. So it's also easy for us to do
arbitrary shell code execution by using

ROP. So now we can do arbitrary shell code
execution in the VMX process. We're seeing

that we have escaped from the virtual
machine of the ESXi fully, we tried to

execute some command by using a system
call execve, but it fails. We tried to

open and read some sensitive files just
like password, it fails again. Then we

realize that there is a sandbox. We cannot
execute any commands unless we escape the

sandbox either. The next part come to
comes to the how we analyze and the

escape the sandbox. After realizing that
there is a sandbox in the ESXi, we reverse

some code of the VMkernel and we find the
kernel module named as VM Kernel SAS

control system. And this system, this
module, implements the fine grained checks

for the system call. And it seems that
this sandbox is a rule-based sandbox. So

we just tried to find the configuration
file of this sandbox. We finally found it

at this directory,
/etc/vmware/secpolicy/domains, and it

seems that there are many different
sandboxes offered by VMWare to the

different processes in the userworld. Like
app, plugin and the globalVMDom is a file

for our VMX process and for our VM. After
reading that, it's obvious for us that the

/var/run directory is the only directory
where we have read and write permissions.

Then we look at the files existing in this
directory. We got a lot of pid filess just

like crond, dcui, inetd and so on. And
it's also obvious that inetd.conf

configure file is only configure file we
can write. What's inetd? inetd is open

source software and it's a super-server
domain that provides internet services.

Then we just analyze the contents of the
inetd.conf. The content of the inetd.conf

is here on the ESXi. We can find that it a
defines two services, ssh and the authd.

And some of it defines which binary will
be used by different services. For

example, the authd will be used by the
authd services. Also after some testing,

we realize that the authd service is
always enabled, while the sshd service is

not. So this is the only configure file we
can write. So we got an idea. How about

overwriting this configure file? Or we can
overwrite the binary part for authd like

that, we can override the /sbin/authd to
/bin/sh. So once can restart the inetd

process we can bind the shell to the port
authd is using. Then we just find a way to

restart the inetd process. We analyzed the
configure file of the sandbox again, and

we found out the queue system call we can
use in the VMX process. Then we just use

the queue HUP to restart the inetd
process. Once the inetd process restarts,

we can execute any commands by sending
them to the port the authd is using. So

that's the method we use to escape from
the sandbox. And here's a demo.

Oh, sorry.

Oh, it seems not, I cannot play this
video, but it's OK. You can find it on

YouTube and we created this demo after
the GeekPwn 2018, we get a reverse shell

after excuting the exploit in our guest
OS. That's all. And if you want to get

more details about our exploit chain,
please check our paper here and that's

all. Thanks.

<i>applause</i>

Herald: So I don't think I'm actually
worthy to share the stage with

f1yyy, that was awesome. If you have
questions, we have microphones, you need

to come up to the microphone, line up
behind them and we'll take your question.

Meanwhile, does the signal angel have
anything? No questions yet. Do we not have

questions from the audience? There is one.
Can I have number six, please?

Mic 6: Do you talk to VMWare for this
little hack?

f1yyy: We have reported all these
vulnerabilities to VMWare after the

GeekPwn 2018, and it has been one year
since after they repair it.

Mic 6: OK, Thanks.
Herald: That's definitely a relief. Number

one, please.
Mic 1: First of all, thanks for the great

talk. I just want to know if there is any
meaningful thing a system administrator

can do to lock down the sandbox further so
that we can have some preventative,

basically tasks, for our ESXi setups. Or
if there is nothing we can do except

patching, of course.
f1yyy: Could you repeat your question?

It's so fast for me. Sorry about that.
Mic 1: Basically, is there anything you

can do as an administrator to lock down
the sandbox even more so that this is

impossible or that it is harder than what
you showed?

f1yyy: OK. This is the first question.
Your can set the sandbox down by executing

a command on the ESXi shell. I didn't put
the command here. I found the command to

set the sandbox down. You can find it by
searching the documents about the ESXi.

Wait, wait, wait, wait. I found it, just
by myself by using the command offered on

the ESXi shell. It's not documented by the
VMWare. OK, I will share this command on

my Twitter later. Sorry about that. I
didn't put this command into my slides.

Mic 1: But would this have prevented the
attack?

f1yyy: Prevented it?
Herald: By doing that change, by doing

that command, would be possible to prevent
the attack that you just showed?

f1yyy: The sandbox is used to protect the
VMX process. So if you update your ESXi, I

think that it will be safe.
Herald: Okay, great. We have a we have a

question from the Internet.
Signal Angel: Yes. Does this exploit also

work on non-AMD VTx enabled VMs using binary
translation?

Herald: Is it is it more universal than
just the AMD-VX?

f1yyy: Yeah, can you repeat that again?
I just hear the, okay.

Signal Angel: Does it also work on non-AMD
V or VTX-enabled VMs using binary

translation?
f1yyyy: Yes, because all these

vulnerabilities exist in the virtual
hardware. You will need to use virtual

hardware in your virtual machine.
Herald: So any further questions? I'm not

seeing anybody on the microphones. Any
further questions from the internet?

That's it then. Good. Please, everybody help 
me in thanking f1yyyy for this fantastic talk.

<i>applause</i>

<i>36c3 postroll music</i>

Subtitles created by c3subtitles.de
in the year 2020. Join, and help us!