#days: Ange Albertini: Such a weird processor - messing with x86 opcodes

0:05 - 0:10

Such a weird processor - messing with x86 opcodes... and a little bit of PE [Portable Executable]
0:11 - 0:19

So welcome. ...And especially let me know if I speak too quickly. Um, so -- who I am -- oh, yes so
0:19 - 0:28

I will talk about opcodes and a little bit about the PE [portable executable] file format and their oddities. So, I've been
0:28 - 0:35

a reverse engineer for some years, for some time. I created a project called Corkami.
0:35 - 0:42

Also in the past I worked on the MAME arcade emulator, and professionally I am a malware analyst, but
0:42 - 0:49

this is only on the behalf of my hobbies, this is my own experiments and research at home.
0:49 - 0:57

So, I introduced Corkami. Corkami is just the name of the project I created for RCE project.
0:57 - 1:04

I tried to keep it just to the technical stuff, no ads, no login required.
1:04 - 1:06

Really direct to the good stuff.
1:06 - 1:12

I try to update it and make it useful, so I also created cheat sheets and the kind of easy documents
1:12 - 1:15

that I would use for work on a daily basis,
1:15 - 1:18

but it's only a hobby; I do that once the kids are asleep
1:18 - 1:23

and late at night so it's probably doesn't look professional
1:23 - 1:25

and as good as I would like it to be.
1:25 - 1:31

So right now, Corkami, the form of Corkami, is wiki pages and cheat sheets
1:31 - 1:38

and I focus on creating as many as possible relevant proof of concepts [Hi Bob!]
1:38 - 1:43

so the binaries are hand-written, usually I don't use a compiler, I create the PE (structure) myself
1:43 - 1:46

so that it's only focusing on the exact interesting point
1:46 - 1:49

and you don't have a lot of noise even -- you don't probably
1:49 - 1:51

need IDA to actually understand what's going on
1:51 - 1:55

because I try to focus only on what's important.
1:55 - 1:58

The binaries are all directly available to download so you can
1:58 - 2:01

really test your debugger, your tools, your knowledge
2:01 - 2:04

and just get them directly from that.
2:04 - 2:07

So far, I've focused on the PDF, assembly and the PE..
2:07 - 2:11

...file format. A few other stuff, but that's mainly the most
2:11 - 2:15

covered subject of my website. And I share that with a
2:15 - 2:19

very permissive license so BSD you can reuse them commercially
2:19 - 2:25

whatever. Even the images are done in open-source format.
2:25 - 2:30

So the story behind this presentation is that some time ago
2:30 - 2:32

I was young and innocent and I thought that CPUs, being
2:32 - 2:38

electronic - whatever - they had to be perfectly logical and no problems
2:38 - 2:42

and then I was tricked by malware. And basically
2:42 - 2:46

IDA wasn't able to work on it, so I decided to go back
2:46 - 2:50

to the basics and study assembly and PE files from scratch.
2:50 - 2:53

I created in the meantime documents on Corkami
2:54 - 2:57

and now I'm presenting you more or less the final results.
2:57 - 3:01

or the good programs results. If I wasn't -- if I was just a
3:01 - 3:06

guy who learned assembly I probably wouldn't be in HashDays
3:06 - 3:10

to talk about it, if I didn't get a few achievements from
3:10 - 3:14

various tools. So basically I failed all the disassemblers that I tried
3:14 - 3:21

and I also created a few crashes - in IDA. I insist that all
3:21 - 3:26

the authors were notified and most of the bugs are already fixed, but
3:26 - 3:31

basically it was like this in 6.1 -- you get a direct crash -- but
3:31 - 3:33

now it's fixed in 6.2, and everything.
3:33 - 3:37

And Hiew [Hacker's view] - that's the latest version - but the newest and released,
3:37 - 3:40

- well, the newest beta - fixed that and so on.
3:40 - 3:45

So the agenda for the presentation is that I first try with
3:45 - 3:51

an easy introduction, but I assume that most of you already know or are familiar with disassembly, right?
3:52 - 3:57

Yes. And another question: are you all familiar with
3:58 - 4:03

or you already had an event of undocumented disassembly in your ... or never?
4:03 - 4:06

Like, you trust IDA and that's all.
4:07 - 4:11

Like, is it a common thing to have an undocumented disassembly in IDA?
4:11 - 4:14

Raise you arms -- okay, not so much.
4:14 - 4:20

Okay. So then after the introduction (that will go quickly),
4:20 - 4:25

I will mention a few tricks, then introduce CoST, the program that I created.
4:25 - 4:29

And I will also talk a little bit more about the PE file format.
4:30 - 4:34

So as you all have assembly knowledge I will go quickly on that.
4:34 - 4:37

So basically, you compile a binary, there is assembly, there is
4:38 - 4:44

some relevance, some common points between the [source] code and the assembled [generated] code.
4:44 - 4:48

Then of course there is a relation between the opcode and the [assembly] code, you all know that.
4:49 - 4:53

What is important is that the assembly is generated by the compiler, but actually what is
4:54 - 5:00

then from the assembly what is -- what's only kept in the binary are the opcodes itself which are understood
5:00 - 5:03

directly by the CPU, which means the CPU just knows
5:03 - 5:07

what to do with the bytes, it doesn't care if you or the
5:07 - 5:11

tool you're using know what it will do, because it just does it.
5:11 - 5:16

And the problem is that what we read is not usually the opcodes for most people but actually the disassembly
5:16 - 5:21

and if the disassembler doesn't give you any result, well,
5:21 - 5:25

we're stuck, we're blind, we don't know what execution will do.
5:25 - 5:28

And the other problem is because of the opcode length you
5:28 - 5:30

don't know what the next instruction will be because you
5:30 - 5:32

don't know how to disassemble it.
5:32 - 5:40

So, here I just create one undocumented opcode in a simple program.
5:40 - 5:48

So basically we just '_emit' -- [it's] a keyword in -- that's Visual Studio 2010 ultimate --
5:48 - 5:52

you will get a byte that is unidentified at disassembly
5:52 - 5:59

so you get question marks, so basically this program
5:59 - 6:01

even though it costs several thousand dollars is not able
6:01 - 6:05

to -- it doesn't know what will happen.
6:05 - 6:09

So usually if you do that... Oh, yeah, if you check the Intel documentation
6:09 - 6:14

there is nothing to see at the D6 opcode, there is nothing to see there.
6:14 - 6:18

Microsoft doesn't say anything, Intel doesn't say anything,
6:18 - 6:21

so usually if you try that you could expect bad results.
6:21 - 6:27

So, not documented, directly: usually it is a crash or not the expected result.
6:27 - 6:30

But here, in this case, this specific case, no problem.
6:30 - 6:35

We don't know what is was, if we follow Intel or Microsoft documentation, we don't know what happened.
6:35 - 6:41

But if we -- the CPU just does its stuff. So what happened is that actually
6:41 - 6:49

D6 is a very simple opcode, that doesn't do much, but somehow it's not documented by Intel
6:49 - 6:54

[but] it's documented by AMD, and most of the opcodes are actually documented by AMD
6:54 - 6:58

but not Intel. I don't know why, if anyone has any idea why...
6:58 - 7:04

It's quite a trivial opcode, but it's not -- Intel still says there's nothing there. Okay.
7:04 - 7:08

So it's commonly used, the common use for those undocumented opcodes are malware
7:08 - 7:13

and packers, just to prevent automated analysis or easy reverse-engineering.
7:14 - 7:22

What's funny is that, Intel, if you follow the documentation you will have many holes, but Intel's own disassembler,
7:22 - 7:26

Xed, which is free of use, it is not open source, but just handles
7:26 - 7:36

all these opcodes correctly, while Microsoft, and Visual Studio, and WinDBG, they follow blindly the documentation.
7:36 - 7:43

So you will get question marks even though Intel knows perfectly what it does.
7:43 - 7:52

So it's like "[...] do as I disassemble and don't read my documentation."
7:52 - 8:01

So - of course - you could argue that WinDBG is only made to debug what the compiler,
8:01 - 8:07

Microsoft compiler created, but then it kind of rules out WinDBG as a malware debugging tool,
8:08 - 8:17

because you just inserted D6, it's trivial, and WinDBG is just not able to tell you what the instructions
8:17 - 8:25

are. So it's not very useful for malware analysis -- for a malware analysis debugger
8:25 - 8:33

So, another problem that happens is that of course each of the
8:33 - 8:37

undocumented things, facts, are available, maybe one
8:37 - 8:42

you will have in a trojan, one in a packer, and everything, but it's not so easy
8:42 - 8:47

to find a good, exhaustive, clean test set to actually
8:47 - 8:49

gather all these undocumented facts, so for example if you
8:49 - 8:53

so, for example, someone says - a colleague - mentions an undocumented
8:53 - 8:56

opcode or behaviour, and then you say "oh yeah, it's
8:56 - 8:59

in MebRoot [MBR infector], or you skip this part of the file or whatever",
8:59 - 9:03

and then you are actually, you know first it's a malware so you have -- you cannot
9:03 - 9:08

really spread that, and then there is a lot of noise -- the malware payload or something before and
9:08 - 9:15

after -- so it's not so easy to analyse. So that's why I focused on creating a small and clean test
9:15 - 9:21

set that would actually provide --- insists just on one particular instruction or fact.
9:22 - 9:28

So, now let's start, at last, the real stuff, and a few of the undocumented opcodes.
9:28 - 9:37

But before I actually started [studying], [I was] wondering what the actual possibilities of the CPUs, I didn't even know
9:37 - 9:44

what are the possibilities, what are the opcodes that are still supported or not by the -- by the CPU.
9:44 - 9:52

And I think it's a bit like English, everybody, or most people in the world, would be able to read and
9:52 - 9:57

understand these words, and if you['ve] see[n] some disassembly [before] then well you are used to seeing these opcodes,
9:57 - 10:04

they are made by all the compilers and they are so common that if they are not here then we are a bit
10:04 - 10:08

ill-at-ease, and if it's something different then we probably would be surprised.
10:09 - 10:20

So this is standard English, but the Intel CPUs were made in the 70s, so it'd be the same as if you take
10:20 - 10:27

Shakespearean English, so you could say that it's still English, but mmm... You know, I don't know what that means actually...
10:27 - 10:30

or maybe I forgot, I quickly forgot at least, and it's a bit the same
10:30 - 10:36

for those opcodes which are still supported by all the CPUs that we have -- all the Intel CPUs -- but
10:36 - 10:41

we probably don't know what they actually do, and that's a problem.
10:41 - 10:46

I actually made, one of the proof of concepts that I made was only using these old opcodes, and these
10:46 - 10:53

old opcodes are actually doing something, so if someone is familiar with reading that, maybe I should
10:53 - 10:59

ask "how old are you?", because myself I am used to the PUSH/JUMP/CALLs, but when it's about this,
10:59 - 11:06

mmm... what is exactly being done. And it's still working on an i7, and it's still usable by malware,
11:06 - 11:14

packers or anything, and yet some of them are -- totally unused now and they are still fully working on
11:14 - 11:16

modern CPUs.
11:16 - 11:21

And of course, it's a bit like English, it's an evolving language, and a bit like maybe the oldest generations
11:21 - 11:27

of people -- of humans wouldn't be used to the buzzwords - the latest buzzwords.
11:27 - 11:35

These opcodes are sometimes present in the most recent CPUs, so, and you have direct opcodes for
11:35 - 11:41

CRC32 or AES decryption, string matching, and then some complex operation, in just one opcode.
11:41 - 11:48

So this, this is possible, this exists in modern CPUs. Not all of them, of course.
11:48 - 11:54

One thing that I like is the MOVBE -- move big endian -- opcode, because move big endian is the rejected
11:54 - 12:02

offspring, it's only implemented in the Atom CPU, which means this netbook has -- supports this opcode
12:02 - 12:09

and the i7 64-bit doesn't have this opcode, even though it will have CRC32 or maybe AES [op]code, so...
12:09 - 12:12

so much for complete backward compatibility.
12:12 - 12:20

There is no physical CPU as far as I know that can emulate -- execute CRC32 and MOVBE.
12:20 - 12:24

And of course, MOVBE is quite meaningless itself because you already have an opcode for the big --
12:24 - 12:32

endian-ness swapping. So I don't know, this small computer has an opcode that most PC's don't.
12:32 - 12:35

Okay. Why? I don't know. If you know...
12:35 - 12:38

[Audience member:] "Is this opcode documented in the CPU feature set?"
12:38 - 12:38

Yeah.
12:38 - 12:42

Yeah, it's totally -- this MOVBE -- it's totally documented, it's official.
12:42 - 12:47

[Audience member:] "But, no; is it like a CPU flag just for this instruction or is it implicit by 'this
12:47 - 12:50

is an Atom CPU'?"
12:51 - 12:58

Uh... Yeah, I don't know. I check the value by CPUID but I don't know if it's relevant to the... but
12:58 - 13:07

I think it's by itself. ...but the CPUID result is so big that I don't remember it all.
13:08 - 13:13

Uh, another thing, a bit specific to Windows in my case, because I focus on malware, is that before you do
13:13 - 13:22

actually any opcode, I was focusing on what are the register values when you start a program, and I found
13:22 - 13:28

out that the register values by default when you start a program and you haven't executed, theoretically, any opcode,
13:28 - 13:33

- theoretically- actually gives you some information that are actively used in malwares.
13:33 - 13:40

So for example, at the start point, EAX gives you either gives you if it's older generation (XP or before),
13:40 - 13:42

or Vista or later.
13:42 - 13:51

This is not so used by malwares, I don't recall seeing it, but GS, if GS is null, then it's a 32-bit
13:51 - 13:54

system, and if it's not it's a 64-bit system.
13:54 - 13:56

I will actually use that later in one of the tricks.
13:56 - 14:04

And also, the relations between the registers -- there are many registers on the Intel CPUs -- is not
14:04 - 14:10

sometimes very clear. I was surprised that when you do a FPU operation, it changes the FPU status, the
14:10 - 14:18

FPU registers themselves, but also the MMX registers, and somehow all the documentations I saw on the
14:18 - 14:25

internet are always mapping ST0 and MM0 in front of each other which makes sense, but actually if you
14:25 - 14:30

modify -- if you just do a single FPU operation, it will actually modify not MM0, but MM7.
14:31 - 14:36

So if you do an FPU operation like "load PI" [FLDPI] and then you check the value of MM7, that could be used
14:36 - 14:39

as a trick or it's just like the way it is.
14:39 - 14:45

And like, all the documentations, wikipedia and so on, that I could find about the overlapping of the registers.
14:45 - 14:53

Another thing is that this was used as an anti-emulation trick in XP, that FPU also changes CR0
14:53 - 14:59

so you have quite an unexpected anti-emulation trick by just using FPU operation.
14:59 - 15:09

So here is it; basically 'store machine status word' [SMSW] is an older 286 CPU opcode -- mnemonic, that was
15:09 - 15:18

created at the 286 era, so before the protected mode was fully created, and so it allows you to access
15:18 - 15:26

to read the value of CR0, even from user mode, while the 'MOV CR0' is actually a privileged opcode.
15:26 - 15:34

For some reason, the higher word of the register is undefined officially by the documentation, so Intel
15:34 - 15:40

just says "this is the value -- the lowest value is correct but you cannot expect the real value". So for
15:40 - 15:45

some reason, I don't know why they say that, because it's actually the value - the higher bits - of CR0.
15:45 - 15:53

And under XP, when you do FPU operations, the value of CR0 will be modified, and eventually reverts
15:53 - 16:00

by itself. So you can have, just by doing -- SMSW, and then you expect the result, then
16:00 - 16:05

you do a FPU operation, then the result should be different, and then eventually the result will revert
16:05 - 16:10

to the original value. So it's quite a tricky and unexpected anti-emulator.
16:11 - 16:19

You have a similar trick on 32-bit Windows, where GS is not stored in the context, so it means that on
16:19 - 16:25

thread-switch the value of GS is lost, which means if you just wait for something, GS will eventually
16:25 - 16:32

reset to 0. So if you set GS and you are stepping manually, this is slow and this creates a thread-switch,
16:32 - 16:40

so instantly GS is lost. And also, like the previous trick, if you just wait for GS not to be...
16:40 - 16:45

if you just loop until GS is not 0, this on a real system, will eventually exit from the loop.
16:45 - 16:53

But the first time, it blew me, I was really wondering what can happen there, there's no other thread
16:53 - 16:58

and of course in my proof of concept, it directly starts like this. What happens? What should happen now ,
16:58 - 17:02

but on a real system? Eventually, it's reset to 0.
17:02 - 17:10

Another thing is that of course it's reset to 0, but not in 0 time, so if you do wait for GS's reset
17:10 - 17:17

and then another loop, this can only happen between two resets... thread switch, which means it should
17:17 - 17:23

take a minimum of time, so you can use that for timing -- anti-emulation timing tricks.
17:25 - 17:32

Of course, I was also thinking that NOP is perfect, because NOP is NOP, it does nothing.
17:33 - 17:44

But originally NOP is 'exchange eax with eax' [xchg eax, eax], or 'ax with ax', but the problem is that NOP [encoded as] 0x90 is always doing nothing,
17:44 - 17:51

but on 64-bit you always have, you have another encoding [87 c0] to do an 'exchange EAX AX' which this time again
17:51 - 17:54

doesn't do anything on 32b, but like all the other opcodes
17:54 - 17:58

in 64b mode, it actually resets the higher DWORD
17:58 - 18:02

so you have an XCHG EAX [,EAX] that does something,
18:02 - 18:05

even though at first it looks like it would do nothing
18:05 - 18:09

but hopefully in this case the 90 NOP is still doing nothing
18:10 - 18:14

and this is probably now common in malwares and stuff
18:14 - 18:18

HINT NOP was the multi-byte nop
18:18 - 18:23

that actually gives a hint about what will be executed next, by the CPU
18:23 - 18:24

whatever the address here [in memory referenced HINT NOP]
18:24 - 18:26

it wouldn't trigger an exception
18:26 - 18:30

but as you can see, it's really a multi-byte opcode -- it can be a very long nop
18:31 - 18:32

that's weird to say
18:32 - 18:35

another thing is, once again it's partially undocumented by Intel
18:37 - 18:44

the full range of HINT NOP encoding is bigger on AMD documentation
18:44 - 18:48

and another thing is that, because it's a multi-byte opcode
18:48 - 18:51

if you - at the end of a page - insert those bytes
18:51 - 18:54

then it will look for the operands
18:55 - 18:56

then it could trigger an exception,
18:56 - 19:00

so it's a nop that could trigger an exception if at the end of the page
19:01 - 19:04

so, thank you Intel -- or whatever, I don't know, I'm not sure
19:04 - 19:06

MOV, once again, I thought...
19:06 - 19:10

MOV being MOV, should be perfectly logical
19:11 - 19:15

sadly not... first... all this is documented, but it's tricky
19:15 - 19:19

because -- there were even bugs for that in all the disassemblers I tried, I think
19:19 - 19:21

well, except Xed, maybe
19:23 - 19:29

you cannot do MOV on or from CR0 on memory
19:29 - 19:32

so the documentation says that the Mod/RM is ignored
19:33 - 19:35

it doesn't mean it's illegal, it's just ignored
19:35 - 19:37

so if you do this, which could lead to a crash
19:37 - 19:39

it's actually interpreted as that
19:39 - 19:42

and as far as I can remember, you'd fail all the disassemblers with that
19:42 - 19:44

until recently [ ;) ]
19:44 - 19:50

MOVSXD is a 64b opcode, is sign-extending, so theoretically
19:50 - 19:55

it should work from a smaller register to a bigger register
19:55 - 19:58

but if you use no REX prefix, which is discouraged
19:58 - 20:00

you can actually make it work like a standard MOV,
20:01 - 20:04

and the other way around,
20:04 - 20:09

MOV from a selector to a 32b register actually works
20:09 - 20:12

so many disassemblers were disassembling that as MOV AX, CS
20:12 - 20:16

because that would make both operands the same size,
20:16 - 20:19

but actually the upper word of the target register
20:19 - 20:23

is 'undefined' but actually there is no funny thing here,
20:23 - 20:25

there's no random value, it's zeroes
20:25 - 20:29

so basically, it makes it equivalent to MOV EAX, CS
20:30 - 20:32

BSWAP is one of my favorite
20:32 - 20:35

because I think it's like an administration
20:35 - 20:38

it's supposed to just swap the endianness of the registers
20:38 - 20:42

but because of -- external reasons
20:42 - 20:45

it's never really doing the work you expect
20:45 - 20:50

so, only in 64b, it's actually correctly swapping the endianness
20:50 - 20:51

as you would expect
20:51 - 20:55

on EAX [32b], in 64b [mode], like all the 32b opcodes,
20:55 - 20:58

it will actually register [clear] the higher dword -- ok !
20:58 - 21:02

and, on word, it's actually 'undefined' again
21:02 - 21:04

but it's commonly used in malwares and packers
21:04 - 21:07

because it just resets [the register]
21:07 - 21:09

so it's like a XOR AX, AX
21:09 - 21:14

so, with this unexplainable result, I understand
21:14 - 21:18

that Intel probably doesn't want to explain -- just say it's 'undefined'
21:18 - 21:20

because they would be too ashamed to explain
21:20 - 21:22

why we get this funny result
21:24 - 21:31

BSWAP AX is also wrongly disassembled by WinDbg and so on
21:33 - 21:35

it will be disassembled as BSWAP EAX
21:35 - 21:37

and actually, you clear the register
21:42 - 21:44

can everybody understand this code?
21:47 - 21:50

anybody sees the potential trap?
21:53 - 21:56

so, it pushes the address of on the stack,
21:56 - 22:00

then RETN takes the address from the stack,
22:00 - 22:03

and, basically, you just jump to an immediate value,
22:10 - 22:11

execution ordering ?
22:11 - 22:13

yeah, the execution starts here
22:14 - 22:17

???
22:17 - 22:20

no -- ok, it's not the point here
22:20 - 22:26

and of course, if you -- this is OllyDbg 1, it's fixed in OllyDbg 2
22:26 - 22:28

but OllyDbg1 is even trying to be nice,
22:28 - 22:30

telling you -- this is an automatic comment -- that RET
22:30 - 22:32

is used as a jump to
22:33 - 22:36

and, as you can see, not exactly the same [happens]
22:36 - 22:37

so, what happened ?
22:37 - 22:38

no one sees ?
22:40 - 22:42

so, basically, here, you have a 66 prefix on RETN
22:43 - 22:46

which actually makes RETN to IP, and not EIP
22:47 - 22:55

so, actually, you don't jump to 401008, but to 00001008
22:56 - 22:59

and in this proof of concept, I mapped the NULL page
22:59 - 23:01

and I created -- added some code at this address
23:01 - 23:06

so, this is actually not a return to this []
23:06 - 23:10

but the problem is that, officially, this is also called a 'return'
23:10 - 23:15

it's not [different from the standard one] -- the disassemblers added their own, now, way of disassembling it
23:15 - 23:19

like 'small retn', ret.16, or something like this
23:19 - 23:22

but actually officially, it's the same mnemonic
23:22 - 23:27

so, the latest Hiew, I think, and that's OllyDbg 1
23:28 - 23:31

maybe the latest OllyDbg 2 fixed that
23:31 - 23:33

but you can still be tricked just by that
23:33 - 23:41

the 66 prefix - the jump to IP - also works on CALLs, RETs, LOOPs, [and JMPs]
23:41 - 23:44

so all the flow control opcodes
23:45 - 23:47

so, I won't enumerate all the tricks,
23:47 - 23:51

because otherwise you'll die of boredom probably
23:51 - 23:55

if you want more, then I created a page on Corkami [x86.corkami.com],
23:55 - 24:00

and I already made some graphs and cheat sheets
24:00 - 24:04

to have an easy [table] -- list of opcodes
24:04 - 24:07

and, that's quite too much theory for now...
24:07 - 24:12

So, I don't like just -- reading stuff and not having something to feed my debugger
24:12 - 24:13

so I created CoST
24:13 - 24:16

which stands for Corkami Standard Test
24:16 - 24:21

CoST is a single binary, there is no option,
24:21 - 24:25

you just run it, and it will just execute a lot of different tests
24:25 - 24:28

and then, I also made it a hardened PE,
24:28 - 24:35

so it may also help you to test the PE side of your tools
24:35 - 24:36

or your knowledge
24:36 - 24:40

but, because in hardened PE, it's actually quite difficult to debug,
24:40 - 24:42

I also made an easy PE mode so that
24:42 - 24:47

you can study only the assembly, and not have too much troubles
24:47 - 24:48

debugging it
24:49 - 24:51

so, CoST contains a lot of tests
24:57 - 24:59

classic stuff -- very trivial stuff
24:59 - 25:03

then, a few more complex stuff, like JMP to IP, IRET...
25:03 - 25:05

undocumented opcodes
25:05 - 25:10

CPU specific, like MOVBE, POPCNT, CRC32
25:10 - 25:17

also some detections of OS and VM by using common opcodes
25:17 - 25:25

like, the 'red pill trick'... yeah, just SLDT execution, and you get a value, and you compare...
25:25 - 25:28

but it's 'the blue pill', or whatever...
25:29 - 25:33

and also some OS bugs because sometimes, Windows XP
25:33 - 25:35

was doing the wrong job trying to tell you which was
25:35 - 25:38

the exception that just happened, and it would be a way
25:38 - 25:44

to make the difference between an actual OS and an emulator that would try to be logical
25:45 - 25:49

CoST is written in assembly, so, there's no extra
25:50 - 25:52

it's not compiled, it's not generated, but
25:52 - 25:56

to make it self-documented, I created internal exports
25:56 - 26:00

so that each section of the file is easy to browse [to],
26:00 - 26:05

so that you will know -- if you quickly want to jump to the 64b part
26:06 - 26:08

then it's easier via the exports
26:08 - 26:13

and also I wanted it to print messages in the most convenient way
26:13 - 26:18

so, if you keep printing messages, then it will make the assembly
26:18 - 26:21

wider, I mean longer to scroll, so I used
26:21 - 26:25

Vectored Exception Handling, and a fake opcode
26:25 - 26:28

so that you have the comments of what's gonna happen,
26:28 - 26:30

appearing directly in the code
26:30 - 26:34

so it's a kind of self-documented, without a debug symbols file
26:34 - 26:38

and, you saw, it doesn't have much of output
26:38 - 26:41

but actually it has a lot of debug output
26:41 - 26:47

like 100 -- I forgot -- messages. it's even saying '[trick] I'm gonna do this'
26:47 - 26:49

and then, 'i'm gonna do that...', so
26:49 - 26:55

trying to make it helpful yet a bit hard to disassemble
26:57 - 27:00

can anyone understand what this code is doing ?
27:00 - 27:01

this is one of my favourite
27:02 - 27:05

we can't see the opcodes
27:06 - 27:07

no, there's no [opcode] trick this time
27:17 - 27:19

so, basically you push some arguments on the stack
27:19 - 27:21

you jump to here
27:21 - 27:26

basically, with the return far [RETF]... I pushed 'push_eip' on the stack
27:26 - 27:28

with a 33 word
27:28 - 27:30

so basically I will RETurn Far to this
27:30 - 27:35

basically I will return back to this EIP in selector 33
27:35 - 27:39

if this is in a 64b OS, and this is a 32b process
27:39 - 27:42

you will return back to execution here, in 64b mode
27:42 - 27:47

because selector 33 is the selector for 64b mode
27:47 - 27:49

which you can access from a 32b process
27:49 - 27:54

so basically this code will be executed first in the current selector
27:56 - 28:01

as you see, and then it's executed back on selector 33,
28:01 - 28:04

which means in 64b mode
28:04 - 28:08

so you have the same EIP, you have the same opcodes
28:08 - 28:10

but the disassembly will be different,
28:10 - 28:14

and I chose some opcodes will make mnemonics
28:14 - 28:17

specific to each side, 32b or 64b sides
28:17 - 28:22

so, it's already quite a b*tch to disassemble
28:22 - 28:27

because, same EIP, so unless you're careful about the selector,
28:27 - 28:29

well, it's a problem
28:30 - 28:36

[Errata: you can debug this kind of code, check my berlinsides presentation (screencast on slide 58)]
28:38 - 28:45

http://bsx2.corkami.com , slide 58 [screencast]
28:47 - 28:50

if you run over it, you return to the original selector,
28:50 - 28:52

which is why there is the PUSH CS here
28:52 - 28:56

and you go back to with the original selector
28:56 - 28:58

execution will go through quickly
28:58 - 29:00

but you cannot step through that code [WRONG, you can with WinDbg+wow64exts]
29:00 - 29:03

so, killing the disassemblers, and the debuggers
29:03 - 29:04

and yet, simple
29:04 - 29:07

so, here is the result that you get when you run CoST
29:07 - 29:10

with the latest -- well the latest public version of Hiew
29:10 - 29:13

I think it's gonna be fixed
29:13 - 29:16

so, this is a HINT NOP that's not documented by Intel
29:16 - 29:20

and it's a bit forgotten by most disassemblers
29:20 - 29:24

so, WinDbg and Hiew are giving you
29:24 - 29:29

undocumented, well -- questions marks, or the Hiew style of question marks
29:29 - 29:34

then, since -- that was originally what I planned to present at Hashdays
29:34 - 29:39

but then, I decided to bring a few tricks in CoST itself, on the PE side of things
29:39 - 29:42

so, this is the header, so it has MZ, and then some text
29:42 - 29:44

so you can 'type cost.exe'
29:44 - 29:46

and it has some text - I made it type-able
29:46 - 29:51

and the NT headers - the 'PE' header, the one starting with PE
29:51 - 29:54

is actually starting at the bottom of the file -- the bottom of the file is here
29:54 - 29:55

so it's a footer
29:55 - 29:58

and I made it so the values are quite critical
29:58 - 30:01

so, they are not the one you would expect
30:01 - 30:03

so this is the result that you would get when you were
30:03 - 30:05

loading CoST under IDA 6.1
30:07 - 30:10

so, well, some values were random and everything
30:11 - 30:15

but, if you have -- with CoST, you can test and set the value of a register
30:15 - 30:17

then compare it
30:17 - 30:19

but you cannot test all the possibilities of PE files
30:19 - 30:21

with a single file, because you have to choose
30:21 - 30:25

so, for example, CoST has no section, weird alignments and everything
30:25 - 30:27

but you cannot make all the possible cases [in a single file]
30:27 - 30:31

so, I went on and I created another page on Corkami
30:31 - 30:37

with, as usual, the proof of concepts, some graphs about the PE files and everything
30:37 - 30:40

I don't consider it finished but I consider it good enough to break
30:40 - 30:41

a bit everything
30:42 - 30:46

now, I already created more than 100 PoCs, which try
30:46 - 30:51

0 section, big alignments, huge alignments, and I have some funny results...
30:51 - 30:55

so, here is the 'virtual section table vs Hiew'
30:55 - 31:00

so, when you're in low alignments, you can have no section,
31:00 - 31:03

or the section table can be empty
31:03 - 31:08

so basically, I made the SizeOfOptionalHeader point in virtual memory space
31:08 - 31:11

which means the section table is out of the PE file [full of 00, in virtual space]
31:11 - 31:16

and Hiew doesn't like this. A consequence of that it doesn't even think it's a PE file
31:16 - 31:18

while it's fully working, but this trick only works under XP
31:18 - 31:25

because Windows 7 is a bit more picky on the unused section table values
31:29 - 31:34

so when you got some ASCII art in the Data Directories
31:34 - 31:37

you can probably guess that there is something going on
31:37 - 31:40

if you have better ASCII art suggestion, I'm all ears
31:40 - 31:43

so, basically, this is the 'Dual PE header' that was presented by
31:43 - 31:45

Reversing Labs in BlackHat
31:45 - 31:48

so, are you familiar with that ?
31:50 - 31:52

so, basically, you extend the SizeOfHeaders so that
31:52 - 31:59

the NT headers will be actually mapped at the bottom of the file
31:59 - 32:03

so that when it's far enough to reach section [not file] alignment
32:04 - 32:05

and when you load that, in memory
32:05 - 32:07

the first section will actually be mapped over it
32:10 - 32:13

the first part of the OPTIONAL_HEADER is the one used on disk
32:13 - 32:16

so, this is what is used to check if the file will load
32:16 - 32:20

but the Data Directories are read from the values in memory
32:20 - 32:25

so, first, the OPTIONAL_HEADER is parsed, mapped in memory
32:25 - 32:29

then the section is folding itself over the bottom part of the header
32:29 - 32:31

and then the true Data directories that were originally
32:31 - 32:34

in the start of the section will be taken in account
32:34 - 32:39

so all this is garbage and visible on disk, it follows the SizeOfOptionalHeader
32:39 - 32:44

but actually in memory, this is not what is used to be parsed
32:45 - 32:47

another weird thing is that the export names can just be
32:47 - 32:51

absolutely anything, until a null character
32:51 - 32:53

which means, non ASCII, whatever
32:53 - 32:56

and another funny thing is that
32:56 - 32:57

Hiew displays them in line
32:57 - 32:59

so you can just add your own ads,
32:59 - 33:02

because those are just export names, and one of the export
33:02 - 33:05

[name] is actually more than 16 Kb
33:05 - 33:08

so that it's good enough to create a buffer overflow
33:08 - 33:10

if your tool is not careful about that
33:10 - 33:14

and it's also possible to have a NULL export [name], just a character NULL
33:14 - 33:15

and you can import a NULL API
33:15 - 33:17

no problem
33:19 - 33:23

I also just tried to see the different possibilities
33:23 - 33:26

created a few files that had the maximum number of sections
33:26 - 33:31

the limit is 96 under XP, and 64K under Vista and [Windows] 7
33:31 - 33:33

which means, well
33:33 - 33:36

OllyDbg 2 - the latest OllyDbg - gives you a funny message
33:36 - 33:38

but it still loads the file.
33:38 - 33:40

OllyDbg 1 crashes directly on this file
33:42 - 33:43

err...still some time ?
33:45 - 33:48

and the one last, not very visual, but I noticed
33:48 - 33:52

that the AddressOfIndex of the TLS is overwritten on loading
33:52 - 33:59

and imports - the terminator of imports doesn't need to be five null dwords
33:59 - 34:03

but only if the name [of the DLL] is 0, then the import descriptor
34:03 - 34:05

is considered a terminator
34:05 - 34:09

so, basically, if you make AddressOfIndex point to the name of an import descriptor
34:10 - 34:15

you could get that overwritten, and then the imports will be truncated
34:15 - 34:16

will be considered truncated
34:16 - 34:20

and actually, the behavior is different under XP or Windows 7
34:20 - 34:25

so, under XP, it's overwritten after imports loading,
34:25 - 34:28

so the whole imports table is not truncated,
34:28 - 34:32

while under Windows 7, it's happening before the imports are loaded,
34:32 - 34:35

which means you have the same PE, but different loading behaviour
34:35 - 34:37

under different versions of windows
34:37 - 34:41

and the file works on both versions of windows
34:43 - 34:46

oh wait, before that... maybe I still have some time ?
34:55 - 34:56

15 minutes left ? ok
34:56 - 34:58

I'll do the demo
35:01 - 35:01

This is just to prove...
35:02 - 35:03

sorry?
35:23 - 35:25

This is the kind of PE file that I typically create
35:25 - 35:29

I only defined [required] elements that just need to work
35:29 - 35:30

and this is actually a driver
35:30 - 35:34

so, even though I used some undocumented opcodes
35:37 - 35:39

It's a working driver and it doesn't have the usual
35:40 - 35:42

[compiler] stuff you have in a driver
35:44 - 35:47

just to say that this is the kind of PoC, clear to see
35:47 - 35:51

you don't have external stuff that bother, that bugs your view
35:51 - 35:52

or your debugging
35:52 - 36:02

so, this one is just to see the possible values of CR0
36:02 - 36:07

via the SMSW, theoretically undefined on DWORD
36:07 - 36:09

but it actually gives you the same value
36:09 - 36:11

[like] the standard MOV EAX, CR0
36:11 - 36:16

and here is MOV EAX, CR0 with the wrong Mod/RM
36:16 - 36:22

which, in the latest Hiew, is actually not disassembled at all
36:38 - 36:39

let's hope it doesn't crash...
36:43 - 36:47

so, as you can see, you get exactly the same value
36:47 - 36:53

whether you're using the normal CR0, the 'invalid' one, and the 'undefined'
36:55 - 36:57

the upper part is supposed to be undefined
36:57 - 37:00

usually when it's undefined, it's zeroes, in Intel language
37:00 - 37:02

but here it just works fine
37:02 - 37:03

and my machine didn't even crash
37:03 - 37:05

which means the driver is fine
37:05 - 37:07

so you can study small drivers
37:08 - 37:11

the first PoC that I presented here
37:12 - 37:15

was the one with old disassembly
37:15 - 37:18

anyone still knows what the value is?
37:20 - 37:23

so basically, some opcodes are here for garbage
37:23 - 37:28

just to prove that they are actually [supported], they are just used as junk
37:28 - 37:30

but registers are actually modified [in the others]
37:30 - 37:38

and these opcodes from the 70's, or something -- the early 80's
37:38 - 37:41

are still perfectly working on a modern CPU or even an i7
37:43 - 37:48

one of the PoC I created is the one that actually tests the values
37:48 - 37:50

-- the initial values [of each registers] -- so that you can see
37:51 - 37:55

what would be the possible values whether it's on XP or Windows 7
37:56 - 38:01

each time [TLS, EntryPoint, DllMain], I just save all the values of the registers
38:01 - 38:04

and then I compare them to possible values
38:04 - 38:06

so I test them one after each other
38:06 - 38:10

actually, on TLS, you have much more control of the values
38:10 - 38:16

because the values you will get in the TLS -- on loading the TLS
38:16 - 38:20

are the RVA [of the TLS data directory], the callbacks, the size of the TLS
38:20 - 38:23

you get that in -- I forgot exactly, but it's in the source...
38:26 - 38:33

running this will help you to mimic an OS better in your emulator
38:33 - 38:35

if that's what you're interested [in]
38:35 - 38:41

SMSW is actually the one comparing -- so, using SMSW,
38:41 - 38:46

then comparing the value, then checking whether the register changed
38:46 - 38:48

[after an FPU operation] and then when it reverts normally
38:48 - 38:52

a funny fact that I would like an explanation [for],
38:52 - 38:54

if you know it
38:54 - 39:01

is that actually, this behaviour is different if you run the file normally
39:01 - 39:04

or if you run it with a redirection
39:04 - 39:08

if you pipe the output, you get a 'fail' result
39:08 - 39:11

if you run the file normally, it just works
39:11 - 39:17

so, I would like -- here, I will just run it, and then I will run it to a file, and just TYPE the result
39:22 - 39:24

normal execution: OK
39:24 - 39:26

redirection: FAIL
39:26 - 39:29

if you guys have any explanation for that, I'm all ears
39:30 - 39:37

did you try redirecting to something else ? like, a COM
39:37 - 39:39

oh, I didn't try
39:42 - 39:44

so, you would pipe to another device, and ...
39:45 - 39:46

but then, how do you get it back ?
39:46 - 39:48

printer, or ...
39:48 - 39:51

yeah, I don't have a COM device or...
39:54 - 39:56

yeah, I don't know
39:56 - 39:59

but it was a big surprise, because I had a test bench
39:59 - 40:01

and then, 'FAIL'. .. uh ?
40:02 - 40:06

run, OK... so, I have no idea why...
40:07 - 40:08

the GS trick...
40:09 - 40:10

quite simple
40:10 - 40:15

and I also have some output
40:19 - 40:21

I modified GS then it's reset
40:21 - 40:23

then it's waited for result
40:23 - 40:26

then I'm doing 2 resets and checking the time in between
40:26 - 40:29

so that, it shouldn't happen too quickly
40:30 - 40:31

NOPs, so...
40:37 - 40:39

I'm testing the undocumented NOPs
40:39 - 40:44

testing the NOP that are on invalid page
40:54 - 40:55

so, standard NOP
41:00 - 41:02

32b nop
41:07 - 41:15

so, all my 64b tests are still done in 32b process so that you can run them on normal OS
41:15 - 41:19

then it detects via GS if 64b [mode] is available
41:19 - 41:21

and in this case, you would get a different result
41:21 - 41:26

so, if you run it on 64b, which I don't have here, you would get
41:26 - 41:28

the actual tests on 64b
41:28 - 41:30

and the results printed out.
41:31 - 41:35

but still, it's not possible to debug that easily [wrong]
41:35 - 41:39

but at least, there's no trick over there, so it's easy to bring back to a 64b process
41:39 - 41:43

[to step over 64b code and return to the 32b process]
41:45 - 41:46

PUSH/RET
41:48 - 41:51

you print the output, and then...
41:52 - 41:57

Olly nicely tells you that you will jump to 401008
41:58 - 42:03

but actually -- here the display is actually correct
42:03 - 42:05

and the TLS already created a null page
42:05 - 42:07

which prints 'FAIL'
42:09 - 42:14

so, as expected, but there is no standard way to disassemble that correctly
42:15 - 42:23

I can't execute the working 64k sections.
42:23 - 42:27

and actually I'm executing all the code [the complete virtual space of all 64k sections]
42:27 - 42:29

the sections are quite big
42:29 - 42:32

and I'm modifying EAX so that all the 00 00 are executed
42:32 - 42:35

and just to do a printf in the end.
42:35 - 42:39

it actually takes a few seconds to execute on an i7
42:39 - 42:43

so it's actually quite funny to see... you launch it... even when the cache is loaded,
42:43 - 42:48

and the OS is ready to be fast... you launch it... and printf comes a few seconds later
42:50 - 42:58

virtual sections is the one that Hiew doesn't think it's a PE at all -- this is the latest Hiew
43:00 - 43:02

well, it's been patched anyway
43:02 - 43:08

well, I can't browse PE now that it doesn't think it's a PE file...
43:08 - 43:13

but basically, it thinks that the OPTIONAL_HEADER points to the end of the file -- beyond the end of
43:13 - 43:14

the file
43:14 - 43:15

the folded header...
43:17 - 43:18

a few error messages...
43:18 - 43:20

because of the wrong data directories
43:20 - 43:23

and the actual DD are at the start of...
43:30 - 43:31

...the section
43:33 - 43:41

this would be the imports and the actual real DD
43:42 - 43:49

and last, the one with the TLS AddressOfIndex that is pointing...
43:58 - 44:01

...inside the imports, at the AddressOfName
44:02 - 44:04

so it will overwrite the loading [overwrite the pointer during loading]
44:04 - 44:11

and when you just load it, it just says 'it's XP' because
44:11 - 44:14

my imports were loaded this way, and not the other way.
44:14 - 44:17

and if you run that file [under W7], it will give you another results
44:17 - 44:18

and then, the exports...
44:19 - 44:24

where some of the exports are actually very long
44:24 - 44:30

you can see that actually, here I'm taking over the disassembly
44:30 - 44:33

so I'm repeating the same fake opcodes and address
44:33 - 44:36

so you fool the disassembler that way
44:37 - 44:40

I think it's just a visual effect, they are no big problems
44:40 - 44:43

but it's a known problem that was fixed recently in IDA
44:43 - 44:47

that if you put an export in the middle of the instruction
44:47 - 44:49

the fake export will actually take over the disassembly,
44:49 - 44:52

and that would ruin the disassembly
44:52 - 44:56

there's actually a PoC for that in Corkami, of course
44:57 - 45:00

so, that's all for the demos
45:05 - 45:09

so, I wanted to know more about x86 and PE
45:10 - 45:12

which are far from perfectly documented
45:12 - 45:14

and are still not perfectly documented,
45:14 - 45:18

but at least, I've been covering some parts of it,
45:18 - 45:20

there are still some gray areas,
45:20 - 45:23

but at least, every day, I'm just learning a bit more,
45:23 - 45:26

and publishing my results and sharing them openly,
45:27 - 45:31

like WinDbg, if you follow only the official documentations,
45:32 - 45:36

you will only get bad results, with malwares and packers out there,
45:36 - 45:40

if you - yourself - are interested, or you develop a tool, an emulator, an engine, whatever...
45:40 - 45:44

well you know you can just visit Corkami, read the pages,
45:44 - 45:48

download the PoCs, which are [freely] available,
45:48 - 45:50

and if you find any bugs - which might happen,
45:50 - 45:54

then send me a postcard, or a red-cross T-shirt
45:57 - 46:01

Thanks to Peter Ferrie, and all my reviewers, and people who contributed...
46:01 - 46:02

do you have any questions ?
46:03 - 46:10

did you ran them through AVs - antivirus scanners? you would find a sh*tload of 0days
46:10 - 46:22

no, then, I wouldn't be good to actually turn them into exploits or anything, so...
46:23 - 46:29

already breaking all the disassemblers and stuff was good enough for me
46:29 - 46:33

I found a crash in Intel XED, which was good enough
46:40 - 46:44

any other question? everybody survived the presentation?
46:45 - 46:47

it's a great talk, man
46:47 - 46:48

thank you!
46:48 - 46:50

THANK YOU! [for watching]

Title:: #days: Ange Albertini: Such a weird processor - messing with x86 opcodes
Description:: #days Security & Risk Conference: Ange Albertini: Such a weird processor - messing with x86 opcodes

more » « less
Video Language:: English
Duration:: 47:07

	ange.albertini edited English subtitles for #days: Ange Albertini: Such a weird processor - messing with x86 opcodes
	ange.albertini edited English subtitles for #days: Ange Albertini: Such a weird processor - messing with x86 opcodes
	ange.albertini edited English subtitles for #days: Ange Albertini: Such a weird processor - messing with x86 opcodes
	ange.albertini edited English subtitles for #days: Ange Albertini: Such a weird processor - messing with x86 opcodes
	ange.albertini edited English subtitles for #days: Ange Albertini: Such a weird processor - messing with x86 opcodes
	ange.albertini edited English subtitles for #days: Ange Albertini: Such a weird processor - messing with x86 opcodes
	ange.albertini edited English subtitles for #days: Ange Albertini: Such a weird processor - messing with x86 opcodes
	ange.albertini edited English subtitles for #days: Ange Albertini: Such a weird processor - messing with x86 opcodes

Show all

English subtitles

Revisions

Revision 9

ange.albertini

#days: Ange Albertini: Such a weird processor - messing with x86 opcodes

Revisions

Our website uses cookies

Operating cookies (Required)