-
Such a weird processor - messing with x86 opcodes... and a little bit of PE [Portable Executable]
-
So welcome. ...And especially let me know if I speak too quickly. Um, so -- who I am -- oh, yes so
-
I will talk about opcodes and a little bit about the PE [portable executable] file format and their oddities. So, I've been
-
a reverse engineer for some years, for some time. I created a project called Corkami.
-
Also in the past I worked on the MAME arcade emulator, and professionally I am a malware analyst, but
-
this is only on the behalf of my hobbies, this is my own experiments and research at home.
-
So, I introduced Corkami. Corkami is just the name of the project I created for RCE project.
-
I tried to keep it just to the technical stuff, no ads, no login required.
-
Really direct to the good stuff.
-
I try to update it and make it useful, so I also created cheat sheets and the kind of easy documents
-
that I would use for work on a daily basis,
-
but it's only a hobby; I do that once the kids are asleep
-
and late at night so it's probably doesn't look professional
-
and as good as I would like it to be.
-
So right now, Corkami, the form of Corkami, is wiki pages and cheat sheets
-
and I focus on creating as many as possible relevant proof of concepts [Hi Bob!]
-
so the binaries are hand-written, usually I don't use a compiler, I create the PE (structure) myself
-
so that it's only focusing on the exact interesting point
-
and you don't have a lot of noise even -- you don't probably
-
need IDA to actually understand what's going on
-
because I try to focus only on what's important.
-
The binaries are all directly available to download so you can
-
really test your debugger, your tools, your knowledge
-
and just get them directly from that.
-
So far, I've focused on the PDF, assembly and the PE..
-
...file format. A few other stuff, but that's mainly the most
-
covered subject of my website. And I share that with a
-
very permissive license so BSD you can reuse them commercially
-
whatever. Even the images are done in open-source format.
-
So the story behind this presentation is that some time ago
-
I was young and innocent and I thought that CPUs, being
-
electronic - whatever - they had to be perfectly logical and no problems
-
and then I was tricked by malware. And basically
-
IDA wasn't able to work on it, so I decided to go back
-
to the basics and study assembly and PE files from scratch.
-
I created in the meantime documents on Corkami
-
and now I'm presenting you more or less the final results.
-
or the good programs results. If I wasn't -- if I was just a
-
guy who learned assembly I probably wouldn't be in HashDays
-
to talk about it, if I didn't get a few achievements from
-
various tools. So basically I failed all the disassemblers that I tried
-
and I also created a few crashes - in IDA. I insist that all
-
the authors were notified and most of the bugs are already fixed, but
-
basically it was like this in 6.1 -- you get a direct crash -- but
-
now it's fixed in 6.2, and everything.
-
And Hiew [Hacker's view] - that's the latest version - but the newest and released,
-
- well, the newest beta - fixed that and so on.
-
So the agenda for the presentation is that I first try with
-
an easy introduction, but I assume that most of you already know or are familiar with disassembly, right?
-
Yes. And another question: are you all familiar with
-
or you already had an event of undocumented disassembly in your ... or never?
-
Like, you trust IDA and that's all.
-
Like, is it a common thing to have an undocumented disassembly in IDA?
-
Raise you arms -- okay, not so much.
-
Okay. So then after the introduction (that will go quickly),
-
I will mention a few tricks, then introduce CoST, the program that I created.
-
And I will also talk a little bit more about the PE file format.
-
So as you all have assembly knowledge I will go quickly on that.
-
So basically, you compile a binary, there is assembly, there is
-
some relevance, some common points between the [source] code and the assembled [generated] code.
-
Then of course there is a relation between the opcode and the [assembly] code, you all know that.
-
What is important is that the assembly is generated by the compiler, but actually what is
-
then from the assembly what is -- what's only kept in the binary are the opcodes itself which are understood
-
directly by the CPU, which means the CPU just knows
-
what to do with the bytes, it doesn't care if you or the
-
tool you're using know what it will do, because it just does it.
-
And the problem is that what we read is not usually the opcodes for most people but actually the disassembly
-
and if the disassembler doesn't give you any result, well,
-
we're stuck, we're blind, we don't know what execution will do.
-
And the other problem is because of the opcode length you
-
don't know what the next instruction will be because you
-
don't know how to disassemble it.
-
So, here I just create one undocumented opcode in a simple program.
-
So basically we just '_emit' -- [it's] a keyword in -- that's Visual Studio 2010 ultimate --
-
you will get a byte that is unidentified at disassembly
-
so you get question marks, so basically this program
-
even though it costs several thousand dollars is not able
-
to -- it doesn't know what will happen.
-
So usually if you do that... Oh, yeah, if you check the Intel documentation
-
there is nothing to see at the D6 opcode, there is nothing to see there.
-
Microsoft doesn't say anything, Intel doesn't say anything,
-
so usually if you try that you could expect bad results.
-
So, not documented, directly: usually it is a crash or not the expected result.
-
But here, in this case, this specific case, no problem.
-
We don't know what is was, if we follow Intel or Microsoft documentation, we don't know what happened.
-
But if we -- the CPU just does its stuff. So what happened is that actually
-
D6 is a very simple opcode, that doesn't do much, but somehow it's not documented by Intel
-
[but] it's documented by AMD, and most of the opcodes are actually documented by AMD
-
but not Intel. I don't know why, if anyone has any idea why...
-
It's quite a trivial opcode, but it's not -- Intel still says there's nothing there. Okay.
-
So it's commonly used, the common use for those undocumented opcodes are malware
-
and packers, just to prevent automated analysis or easy reverse-engineering.
-
What's funny is that, Intel, if you follow the documentation you will have many holes, but Intel's own disassembler,
-
Xed, which is free of use, it is not open source, but just handles
-
all these opcodes correctly, while Microsoft, and Visual Studio, and WinDBG, they follow blindly the documentation.
-
So you will get question marks even though Intel knows perfectly what it does.
-
So it's like "[...] do as I disassemble and don't read my documentation."
-
So - of course - you could argue that WinDBG is only made to debug what the compiler,
-
Microsoft compiler created, but then it kind of rules out WinDBG as a malware debugging tool,
-
because you just inserted D6, it's trivial, and WinDBG is just not able to tell you what the instructions
-
are. So it's not very useful for malware analysis -- for a malware analysis debugger
-
So, another problem that happens is that of course each of the
-
undocumented things, facts, are available, maybe one
-
you will have in a trojan, one in a packer, and everything, but it's not so easy
-
to find a good, exhaustive, clean test set to actually
-
gather all these undocumented facts, so for example if you
-
so, for example, someone says - a colleague - mentions an undocumented
-
opcode or behaviour, and then you say "oh yeah, it's
-
in MebRoot [MBR infector], or you skip this part of the file or whatever",
-
and then you are actually, you know first it's a malware so you have -- you cannot
-
really spread that, and then there is a lot of noise -- the malware payload or something before and
-
after -- so it's not so easy to analyse. So that's why I focused on creating a small and clean test
-
set that would actually provide --- insists just on one particular instruction or fact.
-
So, now let's start, at last, the real stuff, and a few of the undocumented opcodes.
-
But before I actually started [studying], [I was] wondering what the actual possibilities of the CPUs, I didn't even know
-
what are the possibilities, what are the opcodes that are still supported or not by the -- by the CPU.
-
And I think it's a bit like English, everybody, or most people in the world, would be able to read and
-
understand these words, and if you['ve] see[n] some disassembly [before] then well you are used to seeing these opcodes,
-
they are made by all the compilers and they are so common that if they are not here then we are a bit
-
ill-at-ease, and if it's something different then we probably would be surprised.
-
So this is standard English, but the Intel CPUs were made in the 70s, so it'd be the same as if you take
-
Shakespearean English, so you could say that it's still English, but mmm... You know, I don't know what that means actually...
-
or maybe I forgot, I quickly forgot at least, and it's a bit the same
-
for those opcodes which are still supported by all the CPUs that we have -- all the Intel CPUs -- but
-
we probably don't know what they actually do, and that's a problem.
-
I actually made, one of the proof of concepts that I made was only using these old opcodes, and these
-
old opcodes are actually doing something, so if someone is familiar with reading that, maybe I should
-
ask "how old are you?", because myself I am used to the PUSH/JUMP/CALLs, but when it's about this,
-
mmm... what is exactly being done. And it's still working on an i7, and it's still usable by malware,
-
packers or anything, and yet some of them are -- totally unused now and they are still fully working on
-
modern CPUs.
-
And of course, it's a bit like English, it's an evolving language, and a bit like maybe the oldest generations
-
of people -- of humans wouldn't be used to the buzzwords - the latest buzzwords.
-
These opcodes are sometimes present in the most recent CPUs, so, and you have direct opcodes for
-
CRC32 or AES decryption, string matching, and then some complex operation, in just one opcode.
-
So this, this is possible, this exists in modern CPUs. Not all of them, of course.
-
One thing that I like is the MOVBE -- move big endian -- opcode, because move big endian is the rejected
-
offspring, it's only implemented in the Atom CPU, which means this netbook has -- supports this opcode
-
and the i7 64-bit doesn't have this opcode, even though it will have CRC32 or maybe AES [op]code, so...
-
so much for complete backward compatibility.
-
There is no physical CPU as far as I know that can emulate -- execute CRC32 and MOVBE.
-
And of course, MOVBE is quite meaningless itself because you already have an opcode for the big --
-
endian-ness swapping. So I don't know, this small computer has an opcode that most PC's don't.
-
Okay. Why? I don't know. If you know...
-
[Audience member:] "Is this opcode documented in the CPU feature set?"
-
Yeah.
-
Yeah, it's totally -- this MOVBE -- it's totally documented, it's official.
-
[Audience member:] "But, no; is it like a CPU flag just for this instruction or is it implicit by 'this
-
is an Atom CPU'?"
-
Uh... Yeah, I don't know. I check the value by CPUID but I don't know if it's relevant to the... but
-
I think it's by itself. ...but the CPUID result is so big that I don't remember it all.
-
Uh, another thing, a bit specific to Windows in my case, because I focus on malware, is that before you do
-
actually any opcode, I was focusing on what are the register values when you start a program, and I found
-
out that the register values by default when you start a program and you haven't executed, theoretically, any opcode,
-
- theoretically- actually gives you some information that are actively used in malwares.
-
So for example, at the start point, EAX gives you either gives you if it's older generation (XP or before),
-
or Vista or later.
-
This is not so used by malwares, I don't recall seeing it, but GS, if GS is null, then it's a 32-bit
-
system, and if it's not it's a 64-bit system.
-
I will actually use that later in one of the tricks.
-
And also, the relations between the registers -- there are many registers on the Intel CPUs -- is not
-
sometimes very clear. I was surprised that when you do a FPU operation, it changes the FPU status, the
-
FPU registers themselves, but also the MMX registers, and somehow all the documentations I saw on the
-
internet are always mapping ST0 and MM0 in front of each other which makes sense, but actually if you
-
modify -- if you just do a single FPU operation, it will actually modify not MM0, but MM7.
-
So if you do an FPU operation like "load PI" [FLDPI] and then you check the value of MM7, that could be used
-
as a trick or it's just like the way it is.
-
And like, all the documentations, wikipedia and so on, that I could find about the overlapping of the registers.
-
Another thing is that this was used as an anti-emulation trick in XP, that FPU also changes CR0
-
so you have quite an unexpected anti-emulation trick by just using FPU operation.
-
So here is it; basically 'store machine status word' [SMSW] is an older 286 CPU opcode -- mnemonic, that was
-
created at the 286 era, so before the protected mode was fully created, and so it allows you to access
-
to read the value of CR0, even from user mode, while the 'MOV CR0' is actually a privileged opcode.
-
For some reason, the higher word of the register is undefined officially by the documentation, so Intel
-
just says "this is the value -- the lowest value is correct but you cannot expect the real value". So for
-
some reason, I don't know why they say that, because it's actually the value - the higher bits - of CR0.
-
And under XP, when you do FPU operations, the value of CR0 will be modified, and eventually reverts
-
by itself. So you can have, just by doing -- SMSW, and then you expect the result, then
-
you do a FPU operation, then the result should be different, and then eventually the result will revert
-
to the original value. So it's quite a tricky and unexpected anti-emulator.
-
You have a similar trick on 32-bit Windows, where GS is not stored in the context, so it means that on
-
thread-switch the value of GS is lost, which means if you just wait for something, GS will eventually
-
reset to 0. So if you set GS and you are stepping manually, this is slow and this creates a thread-switch,
-
so instantly GS is lost. And also, like the previous trick, if you just wait for GS not to be...
-
if you just loop until GS is not 0, this on a real system, will eventually exit from the loop.
-
But the first time, it blew me, I was really wondering what can happen there, there's no other thread
-
and of course in my proof of concept, it directly starts like this. What happens? What should happen now ,
-
but on a real system? Eventually, it's reset to 0.
-
Another thing is that of course it's reset to 0, but not in 0 time, so if you do wait for GS's reset
-
and then another loop, this can only happen between two resets... thread switch, which means it should
-
take a minimum of time, so you can use that for timing -- anti-emulation timing tricks.
-
Of course, I was also thinking that NOP is perfect, because NOP is NOP, it does nothing.
-
But originally NOP is 'exchange eax with eax' [xchg eax, eax], or 'ax with ax', but the problem is that NOP [encoded as] 0x90 is always doing nothing,
-
but on 64-bit you always have, you have another encoding [87 c0] to do an 'exchange EAX AX' which this time again
-
doesn't do anything on 32b, but like all the other opcodes
-
in 64b mode, it actually resets the higher DWORD
-
so you have an XCHG EAX [,EAX] that does something,
-
even though at first it looks like it would do nothing
-
but hopefully in this case the 90 NOP is still doing nothing
-
and this is probably now common in malwares and stuff
-
HINT NOP was the multi-byte nop
-
that actually gives a hint about what will be executed next, by the CPU
-
whatever the address here [in memory referenced HINT NOP]
-
it wouldn't trigger an exception
-
but as you can see, it's really a multi-byte opcode -- it can be a very long nop
-
that's weird to say
-
another thing is, once again it's partially undocumented by Intel
-
the full range of HINT NOP encoding is bigger on AMD documentation
-
and another thing is that, because it's a multi-byte opcode
-
if you - at the end of a page - insert those bytes
-
then it will look for the operands
-
then it could trigger an exception,
-
so it's a nop that could trigger an exception if at the end of the page
-
so, thank you Intel -- or whatever, I don't know, I'm not sure
-
MOV, once again, I thought...
-
MOV being MOV, should be perfectly logical
-
sadly not... first... all this is documented, but it's tricky
-
because -- there were even bugs for that in all the disassemblers I tried, I think
-
well, except Xed, maybe
-
you cannot do MOV on or from CR0 on memory
-
so the documentation says that the Mod/RM is ignored
-
it doesn't mean it's illegal, it's just ignored
-
so if you do this, which could lead to a crash
-
it's actually interpreted as that
-
and as far as I can remember, you'd fail all the disassemblers with that
-
until recently [ ;) ]
-
MOVSXD is a 64b opcode, is sign-extending, so theoretically
-
it should work from a smaller register to a bigger register
-
but if you use no REX prefix, which is discouraged
-
you can actually make it work like a standard MOV,
-
and the other way around,
-
MOV from a selector to a 32b register actually works
-
so many disassemblers were disassembling that as MOV AX, CS
-
because that would make both operands the same size,
-
but actually the upper word of the target register
-
is 'undefined' but actually there is no funny thing here,
-
there's no random value, it's zeroes
-
so basically, it makes it equivalent to MOV EAX, CS
-
BSWAP is one of my favorite
-
because I think it's like an administration
-
it's supposed to just swap the endianness of the registers
-
but because of -- external reasons
-
it's never really doing the work you expect
-
so, only in 64b, it's actually correctly swapping the endianness
-
as you would expect
-
on EAX [32b], in 64b [mode], like all the 32b opcodes,
-
it will actually register [clear] the higher dword -- ok !
-
and, on word, it's actually 'undefined' again
-
but it's commonly used in malwares and packers
-
because it just resets [the register]
-
so it's like a XOR AX, AX
-
so, with this unexplainable result, I understand
-
that Intel probably doesn't want to explain -- just say it's 'undefined'
-
because they would be too ashamed to explain
-
why we get this funny result
-
BSWAP AX is also wrongly disassembled by WinDbg and so on
-
it will be disassembled as BSWAP EAX
-
and actually, you clear the register
-
can everybody understand this code?
-
anybody sees the potential trap?
-
so, it pushes the address of on the stack,
-
then RETN takes the address from the stack,
-
and, basically, you just jump to an immediate value,
-
execution ordering ?
-
yeah, the execution starts here
-
???
-
no -- ok, it's not the point here
-
and of course, if you -- this is OllyDbg 1, it's fixed in OllyDbg 2
-
but OllyDbg1 is even trying to be nice,
-
telling you -- this is an automatic comment -- that RET
-
is used as a jump to
-
and, as you can see, not exactly the same [happens]
-
so, what happened ?
-
no one sees ?
-
so, basically, here, you have a 66 prefix on RETN
-
which actually makes RETN to IP, and not EIP
-
so, actually, you don't jump to 401008, but to 00001008
-
and in this proof of concept, I mapped the NULL page
-
and I created -- added some code at this address
-
so, this is actually not a return to this []
-
but the problem is that, officially, this is also called a 'return'
-
it's not [different from the standard one] -- the disassemblers added their own, now, way of disassembling it
-
like 'small retn', ret.16, or something like this
-
but actually officially, it's the same mnemonic
-
so, the latest Hiew, I think, and that's OllyDbg 1
-
maybe the latest OllyDbg 2 fixed that
-
but you can still be tricked just by that
-
the 66 prefix - the jump to IP - also works on CALLs, RETs, LOOPs, [and JMPs]
-
so all the flow control opcodes
-
so, I won't enumerate all the tricks,
-
because otherwise you'll die of boredom probably
-
if you want more, then I created a page on Corkami [x86.corkami.com],
-
and I already made some graphs and cheat sheets
-
to have an easy [table] -- list of opcodes
-
and, that's quite too much theory for now...
-
So, I don't like just -- reading stuff and not having something to feed my debugger
-
so I created CoST
-
which stands for Corkami Standard Test
-
CoST is a single binary, there is no option,
-
you just run it, and it will just execute a lot of different tests
-
and then, I also made it a hardened PE,
-
so it may also help you to test the PE side of your tools
-
or your knowledge
-
but, because in hardened PE, it's actually quite difficult to debug,
-
I also made an easy PE mode so that
-
you can study only the assembly, and not have too much troubles
-
debugging it
-
so, CoST contains a lot of tests
-
classic stuff -- very trivial stuff
-
then, a few more complex stuff, like JMP to IP, IRET...
-
undocumented opcodes
-
CPU specific, like MOVBE, POPCNT, CRC32
-
also some detections of OS and VM by using common opcodes
-
like, the 'red pill trick'... yeah, just SLDT execution, and you get a value, and you compare...
-
but it's 'the blue pill', or whatever...
-
and also some OS bugs because sometimes, Windows XP
-
was doing the wrong job trying to tell you which was
-
the exception that just happened, and it would be a way
-
to make the difference between an actual OS and an emulator that would try to be logical
-
CoST is written in assembly, so, there's no extra
-
it's not compiled, it's not generated, but
-
to make it self-documented, I created internal exports
-
so that each section of the file is easy to browse [to],
-
so that you will know -- if you quickly want to jump to the 64b part
-
then it's easier via the exports
-
and also I wanted it to print messages in the most convenient way
-
so, if you keep printing messages, then it will make the assembly
-
wider, I mean longer to scroll, so I used
-
Vectored Exception Handling, and a fake opcode
-
so that you have the comments of what's gonna happen,
-
appearing directly in the code
-
so it's a kind of self-documented, without a debug symbols file
-
and, you saw, it doesn't have much of output
-
but actually it has a lot of debug output
-
like 100 -- I forgot -- messages. it's even saying '[trick] I'm gonna do this'
-
and then, 'i'm gonna do that...', so
-
trying to make it helpful yet a bit hard to disassemble
-
can anyone understand what this code is doing ?
-
this is one of my favourite
-
we can't see the opcodes
-
no, there's no [opcode] trick this time
-
so, basically you push some arguments on the stack
-
you jump to here
-
basically, with the return far [RETF]... I pushed 'push_eip' on the stack
-
with a 33 word
-
so basically I will RETurn Far to this
-
basically I will return back to this EIP in selector 33
-
if this is in a 64b OS, and this is a 32b process
-
you will return back to execution here, in 64b mode
-
because selector 33 is the selector for 64b mode
-
which you can access from a 32b process
-
so basically this code will be executed first in the current selector
-
as you see, and then it's executed back on selector 33,
-
which means in 64b mode
-
so you have the same EIP, you have the same opcodes
-
but the disassembly will be different,
-
and I chose some opcodes will make mnemonics
-
specific to each side, 32b or 64b sides
-
so, it's already quite a b*tch to disassemble
-
because, same EIP, so unless you're careful about the selector,
-
well, it's a problem
-
[Errata: you can debug this kind of code, check my berlinsides presentation (screencast on slide 58)]
-
http://bsx2.corkami.com , slide 58 [screencast]
-
if you run over it, you return to the original selector,
-
which is why there is the PUSH CS here
-
and you go back to with the original selector
-
execution will go through quickly
-
but you cannot step through that code [WRONG, you can with WinDbg+wow64exts]
-
so, killing the disassemblers, and the debuggers
-
and yet, simple
-
so, here is the result that you get when you run CoST
-
with the latest -- well the latest public version of Hiew
-
I think it's gonna be fixed
-
so, this is a HINT NOP that's not documented by Intel
-
and it's a bit forgotten by most disassemblers
-
so, WinDbg and Hiew are giving you
-
undocumented, well -- questions marks, or the Hiew style of question marks
-
then, since -- that was originally what I planned to present at Hashdays
-
but then, I decided to bring a few tricks in CoST itself, on the PE side of things
-
so, this is the header, so it has MZ, and then some text
-
so you can 'type cost.exe'
-
and it has some text - I made it type-able
-
and the NT headers - the 'PE' header, the one starting with PE
-
is actually starting at the bottom of the file -- the bottom of the file is here
-
so it's a footer
-
and I made it so the values are quite critical
-
so, they are not the one you would expect
-
so this is the result that you would get when you were
-
loading CoST under IDA 6.1
-
so, well, some values were random and everything
-
but, if you have -- with CoST, you can test and set the value of a register
-
then compare it
-
but you cannot test all the possibilities of PE files
-
with a single file, because you have to choose
-
so, for example, CoST has no section, weird alignments and everything
-
but you cannot make all the possible cases [in a single file]
-
so, I went on and I created another page on Corkami
-
with, as usual, the proof of concepts, some graphs about the PE files and everything
-
I don't consider it finished but I consider it good enough to break
-
a bit everything
-
now, I already created more than 100 PoCs, which try
-
0 section, big alignments, huge alignments, and I have some funny results...
-
so, here is the 'virtual section table vs Hiew'
-
so, when you're in low alignments, you can have no section,
-
or the section table can be empty
-
so basically, I made the SizeOfOptionalHeader point in virtual memory space
-
which means the section table is out of the PE file [full of 00, in virtual space]
-
and Hiew doesn't like this. A consequence of that it doesn't even think it's a PE file
-
while it's fully working, but this trick only works under XP
-
because Windows 7 is a bit more picky on the unused section table values
-
so when you got some ASCII art in the Data Directories
-
you can probably guess that there is something going on
-
if you have better ASCII art suggestion, I'm all ears
-
so, basically, this is the 'Dual PE header' that was presented by
-
Reversing Labs in BlackHat
-
so, are you familiar with that ?
-
so, basically, you extend the SizeOfHeaders so that
-
the NT headers will be actually mapped at the bottom of the file
-
so that when it's far enough to reach section [not file] alignment
-
and when you load that, in memory
-
the first section will actually be mapped over it
-
the first part of the OPTIONAL_HEADER is the one used on disk
-
so, this is what is used to check if the file will load
-
but the Data Directories are read from the values in memory
-
so, first, the OPTIONAL_HEADER is parsed, mapped in memory
-
then the section is folding itself over the bottom part of the header
-
and then the true Data directories that were originally
-
in the start of the section will be taken in account
-
so all this is garbage and visible on disk, it follows the SizeOfOptionalHeader
-
but actually in memory, this is not what is used to be parsed
-
another weird thing is that the export names can just be
-
absolutely anything, until a null character
-
which means, non ASCII, whatever
-
and another funny thing is that
-
Hiew displays them in line
-
so you can just add your own ads,
-
because those are just export names, and one of the export
-
[name] is actually more than 16 Kb
-
so that it's good enough to create a buffer overflow
-
if your tool is not careful about that
-
and it's also possible to have a NULL export [name], just a character NULL
-
and you can import a NULL API
-
no problem
-
I also just tried to see the different possibilities
-
created a few files that had the maximum number of sections
-
the limit is 96 under XP, and 64K under Vista and [Windows] 7
-
which means, well
-
OllyDbg 2 - the latest OllyDbg - gives you a funny message
-
but it still loads the file.
-
OllyDbg 1 crashes directly on this file
-
err...still some time ?
-
and the one last, not very visual, but I noticed
-
that the AddressOfIndex of the TLS is overwritten on loading
-
and imports - the terminator of imports doesn't need to be five null dwords
-
but only if the name [of the DLL] is 0, then the import descriptor
-
is considered a terminator
-
so, basically, if you make AddressOfIndex point to the name of an import descriptor
-
you could get that overwritten, and then the imports will be truncated
-
will be considered truncated
-
and actually, the behavior is different under XP or Windows 7
-
so, under XP, it's overwritten after imports loading,
-
so the whole imports table is not truncated,
-
while under Windows 7, it's happening before the imports are loaded,
-
which means you have the same PE, but different loading behaviour
-
under different versions of windows
-
and the file works on both versions of windows
-
oh wait, before that... maybe I still have some time ?
-
15 minutes left ? ok
-
I'll do the demo
-
This is just to prove...
-
sorry?
-
This is the kind of PE file that I typically create
-
I only defined [required] elements that just need to work
-
and this is actually a driver
-
so, even though I used some undocumented opcodes
-
It's a working driver and it doesn't have the usual
-
[compiler] stuff you have in a driver
-
just to say that this is the kind of PoC, clear to see
-
you don't have external stuff that bother, that bugs your view
-
or your debugging
-
so, this one is just to see the possible values of CR0
-
via the SMSW, theoretically undefined on DWORD
-
but it actually gives you the same value
-
[like] the standard MOV EAX, CR0
-
and here is MOV EAX, CR0 with the wrong Mod/RM
-
which, in the latest Hiew, is actually not disassembled at all
-
let's hope it doesn't crash...
-
so, as you can see, you get exactly the same value
-
whether you're using the normal CR0, the 'invalid' one, and the 'undefined'
-
the upper part is supposed to be undefined
-
usually when it's undefined, it's zeroes, in Intel language
-
but here it just works fine
-
and my machine didn't even crash
-
which means the driver is fine
-
so you can study small drivers
-
the first PoC that I presented here
-
was the one with old disassembly
-
anyone still knows what the value is?
-
so basically, some opcodes are here for garbage
-
just to prove that they are actually [supported], they are just used as junk
-
but registers are actually modified [in the others]
-
and these opcodes from the 70's, or something -- the early 80's
-
are still perfectly working on a modern CPU or even an i7
-
one of the PoC I created is the one that actually tests the values
-
-- the initial values [of each registers] -- so that you can see
-
what would be the possible values whether it's on XP or Windows 7
-
each time [TLS, EntryPoint, DllMain], I just save all the values of the registers
-
and then I compare them to possible values
-
so I test them one after each other
-
actually, on TLS, you have much more control of the values
-
because the values you will get in the TLS -- on loading the TLS
-
are the RVA [of the TLS data directory], the callbacks, the size of the TLS
-
you get that in -- I forgot exactly, but it's in the source...
-
running this will help you to mimic an OS better in your emulator
-
if that's what you're interested [in]
-
SMSW is actually the one comparing -- so, using SMSW,
-
then comparing the value, then checking whether the register changed
-
[after an FPU operation] and then when it reverts normally
-
a funny fact that I would like an explanation [for],
-
if you know it
-
is that actually, this behaviour is different if you run the file normally
-
or if you run it with a redirection
-
if you pipe the output, you get a 'fail' result
-
if you run the file normally, it just works
-
so, I would like -- here, I will just run it, and then I will run it to a file, and just TYPE the result
-
normal execution: OK
-
redirection: FAIL
-
if you guys have any explanation for that, I'm all ears
-
did you try redirecting to something else ? like, a COM
-
oh, I didn't try
-
so, you would pipe to another device, and ...
-
but then, how do you get it back ?
-
printer, or ...
-
yeah, I don't have a COM device or...
-
yeah, I don't know
-
but it was a big surprise, because I had a test bench
-
and then, 'FAIL'. .. uh ?
-
run, OK... so, I have no idea why...
-
the GS trick...
-
quite simple
-
and I also have some output
-
I modified GS then it's reset
-
then it's waited for result
-
then I'm doing 2 resets and checking the time in between
-
so that, it shouldn't happen too quickly
-
NOPs, so...
-
I'm testing the undocumented NOPs
-
testing the NOP that are on invalid page
-
so, standard NOP
-
32b nop
-
so, all my 64b tests are still done in 32b process so that you can run them on normal OS
-
then it detects via GS if 64b [mode] is available
-
and in this case, you would get a different result
-
so, if you run it on 64b, which I don't have here, you would get
-
the actual tests on 64b
-
and the results printed out.
-
but still, it's not possible to debug that easily [wrong]
-
but at least, there's no trick over there, so it's easy to bring back to a 64b process
-
[to step over 64b code and return to the 32b process]
-
PUSH/RET
-
you print the output, and then...
-
Olly nicely tells you that you will jump to 401008
-
but actually -- here the display is actually correct
-
and the TLS already created a null page
-
which prints 'FAIL'
-
so, as expected, but there is no standard way to disassemble that correctly
-
I can't execute the working 64k sections.
-
and actually I'm executing all the code [the complete virtual space of all 64k sections]
-
the sections are quite big
-
and I'm modifying EAX so that all the 00 00 are executed
-
and just to do a printf in the end.
-
it actually takes a few seconds to execute on an i7
-
so it's actually quite funny to see... you launch it... even when the cache is loaded,
-
and the OS is ready to be fast... you launch it... and printf comes a few seconds later
-
virtual sections is the one that Hiew doesn't think it's a PE at all -- this is the latest Hiew
-
well, it's been patched anyway
-
well, I can't browse PE now that it doesn't think it's a PE file...
-
but basically, it thinks that the OPTIONAL_HEADER points to the end of the file -- beyond the end of
-
the file
-
the folded header...
-
a few error messages...
-
because of the wrong data directories
-
and the actual DD are at the start of...
-
...the section
-
this would be the imports and the actual real DD
-
and last, the one with the TLS AddressOfIndex that is pointing...
-
...inside the imports, at the AddressOfName
-
so it will overwrite the loading [overwrite the pointer during loading]
-
and when you just load it, it just says 'it's XP' because
-
my imports were loaded this way, and not the other way.
-
and if you run that file [under W7], it will give you another results
-
and then, the exports...
-
where some of the exports are actually very long
-
you can see that actually, here I'm taking over the disassembly
-
so I'm repeating the same fake opcodes and address
-
so you fool the disassembler that way
-
I think it's just a visual effect, they are no big problems
-
but it's a known problem that was fixed recently in IDA
-
that if you put an export in the middle of the instruction
-
the fake export will actually take over the disassembly,
-
and that would ruin the disassembly
-
there's actually a PoC for that in Corkami, of course
-
so, that's all for the demos
-
so, I wanted to know more about x86 and PE
-
which are far from perfectly documented
-
and are still not perfectly documented,
-
but at least, I've been covering some parts of it,
-
there are still some gray areas,
-
but at least, every day, I'm just learning a bit more,
-
and publishing my results and sharing them openly,
-
like WinDbg, if you follow only the official documentations,
-
you will only get bad results, with malwares and packers out there,
-
if you - yourself - are interested, or you develop a tool, an emulator, an engine, whatever...
-
well you know you can just visit Corkami, read the pages,
-
download the PoCs, which are [freely] available,
-
and if you find any bugs - which might happen,
-
then send me a postcard, or a red-cross T-shirt
-
Thanks to Peter Ferrie, and all my reviewers, and people who contributed...
-
do you have any questions ?
-
did you ran them through AVs - antivirus scanners? you would find a sh*tload of 0days
-
no, then, I wouldn't be good to actually turn them into exploits or anything, so...
-
already breaking all the disassemblers and stuff was good enough for me
-
I found a crash in Intel XED, which was good enough
-
any other question? everybody survived the presentation?
-
it's a great talk, man
-
thank you!
-
THANK YOU! [for watching]