Sunday, March 14, 2010

X86 Machine Code

A while ago someone asked how to encode/execute x86 machine code, I replied with a small tutorial on the subject, which I thought I should expand and share here

X86 instructions can range from 1 byte to 15 bytes long, an example for 1 byte instruction is the NOP (no operation instruction 10010000b or 0x90h), in general/from a bird's eye view x86 instructions format looks like this:
byte: 0,1,2,3   4,5    6        7              8,9,10,11        12,13,14,15
func: prefix    opcode reg/mem  scaled indexed mem displacement imm data
*Although this is 16 bytes long, the actual instruction can not exceed 15 bytes because some bytes are mutually exclusive.

Now lets encode a simple x86 instruction "mov $0x8888 , %eax". Depending on the operands, memory addressing scheme etc... the MOV instruction can vary, here we want to move a 2 byte immediate operand (0x8888 ) to register EAX which takes the following format:
opcode    8/16bit    eax    byte0     byte1
1011      1          000    10001000  10001000 or 0xB88888

Now comes the fun part, executing the instruction :)  I will use a simple C program to call the code, and watch the results with gdb, there are two ways to call this code, the first and easy way is to just call it ! casting the buffer to a function pointer and calling it.

The second way, which is harder but more educational, involves poking around the stack a little bit, I won't fully explain it, but basically, we will call a function, which will change its own return address on the stack to the address of the opcode so that the execution continues at the buffer (think buffer overflow):
char *opcode = "\xB8\x88\x88";
void run()
{
    long *ret;
    ret=&ret+2;    /*return address on stack*/
    *ret=(long*)&opcode; /*now run() will return to opcode*/
}

main()
{
    //((void(*)(void))opcode)();
    run();
}
Compile with:
gcc mov.c -o mov -ggdb
This is the complete gdb session:
(gdb) mov                 //run with gdb  
(gdb) break run           //set break point at run() function 
(gdb) display /i $pc      //add a display to see the inst mnemonic 
(gdb) run                 //run the program  
(gdb) nexti               // skip instructions until you see ret
(gdb) nexti
0x0804835e in run () at mov.c:7
0x804835e <run+26>:  ret  //return to the opcode address
(gdb) nexti
0x0804954c in opcode () 1: x/i $pc
0x804954c <opcode>:  mov  0x8888,%eax  //Finally our hand coded instruction
(gdb) nexti               //one more nexti to execute the instruction 
(gdb) info registers      //dump registers  
 eax 0x8888       34952   //and eax now holds 0x8888 !
 ecx 0xbf877640  -1081641408
 edx 0xbf877620  -1081641440  

That's it for today, I hope this small tutorial has inspired you to start experimenting yourself.
Read more ...