ItsMods

Full Version: Reverse Engineering: A Beginner's Guide to x86 Assembly
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
NOTE: I did not make this. Credits go to "Guy" from gamerzplanet
i thought it was a good explanation on assembly so i decided to share



Assembly is considered the bottom of the barrel of programming languages - it's considered as low-level[24] as you can go with a programming language. But, as all executables must utilize assembly one way or other, this is also why it is considered very powerful when attempting to learn what is done in a specific executable. For example, if one program encrypts certain types of files, and you need to learn how the encryption algorithm[25] is done, then you would disassemble[26] the program. From there, assuming you know assembly, you may be capable of understanding what the program does (More importantly, what that algorithm is, which would allow you to write a decryption algorithm).

Assembly uses hexadecimal numbers, so it should be understood the number system is organized as follows:

0 = 0, 1 = 1, 2 = 2, 3 = 3, 4 = 4, 5 = 5, 6 = 6, 7 = 7, 8 = 8, 9 = 9
A = 10
B = 11
C = 12
D = 13
E = 14
F = 15

(The above shows numbers from base 16, the hexadecimal system, to base 10, the standard decimal system)

Firstly, assembly is entirely about data manipulation (In general, that's all programming is - manipulating data, effecting hardware to do what you want). To be put simply, usually three things are being modified:
1) The stack
2) Registers/Flags
3) The memory of a program

Now, to explain what the above:
1) The stack is a large stack of numbers, manipulated for handing off parameters[9] to functions[9], storing the registers, and storing other miscellaneous data.

2) Registers are typically used for completing varying operations (Comparing data, arithmetic functions[27], logical operations[18], etc; these type of registers are dubbed "general purpose registers"). Usually, they'll store certain types of numbers/addresses[19], from as low as 4-bits, all the way up to 32-bits (It's possible to go higher than 32-bits, but, most users won't encounter situations where that will be necessary to know). Flags are used for marking registers for different purposes (e.g.: The overflow flag, or OF, will set itself to the number 1, from 0, if an operation[4] using that register is larger than the space that the register can handle; so if you're using a 4-bit register to handle 32-bit data, the OF flag would be set to 1).

3) Varying data in the program is constantly being modified, as the stack and registers can handle only so much data at once, in many cases, it's more efficient to leave some data modification in the program itself (Though it should be noted, this is only done in memory; meaning, if you were to modify the program to display a random popup every 15 minutes while it was running, the moment the program were exited, when you re-open it later, the popup would no longer appear).


Modifying the stack is done through a number of ways, the most common being using PUSH and POP instructions.

In assembly, each line is an instruction[4], limited to at most three parameters, and as little as none.

The PUSH instruction accepts one parameter, which is added to the top of the stack. For example:

Code:
Code:
PUSH 5
The above would push the value 5 onto the stack, so that it would look like this:

Code:
Code:
00000005
Now, it should be mentioned, usually a stack base pointer (Another type of register, which will be explained further later on) is pushed onto the stack, to act as a reference point for modifying the stack. Therefore, in the beginning of most functions/programs, you'll find the following line:

Code:
Code:
PUSH EBP

Which simply causes the stack to start looking like this:

Code:
Code:
00000000
From there, if I can push my data onto the stack:

Code:
Code:
00000005
00000000
Or, I can save one of my registers by using POP:

Code:
Code:
POP EAX
(NOTE: EAX is an example of a 32-bit register - a full list of available registers and what each one is used for will be covered later).


Assuming the value of EAX was 7C90FFDD, the stack will look like:

Code:
Code:
00000005
00000000
7C90FFDD


That covers standard modification of the stack - we'll cover more later, such as how functions access certain portions of the stack for parameters being handed off, etc.

There are many varying types of registers, but to explain the bare basics, we'll start with the general purpose registers. It's necessary to note, the following are all prefixed with the same letter to represent that they are extended registers (32-bit). Therefore, the 16-bit register for EAX is AX:

EAX - Accumulator Register
EBX - Base Register
ECX - Counter Register (Used for looping[20])
EDX - Data Register (Used in multiplication and division)
ESI - Source (Used in memory operations)
EDI - Destination (Used in memory operations)

The above registers can also be accessed in different portions by their 16-bit and 8-bit equivalents; for EAX, as the 16-bit is AX, the 8-bit registers are AH and AL - therefore, for (E)BX the 4-bit registers are BH and BL, etc. When referencing pointers, it may be important to keep in mind the different registers.


Modifying registers is essential for loading data from/to the stack or from/to data in the program memory. The most used instruction for loading data into a register is the MOV instruction.

To load what's stored at the address[19] 01009000 into register EAX:

Code:
Code:
MOV EAX, DWORD PTR DS:[01009000]
One new thing was introduced on top of the MOV instruction and the EAX register: DWORD PTR DS:[Address]

DWORD is a 32-bit value. PTR stands for "pointer", meaning that the data at address 01009000 is being loaded, not the number 01009000. DS stands for "data segment", meaning the loaded value is from the .data section.

To expand, there are four "segment registers", pointing to the segments in the executable:

CS - Code Segment (References anything in the .code section)
DS - Data Segment (References anything in the .data section)
SS - Stack Segment (References the stack)
ES - Extra Segment (Rarely used)

There are also three pointer registers (One of them earlier was already referenced, EBP):

EBP - Base Pointer
ESP - Stack Pointer (Offset to the EBP - "points" to the EBP)
EIP - Instruction Pointer (Points to the address of the next instruction)



Now, apart from the MOV instruction, there is also the LEA instruction. The LEA instruction (Load Effective Address) is slightly slower, and ends with slightly larger code. It's used in preparing the loading of pointers[29] into registers, allowing even math operations to be used (NOTE: Where as MOV can load data into memory, LEA is limited to only modifying registers).

The use is identical to MOV:

Code:
Code:
LEA EAX, DWORD PTR SS:[EBP-4]

Note the use of the stack being referenced - [EBP-4] means to go to the stack pointer and access the line directly above it.

A better example of LEA would be:

Code:
Code:
LEA EAX, [EAX+EBX*4+256]
Note the use of multiplication via the asterisk, and even addition between registers.


Now, onto the easy math operations:

ADD destination, source - Adds the "destination" and "source", leaving the result on the "destination"
SUB destination, source - Subtracts the "destination" and "source", leaving the result on the "destination"

SAL destination, source - Shifts the destination to the left source times (e.g.: 15 shifted once to the left would turn into 5, but shifting once to the right, and the number would still be 5).

SAR destination, source - Shifts the destination to the right source times (e.g.: 15 shifted once to the left would turn into 1, but shifting once to the left, and the number would be 10).

INC destination - Increment the destination (Add one to the given value)
DEC destination - Decrement the destination (Subtract one to the given value)


The final important factor in the basics of assembly are conditional statements (If condition then statement, if not condition then statement, etc) and looping[20].

For comparing data, the CMP instruction is used:

Code:
Code:
CMP EAX, 1
Now, the comparison has to end up somewhere, and the possible outcomes are different types of jumps. If EAX is greater than (Or equal to), less than (Or equal to), and equal to (Or not) the number 1, then a jump to a specific address is made. If not, nothing is done.

e.g.:

Code:
Code:
CMP EAX, 1
JE 00401000
jge -Jump if they're greater or equal ; This will not work on negative registers
jg - Jump if they're greater than ; Neither will this
jle -Jump if they're less or equal ; ..this..
jl - Jump if they're less ; ...Or this
jne - Jump if they're not equal ; This conditional jump and all the following will work with both negative and positive numbers alike
je - Jump if they're equal
jne - Jump if they're not equal
jae - Jump if they're above/greater than or equal
ja - Jump if they're above/greater than
jbe - Jump if they're below/less than or equal
jb - Jump if they're below/less than

The other operation for comparing two numbers is the TEST instruction, which is identical to an AND[18], but rather than storing the result, the next instructions will check if the result of the AND was zero or one.

JZ - Jump if the result was zero
JNZ - Jump if the result was not zero (Meaning it was one)

e.g.:

Assume EAX is 00000001

Code:
Code:
TEST EAX, 1
JNZ 00401000
Since the value of EAX is 1 and the comparison value is 1, the jump will not occur.

Now, these tactics can also be used to repeat steps, for example:

Code:
Code:
0100739D   MOV EAX,0
010073A2   CMP EAX,5
010073A5   JE 010073B1
010073AB   INC EAX
010073AC   JMP 00401000
010073B1   RETN
The EAX register is set to zero, then EAX is compared to 5 - if EAX has the value 5, it jumps to the RETN instruction[21], to exit the function. Otherwise, the executing continues, and INC EAX is called, to add 1 to EAX repeatedly, until eventually, EAX is 5, and will jump to the RETN.



And that's the basics of assembly.
I don't gonna read that... Wink
But ok!