giovedì 13 dicembre 2012

Assembly programming using gcc (part 1)


First of all, why in the hell do we need to know about assembly programming when we have compiled high level language?
First of all, because it is fun :)! Then, instead of why, we should ask when do we need to use assembly instead of high level languages?
In fact, assembly language is used primarily for direct hardware manipulation, access to specialized processor instructions, or to address critical performance issues. Typical uses are device drivers, low-level embedded systems, and real-time systems. We'll not address all these issues but we'll focus only on how to program pieces of code in assembly  on a x86 hardware using the gcc tool chain.
How is it done? There are two ways:
  1. Inline assembly: Assembly instructions are written directly in the C code file
  2. Separate assembly code file: .S file containing only assembly instructions, assembled and linked like another C file.

X86 Assembly and the gcc tool chain

GCC uses AT&T assembly syntax. AT&T syntax uses the opposite order for source and destination operands. For example, to move the constant 1 into the eax register we have:
AT&T] movl $1, %eax                     Intel] mov eax, 1
Note that:
1) Register operands are preceded by the "%" character, including sections.
2) Immediate operands are preceded by the "$" character.
3) The size of memory operands are specified using the last character of the opcode. These are "b" (8-bit), "w" (16-bit), and "l" (32-bit).

Inline assembly

In C syntax, the inline term is used to instruct the compiler to insert the code of a function into the code of its caller at the point where the actual call is made. Such functions are called "inline functions". The benefit of inlining is that it reduces function-call overhead. In particular, inline assembly is a way to tell the compiler to insert, in a determinate position, a fragment of assembly code without trying to compile it.
asm() is the key  statement to inject assembly code in a C program. let see its syntax:

 asm ( assembler template 
           : output operands                  /* optional */
           : input operands                   /* optional */
           : list of clobbered registers      /* optional */
           );
where:
  • assembler template is a list of quoted assembly statements terminated with the '\n';
  • input and output operand are what you can guess they are;
  • clobbered registers are a list of register that have been somehow modified by the inlined assembly code.
Let see an example. Imagine you would write an extra super fast C function that perform the addition of two integers. Here is the code:
int MyAdd(int x, int y) { // add x and y and return the result
int z;
  asm( "addl %%ebx, %%eax;\n"
      : "=a" (z)
      : "a" (x), "b" (y) );
  return z;
}
The assembly statement addl %%ebx, %%eax; add the content of the ebx register with the content of the eax register and put the result in eax.
The first colon is where the output operand is to be described. "=a" (z) means that the output is the register eax and that the content of this register must be written in the z variable. The '=' sign is used t indicate the output read only operand. 'a' is called a constrain. Other constrain are:
r : any register
b : %ebx, %bx, %bl
c : %ecx, %cx, %cl
d : %edx, %dx, %dl
S : %esi, %si
D : %edi, %di
The second colon is the input operands section of the asm statement. Here are listed all the input operands with the same syntax used for the output operand. More than one operand can be specified separating each one with commas.
In the next example, a personalized integer division function,  we'll see another method to use C variables inside the inline assembly.   here are two more methods to address program variables from the inline assembly code.

int MyDiv(int arg1, int arg2, int quo, int rem) { 
  asm ( "movl $0x0, %%edx;"
        "movl %2, %%eax;" 
        "movl %3, %%ebx;" 
        "idivl %%ebx;" 
        : "=a" (quo), "=d" (rem) : "g" (arg1), "g" (arg2) );
}

The meaning of this function is straightforward. First put 0 in the edx register (the remainder), then load the two operand  arg1 and arg2 in eax and ebx respectively. Finally, idivl perform the integer division and put the quotient and the remainder in the eax and edx registers respectively. It easy to recognize the use of %n to represent the n-th variable of the input and output section of the asm statement.

And, if we don't like AT&T syntax? Don't worry gcc came to our rescue with the directive ".intel_syntax noprefix;". To use intel syntax you can add this directive at the beginning of the inline assembly code. Do not forget to put things as they were before adding at the end the directive ".att_syntax noprefix " to switch back to the AT&T syntax. Otherwise, the assembler will try to assemble the rest of the program using Intel syntax while gcc will continue to produce AT&T code.

Good inline assembly programming :).

Nessun commento:

Posta un commento