martedì 18 dicembre 2012

Assembly programming using gcc (part 2)

In the previous post we saw the inline assembly programming features of gcc. It is now time to have a look at how to interface C programs with assembly code file (.S file containing only assembly instructions, assembled and linked like another C file).
To do this we need to understand how the C compiler realize the two well known mechanisms: parameter passing by value and parameter passing by reference.
High-level languages have standard ways to pass data known as calling conventions. For high-level code to interface with assembly language, the assembly language code must use the same conventions as the high-level language. These conventions allow one to create subprograms that are re-entrant. A re-entrant subprogram may be called at any point of a program safely (even inside the subprogram itself). The major concern of these conventions is the description of the rules governing the use of the system stack. Before to call the function the caller program must "push" the parameters  onto the stack in case of parameter passing by value or the parameter's address in case of passage by reference.
Let examine together the following C code example:

void f(char a, char b, char c)
{
   char buffer1[6];
   int d;
}
void main() {
  function(1,2,3);
}

If we want to see how the C compiler implementthe function's parameter passing mechanisms  in assembly language,  there are two possibilities:

  1. Feed gcc with this code and the -S option, i.e. gcc -S example.c;
  2. Disassemble the executable produced by the compiler:

Because in Eclipse is so straightforward to see the disassembled code, just look at the dissambly windows of the debugger perspective :), I've hard-copied this qindow in the figure on the left where we can see the assembly translation in AT&T  syntax of the previous C code.
Here we can see that the caller program before to invoke the function f , pushes onto the stack the values of the three parameters:

20 f(1,2,3);
004013b4: movl $0x3,0x8(%esp)
004013bc: movl $0x2,0x4(%esp)
004013c4: movl $0x1,(%esp)
004013cb: call 0x40138c <f>


Note that gcc, to speed up the things, prefers to use the movl instructions instead of the push ones to implement the push mechanism.


Anyway, what it really does is:
push 3;
push 2;
push 1;
call f;
So, the stack before the call of the function f, together with the memory layout of the executing process, is as depicted in the following figure:

Note that, depending on the implementation, the stack will either grow downward (towards lower memory addresses), or upward. In the case of the Intel, Motorola, SPARC and MIPS processors the stack grows downward. The stack pointer (SP) is also implementation dependent. It may point to the last address on the stack, or to the next free available address after the stack. In the case of Intel processors it points to the last address on the stack.
When the call instruction is executed to invoke the f function, the microprocessor save the 
At the beginning of the f function, the EBP register is saved onto the stack and is used to save the current address of the top of the stack (ESP). Why? Because the C calling convention requires the value of EBX to be unmodified by the function call. If this is not done, it is very likely that the program will not work correctly. So the next two assembly instruction will surely be the standar  prologue of our assembly routines skeleton linkable with C programs.
f:0040138c: push %ebp
0040138d: mov %esp,%ebp
Note that it is mandatory, at the end of the procedure, to restore the original value of EBP and deallocate the local variables restoring the initial stack pointer ESP. This is done with the following two assembly statements that will be the standard epilogue of our function skeleton: 
mov %ebp,%esp; deallocate locals
pop %ebp; restore original EBP value
Goinge back to our disassembly listing, we can see that the next instruction written by the compiler is:
0040138f:   sub $0x1c,%esp
Here the compiler is making room in the stack for the function f local variables. This is done decreasing the stack pointer ESP of 0x1c (decimal 28). Remember that the stack grows toward low memory addresses. But. Hey, wait a moment! Why does the compiler reserve 28 bytes to make room for the two variables buffer1 and d that needs only: size(buffer1)+size(d)=6+4=10 bytes?
The answer is composed by several pieces:

  • First of all, we must remember that memory can only be addressed in multiples of the word size. A word in our case is 4 bytes, or 32 bits. So our 6 byte buffer needs two words to be stored, that is 8 bytes instead of 6!
  • Then, we have to reserve room for d, that is one word or 4 bytes.
  • Then, the compiler want to reserve rooms to make a copy of the function parameters. There are three char parameters, but for each char variables we have to use 1 word. So we have 12 bytes for the parameters.
  • Finally, we must take into account that we pushed onto the stack the 4 bytes EBP register and this account for 4 bytes more .

Summing up all the previous quantities, we have 8+4+12+4=28 bytes. In this way, the compiler create a safe 28 bytes long stack frame to isolate the current contest of the function from the eventual new frame that , eventually, could be necessary to be implemented, for example, in case of calls to other function inside f o recursive calls of f. Puff... Puff...Puff.
This was a little boring! Wasn't it?
Anyway, lets try to summarize what we learned so far. The skeleton of our assembly routine callable from C program should be:
subprogram_label:
  push %ebp
  mov %esp,%ebp
  sub SZ,%esp ; SZ=# bytes needed by local variables
  ; subprogram code
  mov %ebp,%esp
  push %ebp
  ret

Note that the prologue and epilogue of a subprogram can be simplified by using two special instructions that are designed specifically for this purpose. The ENTER instruction performs the prologue code and the LEAVE performs the epilogue. The ENTER instruction takes two immediate operands. For the C calling convention, the second operand is always 0. The first operand is the number bytes needed by local variables. The LEAVE instruction has no operands.

To understand how to access to the passed parameters we can look at how the compiler did it in our simple program:
mov 0x8(%ebp),%ecx   ; move the value of a in ecx
mov 0xc(%ebp),%edx   ; move the value of b in edx
mov 0x10(%ebp),%eax  ; move the value of a in eax

It means that in our stack frame we have:

Location   : data 
EBP + 0x10 : content of c
EBP + 0xc  : content of b
EBP + 0x8  : content of a
EBP + 4    : Return address
EBP        : saved EBP

In case of parameters passed by reference, the caller program pushes the address of the variable instead of its content and the called routine access the address of the variable and can read and, eventually, modifies the content of the passed variables.

So far we saw how to deal with a void function, but what does it happen when we need to write a non void function? 

The C calling conventions specify how this is done. Return values are passed via registers. All integral
types (char, int, enum, etc.) are returned in the EAX register. If they are smaller than 32-bits, they are extended to 32-bits when stored in EAX. (How they are extended depends on if they are signed or unsigned types.) 64-bit values are returned in the EDX:EAX register pair. Pointer values are also stored in EAX. Floating point values are stored in the ST0 register of the math co-processor.

The C calling conventions specify also that after the subprogram is over, the parameters that were pushed on the stack must be removed by the caller program.  Other conventions are different. For example, the Pascal calling convention specifies that the subprogram must remove the parameters before returning to the caller program.



giovedì 13 dicembre 2012

Assembly programming using gcc (part 1)


First of all, why in the hell do we need to know about assembly programming when we have compiled high level language?
First of all, because it is fun :)! Then, instead of why, we should ask when do we need to use assembly instead of high level languages?
In fact, assembly language is used primarily for direct hardware manipulation, access to specialized processor instructions, or to address critical performance issues. Typical uses are device drivers, low-level embedded systems, and real-time systems. We'll not address all these issues but we'll focus only on how to program pieces of code in assembly  on a x86 hardware using the gcc tool chain.
How is it done? There are two ways:
  1. Inline assembly: Assembly instructions are written directly in the C code file
  2. Separate assembly code file: .S file containing only assembly instructions, assembled and linked like another C file.

X86 Assembly and the gcc tool chain

GCC uses AT&T assembly syntax. AT&T syntax uses the opposite order for source and destination operands. For example, to move the constant 1 into the eax register we have:
AT&T] movl $1, %eax                     Intel] mov eax, 1
Note that:
1) Register operands are preceded by the "%" character, including sections.
2) Immediate operands are preceded by the "$" character.
3) The size of memory operands are specified using the last character of the opcode. These are "b" (8-bit), "w" (16-bit), and "l" (32-bit).

Inline assembly

In C syntax, the inline term is used to instruct the compiler to insert the code of a function into the code of its caller at the point where the actual call is made. Such functions are called "inline functions". The benefit of inlining is that it reduces function-call overhead. In particular, inline assembly is a way to tell the compiler to insert, in a determinate position, a fragment of assembly code without trying to compile it.
asm() is the key  statement to inject assembly code in a C program. let see its syntax:

 asm ( assembler template 
           : output operands                  /* optional */
           : input operands                   /* optional */
           : list of clobbered registers      /* optional */
           );
where:
  • assembler template is a list of quoted assembly statements terminated with the '\n';
  • input and output operand are what you can guess they are;
  • clobbered registers are a list of register that have been somehow modified by the inlined assembly code.
Let see an example. Imagine you would write an extra super fast C function that perform the addition of two integers. Here is the code:
int MyAdd(int x, int y) { // add x and y and return the result
int z;
  asm( "addl %%ebx, %%eax;\n"
      : "=a" (z)
      : "a" (x), "b" (y) );
  return z;
}
The assembly statement addl %%ebx, %%eax; add the content of the ebx register with the content of the eax register and put the result in eax.
The first colon is where the output operand is to be described. "=a" (z) means that the output is the register eax and that the content of this register must be written in the z variable. The '=' sign is used t indicate the output read only operand. 'a' is called a constrain. Other constrain are:
r : any register
b : %ebx, %bx, %bl
c : %ecx, %cx, %cl
d : %edx, %dx, %dl
S : %esi, %si
D : %edi, %di
The second colon is the input operands section of the asm statement. Here are listed all the input operands with the same syntax used for the output operand. More than one operand can be specified separating each one with commas.
In the next example, a personalized integer division function,  we'll see another method to use C variables inside the inline assembly.   here are two more methods to address program variables from the inline assembly code.

int MyDiv(int arg1, int arg2, int quo, int rem) { 
  asm ( "movl $0x0, %%edx;"
        "movl %2, %%eax;" 
        "movl %3, %%ebx;" 
        "idivl %%ebx;" 
        : "=a" (quo), "=d" (rem) : "g" (arg1), "g" (arg2) );
}

The meaning of this function is straightforward. First put 0 in the edx register (the remainder), then load the two operand  arg1 and arg2 in eax and ebx respectively. Finally, idivl perform the integer division and put the quotient and the remainder in the eax and edx registers respectively. It easy to recognize the use of %n to represent the n-th variable of the input and output section of the asm statement.

And, if we don't like AT&T syntax? Don't worry gcc came to our rescue with the directive ".intel_syntax noprefix;". To use intel syntax you can add this directive at the beginning of the inline assembly code. Do not forget to put things as they were before adding at the end the directive ".att_syntax noprefix " to switch back to the AT&T syntax. Otherwise, the assembler will try to assemble the rest of the program using Intel syntax while gcc will continue to produce AT&T code.

Good inline assembly programming :).

martedì 4 dicembre 2012

Are Const expressions like 1+3 computed at compile time or at execution time?

The answer is:" Const expressions are computed at compile time".
Let see together a practical demonstartion.
If we consider the following code:

1
2
3
int main() {
int a=1+3;
}

And compile it using  gcc with -S option we obtain:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
 .file "ottimo.c"
 .def ___main; .scl 2; .type 32; .endef
 .text
.globl _main
 .def _main; .scl 2; .type 32; .endef
_main:
 pushl %ebp
 movl %esp, %ebp
 subl $8, %esp
 andl $-16, %esp
 movl $0, %eax
 addl $15, %eax
 addl $15, %eax
 shrl $4, %eax
 sall $4, %eax
 movl %eax, -8(%ebp)
 movl -8(%ebp), %eax
 call __alloca
 call ___main
 movl $4, -4(%ebp)
 leave
 ret


As we can see, the compiler, even without the selection of optimization options, optimize the code and translate a= 1+3 as movl $4, -4(%ebp), that is a =4.

Q.E.D. :)

Global, local and static variables: which is the faster?

This is the first post of the Questions&Answers series. With the tag QeA I'll collect all the relevant answer I'm posting in international forum on C/C++. Hope you'll enjoy it!

The answer is: "Static variable are faster then global and local variables." Moreover, there is no difference between global and local variables, because both are allocated in the stack and must be popped from the stack every time you use them. Let see together a practical demonstration of it!

If we consider the following simple source code:

1
2
3
4
5
6
7
8
9
10
11
12
static int a;
int fTest();
int main() {
  int b;
  a=a+11;
  b=b+22;
}
int fTest(){
  int c;
  c=c+33;
}

compiling it using gcc with option -S give us the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
 .file "variabili.c"
 .def ___main; .scl 2; .type 32; .endef
 .text
.globl _main
 .def _main; .scl 2; .type 32; .endef
_main:
 pushl %ebp
 movl %esp, %ebp
 subl $8, %esp
 andl $-16, %esp
 movl $0, %eax
 addl $15, %eax
 addl $15, %eax
 shrl $4, %eax
 sall $4, %eax
 movl %eax, -8(%ebp)
 movl -8(%ebp), %eax
 call __alloca
 call ___main
 addl $11, _a
 leal -4(%ebp), %eax
 addl $22, (%eax)
 leave
 ret
.globl _fTest
 .def _fTest; .scl 2; .type 32; .endef
_fTest:
 pushl %ebp
 movl %esp, %ebp
 subl $4, %esp
 leal -4(%ebp), %eax
 addl $33, (%eax)
 leave
 ret
.lcomm _a,16

Note that we used int constants 11, 22 and 33 to easily identify the fragment of assembler code of interest. 
We can observe that:
  1. For what concern the global variable b and the local variable c, they are both popped from the stack and then the addition is performed;
  2. For what concern the static int variable a the compiler use the addl instruction with the absolute address of the space reserved to the variable a with the .lcomm _a,16 assembly directive.
In conclusion, it seems that static variable perform better than global and local variables.
Q.E.D. :)

lunedì 3 dicembre 2012

Curiosità online sul C/C++ - N.1

Non riuscite a capire il tipo di una dichiarazione di variabile o di funzione oppure un casting estremo?
Sarebbe bello se qualcuno ve la leggesse in un linguaggio umano?
Oppure, sapete descrivere a parole il tipo del vostro puntatore di puntatore ma non riuscite a tradurlo in codice?
Per rispondere a tutte queste domande, fatevi aiutare da cdecl.org il traduttore simultaneo dal  Grammelot C all'inglese e viceversa. Sul sito, in alto a destra, trovate anche il link al sorgente del programma C che fa da traduttore.