Sunday 20 October 2013

Links to understand process(O.S.)

Links to understand process(O.S.)

http://www.cs.uic.edu/~jbell/CourseNotes/OperatingSystems/3_Processes.html

Thursday 26 September 2013

Linux : module_init() vs. core_initcall() vs. early_initcall()

Linux : module_init() vs. core_initcall() vs. early_initcall()

 

http://stackoverflow.com/questions/18605653/linux-module-init-vs-core-initcall-vs-early-initcall/18605952#18605952

 

 

Thursday 21 March 2013

Anatomy of Linux loadable kernel modules


for details check out the  link,
http://www.ibm.com/developerworks/linux/library/l-lkm/index.html?ca=dgr-lnxw07LinuxLKM
The Linux kernel is what's known as a monolithic kernel, which means that the majority of the operating system functionality is called the kernel and runs in a privileged mode. This differs from a micro-kernel, which runs only basic functionality as the kernel (inter-process communication [IPC], scheduling, basic input/output [I/O], memory management) and pushes other functionality outside the privileged space (drivers, network stack, file systems). You'd think that Linux is then a very static kernel, but in fact it's quite the opposite. Linux can be dynamically altered at run time through the use of Linux kernel modules (LKMs).
Linux is not the only monolithic kernel that can be dynamically altered (and it wasn't the first). You'll find loadable module support in Berkeley Software Distribution (BSD) variants, Sun Solaris, in older kernels such as OpenVMS, and other popular operating systems such as Microsoft® Windows® and Apple Mac OS X.Dynamically alterable means that you can load new functionality into the kernel, unload functionality from the kernel, and even add new LKMs that use other LKMs. The advantage to LKMs is that you can minimize the memory footprint for a kernel, loading only those elements that are needed (which can be an important feature in embedded systems).
An LKM has some fundamental differences from elements that compile directly into the kernel and also typical programs. A typical program has a main, where an LKM has a module entry and exit function (in version 2.6, you can name these functions anything you wish). The entry function is called when the module is inserted into the kernel, and the exit function called when it's removed. Because the entry and exit functions are user-defined, amodule_init and module_exit macro exist to define which functions these are. An LKM also includes a required and optional set of module macros. These define the license of the module, the module's author, a description of the module, and more. Figure 1 provides view of a very simple LKM.

Figure 1. Source view of a simple LKM 
Source view of a simple LKM 
The version 2.6 Linux kernel provides a new (simpler) method for building LKMs. When built, you can use the typical user tools for managing modules (though the internals have changed): the standard insmod (installing an LKM), rmmod (removing an LKM),modprobe (wrapper for insmod and rmmod), depmod (to create module dependencies), and modinfo (to find the values for module macros). For more information on building LKMs for the version 2.6 kernel, check out Resources.
An LKM is nothing more than a special Executable and Linkable Format (ELF) object file. Typically, object files are linked to resolve their symbols and result in an executable. But because an LKM can't resolve its symbols until it's loaded into the kernel, the LKM remains an ELF object. You can use standard object tools on LKMs (which for version 2.6 have the suffix .ko, for kernel object). For example, if you used the objdump utility on an LKM, you'd find several familiar sections, such as .text (instructions),.data (initialized data), and .bss (Block Started Symbol, or uninitialized data).
You'll also find additional sections in a module to support its dynamic nature. The .init.text section contains the module_initcode, and the .exit.text contains the module_exit code (see Figure 2). The .modinfo section contains the various macro text indicating module license, author, description, and so on.

Figure 2. An example of an LKM with various ELF sections 
Example of an LKM with various ELF sections 
So, with that introduction to the basics of LKMs, let's dig in to see how modules get into the kernel and are managed internally.
The process of module loading begins in user space with insmod (insert module). The insmod command defines the module to load and invokes the init_module user-space system call to begin the loading process. The insmod command for the version 2.6 kernel has become extremely simple (70 lines of code) based on a change to do more work in the kernel. Rather thaninsmod doing any of the symbol resolution that's necessary (working with kerneld), the insmod command simply copies the module binary into the kernel through the init_module function, where the kernel takes care of the rest.
The init_module function works through the system call layer and into the kernel to a kernel function called sys_init_module(see Figure 3). This is the main function for module loading, making use of numerous other functions to do the difficult work. Similarly, the rmmod command results in a system call for delete_module, which eventually finds its way into the kernel with a call to sys_delete_module to remove the module from the kernel.

Figure 3. Primary commands and functions involved in module loading and unloading 
Primary commands and functions involved in module loading and unloading 
During module load and unload, the module subsystem maintains a simple set of state variables to indicate the operation of a module. If the module is being loaded, then the state is MODULE_STATE_COMING. If the module has been loaded and is available, it is MODULE_STATE_LIVE. Otherwise, if the module is being unloaded, then the state is MODULE_STATE_GOING.
Let's now look at the internal functions for module loading (see Figure 4). When the kernel function sys_init_module is called, it begins with a permissions check to see whether the caller can actually perform this operation (through the capable function). Then, the load_module function is called, which takes care of the mechanical work to bring the module into the kernel and perform the necessary plumbing (I review this shortly). The load_module function returns a module reference that refers to the newly loaded module. This module is loaded onto a doubly linked list of all modules in the system, and any threads currently waiting for module state change are notified through the notifier list. Finally, the module's init() function is called, and the module's state is updated to indicate that it is loaded and live.

Figure 4. The internal (simplified) module loading process 
The internal (simplified) module loading process 
The internal details of module loading are ELF module parsing and manipulation. The load_module function (which resides in ./linux/kernel/module.c) begins by allocating a block of temporary memory to hold the entire ELF module. The ELF module is then read from user space into the temporary memory using copy_from_user. As an ELF object, this file has a very specific structure that can be easily parsed and validated.
The next step is to perform a set of sanity checks on the loaded image (is it a valid ELF file? is it defined for the current architecture? and so on). When these sanity checks are passed, the ELF image is parsed and a set of convenience variables are created for each section header to simplify their access later. Because the ELF objects are based at offset 0 (until relocation), the convenience variables include the relative offset into the temporary memory block. During the process of creating the convenience variables, the ELF section headers are also validated to ensure that a valid module is being loaded.
Any optional module arguments are loaded from user space into another allocated block of kernel memory (step 4), and the module state is updated to indicate that it's being loaded (MODULE_STATE_COMING). If per-CPU data is needed (as determined in the section header checks), a per-CPU block is allocated.
In the prior steps, the module sections are loaded into kernel (temporary) memory, and you also know which are persistent and which can be removed. The next step (7) is to allocate the final location for the module in memory and move the necessary sections (indicated in the ELF headers by SHF_ALLOC, or the sections that occupy memory during execution). Another allocation is then performed of the size needed for the required sections of the module. Each section in the temporary ELF block is iterated, and those that need to be around for execution are copied into the new block. This is followed by some additional housekeeping. Symbol resolution also occurs, which can resolve to symbols that are resident in the kernel (compiled into the kernel image) or symbols that are transient (exported from other modules).
The new module is then iterated for each remaining section and relocations performed. This step is architecture dependent and therefore relies on helper functions defined for that architecture (./linux/arch/<arch>/kernel/module.c). Finally, the instruction cache is flushed (because the temporary .text sections were used), a bit more housekeeping is performed (free temporary module memory, setup the sysfs), and the module is finally returned to load_module.
Unloading the module is essentially a mirror of the load process, except that several sanity checks must occur to ensure safe removal of the module. Unloading a module begins in user space with the invocation of the rmmod (remove module) command. Inside the rmmod command, a system call is made to delete_module, which eventually results in a call to sys_delete_moduleinside the kernel (recall from Figure 3). Figure 5 illustrates the basic operation of the module removal process.

Figure 5. The internal (simplified) module unloading process 
The internal (simplified) module unloading process 
When the kernel function sys_delete_module is invoked (with the name of the module to be removed, passed in as the argument), the first step is to ensure that the caller has permissions. Next, a list is checked to see whether any other modules depend on this module. There exists a list called modules_which_use_me that contains an element per dependent module. If this list is empty, no module dependencies exist and the module is a candidate for removal (otherwise, an error is returned). The next test is to see if the module is loaded. Nothing prohibits a user calling rmmod on a module that's currently being installed, so this check ensures that the module is live. After a few more housekeeping checks, the penultimate step is to call the module's exit function (provided within the module itself). Finally, the free_module function is called.
When free_module is called, the module has been found to be safely removable. No dependencies exist now for the module, and the process of cleaning up the kernel can begin for this module. This process begins by removing the module from the various lists that it was placed on during installation (sysfs, module list, and so on). Next, an architecture-specific cleanup routine is invoked (which can be found in ./linux/arch/<arch>/kernel/module.c). You then iterate the modules that depended on you and remove this module from their lists. Finally, with the cleanup complete—from the kernel's perspective—the various memory that was allocated for the module is freed, including the argument memory, per-CPU memory, and the module ELF memory (core and init).
In many applications, the need for dynamic loading of modules is important, but when loaded, it's not necessary for the modules to be unloaded. This allows the kernel to be dynamic at startup (load modules based on the devices that are found) but not dynamic throughout operation. If it's not required to unload a module after it's loaded, you can make several optimizations to reduce the amount of code needed for module management. You can "unset" the kernel configuration optionCONFIG_MODULE_UNLOAD to remove a considerable amount of kernel functionality related to module unloads.

Friday 8 March 2013

Difference Between Semaphores and Mutex


Difference Between Semaphores and Mutex




  1. A semaphore can be a Mutex but a Mutex can never be semaphore. This simply means that a binary semaphore can be used as Mutex, but a Mutex can never exhibit the functionality of semaphore.
  2. Both semaphores and Mutex (at least the on latest kernel) are non-recursive in nature.
  3. No one owns semaphores, whereas Mutex are owned and the owner is held responsible for them. This is an important distinction from a debugging perspective.
  4. In case the of Mutex, the thread that owns the Mutex is responsible for freeing it. However, in the case of semaphores, this condition is not required. Any other thread can signal to free the semaphore by using the sem_post() function.
  5. A Mutex, by definition, is used to serialize access to a section of re-entrant code that cannot be executed concurrently by more than one thread. A semaphore, by definition, restricts the number of simultaneous users of a shared resource up to a maximum number
  6. Another difference that would matter to developers is that semaphores are system-wide and remain in the form of files on the filesystem, unless otherwise cleaned up. Mutex are process-wide and get cleaned up automatically when a process exits.
  7. The nature of semaphores makes it possible to use them in synchronizing related and unrelated process, as well as between threads. Mutex can be used only in synchronizing between threads and at most between related processes (the pthread implementation of the latest kernel comes with a feature that allows Mutex to be used between related process).
  8. According to the kernel documentation, Mutex are lighter when compared to semaphores. What this means is that a program with semaphore usage has a higher memory footprint when compared to a program having Mutex.
  9. From a usage perspective, Mutex has simpler semantics when compared to semaphores.

check out the link for example. http://www.geeksforgeeks.org/mutex-vs-semaphore/

Sunday 3 March 2013

Bitwise operations in c



Bitwise operations in c

Function to get no of set bits in binary representation of passed binary no. *
#include<stdio.h>
/* Function to get no of set bits in binary representation of passed binary no. */
int countSetBits(long n)
{
        unsigned int num_zeroes = 0;
        for(size_t i = 0; i < 8 * sizeof n; ++i, n >>= 1)
        {
                 if ((n & 1) == 1)
                         ++num_zeroes;
          }
         return num_zeroes;
}

int countResetBits(long n)
    {
    unsigned int num_zeroes = 0;
    for(size_t i = 0; i < 8 * sizeof n; ++i, n >>= 1)
        {
            if ((n & 1) == 0)
            ++num_zeroes;
    }
    return num_zeroes;
    }

/* Program to test function countSetBits */
int main()
{
    long i = 5;
    printf("%d", countResetBits(i));
    printf("\n%d", countSetBits(i));
    getchar();
}


//Decimal to Binary using Bitwise AND operator

void binary(unsigned int num)
{
unsigned int mask=32768;   //mask = [1000 0000 0000 0000]
printf("Binary Equivalent : ");

while(mask > 0)
   {
   if((num & mask) == 0 )
         printf("0");
   else
         printf("1");
  mask = mask >> 1 ;  // Right Shift
   }
}
Output :
Enter Decimal Number : 10
Binary Eqivalent : 0000000000001010


Write a c program to multiply given number by 4

#include <stdio.h>
void main()
{
long number, tempnum;
printf("Enter an integer\n");
scanf("%ld",&number);
tempnum = number;
number = number << 2;   /*left shift by two bits*/
printf("%ld x 4 = %ld\n", tempnum,number);
printf("Enter an integer\n");
scanf("%ld",&number);
tempnum = number;
number = number << 2;   /*left shift by two bits*/
printf("%ld x 4 = %ld\n", tempnum,number);
}

/*------------------------------
Output
Enter an integer
15
15 x 4 = 60

RUN2
Enter an integer
262
262 x 4 = 1048
---------------------------------*/

C program to check odd or even using bitwise operator
#include<stdio.h>
main()
{
   int n;
   printf("Enter an integer\n");
   scanf("%d",&n);

   if ( n & 1 == 1 )
      printf("Odd\n");
   else
      printf("Even\n");
    return 0;
}

Write a ‘C’ program to swap two numbers using bitwise operator

#include<stdio.h>
#include<conio.h>
void main()
{
int x,y;
clrscr();
printf("\n enter the elements\n");
scanf("%d%d",&x,&y);
printf("\n before swaping x=%d,y=%d",x,y);
x=x^y;
y=y^x;
x=x^y;
printf("\n after swaping x=%d y=%d",x,y);
getch();
}

Multiplication of two numbers using BITWISE operators
//How will you multiply two numbers using BITWISE operators
#include<stdio.h>
main()
{
   int a,b,result;
   printf("nEnter the numbers to be multiplied :");
   scanf("%d%d",&a,&b);
   result=0;
   while(b != 0)               // Iterate the loop till b==0
   {
      if (b&01)               // Logical ANDing of the value of b with 01
      result=result+a; // Update the result with the new value of a.
      a<<=1;              // Left shifting the value contained in 'a' by 1.
      b>>=1;             // Right shifting the value contained in 'b' by 1.
   }
   printf("nResult:%d",result);
}

Bitwise operation for division
int main(void) {
int L, leftShift, i, changeMask;

leftShift = 0; /* added or subtracted value will be 2^leftShift */
i = 25; /* value we are adding to or subtracting from */
changeMask = 1 << leftShift;

for (L = leftShift; L < INT_BIT; L++) {
i ^= changeMask;
if ( /* ! */ (i & changeMask)) { /* comment in or out "!" for
addition or subtraction */
break;
}
changeMask <<= 1;
}

printf("%i", i);

return 0;
}

Program in C to manipulate bits,bitmanip.c
//========================
//Program in C to manipulate bits,bitmanip.c
//========================
#include<stdio.h>
int main()
{
int n,b;
printf("\n Enter a number and bit to manipulate: \n");
scanf("%d%d",&n,&b);
printf("You entered %d and %d",n,b);
//Code to Set bth bit in number n
n=n|(1<<b);
//to toggle //n=n^(1<<b)
//to display bth bit// n&(1<<b)
//to clear bth bit // n& ~(1<<b)
printf("\n Entered Number after setting %d th bit is %d \n :",b,n);
return 0;
}

Program in C to set nth bit, bitset.c
//========================
//Program in C to set nth bit, bitset.c
//========================
#include<stdio.h>
int showbits(int nn)
{
unsigned int m;
m=1<<(sizeof(nn)*8-1);
        while(m > 0)
        {
                if(nn & m)
                {
                printf("1");
                }
                else
                {
                printf("0");
                }
        m>>=1;
        }
}
int main()
{
int number,bits;
printf("Enter a nomber and bits to set:\n");
scanf("%d%d",&number,&bits);
printf("\n You Entered: %d and %d  \n",number,bits);
showbits(number);
number=number^(1<<(bits-1));
printf("\n Number now is: %d \n",number);
showbits(number);
printf("\n");
return 0;
}

Program in C to check bit at nth position, bitcheckatn.c
//========================
//Program in C to check bit at nth position, bitcheckatn.c
//========================
#include<stdio.h>
int showbits(int nn)
{
unsigned int m;
m=1<<(sizeof(nn)*8-1);
        while(m > 0)
        {
                if(nn & m)
                {
                printf("1");
                }
                else
                {
                printf("0");
                }
        m>>=1;
        }
}
int main()
{
int number,bits;
printf("Enter a nomber and bits to see qt bit th position:\n");
scanf("%d%d",&number,&bits);
printf("\n You Entered: %d and %d  \n",number,bits);
showbits(number);
number=number&(1<<(bits));
printf("\n Now number is :%d and \n bit at %d  position is: \n",number,bits);
showbits(number);
printf("\n");
return 0;
}
How to Swap two numbers using bitwise operator?
Answer:
#include<stdio.h>
#include<conio.h>
void main()
{
int x,y;
clrscr();
printf("\n enter the elements\n");
scanf("%d%d",&x,&y);
printf("\n before swaping x=%d,y=%d",x,y);
x=x^y;
y=y^x;
x=x^y;
printf("\n after swaping x=%d y=%d",x,y);
getch();
}
Entered Character is digit or uppercase alphabet or lowercase alphabet
#include<stdio.h>
#include<conio.h>
void main()
{
char ch;
clrscr();
printf("enter the character\n");
scanf("%c",&ch);
if(ch>='0'&& ch<='9')
printf("\nIt is adigit");
if(ch>='A'&&ch<='Z')
printf("\n It is alphabet");
if(ch>='a'&&ch<='z')
printf("\nit is alphabet");
getch();
}

Sum of digit of a number using recursion function
#include<stdio.h>
#include<conio.h>
void main()
{
int sumdig(int);
int n,sum,d,f,rev=0;
clrscr();
printf("enter the number");
scanf("%d",&n);
sum=sumdig(n);
printf("%d",sum);
getch();
}

int sumdig(int n)
{
int sum,d,r=0;
if(n==0)
return(0);
else
sum=n%10+sumdig(n/10);
while(sum!=0)
{
d=sum%10;
r=r+d;
sum=sum/10;
}
return(r);
}

Write a ‘C’ program to convert given decimal number into binary number
#include
#include
void main()
{
int n,j,a[50],i=0;
clrscr();
printf("\n enter the value :-");
scanf("%d",&n);
while(n!=0)
{
a[i]=n%2;
i++;
n=n/2;
}
printf("\n binary conversion\n");
for(j=i-1;j>=0;j--)
printf("%d",a[j]);
getch();
}

#include<stdio.h>
#include<conio.h>
void main()
{
int n,tmp,d,rev=0;
clrscr();
printf("enter the number=");
scanf("%d",&n);
tmp=n;
while(n!=0)
{
d=n%10;
rev=(rev*10)+d;
n=n/10;
}
if(tmp==rev)
printf("enter number is palindrom");
else
printf("enter number is not palindrom");
getch();
}


Wednesday 27 February 2013

Execution life cycle of a c program



Execution life cycle of a c program

There are different ways in which we can execute programs. Compilersinterpreters and virtual machines are some tools that we can use to accomplish this task. All these tools provide a way to simulate in hardware the semantics of a program. Although these different technologies exist with the same core purpose - to execute programs - they do it in very different ways. They all have advantages and disadvantages, and in this chapter we will look more carefully into these trade-offs. Before we continue, one important point must be made: in principle any programming language can be compiled or interpreted. However, some execution strategies are more natural in some languages than in others.

[edit]Compiled Programs

Compilers are computer programs that translate a high-level programming language to a low-level programming language. The product of a compiler is an executable file, which is made of instructions encoded in a specific machine code. Hence, an executable program is specific to a type of computer architecture. Compilers designed for distinct programming languages might be quite different; nevertheless, they all tend to have the overall macro-architecture described in the figure below:
Compiler design IPL.png
A compiler has a front end, which is the module in charge of transforming a program, written in a high-level source language into an intermediate representation that the compiler will process in the next phases. It is in the front end that we have the parsing of the input program, as we have seen in the last two chapters. Some compilers, such as gcc can parse several different input languages. In this case, the compiler has a different front end for each language that it can handle. A compiler also has a back end, which does code generation. If the compiler can target many different computer architectures, then it will have a different back-end for each of them. Finally, compilers generally do some code optimization. In other words, they try to improve the program, given a particular criterion of efficiency, such as speed, space or energy consumption. In general the optimizer is not allowed to change the semantics of the input program.
The main advantage of execution via compilation is speed. Because the source program is translated directly to machine code, this program will most likely be faster than if it were interpreted. Nevertheless, as we will see in the next section, it is still possible, although unlikely, that an interpreted program run faster than its machine code equivalent. The main disadvantage of execution by compilation is portability. A compiled program targets a specific computer architecture, and will not be able to run in a different hardware.

[edit]The life cycle of a compiled program

A typical C program, compiled by gcc, for instance, will go through many transformations before being executed in hardware. This process is similar to a production line in which the output of a stage becomes the input to the next stage. In the end, the final product, an executable program, is generated. This long chain is usually invisible to the programmer. Nowadays, integrated development environments (IDE) combine the several tools that are part of the compilation process into a single execution environment. However, to demonstrate how a compiler works, we will show the phases present in the execution of a standard C file compiled with gcc. These phases, their products and some examples of tools are illustrated in the figure below.
Program life cycle IPL.png
The aim of the steps seen above is to translate a source file to a code that a computer can run. First of all, the programmer uses a text editor to create a source file, which contains a program written in a high-level programming language. In this example, we are assuming C. There exist every sort of text editor that can be used here. Some of them provide supporting in the form of syntax highlighting or an integrated debugger, for instance. Lets assume that we have just edited the following file, which we want to compile:
#define CUBE(x) (x)*(x)*(x)
int main() {
  int i = 0;
  int x = 2;
  int sum = 0;
  while (i++ < 100) {
    sum += CUBE(x);
  }
  printf("The sum is %d\n", sum);
}
After editing the C file, a preprocessor is used to expand the macros present in the source code. Macro expansion is a relatively simple task in C, but it can be quite complicated in languages such as lisp, for instance, which take care of avoiding typical problems of macro expansion such as variable capture. During the expansion phase, the body of the macro replaces every occurrence of its name in the program's source code. We can invoke gcc's preprocessor via a command such as gcc -E f0.c -o f1.c. The result of preprocessing our example program is the code below. Notice that the call CUBE(x) has been replaced by the expression (x)*(x)*(x).
int main() {
  int i = 0;
  int x = 2;
  int sum = 0;
  while (i++ < 100) {
    sum += (x)*(x)*(x);
  }
  printf("The sum is %d\n", sum);
}
In the next phase we convert the source program into assembly code. This phase is what we normally call compilation: a text written in the C grammar will be converted into a program written in the x86 assembly grammar. It is during this step that we perform the parsing of the C program. In Linux we can translate the source file, e.g., f1.c to assembly via the command cc1 f1.c -o f2.s, assuming that cc1 is the system's compiler. This command is equivalent to the call gcc -S f1.c -o f2.s. The assembly program can be seen in the left side of the figure below. This program is written in the assembly language used in the x86 architecture. There are many different computer architectures, such as ARMPowerPC and Alpha. The assembly language produced for any of them would be rather different than the program below. For comparison purposes, we have printed the ARM version of the same program at the right side of the figure. These two assembly languages follow very different design philosophies: x86 uses a CISC instruction set, while ARM follows more closely the RISC approach. Nevertheless, both files, the x86's and the ARM's have a similar syntactic skeleton. The assembly language has a linear structure: a program is a list-like sequence of instructions. On the other hand, the C language has a syntactic structure that looks more like a tree, as we have seen in a previous Chapter. Because of this syntactic gap, this phase contains the most complex translation step that the program will experiment during its life cycle.
# Assembly of x86                           # Assembly of ARM
  .cstring                                  _main:
LC0:                                        @ BB#0:
  .ascii "The sum is %d\12\0"                 push       {r7, lr}
  .text                                       mov       r7, sp
.globl _main                                  sub       sp, sp, #16
_main:                                        mov       r1, #2
  pushl   %ebp                                mov  r0, #0
  movl    %esp, %ebp                          str     r0, [r7, #-4]
  subl    $40, %esp                           str  r0, [sp, #8]
  movl    $0, -20(%ebp)                       stm       sp, {r0, r1}
  movl    $2, -16(%ebp)                       b LBB0_2
  movl    $0, -12(%ebp)                     LBB0_1:
  jmp     L2                                  ldr       r0, [sp, #4]
L3:                                           ldr       r3, [sp]
  movl    -16(%ebp), %eax                     mul  r1, r0, r0
  imull   -16(%ebp), %eax                     mla  r2, r1, r0, r3
  imull   -16(%ebp), %eax                     str  r2, [sp]
  addl    %eax, -12(%ebp)                   LBB0_2:
L2:                                           ldr       r0, [sp, #8]
  cmpl    $99, -20(%ebp)                      add       r1, r0, #1
  setle   %al                                 cmp  r0, #99
  addl    $1, -20(%ebp)                       str       r1, [sp, #8]
  testb   %al, %al                            ble     LBB0_1
  jne     L3                                @ BB#3:
  movl    -12(%ebp), %eax                     ldr  r0, LCPI0_0
  movl    %eax, 4(%esp)                       ldr  r1, [sp]
  movl    $LC0, (%esp)                      LPC0_0:
  call    _printf                             add       r0, pc, r0
  leave                                       bl        _printf
  ret                                         ldr       r0, [r7, #-4]
                                              mov       sp, r7
                                              pop       {r7, lr}
                                              mov       pc, lr
It is during the translation from the high-level language to the assembly language that the compiler might apply code optimizations. These optimizations must obey the semantics of the source program. An optimized program should do the same thing as its original version. Nowadays compilers are very good at changing the program in such a way that it becomes more efficient. For instance, a combination of two well-known optimizations, loop unwinding and constant propagation can optimize our example program to the point that the loop is completely removed. As an example, we can run the optimizer using the following command, assuming again that cc1 is the default compiler that gcc uses:cc1 -O1 f1.c -o f2.opt.s. The final program that we produce this time, f2.opt.s is surprisingly concise:
  .cstring
LC0:
  .ascii "The sum is %d\12\0"
  .text
.globl _main
_main:
  pushl %ebp
  movl  %esp, %ebp
  subl  $24, %esp
  movl  $800, 4(%esp)
  movl  $LC0, (%esp)
  call  _printf
  leave
  ret
The next step in the compilation chain consists in the translation of the assembly language to binary code. The assembly program is still readable by people. The binary program, also called an object file can, of course, be read by human beings, but there are not many human beings who are up to this task these days. Translating from assembly to binary code is a rather simple task, because both these languages have the same syntactic structure. Only their lexical structure differs. Whereas the assembly file is written with ASCII [w:Assembly_language#Opcode_mnemonics_and_extended_mnemonics|[mnemonics]], the binary file contains sequences of zeros and ones that the hardware processor recognizes. A typical tool used in this phase is the as assembler. We can produce an object file with the command below as f2.s -o f3.o.
The object file is not executable yet. It does contain enough information to specify where to find the implementation of the printf function, for example. In the next step of the compilation process we change this file so that the address of functions defined in an external libraries be visible. Each operating system provides programmers with a number of libraries that can be used together with code that they create. A special software, the linker can find the address of functions in these libraries, thus fixing the blank addresses in the object file. Different operating systems use different linkers. A typical tool, in this case, is ld or collect2. For instance, in order to produce the executable program in a Mac OS running Leopard, we can use the command collect2 -o f4.exe -lcrt1.10.5.o f3.o -lSystem.
At this point we almost have an executable file, but our linked binary program is bound to suffer a last transformation before we can see its output. All the addresses in the binary code are relative. We must replace these addresses by absolute values, which point correctly to the targets of the function calls and other program objects. This last step is the responsibility of a program called loader. The loader dumps an image of the program into memory and runs it.

Memory Layout of C Programs


Memory Layout of C Programs

A typical memory representation of C program consists of following sections.
1. Text segment
2. Initialized data segment
3. Uninitialized data segment
4. Stack
5. Heap

A typical memory layout of a running process
1. Text Segment:
A text segment , also known as a code segment or simply as text, is one of the sections of a program in an object file or in memory, which contains executable instructions.
As a memory region, a text segment may be placed below the heap or stack in order to prevent heaps and stack overflows from overwriting it.
Usually, the text segment is sharable so that only a single copy needs to be in memory for frequently executed programs, such as text editors, the C compiler, the shells, and so on. Also, the text segment is often read-only, to prevent a program from accidentally modifying its instructions.
2. Initialized Data Segment:
Initialized data segment, usually called simply the Data Segment. A data segment is a portion of virtual address space of a program, which contains the global variables and static variables that are initialized by the programmer.
Note that, data segment is not read-only, since the values of the variables can be altered at run time.
This segment can be further classified into initialized read-only area and initialized read-write area.
For instance the global string defined by char s[] = “hello world” in C and a C statement like int debug=1 outside the main (i.e. global) would be stored in initialized read-write area. And a global C statement like const char* string = “hello world” makes the string literal “hello world” to be stored in initialized read-only area and the character pointer variable string in initialized read-write area.
Ex: static int i = 10 will be stored in data segment and global int i = 10 will also be stored in data segment
3. Uninitialized Data Segment:
Uninitialized data segment, often called the “bss” segment, named after an ancient assembler operator that stood for “block started by symbol.” Data in this segment is initialized by the kernel to arithmetic 0 before the program starts executing
uninitialized data starts at the end of the data segment and contains all global variables and static variables that are initialized to zero or do not have explicit initialization in source code.
For instance a variable declared static int i; would be contained in the BSS segment.
For instance a global variable declared int j; would be contained in the BSS segment.
4. Stack:
The stack area traditionally adjoined the heap area and grew the opposite direction; when the stack pointer met the heap pointer, free memory was exhausted. (With modern large address spaces and virtual memory techniques they may be placed almost anywhere, but they still typically grow opposite directions.)
The stack area contains the program stack, a LIFO structure, typically located in the higher parts of memory. On the standard PC x86 computer architecture it grows toward address zero; on some other architectures it grows the opposite direction. A “stack pointer” register tracks the top of the stack; it is adjusted each time a value is “pushed” onto the stack. The set of values pushed for one function call is termed a “stack frame”; A stack frame consists at minimum of a return address.
Stack, where automatic variables are stored, along with information that is saved each time a function is called. Each time a function is called, the address of where to return to and certain information about the caller’s environment, such as some of the machine registers, are saved on the stack. The newly called function then allocates room on the stack for its automatic and temporary variables. This is how recursive functions in C can work. Each time a recursive function calls itself, a new stack frame is used, so one set of variables doesn’t interfere with the variables from another instance of the function.
5. Heap:
Heap is the segment where dynamic memory allocation usually takes place.
The heap area begins at the end of the BSS segment and grows to larger addresses from there.The Heap area is managed by malloc, realloc, and free, which may use the brk and sbrk system calls to adjust its size (note that the use of brk/sbrk and a single “heap area” is not required to fulfill the contract of malloc/realloc/free; they may also be implemented using mmap to reserve potentially non-contiguous regions of virtual memory into the process’ virtual address space). The Heap area is shared by all shared libraries and dynamically loaded modules in a process.
Examples.
The size(1) command reports the sizes (in bytes) of the text, data, and bss segments. ( for more details please refer man page of size(1) )
1. Check the following simple C program
#include <stdio.h>
int main(void)
{
    return 0;
}
[narendra@CentOS]$ gcc memory-layout.c -o memory-layout
[narendra@CentOS]$ size memory-layout
text       data        bss        dec        hex    filename
960        248          8       1216        4c0    memory-layout
2. Let us add one global variable in program, now check the size of bss (highlighted in red color).
#include <stdio.h>
int global; /* Uninitialized variable stored in bss*/
int main(void)
{
    return 0;
}
[narendra@CentOS]$ gcc memory-layout.c -o memory-layout
[narendra@CentOS]$ size memory-layout
text       data        bss        dec        hex    filename
 960        248         12       1220        4c4    memory-layout
3. Let us add one static variable which is also stored in bss.
#include <stdio.h>
int global; /* Uninitialized variable stored in bss*/
int main(void)
{
    static int i; /* Uninitialized static variable stored in bss */
    return 0;
}
[narendra@CentOS]$ gcc memory-layout.c -o memory-layout
[narendra@CentOS]$ size memory-layout
text       data        bss        dec        hex    filename
 960        248         16       1224        4c8    memory-layout
4. Let us initialize the static variable which will then be stored in Data Segment (DS)
#include <stdio.h>
int global; /* Uninitialized variable stored in bss*/
int main(void)
{
    static int i = 100; /* Initialized static variable stored in DS*/
    return 0;
}
[narendra@CentOS]$ gcc memory-layout.c -o memory-layout
[narendra@CentOS]$ size memory-layout
text       data        bss        dec        hex    filename
960         252         12       1224        4c8    memory-layout
5. Let us initialize the global variable which will then be stored in Data Segment (DS)
#include <stdio.h>
int global = 10; /* initialized global variable stored in DS*/
int main(void)
{
    static int i = 100; /* Initialized static variable stored in DS*/
    return 0;
}
[narendra@CentOS]$ gcc memory-layout.c -o memory-layout
[narendra@CentOS]$ size memory-layout
text       data        bss        dec        hex    filename
960         256          8       1224        4c8    memory-layout
.