ARM based SoC's are very common these days in the mobile/tablet market space and other consumer electronics. In a SoC verification environment, C tests are written to exercise data transactions across various IPs in the system. C tests are converted to object code following the procedure described on this page. The object code is then loaded into memory models for the processor to execute. So, let's try to learn how a C program stored in hard disk is transformed into a program executed on a processor. There are basically four logical steps/phases that you need to be aware of.

Compilation

A compiler (which performs the compilation), transforms source code written in a high-level programming language like C into the target processor's assembly language. A few decades ago, a lot of software were written in assembly language because back then, compilers were not very efficient in utilizing the ISA of the target processor to produce dense and optimized code. Also, DRAM and memory was expensive. So, there was a big demand for Assembly Language coders to write dense and highly optimized codes for each processor. But, now-a-days there are softwares capable of producing very good quality and highly efficient assembly code.

An example is GCC (GNU Compiler Collection), produced by GNU Project, which supports a variety of processor architecures and is in common use in the industry today. This is free software distributed under GNU General Public License (GPL). To know what version of GCC is installed in your server, type gcc --version in a linux console.

To compile and generate an output object file, you can try

$> gcc "C_file" -o "name_object"
$> gcc main.c -o HelloWorld

Assembler

An assembler converts assembly language into an object file, which consists of machine language instructions, data and information needed to place instructions in memory. It has to keep track of all the labels used in branches and data transfer instructions in a symbol table and uses that information to determine addresses for each label. There are different formats for object files, and the most popular are COFF and ELF. Most object formats are structured as separate sections of data, each containing a certain type of data.

Header Descriptive and Control information
Code Segment Text segment, Executable code
Data Segment Initialized static variables
Read-Only Data rodata, initialized static constants
BSS Segment uninitialized static data, both variables and constants
External Definitions and references for linking
Relocation Information List of pointers stored in object file
Dynamic linking information Links shared libraries needed by an executable
Debugging Information Debugger can associate machine instructions with C source files

Linker

It would be a waste of time to compile and assemble all the source files if we had to edit a few lines in a single file. A lot of standard libraries are also part of the source code database and repetitive compilation and assembling of these files should be avoided. The work-around is to compile each file individually and link them up later on, and that is exactly what the linker does. It takes all the individual machine codes and combines them together via the following phases.

  1. Place code and data modules symbolically in memory
  2. Determine the addresses of data and instruction labels
  3. Place both internal and external references

Basically, the linker uses the relocation information and the symbol table in each object file to resolve undefined labels, determines the memory locations of each module and finally produces an executable file that can be run on a processor. This is similar to the object file except that there will not be any unresolved references.

Loader

If an operating system exists in the system, it will load the object file into memory and starts it.

  1. Determine size of text and data segments from the executable file
  2. Create an address space large enough for text and data
  3. Copy instructions/data from executable file into memory
  4. Initialize machine registers, point stack pointer to the right location
  5. Calls the main routine of the program