ikerhurtado.com
You're in
Iker Hurtado's pro blog
Developer | Entrepreneur | Investor
Software engineer (entrepreneur and investor at times). These days doing performant frontend and graphics on the web platform at Barcelona Supercomputing Center

Relearning C++: The process of writing, building and executing a program

5 Mar 2015   |   iker hurtado  
Share on Twitter Share on Google+ Share on Facebook
In this post I write down some notes as from my study of the process of writing, building, loading to memory and executing a C++ program.

I start with this graphic; it's very explanatory about the process:

Now in text, I extract a very good historical introduction to this process from the 'Linkers series' of Ian Lance Taylor :

What does a linker do?

It’s simple: a linker converts object files into executables and shared libraries. Let’s look at what that means. For cases where a linker is used, the software development process consists of writing program code in some language: e.g., C or C++ or Fortran (but typically not Java, as Java normally works differently, using a loader rather than a linker). A compiler translates this program code, which is human readable text, into into another form of human readable text known as assembly code. Assembly code is a readable form of the machine language which the computer can execute directly. An assembler is used to turn this assembly code into an object file. For completeness, I’ll note that some compilers include an assembler internally, and produce an object file directly. Either way, this is where things get interesting.

In the old days, when dinosaurs roamed the data centers, many programs were complete in themselves. In those days there was generally no compiler–people wrote directly in assembly code–and the assembler actually generated an executable file which the machine could execute directly. As languages liked Fortran and Cobol started to appear, people began to think in terms of libraries of subroutines, which meant that there had to be some way to run the assembler at two different times, and combine the output into a single executable file. This required the assembler to generate a different type of output, which became known as an object file (I have no idea where this name came from). And a new program was required to combine different object files together into a single executable. This new program became known as the linker (the source of this name should be obvious).

Linkers still do the same job today. In the decades that followed, one new feature has been added: shared libraries.

Shared libraries were invented as an optimization for virtual memory systems running many processes simultaneously. People noticed that there is a set of basic functions which appear in almost every program. Before shared libraries, in a system which runs multiple processes simultaneously, that meant that almost every process had a copy of exactly the same code. This suggested that on a virtual memory system it would be possible to arrange that code so that a single copy could be shared by every process using it. The virtual memory system would be used to map the single copy into the address space of each process which needed it. This would require less physical memory to run multiple programs, and thus yield better performance.

I believe the first implementation of shared libraries was on SVR3, based on COFF. This implementation was simple, and basically assigned each shared library a fixed portion of the virtual address space. This did not require any significant changes to the linker. However, requiring each shared library to reserve an appropriate portion of the virtual address space was inconvenient.

SunOS4 introduced a more flexible version of shared libraries, which was later picked up by SVR4. This implementation postponed some of the operation of the linker to runtime. When the program started, it would automatically run a limited version of the linker which would link the program proper with the shared libraries. The version of the linker which runs when the program starts is known as the dynamic linker. When it is necessary to distinguish them, I will refer to the version of the linker which creates the program as the program linker. This type of shared libraries was a significant change to the traditional program linker: it now had to build linking information which could be used efficiently at runtime by the dynamic linker.

Header files

The most uncomfortable task in order to prepare the code for building the program is in advance entity declaration. This issue is handled by header files.

Some good practices to create header files:

  • Include header guards.
  • Header files should generally only be used for declarations.
  • Do not declare variables unless they are constants.
  • Each header file should have a specific task, and be as independent as possible.
  • Minimize the number of other header files that you include in a header file.
  • Do not use using declarations.
  • Do not put a relative path we want to include as part of the #include line. Better, tell the compiler or IDE that you have a bunch of header files in some other location. This can generally be done by setting an 'include path' or 'search directory' in your IDE project settings. This prevents us from changing every code file if the path changes.
Good explanation about header files: 1.9 — Header files « Learn C++

Linker

The linker links the compiled object codes with other object codes and the libraries object codes (.a) to produce the executable code.

A very detailed description on the linker: C++ and the linker | copton.net

Static and dynamic libraries

A library is a package of code in order to be used by many programs. A C++ library is composed of two parts: A header file that defines its functionality and a precompiled binary that contains the implementation of that functionality. Some libraries may be split into multiple files and/or have multiple header files.

Libraries are precompiled for several reasons. First, since libraries rarely change, they do not need to be recompiled often. Second, because precompiled objects are in machine language, it prevents people from accessing or changing the source code.

Static libraries

A static library is a set of routines, external functions and variables which are compiled and linked directly into a target application producing an object file and a stand-alone executable. After linking, all the functionality of the static library becomes part of your executable. On linux, static libraries typically have an .a (archive) extension.

One advantage of using static libraries is that the resultant executable file will be able to be run without dependencies. On the downside, because a copy of the library becomes part of every executable that uses it, this can cause a lot of wasted space. Static libraries also can not be upgraded easy —to update the library, the entire executable needs to be replaced.

Dynamic libraries

A dynamic library (or shared library) a set of routines and variables that are loaded into a application at run time. When a program that uses a dynamic library is compiled, the library does not become part of the executable. On Linux, dynamic libraries typically have a .so (shared object) extension.

One advantage of dynamic libraries is that many programs can share one copy. Other advantage is that the dynamic library can be upgraded to a newer version without replacing all of the executables that use it.

Because dynamic libraries are not linked into your program, programs using dynamic libraries must explicitly load and interface with the dynamic library. This mechanisms can be confusing, and makes interfacing with a dynamic library awkward.

An import library is a library that automates the process of loading and using a dynamic library. On Linux, the shared object (.so) file plays as both a dynamic library and an import library.

Using libraries

Some guidelines on using libraries:

The compiler needs to know where to look for the header file(s) for the library. On Linux, libraries are typically installed to /usr/include, which should already be part of the include file search path. If the files are installed elsewhere, we will have to tell the compiler where to find them.

The linker needs to know where to look for the library file(s) for the library. This typically involves adding a directory to the list of places the linker looks for libraries. On Linux, libraries are typically installed to /usr/lib, which should already be a part of your library search path.

If using dynamic libraries, the program needs to know where to find them. Under Linux, libraries are typically installed to /usr/lib, which is in the default search path.

Good explanation about this topic: A.1 — Static and dynamic libraries « Learn C++

POST A COMMENT: